This is the schema for the weighted word list...
1) Weight: Counts the number of occurrences of the word in the transcript
2) Word: The word whose frequency is being measured
3) URL: The web address associated with the word (optional)
Transforming your transcript into a weighted word list is one of the services offered by Jaussi Consulting LLC. Click here for more details.
This is what the first few rows of the weighted word list for this transcript look like.
Once the weighted word list was generated, I then used it as the input for creating the "Word Cloud" (also known as a "Tag Cloud") pictured below.
In order to accomplish this task, I created a simple SQL Server Integration Services (SSIS) package named "LoadSubSource.dtsx". It has a couple basic components which truncate and then load a staging table. The other components here are part of the SSIS package management system I engineered based on chapter two of Microsoft SQL Server 2008 Integration Services: Problem, Design, Solution (an excellent reference manual). Here is an image of the control flow design surface...
This package accomplished my first step of getting the raw transcript data into the SQL Server Database. From there, I engineered a series of tables, stored procedures, functions, and views to generate the desired output. For ease of use, I call these from the following script...
The stored procedure here has parameters that limit the number of rows in the output, filter for a specific speaker (optional), and filter for a minimum frequency (word weight).