Technology: Tools and Raw Data

Raw data

For the primary election phase of the 2008 presidential election I 'captured' all of the videos that eight of the candidates posted to YouTube. These are the videos they posted between February of 2007 and the New Hampshire primary. They are available here:

During the 2008 campaign I captured the views per day for the videos posted to YouTube by the McCain and Obama campaigns. They are available here: campaign views per day.

For the Obama YouTube presidency the daily views of the videos they are placing on YouTube are here: presidency views per day. This includes views of videos posted after the campaign and before the inauguration. However, those views were collected after the fact so they are only useful as the total number of views for the videos, and they are not useful in determining the dynamics of viewing.

Tools

Tracking web communication

Many tools have been developed to track public commucation on the web. Searching Twitter has seen a particularly rapid development as Twitter increased in users in 2009. But there is tracking software for other services and the Pew Foundation has conducted a series of surveys on use of the web. The list on the left is almost wholly reports written by tech bloggers covering technology for searching communication on the web. The list is in chronological order with the oldest at the top and the newest at the bottom. It will be kept up to date -- I will add to it as I read more reports.

Jaycut is a useful online video editing system. It is a little sticky about giving up the edited video -- though you can do sharing in some straightforward ways. It is not as good a video editor as Premiere or Fast Cut Pro and others. But it will do a pretty good job.

Archivist is a program that runs on Windows computers. It searches the Twitter API for tweets. The full panoply of search constructions in the Twitter API are available in the Archivist. The advantage of the Archivist is it stores the file on the local disk drive in xml format, and it will output it in csv format. So it both searches and produces a file that can be analyzed. Most of the search programs will search but will not produce a file. It is a new and not very well tested program, and I have been doing beta work for the programmers. The record of that investigation is a collection of notes here: experimenting with Archivist.

Twitter puts limits on searches of its API. This is how "Karsten here, lead dev on the Archivist" describes the limitations. " Well, the tricky thing is that they keep changing the rules and the server.  So it is a moving target for sure.  I do know that they only allow for apps to get 1500 results at a time.  So, it is possible that the archivist could miss tweets if there are more than 1500 results within the 10 minutes. I think I’m going to update it so that it polls every 5 minutes, which should keep you from getting rate limited, but if you are running multiple instances, you could get in trouble. You can only have 150 requests per hour ( http://apiwiki.twitter.com/Rate-limiting ). So, the math is pretty simple all told, as far as instances of different twitter clients and not getting rate limited."

socialmention will search and export the results of the search in csv format. It will search blogs, microblogs, bookmarks, comments, events, images, news, videos, and audio. News is web news: digg, reddit, etc. It does everything one would want except that its search is not very good. It finds much less than Archivist, for example. It has a real time search widget, but that does not produce a file for analysis.

FriendFeed does real time research that continually updates itself, and the search results can be embedded in a web page. But does not export a file.

Twitter StreamGraphs is another very useful system. It does a search for the most recent 1,000 tweets and displays them on a timeline. It is very handy for looking at the dynamics though it will not go beyond the 1,000 tweet count.

Content Analysis -- once you have text you need to do something with it. Content or textual analysis has been embedded in many computer programs. A review of programs is here and here. It seems to be somewhat out of date; it is not dated. They say The General Inquirer is available online.

Comments for analyses -- I can use JS-Kit. They have a widget for taking comments called EHCO. The text to be inserted is: <div class="js-kit-comments" permalink=""></div><script src="http://js-kit.com/for/myweb.uiowa.edu/comments.js"></script> I am not sure how this works, but it did seem to work to get comments.