Visualizing SOPA on Twitter

spacer

When I heard that Tyler Gray at Public Knowledge was looking for someone to do some analysis on tweets that mentioned SOPA, I thought I might try Cytoscape (an open source tool used for biomedical research, but handy for large scale data visualization) to show some of the relationships between people discussing the controversial bill on Twitter.

The result is a graph of the most active users referencing SOPA

spacer

Public Knowledge worked with the Brick Factory to set up their slurp140 tool to record approximately 1.5 million tweets which Tyler sent me in the form 350mb CSV file. I first used Google Refine to clean and narrow the set down to only tweets which were replies to someone else. This left approximately 80,000 tweets which I then imported into R. I then ranked all of usernames by how often they appeared both as senders and recipients, and then picked the approximate top 1,000 users. Since replies are sent from one user to another, the graph is directed: each edge has a direction with an origin and an arrow pointing at the recipient. There are 1,021 nodes identified by their Twitter usernames, and 1,757 edges a good portion of which are labeled with the content of their tweet.

spacer

Visualizing networks this large is more of an art than a science

I’ve tried to strike a balance between visual complexity, aesthetics and readability of tweets, but you’ll find that this isn’t always successful. Sometimes tweets run into nodes, sometimes edges run into labels, and sometimes the graph feels like a total mess. But that messiness is part of what made the SOPA debate on so interesting over the last month.

spacer

Thousands of people participating with plenty of cross talk.

The colors and sizes of the nodes and edges are coded in the following ways:

  • A node and its label size is maps to the number of tweets both posted by a user and and mentioning a user. (Ex: @BarackObama is a huge node because so many people were tweeting at him about SOPA).
  • Node color represents the number of outgoing tweets. The greener the node, the more replies a user posted. (Ex: @Digiphile sent a lot of tweets mentioning SOPA.)
  • Edge thickness represents “edge betweeness” which is how many “shortest paths” that run through it. This is a rough measure of how central a given tweet is in a network. (Ex: @declanm and @mmasnick have a thick line connecting them because many other nodes are connected to the two through that tweet.)
  • Edge color represents the language of the tweet. (Ex: Tweets in English are blue, Spanish are yellow.)

The nodes are positioned using an “force directed” algorithm which is typically designed for undirected graphs, but I found it to be the most visually compelling of Cytoscape’s layout options. To learn more about force directed graphs, take a look at this d3 tutorial visualizing the characters in Victor Hugo’s Les Misérables.

spacer

To really browse the graph visit GigaPan where I’ve uploaded a 32,000 x 32,000 pixel version.

I highly recommend GigaPan’s full screen mode. I’ve also created a couple snapshots on GigaPan that highlight interesting nodes: @BarackObama, @GoDaddy, and @LamarSmithTX21 and @DarellIssa.

If you really want, you can also download the 36mb gigapixel file, the Cytoscape source file, and the PDF vector version of the network graph.

Thanks again to Public Knowledge, The Brick Factory for providing the infrastructure to record the tweets, and everyone who has helped fight against SOPA and PIPA over the last couple of months, especially those who tweeted about it.

- @fredbenenson
Written by Fred Posted in Art, Copyright, Data, Law, Politics

15 comments

  1. spacer
    joe

    wicked awesome.

    media companies need to stop trying to hold back technical progress (i.e. Napster) and adapt (i.e. itunes, rhapsody, new napster, netflix)

  2. Pingback: 將Twitter上的SOPA討論視覺化 - TechBeer.in | TechBeer
  3. Pingback: émergenceweb : blogue » Blackout SOPA: quand le Web se mobilise contre le censure !
  4. Pingback: What Does SOPA Look Like On Twitter? - AllTwitter
  5. Pingback: Today's Scuttlebot: Blackout Comics and Twitter Maps - NYTimes.com
  6. Pingback: Today’s Scuttlebot: Blackout Comics and Twitter Maps | CATA NEWS
  7. spacer
    Dwight Turner (@DwightTurner)

    This is wiked, I agree. I love that you explain your methodology and link to helpful tools. However, you didn’t include much analysis of your findings. Anything surprising or outstanding? Was much the conversation dominated by arguing or general outrage? Not sure if we can make any inferences of this nature based on your findings, but I’d be curious to hear if you had any hypothesis.

    Finally do the 80,000 tweets represent tweets in English or using a certain term or hashtag? Cool stuff. If you have time, thanks for answering.

  8. Pingback: Hoe Twitter reageerde op de SOPA-blackout | Twittermania
  9. Pingback: Visualizing SOPA on Twitter | Fred Benenson’s Blog » Infographics Central
  10. Pingback: The protest against SOPA/PIPA #SOPAstrike « ilNichilista
  11. Pingback: “STOP SOPA”事件追踪 | 新青年·REVIEW
  12. spacer
    Francesc Gómez Morales

    So amazing!!

    Thank you very much for sharing all the steps and troubles you have found. This post is a little treasure.

  13. Pingback: Review: Der #SOPA – Blackout-Day » Von markus » netzpolitik.org
  14. spacer
    Victor Pascual

    Good job! Thanks for sharing the images!
    It is possible that you could also share the graph itself in GraphML/GDF format?

    Cheers!

  15. Pingback: zeugs am freitag « blubberfisch

Post a comment Cancel reply

You may use the following HTML:
<a class="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>