Data

  • spacer

    Data Jujitsu: The art of turning data into product

    Smart data scientists can make big problems small.

  • spacer

    Walking the tightrope of visualization criticism

    The balance, fairness and realism of our visualization criticism must improve.

  • spacer

    Stories over spreadsheets

    Kris Hammond on replacing rows and columns with sentences and paragraphs.

  • spacer

    The chicken and egg of big data solutions

    Are solution vendors waiting for broad Hadoop adoption before jumping in?

  • spacer

    A brief history of data journalism

    Key milestones in data journalism's development.

spacer

Economic impact of open source on small business

Results from an in-depth study of open source's role in small and medium businesses.

by Mike Hendrickson | @mikehatora | +Mike Hendrickson | July 18, 2012

A few months back, Tim O’Reilly and Hari Ravichandran, founder and CEO of Endurance International Group (EIG), had a discussion about the web hosting business. They talked specifically about how much of Hari’s success had been enabled by open source software. But Hari wasn’t just telling his success story to Tim, but rather was more interested in finding ways to give back to the communities that made his success possible. The two agreed that both companies would work together to produce a report making clear just how much of a role open source software plays in the hosting industry, and by extension, in enabling the web presence of millions of small businesses.

We hope you will read this free report while thinking about all the open source projects, teams and communities that have contributed to the economic succes of small businesses or local governments, yet it’s hard to measure their true economic impact. We combed through mountains of data, built economic models, surveyed customers and had discussions with small and medium businesses (SMB) to pull together a fairly broad-reaching dataset on which to base our study. The results are what you will find in this report.

Here are a few of the findings we derived from Bluehost data (an EIG company) and follow-on research:

  • 60% of web hosting usage is by SMBs, 71% if you include non-profits. Only 22% of hosted sites are for personal use.
  • WordPress is a far more important open source product than most people give it credit for. In the SMB hosting market, it is as widely used as MySQL and PHP, far ahead of Joomla and Drupal, the other leading content management systems.
  • Languages commonly used by high-tech startups, such as Ruby and Python, have little usage in the SMB hosting market, which is dominated by PHP for server-side scripting and JavaScript for client-side scripting.
  • Open source hosting alternatives have at least a 2:1 cost advantage relative to proprietary solutions.

Given that SMBs are widely thought to generate as much as 50% of GDP, the productivity gains to the economy as a whole that can be attributed to open source software are significant. The most important open source programs contributing to this expansion of opportunity for small businesses include Linux, Apache, MySQL, PHP, JavaScript, and WordPress. The developers of these open source projects and the communities that support them are truly unsung heroes of the economy!

Read more…

Comment |
spacer spacer

Data Jujitsu: The art of turning data into product

Smart data scientists can make big problems small.

by DJ Patil | @dpatil | +DJ Patil | July 17, 2012

Having worked in academia, government and industry, I’ve had a unique opportunity to build products in each sector. Much of this product development has been around building data products. Just as methods for general product development have steadily improved, so have the ideas for developing data products. Thanks to large investments in the general area of data science, many major innovations (e.g., Hadoop, Voldemort, Cassandra, HBase, Pig, Hive, etc.) have made data products easier to build. Nonetheless, data products are unique in that they are often extremely difficult, and seemingly intractable for small teams with limited funds. Yet, they get solved every day.

How? Are the people who solve them superhuman data scientists who can come up with better ideas in five minutes than most people can in a lifetime? Are they magicians of applied math who can cobble together millions of lines of code for high-performance machine learning in a few hours? No. Many of them are incredibly smart, but meeting big problems head-on usually isn’t the winning approach. There’s a method to solving data problems that avoids the big, heavyweight solution, and instead, concentrates building something quickly and iterating. Smart data scientists don’t just solve big, hard problems; they also have an instinct for making big problems small.

We call this Data Jujitsu: the art of using multiple data elements in clever ways to solve iterative problems that, when combined, solve a data problem that might otherwise be intractable. It’s related to Wikipedia’s definition of the ancient martial art of jujitsu: “the art or technique of manipulating the opponent’s force against himself rather than confronting it with one’s own force.”

How do we apply this idea to data? What is a data problem’s “weight,” and how do we use that weight against itself? These are the questions that we’ll work through in the subsequent sections.

Read more…

Comment |
spacer spacer

From smartphones and continuous data comes the social MRI

Dr. Nadav Aharony used phone sensors to explore personal behaviors and community trends.

by Mac Slocum | @macslocum | +Mac Slocum | July 13, 2012

It’s clear at this point that the smartphone revolution has very little to do with the phone function in these devices. Rather, it’s the unique mix of sensors, always-on connectivity and mass consumer adoption that’s shaping business and culture.

Dr. Nadav Aharony (@nadavaha) tapped into this mix when he was working on a “social MRI” study in MIT’s Media Lab. Aharony, who recently joined us as part of our ongoing foo interview series, described his vision of the social MRI:

“If you think about it, the three things you take with you when you go out of your home are your keys, your wallet and your phone, so our phones are always with us. In aggregate, we can use the phones in many people’s pockets as a virtual imaging chamber. So, one aspect of the social MRI is this virtual imaging chamber that is collecting tens or hundreds of signals at the same time from members of the community.” [Discussed at 1:16]

Aharony’s work focused on 150 participants (about 75 families) that were given phones for 15 months. During that time, more than one million hours of “continuous sensing data” was gathered with the participants’ consent. The data was acquired and scrubbed under MIT’s ethics guidelines, and for extra measure, Aharony included his own data in the dataset.

Collecting the data was just the beginning. Parsing that information and creating experiments based on emerging signals is where the applications of a social MRI became significant.
Read more…

Comment: 1 |
spacer

Heavy data and architectural convergence

Data is getting heavier relative to the networks that carry it around the data center.

by Jim Stogdill | @jstogdill | +Jim Stogdill | July 9, 2012

Imagine a future where large clusters of like machines dynamically adapt between programming paradigms depending on a combination of the resident data and the required processing.

Comment |
spacer spacer

Walking the tightrope of visualization criticism

The balance, fairness and realism of our visualization criticism must improve.

by Andy Kirk | @visualisingdata | +Andy Kirk | July 2, 2012

A creative field, such as visualization, will have many different interpretations and perspectives. The resolution and richness of this opinion is important to safeguard.

Comment |
spacer

UK Cabinet Office relaunches Data.gov.uk, releases open data white paper

The British government further embraces open data as a means to transparency and "prosperity."

by Alex Howard | @digiphile | +Alex Howard | June 29, 2012

The Cabinet Office of the United Kingdom released a notable new white paper on open data and relaunched its flagship open data platfrom, Data.gov.uk. This post features interviews on open data with Cabinet Minister Francis Maude, Tim Berners-Lee and Rufus Pollock.

Comment |