Strata - Making Data Work

Software that keeps an eye on Grandma

Networked sensors and machine learning make it easy to see when things are out of the ordinary.

by Jon Bruner | @JonBruner | +Jon Bruner | November 22, 2012

Much of health care — particularly for the elderly — is about detecting change, and, as the mobile health movement would have it, computers are very good at that. Given enough sensors, software can model an individual’s behavior patterns and then figure out when things are out of the ordinary — when gait slows, posture stoops or bedtime moves earlier.

Technology already exists that lets users set parameters for households they’re monitoring. Systems are available that send an alert if someone leaves the house in the middle of the night or sleeps past a preset time. Those systems involve context-specific hardware (i.e., a bed-pressure sensor) and conscientious modeling (you have to know what time your grandmother usually wakes up).

The next step would be a generic system. One that, following simple setup, would learn the habits of the people it monitors and then detect the sorts of problems that beset elderly people living alone — falls, disorientation, and so forth — as well as more subtle changes in behavior that could signal other health problems.

A group of researchers from Austria and Turkey has developed just such a system, which they presented at the IEEE’s Industrial Electronics Society meeting in Montreal in October.*

Activity as surmised in different rooms by the researchers’ machine-learning algorithms. Source: “Activity Recognition Using a Hierarchical Model.”

In their approach, the researchers train a machine-learning algorithm with several days of routine household activity using door and motion sensors distributed through the living space. The sensors aren’t associated with any particular room at the outset: their software algorithmically determines the relative positions of the sensors, then classifies the rooms that they’re in based on activity patterns over the course of the day. Read more…

Comment |

Visualization of the Week: Hubway commuters’ time saved

An interactive visualization shows how bike-sharing service Hubway has saved its commuters more than 45,000 hours of travel time.

by Jenn Webb | @JennWebb | +Jenn Webb | November 21, 2012

Boston’s bike-sharing service Hubway recently opened up its data and held a visualization contest. The winners were announced this week: Overall Best Visualization went to MIT student Virot “Ta” Chiraphadhanakul. His interactive visualization measures trip times from one Hubway station to another and compares the times to the same trips using Massachusetts Bay Transportation Authority (MBTA) public transportation and/or walking. His results show that in 513,733 one-way trips, commuters have saved more than 45,000 hours of travel time by using Hubway.

The overall interactive chart allows you to hover over specific dots, or trips, to see travel times and time saved using Hubway. Bigger dots represent more popular trips.

Read more…

Comment |

DocGraph: Open social doctor data

An inside look at DocGraph, a data project that shows how the U.S. health care system delivers care.

by Fred Trotter | @fredtrotter | +Fred Trotter | November 19, 2012

At Strata RX in October I announced the availability of DocGraph. This is the first project of NotOnly Development, which is a Not Only For Profit Health IT micro-incubator.

The DocGraph dataset shows how doctors, hospitals, laboratories and other health care providers team together to treat Medicare patients. This data details how the health care system in the U.S. delivers care.

You can read about the basics of this data release, and you can read about my motivations for making the release. Most importantly, you can still participate in our efforts to crowdfund improvements to this dataset. We have already far surpassed our original $15,000 goal, but you can still get early and exclusive access to the data for a few more days. Once the crowdfunding has ended, the price will go up substantially.

This article will focus on this data from a technical perspective.

In a few days, the crowdfunding (hosted by Medstartr) will be over, and I will be delivering this social graph to all of the participants. We are offering a ransom license that we are calling “Open Source Eventually,” so participants in the crowdfunding will get exclusive access to the data for a full six months before the license to this dataset automatically converts to a Creative Commons license. The same data is available under a proprietary-friendly license for more money. For all of these “releases,” this article will be the go-to source for technical details about the specific contents of the file.

Read more…

Comments: 6 |

Strata Week: Investors embrace Hadoop BI startups

Platfora, Continuuity secure funding; the Internet of Things gets connected; and personal big data needs a national awareness campaign.

by Jenn Webb | @JennWebb | +Jenn Webb | November 16, 2012

Here are a few stories from the data space that caught my attention this week.

Two Hadoop BI startups secure funding

There were a couple notable pieces of investment news this week. Platfora, a startup looking to democratize Hadoop as a business intelligence (BI) tool for everyday business users, announced this week that it has raised $20 million in series B funding, bringing its total funding to $25.7 million, according to a report by Derrick Harris at GigaOm.

Harris notes that investors seem to get the technology — CEO Ben Werther told Harris that in this funding round, discussions moved to signed term sheets in just three weeks. Harris writes that the smooth investment experience “probably has something to do with the consensus the company has seen among venture capitalists, who project Hadoop will take about 20 percent of a $30 billion legacy BI market and are looking for the startups with the vision to win that business.”

Platfora faces plenty of well-funded legacy BI competitors, but Werther told Christina Farr at Venture Beat that Platfora’s edge is speed: “People can visualize and ask questions about data within hours. There is no six-month cycle time to make Hadoop amazing.”

In other investment news, Continuuity announced it has secured $10 million in series A funding to further develop AppFabric, its cloud-based platform-as-a-service tool designed to host Hadoop-based BI applications. Alex Wilhelm reports at The Next Web that Continuuity is looking to make AppFabric “the de facto location where developers can move their big data tools from idea to product, without worrying about building their own backend, or fretting about element integration.”

Read more…

Comment |

3 big ideas for big data in the public sector

Predictive analytics, code sharing and distributed intelligence could improve criminal justice, cities and response to pandemics.

by Alex Howard | @digiphile | +Alex Howard | November 15, 2012

If you’re going to try to apply the lessons of “Moneyball” to New York City,’ you’ll need to get good data, earn the support of political leaders and build a team of data scientists. That’s precisely what Mike Flowers has done in the Big Apple, and his team has helped to save lives and taxpayers dollars. At the Strata + Hadoop World conference held in New York in October, Flowers, the director of analytics for the Office of Policy and Strategic Planning in the Office of the Mayor of New York City, gave a keynote talk about how predictive data analytics have made city government more efficient and productive.

While the story that Flowers told is a compelling one, the role of big data in the public sector was in evidence in several other sessions at the conference. Here are three more ways that big data is relevant to the public sector that stood out from my trip to New York City.

Read more…

Comment: 1 |

Visualization of the Week: How cities flow

Using Uber's ride data, neuroscientist Bradley Voytek created a visualization showing the flow of people throughout nine US cities.

by Jenn Webb | @JennWebb | +Jenn Webb | November 14, 2012

In January, Uber’s resident neuroscientist Bradley Voytek put together a visualization of the flow of people in San Francisco — the volume and direction of people traveling from one city neighborhood to another — using Uber’s ride data. Now, he’s done it for eight more US cities.

Here’s a look at the flow in Boston:

Click here to see the full list of visualizations.

Read more…

Comment |

Data science in the natural sciences

Big data is shaping diverse fields, showing that past predictions from data-driven natural sciences are now coming to pass.

by Chris Wiggins | @chrishwiggins | +Chris Wiggins | November 12, 2012

I find myself having conversations recently with people from increasingly diverse fields, both at Columbia and in local startups, about how their work is becoming “data-informed” or “data-driven,” and about the challenges posed by applied computational statistics or big data.

A view from health and biology in the 1990s

In discussions with, as examples, New York City journalists, physicists, or even former students now working in advertising or social media analytics, I’ve been struck by how many of the technical challenges and lessons learned are reminiscent of those faced in the health and biology communities over the last 15 years, when these fields experienced their own data-driven revolutions and wrestled with many of the problems now faced by people in other fields of research or industry.

It was around then, as I was working on my PhD thesis, that sequencing technologies became sufficient to reveal the entire genomes of simple organisms and, not long thereafter, the first draft of the human genome. This advance in sequencing technologies made possible the “high throughput” quantification of, for example,

the dynamic activity of all the genes in an organism; or
the set of all protein-protein interactions in an organism; or even
statistical comparative genomics revealing how small differences in genotype correlate with disease or other phenotypes.

These advances required formation of multidisciplinary collaborations, multi-departmental initiatives, advances in technologies for dealing with massive datasets, and advances in statistical and mathematical methods for making sense of copious natural data. Read more…

Comments: 2 |