spacer

linkedin

twitter

@medriscoll

    dna dating

    A recent start-up, Yoke.me, is attempting to build a better dating engine using Big Data and algorithms.  But what mix of data could best be used to algorithmically identify an optimal mate?  Photos, favorite albums, and religious beliefs are a start.

    But how about DNA?

    A couple of years ago at SciFoo, Toby Segaran, Meredith Carpenter, and I brainstormed about creating a start-up that would do just this.  We dubbed it GeneHarmony.

    Here’s how it would work: to become a member, you submit a saliva sample to our genomics facility, which sequences all of your genetic quirks (since most of us share DNA which is 99.6% similar, we need only sequence the differences).

    Read more

    the data science debate: domain expertise or machine learning?

    spacer

    (Photo credit:  O’Reilly Radar - See Link to Full Video)

    This past Tuesday evening at Strata I moderated an Oxford-Style debate between six of the top data scientists in Silicon Valley and beyond. The motion debated was: 

    “In data science, domain expertise is more important than machine learning skill.”

    Read more

    start-ups belong in cities

    spacer

    Last Saturday, I woke up and walked down to my favorite coffee shop in San Francisco, SightGlass coffee in SoMa.

    I met up with a couple of entrepreneurs pitching an amazing idea, and while ordering some mind-buzzingly-good drip coffee, ran into a mentor of mine.

    I write this because, while these interactions could have happened in the suburbs of Silicon Valley — whether the Coupa Cafe in Palo Alto or Red Rock in Mountain View — they are quintessentially enabled by four qualities of a city like San Francisco:

    •  neighborhoods that mix commerce and living, that “serve more than one primary function”
    •  blocks that are walkable, short and broken up with alleyways and side streets
    •  buildings which are a diversity of the old and new, luxury and low-rent
    •  people are prevalent and sufficiently concentrated

    These four qualities enable the unique vibrancy of urban neighborhoods, and were laid out by Jane Jacobs in her magnum opus “The Death and Life of Great American Cities.”

    Read more

    ETL: the coal mining of the information age

    “If I were starting a NoSQL-in-the-enterprise startup, I would focus on ETL. ETL is a mess, and is a precursor for any fancy uses of data.” - @jaykreps

    “@jaykreps ETL is the coal mining of the information age: dirty, important work that fuels the economy.” - @peteskomoroch

    One of the largest obstacles facing companies who seek to derive value from data isn’t data’s size.  It’s data’s dirtiness.

    It’s been said before: 80% of the effort that goes into a data science project is extracting, transforming, and loading (ETL’ing) data into a system where it can be analyzed.

    Read more

    why everyone should be a medical data donor

    spacer

    What happens to your medical records when you die?  Gil Elbaz thinks you ought to donate them to science, a thought he shared with a technology audience this past week.

    It’s a fascinating idea.  But why wait until you’re dead?  In the age of the quantified self, why shouldn’t you be able to give your DNA sequence, your diet, and your disease diagnoses to science while you’re alive?  Unlike your organs, you can donate your data away and yet still keep it.

    We have companies collecting vast swaths of data about our buying, browsing, and clicking habits to sell us more stuff.  But when it comes to understanding what behaviors keep us healthy, it’s a rocky landscape of HIPAA-regulated, technologically-challenged health insurers and providers.  We collect so much data about what makes us click, yet so little about makes us tick.

    There are pockets of hope.  Sites such as PatientsLikeMe — which as this writing has 122,640 patients and over a thousand conditions — and Ginger.io are green sprouts in a bottom-up, democratizing data movement for health.

    Nearly eight out of ten people on the planet earth now own a mobile phone.  These phones send so-called “heartbeat” data to cell towers every few seconds.  Imagine if, instead, we had the true heartbeat data of the humans carrying those phones?  A simple cardiac signal can betray a host of health issues, from stress and aging to a warning of impending stroke or heart attack.

    I know that I’m not alone in being willing to give my data to medical science.  If the Fitbit or Jawbone UP had a checkbox that read “donate my data”, and the receiving institution was a trusted one, it could be the beginning of a valuable data bank.  If the Red Cross can convince us to stick needles in our arms to give blood, certainly we can endure bracelets on our wrists to give data.

    lies, damned lies, and social media statistics

    spacer

    Social media statistics — shares, retweets, and likes — reflect content’s value the way a funhouse mirror reflects one’s looks: grotesquely.  As the web lines its halls with social mirrors, these distortions are influencing the content we create and consume.

    One need look no further than the headlines at Hacker News for a gallery of the grotesque:  ”N Reasons…”, “Why X is Wrong”, “Free Y”, and “How Z.. Cancer”.  Many of these stories are explicitly crafted to achieve fifteen seconds of fame.

    I plead guilty of this seduction — with @jkottke telling me off as proof — because it’s tempting to believe that metrics are an honest measure of value.  They’re not.

    Social Media Statistics are Biased

    Hacker News readers are not a representative audience. Because of the frenzied frequency with which they flood the voting booths of cyberspace, their influence is outsized — and perversely enough, in inverse proportion to their attention spans.

    We need a balance against these biases.  A retweet from @timoreilly means more than one from @lolz69.  Klout has attempted, with some ignominy, to measure online influence. If we weighted retweet counts by influence, we might have a better measure of an article’s impact.

    Time matters too. All content is a zero until someone reacts, so we need to gauge the speed of +1s or shares, not just the total.

    And positive feedback loops are everywhere.  We end up reading and sharing the same few dozen articles every day, not because these are always the most valuable, but because once they’ve bubbled up into the meme pool, they get recirculated and amplified.

    Be a First Follower

    The strongest signal of quality should be the content itself, not its number of shares or comments.  If you keep an open mind, you’ll encounter that joy of discovery once so integral to the web.  Lovely gems still lurk out there.  

    Being the first follower takes a smidgeon of bravery.  So ignore what other people think and share something no one else has.  You’ll be a democratizing force.

    Connect with People, Don’t Collect Them

    Few of us share our ideas, photographs, and experiences online solely to collect followers.  We do so to convince, to delight, to connect with people.  

    If you’re a creator, never confuse numbers with the value of your creative output.  Resist the urge to chase some earlier success.  If you create something of lasting value, which has staying power after the initial spasms of interest have passed, you will engage with your audience in a way that few metrics reveal.

    Blogging to boost your follower count is like launching a start-up to build your bank balance:  it rarely works.  Instead, focus passionately on creating value, and the rest will come.

    what to feed the mythical machine learning beast?

    spacer

    One of the holy grails of machine learning is the creation of a system that can “read the web” and learn from it, as Isaac Newton read Euclid’s Elements and taught himself geometry.

    Imagine a mythical beast that could speed-read one-hundred million pages per second, consuming every Wikipedia entry, every scientific article on arxiv.org, every out-of-copyright scanned book, and beyond just indexing that information, could actually reason with it.

    Read more

    gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.