kiwitobes.com

Posting on Google+

So, it’s been over a year since I last posted here. For some reason the platform never quite clicked for me. I’ve been enjoying using Google+ (disclaimer, yes I work for Google) a lot more, sharing photos, links and thoughts there.

I’m not sure I’ll post here any more, but I’ll leave it up cause some of the entries are still pretty popular.

Anyway, my Google+ page is here.

October 24th, 2011 | Category: Uncategorized | Comments (31)

Visualizing UBS Analysts

Last November, my friend Jesper and I presented at Web 2.0 New York, a talk called Freeing and Visualizing Financial Data. I’ve been meaning to put up some of the visuals we made while looking through what was available.

Banks are required to publish a report each quarter called an “Analyst Transparency Report”. The report lists all the stocks that the bank’s analysts wrote reports about, along with who wrote the report and what their recommendation was. Because the report is just a several-hundred page table, it’s very difficult to get a view of the big picture. It’s even difficult to figure out basic stuff like all the stocks covered by a particular analyst.

To get an idea of what was going on, I took the report and turned it into a large network diagram (longtime readers will start to think that this is my answer to everything )

(click on the graph to see a high-res version)

The rectangular nodes are analysts and the oval nodes are stocks. The connections between them indicate that the recommendation was buy (green), neutral (black) or sell (red).

You can spend a long time looking at this and noticing patterns that emerge. One of the cool things is that it’s easy to quickly spot companies that appeared together in a report so you can tell that they potentially have related fortunes. Here’s a great example:

In this image, you can see that this analyst, Kevin Crissey, seems to specialize in the travel industry. He covers, for example both JetBlue (JBLU) and Expedia (EXPE). What’s fantastic about this is that the industry classification for these two companies is completely different. JBLU is classified as a Regional Airline, which EXPE is classified as General Entertainment. Because the clusters in these graphs show relationships between companies that don’t match their hierarchical classifications, they are a great source of information about potential correlations.

Here’s another section of the graph:

What popped out to me here is that this group of three retail analysts all have recommendations on Zumiez (ZUMZ). Not only that, but two of them have written two reports and change from a neutral to sell rating. This is curious because ZUMZ is only a $600million company, which has three analysts who wrote five reports on it. There are much larger companies which have no coverage or only one analyst. Further investigation might reveal that UBS has a particular reason for all this coverage or that ZUMZ did something to attract a lot of attention.

There are many other interesting groupings and links buried in the graph. I think viewing extremely long tables as a graph like this could help us spot relationships we might not have otherwise seen.

April 24th, 2010 | Category: Uncategorized | Comments (273)

My latest two books now available!

The first of these is Programming the Semantic Web. I wrote this with two of my coworkers, Jamie Taylor and Colin Evans. We were attempting to make the first ever practical guide to why regular programmers should pay attention to semantic technologies. After writing this book, I was so convinced myself that I’ve moved all my projects away from traditional relational databases to graph databases.

That animal on the cover is a Red Panda, also known as a Firefox. Many thanks to our editor Mary Treseler for being so awesome through the process of writing this.

The second is Beautiful Data, which is an essay collection that I co-edited with Jeff Hammerbacher and to which I also contributed. We found a group of people who we thought were doing awesome stuff with data and convinced them to write essays.

We have an awesome list of contributors: Peter Norvig, Nathan Yau, Jonathan Follett, Matt Holm, J.M. Hughes, Raghu Ramakrishnan, Brian Cooper, Utkarsh Srivastava, Jason Dykes, Jo Wood Jeff Jonas, Lisa Sokol, Jud Valeski, Alon Halevy, Jayant Madhavan, Aaron Koblin, Valdean Klump, Michal Migurski Jeff Heer, Coco Krumme, Matt Wood, Ben Blackburne, Jean-Claude Bradley, Rajarshi Guha, Andrew Lang, Pierre Lindenbaum, Cameron Neylon, Antony Williams, Egon Willighagen, Lukas Biewald, Brendan O’Connor, Hadley Wickham, Deborah Swayne, David Poole Andrew Gelman, Jonathan P. Kastellec, Yair Ghitza and Jeff and myself.

All royalties for Beautiful Data are split between the Sunlight Foundation and Creative Commons.

Julie Steele, the O’Reilly editor on this book, was so awesome at making sure everyone got their essays in and they were reviewed properly.

They’re both great books, I’m really proud of how they turned out. I’m not sure I’ll be writing again for a while though!

August 6th, 2009 | Category: Books, Data | Comments (345)

Quick updates: Wedding, Hack Day, Books

I’m about to head off to my wedding in Mexico! I’m afraid while I’m gone I won’t be checking email very much, so please don’t be sad if I don’t write back to you immediately, I’ll try to get through everything when I get back!
If you like my posts on semantics and free data, and you live in San Francisco, you should check out Freebase Hack Day to meet a bunch of like-minded people.
Finally, I believe both Programming the Semantic Web and Beautiful Data will be out in July. I’m not sure of the exact date, but we’re hoping that everything is out in time for OSCON.

June 26th, 2009 | Category: Books, Data | Comments (158)

Why Semantics?

In February I gave a tutorial and a talk at the most awesome conference ever (go Tash!) called Webstock, in Wellington, New Zealand. The talk was called Why Semantics, and was essentially about the ideas behind the semantic web and why they’re interesting to normal working developers. After I gave the talk, I had several famous (at least to me) developers tell me that they finally got it, had made many of the data-modeling mistakes that I outlined, and no longer thought the Semantic Web was all hype.

The video was just uploaded to Vimeo by the Webstock team:

(if the embed isn’t showing, you can find the video here)

And here’s the abstract:

Ever since there was a web, people have been talking about the “semantic web”, which is always just around the corner. Even though this hasn’t exactly gone to plan, people working on the ideas behind semantic data modeling have actually come up with a lot of cool stuff.

Modern web development is very concerned with rapid iteration, which has led to the increasing popularity of lightweight frameworks built on dynamic languages such as Rails, Pylons and Django. However, most of us are still stuck using traditional data-modeling methods like relational databases which aren’t designed for constant schema changes. Further, because people don’t think about “standard” ways to share data, there are thousands of different web APIs, all of which have to be dealt with separately.

In this talk Toby will explain what “semantic data” is, how entities and data can be modeled using graphs, and show examples of modeling, integrating, and extending data models for large datasets. You’ll lean how the semantic models support rapid and iterative application development, and easy integration of existing databases. Toby will introduce fast scalable back-ends for storing and querying semantic data and show examples of semantic data already available on the web.

He’ll also briefly discuss how these approaches lead into the standards-based Semantic Web, and how attendees can find short-term value in adopting some of the Semantic Web standards and platforms.

Enjoy! Let me know what you think.

Update: You can find a PDF of the slides here.

June 2nd, 2009 | Category: Data | Comments (179)

My latest project: Freerisk

For the past few months, between writing books and my day job, I’ve been working on a project with my friend Jesper called Freerisk.

A few months ago after we first heard Tim O’Reilly’s “Work on stuff that matters” speech, we started talking about what issues, besides the environmental concerns mentioned in his speech, were import to us that we actually had the skills to work on. We came to the idea of how hackers could help the financial system, particularly when it came to evaluating default-risk of companies or looking for fraudulent behavior.

The financial system itself has always been very closed. The government republishes filings by the SEC in a variety of messy formats, but those who want clean data need to pay subscription fees and have very limited republication rights. So our plan is to make Freerisk a huge open data store of financial data taken primarily from company filings. It’s all going to be available to download or query using standards like SPARQL.

On top of that, there will be APIs for building risk models and submitting your results. We hope to show that “financial hackers” can come up with more interesting and accurate calculators that can model a wider variety of risk scenarios.

If you’re interested in this, several people have written about the project:

Harvard Business Review
Fast Company Interview
O’Reilly Radar on Freerisk

Innovation Lab Write up (Danish)

Next 6 Presentation Followup

We’ve also given several presentations. The O’Reilly emerging technologies conference was kind enough to make and post a video of our talk there (this was our first one, so it’s a little rough, but it should give you a good idea!)

We are looking for people who are interested in getting involved in this project. We have started a discussion group called Open Finance Hackers (just started, nothing there yet). If you’re interested in this at all, please email me and join the group.

April 17th, 2009 | Category: Uncategorized | Comments (164)

A crazy few months

Apologies for the lack of recent posts (I think you’ll forgive me in just a moment). I’ve had a crazy few months, but here’s what I’ve been up to, with links for stuff that you can pre-order and download!

Finished the draft of my second book, with my coworkers Jamie Taylor and Colin Evans. It’s called “Programming the Semantic Web” and it’s already listed in Amazon (the description there right now will be changed, trust me)
Working on collecting and editing essays for what will be a great collection, called Beautiful Data. I’m not sure if I’m allowed to tell you who the contributors are yet, but I will say they’re fantastic and we were very lucky to get them.
I gave a 3-hour workshop and a 40-minute session talk at Webstock which was held in Wellington, New Zealand a couple of weeks ago. It was an amazing experience and warrants a whole post on its own. For now, the slides for both sessions are available as PDFs at kiwitobes.com/webstock/

And coming up, there’s still more stuff going on:

I’m giving a talk at ETech on March 10th. It’s about the failure of risk rating agencies and ideas for how the tech community can help
I’ll also be at Web 2.0 Expo giving another talk on Sources for Data Geeks on April 2nd
And I’m getting married on July 4th!

(because a lot of people ask me, the answer is: no, conference speaking is not even slightly lucrative. I do it for fun)

March 1st, 2009 | Category: Books, Data | Comments (130)

Personal data integration (part 1)

I’ve been toying with the idea of attempting “semantic integration” of a lot of personal data in my life. I’ll be sure to share more later, but so far I’ve managed to pull together my September phone records, my email history, my contacts, my calendar and my Facebook friends (via the API, not something sketchy!) into a single triple-store.

Using this data, I was able to create this chart, which shows my friend network (I have removed myself and Brooke, since we’re connected to everyone and it ruins the layout). The people who I emailed, texted or called in September are shown in green.

You can see tight clusters of my friend groups. The tightest is the big hairball near the bottom that makes up much of Brooke’s Stanford GSB class, but also clear are groupings for my friends from MIT, Chapel Hill, Boston (post-MIT return), my San Francisco tech friends and my family. My family is the only group that is isolated from the rest of the graph — everyone else is connected, which is partly because I’ve introduced some of these groups to each other, and partly just because it’s a small world.

Also good to see is that almost every cluster has at least one green node (my family notably doesn’t, but that’s because my parents aren’t on Facebook), so I’ve generally done a good job of keeping in touch with at least a few people from different phases of my life.

There’s a lot of talk about breaking the silos in the enterprise and, in the semantic-web community, data integration across the entire web. But right now, people don’t even have decent integration across their own personal information. The current proliferation of single-feature applications encourages you to store different aspects of your life in different places — the advantage of course, is that something highly specialized is much more pleasant to use, but the disadvantage is that there’s no way to query across these aspects. I’m interested in experimenting with ways that help people “break the silos” with their own information, in the hope that this will both yield useful applications and help us get a better grip on the bigger problems.

I now have code to keep my triple-store synced with my friend network, my contacts, my phone records, my email and my calendar. I can construct queries across all of this (who did I forget to call on their birthday? Who have I seen recently who went to Stanford?). I’ll be sharing this code at some point, but I want to see how far I can take this. I’m also interested in hearing from anyone who has tried similar experiments and wants to collaborate.

So, anyone have any thoughts on other sources of personal data or questions you might want to ask once it’s integrated?

October 14th, 2008 | Category: Uncategorized | Comments (142)

Web 2.0 NYC, Freebase UG meeting, and Taleb

A few quick updates:

I’ll be speaking at Web 2.0 in New York City this Thursday at 3pm. If you’re at the conference, find me and say hi!
While I’m gone, Freebase is having a user group meeting. Here is the info. Great speakers, you’ll seriously love the GeoSearch API
A new article by my favorite non-fiction author, Nassim Taleb, is at Edge. Highly recommended

I’m working on a lot of new projects right now, I’ll have more to share soon.

September 15th, 2008 | Category: Books, Data, People | Comments (66)

O’Reilly interview at OSCON

While I was at OSCON earlier this year, I did a 20 minute video interview with O’Reilly. I think the idea is to take a lot of interviews and edit them down to shorter segments for some kind of video supplement, but they’ve also posted the entire thing on Youtube.

I talk a little bit about my biotech experience, my book, working at Freebase and the importance of open data to new applications. The whole 20-minute segment is embedded below.

Let me know what you think!

August 31st, 2008 | Category: Uncategorized | Comments (129)

« Older Entries

kiwitobes.com

Posting on Google+

Visualizing UBS Analysts

My latest two books now available!

Quick updates: Wedding, Hack Day, Books

Why Semantics?

My latest project: Freerisk

A crazy few months

Personal data integration (part 1)

Web 2.0 NYC, Freebase UG meeting, and Taleb

O’Reilly interview at OSCON

About

Best Posts

Links

Software I Wrote

Recent Posts

Archives

Recent Comments

Meta