Bill de hÓra

Copier Heads

Robert Scoble: "Every month longer that this deal takes is tens of millions in Google’s pockets. Why? Well, the real race today isn’t for search. Isn’t for email. Isn’t for IM. It’s for ownership of your mobile phone."

That was back at beginning of 2008, at the height of the Microhoo excitement. It's interesting to revisit these things.

Scoble said that this because he "met the guy who runs China’s telecom last week in Davos. He’s seeing six million new people get a cell phone in China every month."

That was 138 per minute, about twice the growth rate of internet/web adoption. In terms of world wide adoption of phones, Scoble was probably off by an order of magnitude. It's not the "next big game" as one commenter put it (Tim O'Reilly). It is the big game.

Steve Jobs: "Basically they were copier heads that just had no clue about a computer or what it could do. And so they just grabbed, eh, grabbed defeat from the greatest victory in the computer industry. Xerox could have owned the entire computer industry today. Could have been you know a company ten times its size. Could have been IBM - could have been the IBM of the nineties. Could have been the Microsoft of the nineties."

I read 99 comments back then. About half a dozen picked up on the mobile point. Everyone was talking about property rights on social graphs and inferred information, or web2.0 ad models, or search, or the importance of email. Google still seems to the webco that understands best the importance of mobile.

August 22, 2009
Posted by: dehora
Comments (362)
android msft yahoo google

Concision

David Pollack: "We can argue about whether Scala is syntactically simpler than Java. I find that my code - I did two years of Ruby On Rails work before I did Scala and I've done Java work since 1996 - I find that Scala is as concise as Ruby. I also find that Scala's type system doesn't often get in my way. Where I have to declare types in Scala is where I should document types in Ruby, and I can also use the type system as a way of defining tests. Scala's design lead to Lift's design. Lift's design lead to abstracting away HTTP in a way that we can do real time, or what appears to be real time, push, in a way that's syntactically and semantically pleasing for the end developer."

My observation here is that the last language I found as interesting as Scala was Python. They reel you in.

August 22, 2009
Posted by: dehora
Comments (426)
scala liftweb

Process and individuality are not exclusive

update 2009/08/22: Jason Yip also recommended "Lean Product and Process Development" by Allen C. Ward.

Gadi Amit: "Against that "unreliable" branded-personality design management, multidisciplinary agencies push the notion of large teams and a rigid process. The message of the process crowd is simplistic, "have a few more disciplines in place and we can create the winning product with the right design." Here comes the ethnographer and the strategist and the focus-group studies and the 500-page dissertations, and so on. I have yet to see any hard proof that these large processes yield higher rates of success in design. I have met more than a few large organizations that will not take this any longer. The process method managed to stifle creativity and nourish argumentative myopics while exhausting corporate budgets and personnel. The case of Doug Bowman, Google's just-resigned lead designer and the 41 shades of Blue sounds painfully familiar. As you churn out more creative work, more data-points and more "scientific" validation, your design never gets better. The process method justified large design budgets yet never reliably delivered. It catered to the corporate ladder that is now gone. It required time and the ability to commit resources that we've probably lost for the next decade. "

Steven Keith responds: " However, I cannot see how what you advocate can work, in reality.

I believe one of those incessant problems with design management is that everyone who cares or seeks methods to address their own challenges is too unique or idiosyncratic to borrow workable insights, processes or anecdotes from all the great thinkers and practitioners out there. I wish I had a dollar for every time I had a client proclaim they're going to do the Sapper or Ives thing. Whatever that is. Is this the consiglieri you speak of?

In the final analysis, I see a gorgeously articulated avalanche of design mgmt ideas, methodologies and articles anchored to edge cases like Apple, Google, IDEO and the like. They're easy to swan over and even fun. But, these edge cases are so disconnected from what will work for most. In fact, I see them doing potentially more harm than good. How many presentations at conferences do you see about really average companies that believe in the promises of design thinking but are struggling because they cannot bridge their "today reality" with the mythological fully integrated design business of tomorrow? There are good reasons why."

I see an ugly fact getting in the way of both these positions and it's called the Toyota Product Development System (TPDS), which marries a strong process and measurement culture with Amit's concept of a consiglieri role. In Toyota that position called the Chief Engineer. To argue either end of the process v creativity spectrum in product design, you have to be able to explain away the Chief Engineer role in Toyota and Toyota's phenomenal market success, which has made them one of the largest companies in the world (10th in the Global Fortune 500). In my sector, when people talk about design, they tend to obsess on Apple. They should also look closely at Toyota, who have a design system than transcends both individual brilliance and process.

I think most software people know about Toyota's methods through Lean Software Development and the Toyota Production System (TPS). TPDS is just as important to understand the product development angles. To get a grasp on the TPDS this book, "The Toyota Product Development System: Integrating People, Process And Technology" is highly recommended as is "Product Development for the Lean Enterprise"

August 19, 2009
Posted by: dehora
Comments (240)
industrial-design agile toyota lean

35 internet years

2001: "Thousands of new sites premiere every day. Most of them are built to support bad browsers intead of standards. It’s an epidemic. Enough already. We finally have good browsers. Let’s use them."

2008:"One accommodates Microsoft as one’s ancestors accommodated Imperial Rome. As a wiser man than me said, 'Render unto Caesar.'"

June 25, 2009
Posted by: dehora
Comments (211)

Installing buildr trunk (1.3.4 pre) on Ubuntu 8.10

Update 2009/04/11: Assaf has a better way:

"There's a snapshot of 1.3.4 you can gem install from apache.org without all the excessive dev dependencies.

sudo gem source —add people.apache.org/~assaf/build...
sudo gem install buildr"

WFM.

Buildr documentation:

To install Buildr from the source directory:

$ cd buildr

$ rake setup install

I got some errors doing that. This worked for me on Ubuntu 8.10

# cd /tmp
# wget rubyforge.org/frs/download.php/45905/rubygems-1.3.1.tgz
# tar xzf rubygems-1.3.1.tgz
# cd rubygems-1.3.1
# sudo ruby setup.rb
# sudo apt-get install python-setuptools
# sudo gem install echoe
# sudo gem install cucumber
# git clone git://github.com/buildr/buildr.git
# cd buildr
# rake setup install
# buildr --version
Buildr 1.3.4

This was to get to a post-1.3.3 Buildr to setup a Scala/Java project structure, as Buildr supports Scala compilation, plus I gather there's lots of good stuff on trunk. I still had to add require "buildr/scala" to the buildfile. As much as I prefer Buildr/Ivy for bootstrapping a project over Maven2, I wonder about needing a cross-language dependency chain (or gems) like this for doing Java/JVM stuff (such as having to install easy_install to get a gem set). Having never used it in a production/industrial setting it's hard to say. Otherwise, I do like Buildr.

April 11, 2009
Posted by: dehora
Comments (331)

Naked CSS Day

It's naked css day; at least for the web pages here that are not html on the filesystem. Part of me thinks this matters less and less each year - for me at least since most weblog information I consume through feedreaders.

April 9, 2009
Posted by: dehora
Comments (207)

A reasoned response to Scala/Ruby at Twitter...

Alex Payne: "Make things, measure them, have reasonable and respectful conversations about them, improve them, and teach others how to do the same." - Mending The Bitter Absence of Reasoned Technical Discussion.

as far as the current Ruby/Scala "debate" goes - I would say always bet on protocols and formats, the web being the prime example. Because as someone who likes Twitter immensely, I like that I don't have to care too much what Twitter is written in or what it runs on. I like that behind the server, the entire stack can be swapped out or ground up rewritten as the service owner sees fit, and as seems to happen with many popular Web services as they grow. That the Twitter API can persist across such internal upgrades is a wonderful thing. This is possible because on the Web, programming languages are an implementation detail. Including javascript/actionscript code on demand.

April 5, 2009
Posted by: dehora
Comments (266)

The Format Of The Long Now

Mark: "HTML is not an output format. HTML is The Format. Not The Eternal Format, but damn if it isn’t The Format Of The Now."

If that doesn't jibe with you, follow the link and view source on the markup around those statements.

Related. Now, view source on that link. Savor the irony.

March 27, 2009
Posted by: dehora
Comments (292)

Feature Creep

Joe: "The ultimate destination of programming language evolution is lisp-without-parentheses"

...with optionally typed function arguments.

March 27, 2009
Posted by: dehora
Comments (168)
programming

Backwards compatibility is commitment

Marc Andreessen: "That's a big deal, that's a very big deal. It's a very serious commitment for a company. Apple's had this commitment, Microsoft's had this commitment. It's what's called a commitment to backwards compatability. So you have to commit to never break anything. So you load up Windows Vista and it run the original Visicalc from thirty years ago, which was the original killer app on the PC, the original spreadsheet. So that is a long term institutional commitment, that takes a very serious company to be able to do."

February 26, 2009
Posted by: dehora
Comments (328)

Partial update is the problem?

Mike Amundsen: "...once you start introducing partial updates, you open yourself for caching problems. doing partial updates means all cached copies of the original resource are now invalid. "

February 4, 2009
Posted by: dehora
Comments (146)

"Just" use POST

Tim Bray: "But maybe Joe needs a bigger club, because I have to admit that limiting myself to GET and POST just doesn’t cause me that much heartburn."

I get asked a lot about PUT v POST, as do other people associated with REST based design. The question comes up online frequently as well (eg it's a regular topic on the rest-discuss and atompub lists). Usually it's in the context of updates via forms posting or how to change just a few data fields. "How do I change the title of an entry?" is a very common and valid use case. Forms posting is easy to code to and highly portable - almost all deployed client and server libraries support (and are often optimised for) forms posting.

The pro-REST answer is to use PUT. PUT means update the resource with this entity, which tends means "overwrite". Let's think for a moment about how that works for things like tags in a blog post - if I leave the tag out, am I saying remove it or ignore it? On the server side, a PUT to a resource involving embedded lists (eg tags in Atom/RSS entries) tends to result in ugly code when either the backing system is an RDBMS or the representation is any "joined" structure in the persistence layer - they'll have to diff what's persisted against what's sent, which for 99% of people means a "select for update" pattern (a double for loop cross-referencing the posted tag list with the database tags is a sure sign you've hit this problem). Yes, you can store the entity straight to disk or use a non-relational architecture - but now you have N indexing problems, something a relational database "just" solves for the 99.9% of developers who don't have a megadata problem.

So PUT often feels wrong or contorted to developers who literally want to mod a couple of fields. Hence PUT is much less popular in the wild than forms posting (all aside from the fact that PUT is excluded from HTML4 forms). In other words, people tend to see PUT as a heavyweight, sucking, POST. In turn they "just" use POST+forms.

Are we done? Unfortunately, no.

When does PUT v POST actually *matter*? It matters, as far as I can tell, when your resource stands for a collection, which is very common - folders, albums, feeds, collections, blogs, tagclouds, orders, a shopping cart - any list based structure.

Let's take AtomPub as an example - to add something to a collection using AtomPub, you use POST:

POST /collection
host :example.org
content-type: image/png

...binary...

Easy, and you can update that uploaded object later via PUT. Updates to the collections themselves are undefined in AtomPub. But let's ask, how would we do that? We could PUT the Atom feed (san the contained Entries) back to the collection URI. So imagine we want to change only the title - isn't an entire PUT of an Atom feed (san the contained Entries) verbose, inefficient and stupid for that simple usecase? We could "just" use a form post instead:

POST /collection
host :example.org
content-type: application/x-www-form-urlencoded

&title=foo

Ahh. Boom. Updating the collection in this way uses the same verb as the adding to the collection. How to tell the difference in client intent? The answer here for most people, will be to use the fact that forms posting has a specific media type - so the media type "qualifies" the operation. This definitely isn't REST style, as the verb is no longer uniform; at the same time it's not an abstract concern - there'll be a big switch in the code somewhere that looks for the media type - exactly the kind of thing good programmers hate. Let's remember that AtomPub servers aren't limited to blog posting - they can accept any media type they declare support for, adn thus can act as generic upload systems (if you have a stable network, more on that another time).

One workaround could be that if the client sent a corresponding "ID", like this:

POST /collection
host :example.org
content-type: application/x-www-form-urlencoded

&title=foo&id=http%3A%2F%2Fexample.org%2Fid%2Fefgfeacbe

the server could detect that the ID is present. It feels funky though, aside from having to map the field/keys in your precious snowflake format into forms parameters

Speaking as a member of the IEFT WG, perhaps we shouldn't have skipped collection updates in AtomPub as it would have made the overall constraint clearer - POST can't be used in the general case for updates to collections, ergo PUT is the only uniform approach to updating their content. On the other hand lots and lots and lots of people don't, won't (and sometimes can't) care about REST/HTTP/AtomPub arcana. So some part of me thinks we need patterns and practices to help developers jfdi.

Fwiw, like Tim, I can live with the forms POST option, to either update a collection or perform a partial write. But think about it for a bit - switch on type is a fairly ugly workaround. Not quite RPC, but problematic. Blog entries in turn are often collections (containing media), as are the folders you find in WebDAV and so on - it's not a problem specific to AtomPub.

So when you ask a pro-REST person about why not "just use forms" for partial updates instead of having to write out the entire data to send to the server via PUT, and they go "uhm, uhm,...", this is the kind of design kludge they're thinking about. Maybe you could PUT a form as a workaround for partials - I think that could work better than POST or having special "edit" URIs for anything collection-like. But as far it goes as I'm not sure we in the pro-REST community have a good general answer or design pattern for partially updating a resource. Until we do, I predict people will tend drop down to using forms posting as it's the easiest and most portable approach for deployed client libraries and web frameworks. That or define some other specialised media type for partial updates.

February 3, 2009
Posted by: dehora
Comments (329)
web http atompub RFC5023 architecture

Containerization

Dan Diephouse on Deployment : "I'm continually amazed at how hard of a problem deployment actually is. If you're going to be deploying any reasonably sized application you have an endless list of things to worry about:

    * Taking the cluster up and down so there is no downtown
    * Managing the configuration of individual nodes
    * Operating system setup
    * Installation of required libraries/3rd party tools
    * Managing dev, QA, staging and production deployments
    * Schema migration/database updates
    * How to do rollbacks

[...]

There are a few other interesting tools out there."

I think one reason that there are only a few tools for deployment is that it's a general end to end problem, technically and organisationally. When you understand the enormity and complexity of bringing up even middling size systems, never mind big ones where components are constantly failing, it can be an overwhelming thing to bite off. Very possibly it means altering existing build systems, or even how the organisation itself is arranged (since deployment cross cuts standard boundaries such as development, qa and operations). Which could seem like ocean-boiling.

Tools like Puppet and SmartFrog take the problem space head-on and look to be general purpose solutions, so I agree with Dan's pointers to them. As an example Dan links to Steve Loughran's deck on deploying a Hadoop cluster. They're well ahead of other FOSS tools that I know of and it's remarkable how few people know they exist. But to use those means skilling up and investing in their configuration language, which might seem arduous. Xen images, version control, language and distro packing all add more flavour to the mix - is your deployment unit a tarball, a warfile, a gem, a deb, an image, a git checkout? All of them? Knowing what the container unit is matters.

Hence you see people starting out with point solutions and dealing with either with problem subsets or specific pain points (code rollout but not configuration or health checks), app/framework specific tools like capistrano (deprecated afaik, thanks for the correction Bob), tail-ending sftp/tomcat tasks onto your Antfile, or in-house shell scripts. None of these scale up as systems get bigger or more layered.

If you won't adopt an external framework, probably the most important thing to do is get past shell scripts to a declarative configuration language so deployment configurations can be managed in their own right. Getting the data structures and component models that represent the state of your running system, right is very important (both puppet and smartfrog have ways to describes and compose systems). Otherwise you're going to being rewriting those scripts forever. This will make your shell scripts more like command line tools than one-offs.

"I hope that we start to see more core infrastructure managed by the infamous cloud people. Just write your app, upload, and tell it where to deploy. Then we can focus on building applications, which is what we really want to do anyway"

This offloading reminds me of the early promise of J2EE containers, but it turned into a vendor specific hell. I'd hope the hosted world can do better :) In any case, while good tools matter, deployment automation is as much about improved process quality.

January 25, 2009
Posted by: dehora
Comments (707)
hadoop Puppet SmartFrog Containerization Deployment

Format mappings and transitivity

Dare Obasanjo has responded to my post Format Debt: what you can't say by asking "Can RDF really save us from data format proliferation?". Quoting him, quoting me*:

"Bill de hÓra has a blog post entitled Format Debt: what you can't say where he writes

The closest thing to a deployable web technology that might improve describing these kind of data mashups without parsing at any cost or patching is RDF. Once RDF is parsed it becomes a well defined graph structure - albeit not a structure most web programmers will be used to, it is however the same structure regardless of the source syntax or the code and the graph structure is closed under all allowed operations.

If we take the example of MediaRSS, which is not consistenly used or placed in syndication and API formats, that class of problem more or less evaporates via RDF. Likewise if we take the current Zoo of contact formats and our seeming inability to commit to one, RDF/OWL can enable a declarative mapping between them. Mapping can reduce the number of man years it takes to define a "standard" format by not having to bother unifying "standards" or getting away with a few thousand less test cases.

I've always found this particular argument by RDF proponents to be suspect. When I complained about the the lack of standards for representing rich media in Atom feeds, the thrust of the complaint is that you can't just plugin a feed from Picassa into a service that understands how to process feeds from Zooomr without making changes to the service or the input feed."

Being a proponent is relative. I'm not sure I'm considered an RDF proponent in the RDF community, having been critical in the past ;) But generally, I can't agree with the argument. Under the hood, it's just mapping and there's no magic here - technically the language (RDF in this case, there are others) will either be able to express the mappings or it won't. For example, RDF can't map celsius to farenheit, but I know it can map foo:title to atom:title.

"The issue I'm pointing out is that either way a developer has to create a mapping."

Right; the questions really are how many mappings, where they are declared and to what extent you can stand over them as being sound. We've be doing this in code for years for syndication formats by mapping them into internal object models in code - every library then having its own mappings that might or might not be consistent. Dare mentioned MediaRSS and without an external configuration for extension formats, we'll have to do for MediaRSS as it appears in the wild today what we do for the 9+ RSS/Atom formats are out there. The double whammy as part of format of the Format Debt is it appears that MediaRSS needs to be mapped to itself in Dare's examples because parsing syntax can result in different dict/tree data structures.

"The problem with this argument is that there is a declarative approach to mapping between XML data formats without having to boil the ocean by convincing everyone to switch to RD; XSL Transformations (XSLT). "

Not quite the same thing (I'll explain why in a minute). XSLT is actually computationally more powerful than RDF - afaict XSLT could do the celsius to farenheit mapping. It can do knights tour.

"In my experience I've seen that creating a software system where you can drop in an XSLT, OWL or other declarative mapping document to deal with new data formats is cheaper and likely to be less error prone than having to alter parsing code written in C#, Python, Ruby or whatever. However we don't need RDF or other Semantic Web technologies to build such solution today. XSLT works just fine as a tool for solving exactly that problem. "

But XSLT is code. All we're saying by this is that XSLT code is cheaper and less likely to be error prone than Python et al. Which I can buy - an XSLT sheet done well can be an executable specification. All an RDF (or "interlingua") proponent will say is that RDF can be even cheaper and less error prone, and much of the reason not to adopt it is down to developer preferences, lack of familiarity, tooling and so on - i.e., much the same reason developers don't adopt XSLT, summarising the issue as "XSLT sucking".

Finally, I think you can easily argue that RDF/OWL gives more leverage for this kind of problem than XSLT, even though RDF is a computationally less powerful, because it allows you state relationships using formal semantics. For example if I write down that:

atom:title owl:sameAs foo:title

foo:title owl:sameAs bar:title

I can infer

bar:title owl:sameAs atom:title

without writing a line of code and I can use that on seeing new data. The predicate "owl:sameAs" is what the formalists call transitive and this reasoning at a distance is the kind of thing RDF proponents are on about when they talk about "semantic webs". OWL in particular has a boatload of such predicates, sameAs is probably the best known.

That kind of inference is not a remotely straightforward thing to do in XSLT. Rather than emulate Greenspun's 10th Rule by writing a half-baked, incomplete, buggy predicate reasoner in XSLT, you'll end up writing multiple XSLT sheets instead, and possibly trying to chain them together. This is the real problem with using XSLT in anger for this kind of work - it doesn't scale as the number of elements to map grows. In that scenario, people fall back to regular programming languages where you can useful data structures like dicts and lists to manage the element names and their associations. That's why things like the feedparser don't (and won't) tend to get written in XSLT. and it's why the mappings will have to stay as private details of implemetations for now.

* on reflection, I blame Abba Singstar for that particular turn of phrase.

January 12, 2009
Posted by: dehora
Comments (455)

Format Debt: what you can't say

Aristotle: "In passing, though, I have to note that it would be nice if we could do a better job of what media types tried to do with their type/subtype separation, ie. have a standardised way to specify a layering of specifity of formats, including multiple formats, so that it would be possible to say that a document is text, and specifically HTML, and specifically a combination of hCard+hTag+ hEXIF+image-link, and specifically a Flickr photo, so as to allow clients to know what the representation means without having to parse it, at whatever their level of understanding of the specified format.

I don't know if this would work in practice, after all the type/subtype thing in media types is mostly a failure. Maybe that was just because of it tried to constrain types to just two layers. It would also be necessary to do a better job of what media types tried to accomodate with the '+xml' suffix contortion, ie. make sure that types reliant on possibly multiple lower-level formats are expressible in a sensible fashion."

There are limiting returns on patching around media types and formats. This suggests doing a better job becomes increasingly harder. Let's call this "Format Debt". I think the media types construct is entirely inadequate for expressing mashed up formats in the way Aristotle wants and we will be limited to patching around it - the media type is deeply embedded into web architecture. I take a polarised position on this, because I think it's less important to be right that push the deb