Wednesday, March 28, 2007

The ODF Validation Service

No, this has nothing to do with getting discounted parking if you use ODF, though that is an intriguing idea...

Daniel Carrera (OpenDocument Fellowship and the OASIS ODF TC) has a new blog and with it comes news of a new ODF tool, an ODF Validator Service, written as part of the Fellowship's ODF Tools project by Alex Hudson.

It is in the spirit of the W3C's Markup Validation Service: upload a document and get an instant report of whether or not it is valid ODF, and if not, what problems were found. I tried a few documents and it seems to work well.

(One thing I'd like to see is an explicit privacy policy statement on the Fellowship's web site, so the user knows what things may or may not be done with the documents they upload. This is good practice for any web service that processes uploaded documents.)

It would be interesting to see if something like this could be made into a flexible framework for scanning ODF documents, at various levels. Think of a SAX-like call-back parser but at multiple levels of detail. So the framework knows how to fully parse an ODF document and identify features at the Zip and XML level. Plugins to the framework can subscribe to various parse events. So, maybe a ZipListener interface that simply has methods onFile() and onDirectory(). Then a ManifestListener interface that allows you to subscribe to notifications of the data in the manifest. Then within a document, like a spreadsheet, you could have listeners at the structural and content level, so onWorksheet(), onCell(), or in a Wordprocessor document, onTable(), onImage(), etc.

A framework like this could allow you to make a range of applications that need to scan an ODF document and take some action on it.


The benefit of the framework is the reduction in code required to get directly to the info in the ODF document you want, without having to master the ODF specification or writing a lot of parsing code. Think of it as a framework for easy multi-level information extraction from ODF documents.

Labels: ODF

# posted by Rob : 3/28/2007 04:41:00 PM  4 comments (policy) links to this post  

Tuesday, March 20, 2007

Cannibalism

A interesting post by Bob Sutor. What is OOXML's real competition, and how does that help ODF? The dynamics get interesting when you are hindered by your own install base. The main selling point of OOXML is its claimed 100% compatibility with the legacy binary formats. But if you are using Office 2000, and happy with it, what is the reason to move to OOXML? Why not remain using the binary formats? What justifies the migration?

The downside is clear. The minute you move to OOXML you have less choice with whom you can successfully exchange documents with. Office for the Mac, Windows Mobile, WordPerfect Office, Google Docs and Spreadsheets, SmartSuite, ThinkFree Office, users of these products, and the numerous 3rd party applications that can read and write the binary formats, these are now outside of the universe of people and applications that you can exchange documents with. Despite some early attempts from Sun and Novell, Linux users are left out as well.

So why move to OOXML? From the CTO's perspective, if your greatest concern is legacy compatibility, what is the ROI argument for changing file formats? Wouldn't the tendency be to remain where you are?

So the breakdown may happen like this:

I think that B & Z may be the dominating factors. N is large now because it includes the inertial effects of Microsoft's market dominance. Even companies that don't make an explicit choice will end up with that path by default. But even the most passive company will not fall into choice A without some thought.

It is interesting to speculate on the initial percentages. But note that this is a network effect game, so the percentages will vary over time based on expectations.

Labels: ODF, OOXML

# posted by Rob : 3/20/2007 02:30:00 PM  16 comments (policy) links to this post  

Monday, March 19, 2007

ODF Freely Available

Another step forward for ODF. After gaining ISO approval in May, and Publication status in December, ISO/IEC 26300 is now counted among ISO's "Freely Available Standards". What is the significance of this? The text is identical to what it was in May, but you no longer need to pay 342 Swiss Francs to ISO to download an official copy. It is now free. Enjoy!

spacer

Labels: ODF

# posted by Rob : 3/19/2007 05:28:00 PM  1 comments (policy) links to this post  

Tuesday, March 06, 2007

Document Migrations

If you've been around this business for a while, you've seen your share of migrations. New operating systems, new networks, new hardware, even new document formats. I'd like to share some recollections of one such migration, and then some suggest a solution.

In 1995 I was working at Lotus on Freelance Graphics, along with many others, getting SmartSuite ready for Windows 95. One day, as I walked to work and rounded the corner of Binney Street, I saw something unusual, even more unusual than the usual unusual one sees in Cambridge. Something was up. There were news vans parked in front of LDB, camera crews and reporters looking for comments, Lotus security videotaping the reporters asking for comments, and me standing there, clueless.

This was how I first heard of IBM's take-over offer. It was hard to concentrate on porting to Windows 95 with all that news going on downstairs, but we managed.

In the weeks and months that followed there were many changes. At Lotus we were 100% SmartSuite users. No surprise there. Most of us did not even have a copy of Microsoft Office on our machines, unless we worked on file compatibility. Not only did we use SmartSuite for our collaborative work, creating and reviewing specifications, giving presentations, etc., we also ran some of our business processes on it. In particular we used an expense report application, done in 1-2-3 with LotusScript.

But IBM used Microsoft Office. So when IBM took over, we needed to migrate. Sure, there was whining and moaning and gnashing of teeth on our end about having to move to an inferior product. And it did take a little while to get accustomed to the different conventions of Office, typing AVERAGE() in Excel, rather than @AVG() in 1-2-3 and stuff like that. But we did it. We moved to Office. It was clear to all that the benefits of having a single file format outweighed the short-term pain on migration.

It is interesting what we did not do:

  1. We did not go and convert all existing legacy SmartSuite documents into Office format. What would have been the point? Most old documents are never touched again. Let them rest in peace.
  2. We did not delete SmartSuite from our hard drives. We kept the application there for cases where we needed to access old documents.
  3. We did not simply continue using SmartSuite and tell it to save in Office format. We knew that both fidelity-wise and performance-wise it is far better to use an application that supports a format natively than to rely on conversion software for interoperability.
  4. We did not translate 1-2-3 macro-based applications into Excel macro-based applications. We took the opportunity to move straight to web based applications. Aside from some standard presentation templates and similar boiler-plate templates we did not do a lot of conversion work.
Looking back in retrospect, the migration of file formats was one of the least contentious changes that accompanied the IBM takeover. We can handle file format changes, but eliminating the traditional Friday Beer Cart, now that was something to complain about...

I'm not much of one for committing unprovoked acts of methodology, but if I had to summarize what little wisdom I have in this area, I'd say that for a migration you want evaluate your existing documents by three criteria: stability, complexity and business criticality, and develop a migration plan based on that.

In the first case you classify documents by how stable (unchanging) they are:
  1. Hot documents — the documents that are being heavily changed and edited today, works-in-progress, in active collaborations
  2. Cold documents — the documents which are no longer edited, though perhaps they are still read. Many of these documents may have zero value and are just taking up space. Others may be valuable records, but hidden away on someone's hard-drive.
  3. Warm documents — These are the ones that are in the middle, not seeing heavy activity, but they aren't quite frozen either.

From the perspective of complexity we have:
  1. Low complexity — simple text and graphics
  2. Medium complexity — using more advanced features, created by power users
  3. High complexity — "engineered documents", using scripting and macros to create applications.
Finally you can also look at these documents from the perspective of business criticality. Of course, this will vary according to your business. It might be relevance to ongoing litigation, it might be according to a records retention policy, it might be whether it concerns currently open projects, etc. But for sake of argument, let's take client or public exposure as a proxy for criticality, so we get this:
  1. Internal use documents — internal presentations and reports
  2. Customer facing documents — engagement reports, proposals, etc.
  3. Publication ready documents — white papers, journal articles, etc.
These three dimensions — stability, complexity and criticality — can be combined, creating 27 different document classes. For example, our old expense report based on 1-2-3 macros would be classified as a hot, high complexity, internal use document.

So you are transitioning from Office legacy binary formats to ODF. What do you do with each of these document classes? You have four main strategies to consider:

  1. Do nothing and preserve the document in the legacy format, maintaining, as needed, access to the legacy application.
  2. Convert document to a portable high fidelity static representation, like PDF
  3. Convert directly to ODF.
  4. Reengineer as something other than a document.

So one migration policy might look like this:


Stability
Complexity
Exposure
Strategy
Cold
Low
Internal Use
Do nothing
Cold
Low
Customer Facing
Do nothing
Cold
Low
Publication Ready
Do nothing
Cold
Medium
Internal UseDo nothing
Cold
Medium
Customer FacingDo nothing
Cold
Medium
Publication ReadyDo nothing
Cold
High
Internal UseDo nothing
Cold
High
Customer FacingConvert to PDF
Cold
High
Publication ReadyConvert to PDF
Warm
Low
Internal UseConvert to ODF
Warm
Low
Customer FacingConvert to ODF
Warm
Low
Publication ReadyConvert to ODF
Warm
Medium
Internal UseConvert to ODF
Warm
Medium
Customer FacingConvert to ODF
Warm
Medium
Publication ReadyConvert to ODF
Warm
High
Internal UseConvert to ODF
Warm
High
Customer FacingPublish as PDF
Warm
High
Publication ReadyPublish as PDF
Hot
Low
Internal UseConvert to ODF
Hot
Low
Customer FacingConvert to ODF
Hot
Low
Publication ReadyConvert to ODF
Hot
Medium
Internal UseConvert to ODF
Hot
Medium
Customer FacingConvert to ODF
Hot
Medium
Publication ReadyConvert to ODF
Hot
High
Internal UseReengineer
Hot
High
Customer FacingReengineer
Hot
High
Publication ReadyReengineer


There may be a better way of expressing this above (Karnaugh maps anyone?) but that gives the idea. Also, I'm not suggested that this is the "one true answer", but merely that this may be a useful way of framing the problem.

Variations might include:


Much of this lends itself to automation. For example:

  1. First you need to find all of the documents in an organization. This could be done by an activeX control on a page everyone in the company visits, an agent that spiders the intranet web pages and file servers, etc.
  2. Each document is then scored.
  3. Finding the stability of a document could be done by looking at the last read and last write stamps on the file. Also can look weblogs. Maybe even metadata in the document that tells how many times it has been edited.
  4. Complexity could be determined by scanning the document to see what features it uses. Some features, like script, would weight heavily for complexity. Think of it as a "goodness of fit" metric for how well the features used in the document fit within the ODF model.
  5. Business criticality is harder to automate, but could be done based on owner of the document, metadata in the document, location of the document (public web page versus intranet), etc.
  6. Calculate the scores, suggest actions to take, and then automate the action. This could lead to a nice automated migration solution.

In summary, it probably is not worth while simply to go out and convert all of your legacy documents in a giant cathartic orgy of document transformations. Not all documents are worth that effort. In any organization you probably have many many documents that will never be read again, ever. You also likely have some very complex documents that probably should be reengineered as web applications on your intranet. The other documents, the ones in the middle, that is where you focus your migration effort.

Labels: ODF

# posted by Rob : 3/06/2007 08:00:00 PM  10 comments (policy) links to this post  

Thursday, March 01, 2007

OASIS Symposium and OpenDocument Workshop

OASIS will have its annual Symposium April 15th-17th in San Diego, with the theme, "eBusiness and Open Standards: Understanding the Facts, Fiction, and Future". It should be noted that this is not a real symposium, where guests recline in couches, drink wine and discuss philosophy to the accompaniment of flute-girls. On the other hand, it will have a lot of ODF, which is almost as good.


Bob Sutor will give the opening keynote. Scott Hudson will give a talk on, "DocTape: A Document Standards Interoperability Framework for DocBook, DITA, ODF and more!". I'll be joining a panel on Tuesday looking at ODF Interoperability and related topics. And Wednesday will be a half-day Workshop on ODF, with presentations on adoption, programmability, accessibility, interoperability and future directions.

Then back home on Thursday, my birthday. This gives my wife the rare opportunity to get a large present into the house without me noticing. Hint, hint...

Labels: ODF

# posted by Rob : 3/01/2007 06:42:00 PM  0 comments (policy) links to this post  

Friday, February 09, 2007

Once More unto the Breach

Stephen Walli has a blog, Once More unto the Breach. He writes mainly about open standards/open source, with a solid business/legal angle. He also has hands-on experience with standards development in the IEEE and ISO with POSIX, and an interesting perspective from his broad experience in the industry, including working with standards and community development issues at Microsoft.

Since his blog's title, like mine, is from Shakespeare, and he occasionally writes about ODF, I am doubly obliged to give a mention.

Looking at the posts tagged ODF, there is some good stuff. In Vendor-Speak: Microsoft and OOXML, he takes a close reading of some of the recent statements from Microsoft on standards and choice, a long-time confusion of terms that I had previously called a language game. Walli points out:

...Standards happen when a technology space matures to the point that customers are over-served and want choice to encourage competition. Customers complaining about price is the market signal. Competitors know they can collectively chase the incumbent vendor with a standard at this point, if they pick the right level of collective abstraction to standardize. This is how standards work in the marketplace. (I would hope the GM for standards at Microsoft knew this.)

A sympathy play isn't going to work here. Customers WANT the standard that encourages multiple implementations. True Microsoft support for ODF in their Office product suite would have been listening to customers. Complaining that the marketplace is competitive while shoving your own product specification through a standards forum is naive at best and arrogant in the extreme at worst.



Another good post, is How Microsoft Should Have Played the ODF Standards Game:

The interesting thing is to look back on the number of times a vendor with a single implementation tried to win playing an overlapping standards game. Looking at the UNIX wars I remember three occasions off the top of my head where this was tried over a long period.

  • "tar wars" over archiving formats.
  • The GUI Wars where OSF/Motif and Sun's GUI toolkit battled it out.
  • The sockets versus streams debate.

In each case, we ended up with two standards being forced upon us. In each case, the dominant technology that won in the marketplace was the one with the most implementations, with the other withering on the vine. Even when both specifications became required for an implementation to claim conformance to the single standard that included them, customers in the market used the specification which was most widely implemented every time.

Microsoft chose the wrong strategy here on multiple levels, betting against customers and the market in general. It may buy a bit of time, but will ultimately cost them more in the long run.


This is a theme that Walli repeats in several of his blog posts — the standard with the most implementations wins.

So a hearty recommendation for Once More unto the Breach, a blog that deserves a slot in your feed aggregator.

Labels: FGBFFW, ODF, OOXML

# posted by Rob : 2/09/2007 10:40:00 AM  0 comments (policy) links to this post  

Sunday, February 04, 2007

Declaring Bankruptcy

Lawrence Lessig called it email bankruptcy: when you have so many unanswered emails in your inbox that you decide to make a clean start and just admit to yourself, and to those who wrote, that you are not going to respond.

I have a related problem, interesting links I've collected and have meaning to blog about. But my links have accumulated far faster than I have been able to write about them. So I am declaring "link bankruptcy". Here is my fire sale, a set of interesting topics for only pennies on the dollar:

  1. Glyn Moody has the story about how platform dependencies has impacted one notable British institution.
  2. Even more startling results in Korea, as reported in The Cost of Monoculture and the Korean Saga.
  3. It is mainly in Polish, but some in English. More coverage of Open Standards in a new blog from Jacek Łęgiewicz.
  4. In case you missed it the first time around, here is a wonderful essay by Dan Bricklin on "Software that Lasts 200 Years". It made me think of what ramifications this has for file formats that aspire to longevity as well.
  5. This looks interesting. A free OpenOffice Calc add-in for doing "fuzzy math" in OpenOffice.
  6. Sweave adds ODF support to the open source R statistical analysis and graphing platform.
  7. Docvert, an online REST service for converting Microsoft Word documents into ODF format.
  8. I know someone was asking for this a few months ago — A Microsoft Works import filter for OpenOffice.
  9. Office Migration Planning Manager (OMPM) allows bulk conversions of legacy Office binary documents to OOXML. Does anyone have something similar for ODF? Not just bulk conversion, but detection and reporting of possible conversion problems as well.
  10. The eXtensibility Manifesto has some good schema design advice, including: #3 "Design of a data model focuses on all stakeholders' requirements for the data." #6 "Designs or components are not reinvented, but rather are leveraged where possible."
  11. "[Expert Witness] Alepin...alleged that the company [Microsoft]had subverted developers who used Microsoft's version of Java 'thinking they were developing multi-platform applications, but were actually developing Windows-specific applications' ". From PC Pro News.
  12. The Case For ODF -- a recent presentation from OpenOffice Community Manager Louis Suarez-Potts.
  13. "Office 2007 lacks some features of earlier versions of Office, and so it can't fully support some Office files created in earlier versions. For example, Word 2007 cannot open Word files that contain multiple document versions, a feature supported by Word prior to Word 2007". Anyone know what else is missing? From Directions on Microsoft.
  14. A few months old — European Cities Do Away with Traffic Signs. Does anyone know how this has turned out?
  15. Dashed Lines and their uses.
  16. David Berlind over at ZDNet: "To me, Ecma is not a standards body. As evidenced by the DVD situation (which is ridiculous if you ask me), it's little more than a puppet with a pipeline through which vendors can pump their proprietary technologies into the ISO standardization process (avoiding the rigor that should normally be applied to anything up for consideratoin as an ISO standard). As such, the ISO is sort of a joke too."
  17. "One trouble spot we encountered using Vista's Explorer metadata organization tools was the lack of support for some of the file types we commonly use. For instance, JPEG files happily take attributes under Vista, but PNG files do not. Along similar lines, Vista would not apply metadata to files we had created in the OpenOffice.org format. And, strangely, our attempts to apply metadata to documents created in OpenOffice.org—in Microsoft Office format—were greeted with an error message." From eWeek.
  18. What is a standard, according to David Rudin, Microsoft's official Standards Attorney? "A technical specification that enables interoperability between different products and services and is either 1) intended for widespread industry adoption or 2) has achieved wide spread industry adoption." This is a nice write-up.

Labels: ODF, OOXML, Standards

# posted by Rob : 2/04/2007 07:00:00 PM  10 comments (policy) links to this post  

Friday, February 02, 2007

Introducing ODF 1.1

ODF 1.1 is now officially approved as an OASIS Standard in a ballot which ended Wednesday. Accessibility Subcommittee Co-Chair Peter Korn breaks the story.

I played but a bit role in this story, though I watched it unfold with amazement. It was late 2005. A colleague mentioned that there were rumblings of concern in Massachusetts about their recent decision to move to ODF and what impact that would have on persons with disabilities. Although I am not an accessibility expert, I know the basics. (Every programmer should know the basics of accessibility, as well as the basics of internationalization, typography, human factors, performance, security, law, technical writing, project management and how to present and receive business cards in Asia).

Initially, I suggested that a file format has no relevance at all for accessibility. After all, the hard part of accessibility, the integration with screen readers was all at the application and operating system level. What difference could a file format make? The file format is not even involved except when loading or saving the document, right? But since knew ODF, I offered to do a quick spot check and report back. It wasn't long before I was able to demonstrate a handful of places where data necessary to enable accessibility was not described in the existing specification. For example, although an imported image allowed an annotation of alternate text for use by screen readers, an OLE embedding did not.

To err is human, but what happened next was extraordinary. There is a natural tendency to shrink away from criticism, to retreat inward and retrench, and at all costs avoid admitting errors. But I personally believe that every time we are corrected or criticized, it gives us another opportunity to show our character by how we handle it. The unchallenged person may be a gentleman or a scoundrel. You do not know until he is under pressure. So it is notable that the OASIS ODF TC overcame its accessibility problems not with defiance and not with acquiescence, but by enthusiastically embracing the challenge, engaging the critics, including the aggrieved community, bringing in the experts, both from OASIS member companies as well as outside invited experts, and working within an open and transparent standards development process, rolled up its sleeves and got to work.

The OASIS ODF Accessibility Subcommittee first met on January 27th, 2006. They delivered their evaluation report on ODF accessibility in June of 2006, followed by contributions to the ODF 1.1 specification which was approved as an OASIS Committee Specification in October, 2006, and just this week was approved by the OASIS membership as an OASIS Standard. This took a few days over a year, start to finish.

This is w
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.