Larsblog

My report on OOXML and ODF

Previous
Next

Three little fish...

Disclaimer: Work on this in the Norwegian government has been going on for years. I worked on this for four months, producing a 45-page report. This blog posting oversimplifies most of the way through in the interests of brevity.

The full report is here, and if you can read Norwegian you can post your feedback in the form on that page.

Ever since ODF and OOXML burst onto the scene in ISO SC34 I've tried to avoid getting pulled into the mess. I was quite successful at this for several years, until one day one our managers at Bouvet suggested we bid for a contract to write a report for the Norwegian government (strictly speaking, the Agency for Public Management and eGovernment (Difi)). The report was about whether to recommend/require ODF and/or OOXML in the Norwegian public sector. I couldn't come up with any valid excuses for not doing it, and so we sent in a bid, and in the end won the contract.

The context of the report is that in Norway the government issues a reference catalogue listing the standards that the public sector is required or recommended to use within various usage areas. Earlier versions of the catalogue required use of ODF in two usage areas, and listed OOXML as being "under observation". So for publication of editable documents on public web sites ODF has been the required format in Norway for a while now. (Note the word "editable"; documents which are not meant to be edited by the recipient must be published in either HTML or PDF.)

My task was to take into account recent developments in the field and make recommendations for how the report should be updated with regards to four specific areas of usage. Basically, this meant following up the "under observation" part. Should the catalogue also make OOXML required or recommended for some of these areas? Or something else entirely?

Method

Approaching a task like this was not easy. What recommendations would make sense? And how to justify them? What does the Norwegian public sector actually need? That last question gave me a place to start. If I could put together some use cases that should show me what functionality users would need. I could then check the description of that functionality in the standards, and also do some testing to see if interchange of documents using this functionality would work in practice.

So this is what I did. I came up with a small set of use cases for each of the usage areas. Very briefly, it goes like this:

  • Web publishing
    • #1: Forms (fill out, send in electronically)
    • #2: Templates (proposed templates for various kinds of documents)
    • #3: Contracts (proposed standard contracts, to be edited)
  • Attachments to emails from public sector to private
    • #4: Forms (receive via email, this time)
    • #5: Contract writing (with a private-sector supplier, for example)
  • Attachments to emails within public sector
    • #6: Collaborative authoring
    • #7: Interchange of budget data

The list was produced through interviews with colleagues and various representatives from the public sector. I realize the list is very short, but remember that most documents are not meant to be edited by the recipient, and for these documents the public sector is required to use HTML or PDF. Note also that, as you'll see, adding more use cases is very unlikely to change the final conclusion.

From these scenarios I then drew up a short list of the necessary functionality:

  • Basic formatting (paragraphs, lists, tables, etc; all use cases)
  • Change tracking (#5 and #6)
  • Comments (#5 and #6)
  • Spreadsheets with formulas (#7)
  • Spreadsheets with macros (#7)
  • Forms (protected against editing with a password; #1, #2, and #4)

Sunset on Canary Wharf

The specs themselves

Now, I was asked to consider two specifications only: ECMA-376:2006, which is the very first OOXML standard (not the one later published by ISO), and ODF 1.1. Together these two documents run to 6783 pages, which was a bit much for me to digest and consider in the limited number of hours I had at my disposal. I therefore decided to focus on the specification of the specific functionalities in the list above (except the first one), and to look for general reports of problems in the two specifications to get a feel for the quality of each.

For ODF 1.1 the results were basically as follows:

General quality
Lots of errors, and quite a few holes where things basically are not specified at all. The mistakes I found were mostly minor (that is, very limited in scope).
Change tracking
The handling of change tracking in running text is quite fair, but doesn't seem to be complete. Change tracking in tables, lists, formulas, etc is missing.
Comments
Looks perfectly fine to me. I'm not sure about the parts that describe how comments are placed, but then no-one seems to implement that, anyway, and positioning of comments is not that important.
Spreadsheets with formulas
This was my first surprise. The specification of formulas just isn't there. Section 8.1.3 of the spec discusses formulas, but is very vague. There's no formal grammar, no list of functions, no list of datatypes, and no evaluation model. Basically, it says formulas should start with "a namespace prefix", then "=", and has some informal prose on how to refer to cells and ranges. That's all.
Spreadsheets with macros
This didn't really come as a surprise: no macro language or API for macros is defined. There's a defined place to put the macros, an attribute for saying what language you used, and various bits of documents have places where you can put event handlers, but that's all.
Forms
There's a fairly big and detailed section on forms with various types of controls and so on. To my surprise, there are even mechanisms for connecting this to databases (not relevant for our purposes, but interesting, anyway). There's also a mechanism for making a section of a document read-only, and to do it you put a hash of the password into an attribute. Unfortunately, nothing is said about how to produce the hash, which rather reduces the value of the mechanism.

For OOXML the results were like this:

General quality
As everyone knows there's lots of errors and mistakes in the ECMA-376:2006 specification. Even the RELAX-NG schemas that come with it turned out to have syntax errors in them.
Change tracking
OOXML spends 120 pages on this, a lot of them duplicated. The functionality is very detailed, going into table changes, formatting changes, list numbering changes, etc etc. I couldn't digest it all, but as far as I could tell it was solid.
Comments
Perfectly fine.
Spreadsheets with formulas
People have made much of the date problem (no pre-1900 dates, 1900 is incorrectly specified as a leap year), but this part of the spec is mostly quite solid, and the date problem does not appear to be very relevant for the public sector. There is a formal grammar, datatypes, function definitions, etc etc. Yes, there are errors and so on, but at least it's specified in full detail.
Spreadsheets with macros
Essentially the same as for ODF: not specified.
Forms
This is described in extensive detail in the spec. The XML modelling of forms looks like it's a direct translation from the original binary format (which it probably is), so it's not exactly beautiful, but as far as I can tell everything you need is there and fully specified. The read-only protection mechanism is fairly complicated (because it's connected with the encryption mechanism), but again looks fully specified.

In short: ODF 1.1 has a huge gaping hole in it as far as spreadsheets are concerned and is full of errors and omissions. ECMA-376 appears to have all the necessary functionality, but is also full of errors. I made no attempt to judge which of the two has the greater density of errors.

Both specifications also have stability issues, although this is worse in the case of OOXML than for ODF.

The implementations

Wikipedia lists a good number of implementations for both formats, so I picked the ones that, as far as I know, have a reasonable set of functionality. Then, for OOXML I considered those which could write OOXML, and for ODF those which could write ODF. Interestingly, there was a reasonable number of each, and for both formats there was a choice of more than 2 implementations on each of the Linux, Mac, and Windows platforms.

In theory it therefore looked like both formats could be used. If, that is, the tools really supported the formats well enough. The only way to verify that was by testing. I made very simple test documents for each of the functionalities listed above in the reference implementation (MS Office for OOXML, OpenOffice for ODF), then opened these in the other tools. If successful, I would make some changes, save to a new file, and open in the reference tool again.

The results were downright depressing. For OOXML, in most cases none of the tools came up with usable results. For change tracking NeoOffice actually worked. And for spreadsheets NeoOffice and Gnumeric both worked fine. (IBM Lotus Symphony and Google Docs also read the spreadsheets correctly, but they can't write OOXML.)

For ODF, in most cases only IBM Lotus Symphony (which is really a fork of OpenOffice) was successful. For comments Microsoft Office (!) and AbiWord also got it right. For spreadsheets the latest Gnumeric for Windows also got it right.

In short, if you want to use ODF or OOXML today, then apparently for OOXML you must use Microsoft Office and for ODF you must use OpenOffice or IBM Lotus Symphony. Or, alternatively, you can use another tool and do lots of manual cleaning up.

I realize that the testing I describe here is very superficial, and I would not on the basis of this testing have made the claim that interchange between tools works. But most of these very, very simple tests failed. My conclusion is that if not even the simplest cases work then real documents are definitely not going to work.

Field, Flåm, Norway

Conclusion

By now I guess the conclusion should be obvious. I couldn't recommend either format. Both specs are of very low quality, and for neither format do you have much of a choice of tools. For the public sector this would essentially mean having to agree not on a format, but on a single tool to be used sector-wide. The purpose of creating standards should be to achieve interoperability, but in this case that just hasn't happened yet.

Having said that, ODF 1.2 looks like it will satisfy nearly all the shortcomings with ODF 1.1 that my report identifies. Similarly, it looks like the next OOXML version (ISO/IEC 29500:2008 amendment 1) will solve most of the OOXML issues. If the implementors follow up and improve their converters things will look much brighter. Unfortunately, this is going to take a couple of years.

So my conclusion in the report is that both standards should be listed as "under observation" for all usage areas.

(Note that this describes version 0.9 of the report. Version 1.0 is due within a month. Feedback over the next week or so is very much welcome.)

Now what?

If previous experience with the OOXML/ODF war is any guide, now follows the part where lots of people get very upset. That's life, I guess.

I went to this job with a genuinely open mind, curious about what I would find, and was really disappointed with the outcome. I knew the specs had problems, but I really thought they were better than this. That the tools were as poor as they are came as an even bigger surprise. In the end, given the results I got I really had no choice about the conclusion.







Comments

Rob Weir - 2010-05-09 17:05:46

Hi Lars,

Maybe there is more to back this up in your full report, but there appears to be a "logical leap" from your implementation tests to your conclusions about the standards. For example, OOXML has 120 pages on change tracking, which you say is "solid". But then you report that it was only interoperable with NeoOffice. Does your report explain why this is so? And why you conclude that the spec has problems, when that appears to contradict what you said earlier about that section of the spec being "solid"?

The Kesan and Shah study a few years ago had a similar approach, and that approach also caused their paper to fall short of illuminating what causes situations like this. Unless you are willing to grapple with what exactly is causing a specific interoperability flaw in an implementation, you are unable to distinguish between the cases of:

A) Implementations that are unable to implement a feature interoperably.

or

B) Implementations that are unwilling to implement a feature interoperably, or even to implement it at all.

Historical, we've seen A avoided even without standards, such as the level of interoperability with legacy 1-2-3 wk1 files or Office XP-era doc files, based on vendor file format disclosures. And B is still possible even in the presence of mathematically perfect specifications.

I'm not saying that it is impossible to identify an interoperability defect and trace it directly to a defect in the standard. But it is incorrect to assume that 100% of interoperability defects are caused by errors in the standard, and 0% are caused by implementation defects and 0% are caused by intentional efforts to reduce interoperability. To really understand what is going on, you need to talk to the vendors, and ask them why a particular feature does not work. In my experience working with ODF implementers, A is almost never the reason.

It is worth noting that standards like HTML achieved their highest levels of interoperability, not by publishing amendments and corrigenda, but by encouraging vendors to implement the standards fully. In other words, interoperability was only achieved by a change in attitude by the vendors, not by a change in the standard. Often we'll see a desire for interoperability to occur first with the vendors, then worked out technically and only then standardized. That's how we got EcmaScript. In many -- perhaps most -- cases the standard does not drive interoperability, but documents that interoperability that is already being achieved. To take it in the other direction is like saying a marriage license makes a couple fall in love.

If you look at the ODF story, we went from Microsoft refusing to implement it to Microsoft committed to implement it for at least the next 10 years. None of this involved the change of even a single line in the ODF standard.

In any case, you might also want to look at the W3C Note "Variability in Specification" which gives a much-needed framework to discuss the relationship between specification conformance and interoperability.

http://www.w3.org/TR/spec-variability/

Also, I wonder, as you took into account "recent developments in the field", did you find room to mention the several multi-vendor ODF Plugfests we've had? Or the work of the OASIS ODF Interoperability and Conformance TC? (We have a recent report on the topic you should read, if you have not already). I think these activities are very relevant. I don't read Norwegian but I see your report cites the Microsoft-sponsored work at Fraunhofer, but I didn't see any mention of any interop work at OASIS or the OpenDoc Society's Plugfests. I hope this is an oversight.

For example, "The State of ODF Interoperability" report here: http://lists.oasis-open.org/archives/oic/201003/msg00020.html

In any case, if you have indeed identified "lots of errors" in ODF 1.1, I hope you will submit a list of them, either to the OASIS ODF TC's comment list directly, or to SC34/WG6. Although I think the impact on interoperability is minor, we're always pleased to fix reported errors.

-Rob

orcmid - 2010-05-09 17:33:05

Be of stout heart.

I can't speak for the quality of the OOXML specifications, especially IS 29500. I have how uneven the specification can be in some spots, but I've never gone deep into much of it.

After an 18-month immersion in the OASIS OpenDocument TC, I can't fault your appraisal there either. ODF 1.2 will be tighter in spots, and there will be a spreadsheet OpenFormula specification, though some of the places you identify in ODF 1.1 are still dodgy.

Considering that you had limited time to carry out your appraisal, I would say that you have done a very creditable job. Bravo!

Jesper Lund Stocholm - 2010-05-10 03:03:48

Hi Lars,

The report you have written is very interesting and for the first time, it seems that the one writing the report has actually looked into each spec and not just copied unsubstantiated claims off the internet.

Thank you for this :-)

I made som app-testing using a "real" document some time ago. You can see the results at

Introduction:

http://www.version2.dk/artikel/12498-rm-standardbrev-2s-part-1

ODF:

http://www.version2.dk/artikel/13744-rm-standardbrev-2s-part-3-odf

OOXML:

http://www.version2.dk/artikel/13316-rm-standardbrev-2s-part-2-ooxml

:-)

I dissagree on your "do not include neither format" - but I'll see if I can include that somehow in my feedback on your report.

Lars Marius - 2010-05-10 03:15:54

@orcmid: Thank you. :-)

@Rob: First of all, thank you for taking the time to comment in depth. It's always valuable to get some critical feedback on a piece of work like this.

There is little in my report, or in my work, to indicate why only NeoOffice appeared to support OOXML change tracking correctly. OOXML documents with change tracking written by Go-oo wouldn't open in Word at all (not sure whose fault that was). And AbiWord ignored the change tracking completely. So I have little basis for saying why this failed, although my impression is that AbiWord doesn't implement this at all.

As you say, it would be valuable to know why the different implementations of both formats failed, but short of tracking down and interviewing the developers that's difficult. And in the limited time I had for this work there was just no way there would be time for something like that.

> But it is incorrect to assume that 100% of interoperability defects
> are caused by errors in the standard, and 0% are caused by
> implementation defects and 0% are caused by intentional efforts to
> reduce interoperability.

That's certainly true. Ultimately, for our purposes, it doesn't matter that much. We can't recommend this standard for use in the public sector if interchange doesn't work. Why it doesn't work is secondary.

Having said that, I put so much emphasis on the quality of the specs because they give some indication of what one can expect from implementations, and also to some degree point to what one can expect in the future.

The general picture I see in both the specs and in the implementations is one of immaturity. That is, the quality of both are headed in the right direction, but it takes time, and we're not there yet.

The report makes it very clear that I expect both the specs and the implementations to improve in the years to come, and that I expect the conclusions in the report would be different if it were written in, say, 2012 or 2013.

> you might also want to look at the W3C Note "Variability in
> Specification"

That does look interesting. I never heard of it before, but will have a look.

> Did you find room to mention the several multi-vendor ODF Plugfests
> we've had? Or the work of the OASIS ODF Interoperability and
> Conformance TC?

I saw these, but did not mention them in the report. However, now that you mention it, I realize that it would be useful to cite these in the section that discusses reasons to believe interoperability will be better in the future. So I'll do that in version 1.0. Thank you for suggesting it.

> I don't read Norwegian but I see your report cites the Microsoft-
> sponsored work at Fraunhofer, but I didn't see any mention of any
> interop work at OASIS or the OpenDoc Society's Plugfests. I hope
> this is an oversight.

I cite the Fraunhofer report because section 3.4 of the report is essentially about that report. It's also the source for a mention of a problem with list counting in 3.2 somewhere.

I wouldn't call not citing the ODF interop work an oversight, but I agree that citing it will improve the report.

> In any case, if you have indeed identified "lots of errors" in ODF
> 1.1, I hope you will submit a list of them, either to the OASIS ODF
> TC's comment list directly, or to SC34/WG6. Although I think the
> impact on interoperability is minor, we're always pleased to fix
> reported errors.

My main source for the assertion that there are lots of errors is the ODF error database, to which you kindly provided the link. I trawled the errors in the there, and checked some of them against the spec, and even described those in the report. I also found some errors either mentioned elsewhere or on my own, but as far as I can tell these are all in the database already (and all of them listed as fixed in ODF 1.2, if I remember correctly).

The exception is the problems with change tracking in ODF. I've discussed this briefly with Patrick, and he suggested I report this in the database, which I intend to. However, first I need to get version 1.0 of the report out. Then I want to take the time to write a proper error report.

Lars Marius - 2010-05-10 03:25:45

@Jesper: Thank you for those links. I may refer to those in version 1.0. It's interesting that your results are so close to mine.

Any feedback on the report would be extremely welcome.

Paul E. Merrell, J.D. - 2010-05-10 05:55:05

Here is the URL for Google's HTML cache of the full report, which can be auto-translated. http://webcache.googleusercontent.com/search?q=cache:CCbn_-slAWMJ:standard.difi.no/filearchive/bouvetrapporten_0_9.pdf+bouvetrapporten_0_9.pdf&cd=1&hl=en&ct=clnk&gl=us

Marius, you might consider linking to that page in your parent article for those seeking a translatable version.

Since I haven't read the full report yet, I'll reserve comment for now.

Doug Mahugh - 2010-05-10 12:28:43

Interesting report, Lars. I look forward to reading the final version (in an English translation, of course).

You mentioned that for ODF you must use OpenOffice.org or IBM Lotus Symphony. Given that those applications are from the same code base, and Symphony has recently been re-aligned with the OO.o 3.x code base, do you see those as two separate implementations, or two variations of one implementation?

Lars Marius - 2010-05-10 12:52:22

@Doug: I don't know if there will be an English translation. Difi says they are negotiating a general agreement for English translation services, and once that's in place they may have some reports translated if there are requests for them. This report might be a candidate, but it's really not for me to decide.

There is actually a section in report considering how to count the OpenOffice derivations. I looked at Go-oo, NeoOffice, OOo, and Symphony, and decided that Go-oo and NeoOffice did not count. Symphony I decided to count as separate, as apparently they parted company with OOo at version 1.1.4. However, the report does note that the tools are very similar.

Doug Mahugh - 2010-05-10 13:53:21

FYI, past versions of Symphony were indeed based on the OOo 1.1.4 code base, but the version currently in beta is based on the OOo 3.x code used by Go-OO and others.

See for example slide #4 from the Symphony v-Next presentation at last year's OOoCon: http://conference.services.openoffice.org/index.php/ooocon/2009/paper/viewFile/120/114

Lars Marius - 2010-05-10 14:07:43

I realized they were going to add OOo 3.x code into the next version, and said as much in the report, but these slides make it sound as though they're going a bit further than that. I guess I could say currently there are 2 different tools, and in the future there will be 1.5 different tools. Tough call, this.

(Anyway, thank you for the link. Nice to finally see some detail on Symphony.)

Stephane Rodriguez - 2010-05-11 09:42:56

Hello,

Funny I recently finished an article on application-level interoperability, taking Microsoft Office (Excel) task, and then taking OpenOffice (calc) out of fairness. Interesting results...

http://ooxmlisdefectivebydesign.blogspot.com/

Uwe Brauer - 2010-06-24 13:37:57

Hi you mentiones Fraunhofer report but may be you will find http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1201708 also interesting

Uwe Brauer

zhun guo - 2012-10-13 11:28:10

Is interoperability possible among web office sites? Ⅰ. Purpose
1. Online document transmission. As a part of Internet, is it necessary to realize direct sending and receiving documents among web office sites? Should online documents transfer protocol be “advancing with time, harmonious sharing”? Whether online documents transfer protocol should transition from the web prior stage (SMTP plus attachments) into web stages such as http or xmpp or html5 or web Socket or SPDY or not?
2. Online document interoperability. After completion of document transmission, can the interoperable webpage document be realized? In other words, the webpage editor of another website may open or edit this file, even may realize collaborative editing or visit cross-domain ?
3. How to generalize? What methods will be better to persuade existing 20 web office providers (such as Google docs, MS office 365, icloud, zoho.......) to accept this kind of interoperability cross platform?

Add a comment

Name required
Email optional, not published
URL optional, published
Comment
Spam don't check this if you want to be posted
Not spam do check this if you want to be posted

Last comments
RSS

Khaizarani Ibrahìm bako on What is an informati...

Bruce on Equivalence classes

Stig on Bitcoin: promises an...

Jon Bjerkelien on The curse of NOARK

Lars Marius on Impressions from Str...

Aad Kamsteeg on Impressions from Str...

Lars Marius on The curse of NOARK

Dave Pawson on The curse of NOARK

james s. on 7 tips on writing cl...

Lars Marius on Active learning, alm...