Extreme 2007—day 1
Posted in Technology on 2007-08-08 16:02
Hotel Europa, interior detail
Extreme Markup languages is a rather unusual conference, with an extremely technical focus, and an unusual mix of the theoretic and pragmatic. The sole criterion for getting on the program seems to be that the speaker must have something interesting to say, which is also unusual, but does, strangely enough, seem to produce excellent results. This is, I think, my fifth Extreme conference, which speaks for itself.
The conference is always held the first week in August in Montréal, which is an excellent combination of time and place. I'll return to that subject later, if I can. The last few years it's also been held in the Hotel Europa, which is definitely suitable, being a pretty extreme place in its own special way.
The first talk I attended was Thomas Passin on Easy RDF for Real-Life System Modeling. He skipped explaining what RDF is, and talks about how it's easy to use for modelling, because it's flexible and the modeller doesn't need to commit too much straight away. He also says RDF supports "hierarchicalish" data better than databases do.
He leads in to this via a discussion of how people like to view their data, and gets to a simple structured text format for RDF data. He's got a Python parser that converts this to RDF/XML. In effect, this is rather like n3, only it looks slightly different. Why he does it this way instead of using n3 he didn't say (that I heard). He's talks a lot about RDF's ability to remove duplicate data.
The talk seems to all be about using this text format to capture data easily as you go, without doing any up-front modelling, and just iteratively changing the ontology as you go. I can see how you could do that with this text format, or n3, or LTM/CTM, or even with Ontopoly. For a technical person it would be faster with a text format, though.
The parser produces a limited subset of RDF/XML, which makes the result easier to process with XSLT. So he has a couple of generic XSLT stylesheets that display the data for him. This part doesn't seem that exciting, since an RDF engine would make this even easier to do using SPARQL, without limiting him to an RDF/XML subset. This seems to me like a case of "if all you have is a hammer..."
What he lists as issues are: lots of identifiers to remember (true, but hasn't been much of an issue for me with LTM), no support for non-text data (probably more of an issue), and interoperability with other tools (what's the problem there?).
Fabio Vitali asked my question about why Tom didn't use n3, and Tom said he felt n3 had too many features that would feel odd for users not familiar with RDF.
Tom was followed by Michael Kay on Writing an XSLT (or XQuery) Optimizer in XSLT. Michael Kay has been maintaining the best Java XSLT processor (SAXON) for years now, and so is definitely the right person to speak on this. He rationale for doing this is that optimization is rule/pattern-based rewriting of expression trees, and that this is what XSLT does (except it works on XML, not expression trees). He says "this is so obvious it's kind of remarkable nobody's done this before, and I'm kicking myself for not doing it at the start".
He's transformed the XPath expressions into an XML format, and then does transformations using XSLT. His first example shows quite well why this is useful. He rewrites count(E) = 0 (where E is any expression) to empty(E). Note how close this is to replacing one XML element type with another. He shows more examples, emphasizing that people really do write the suboptimal examples, and that a key benefit here is that the optimizations feed on each other.
Of course, if the optimizations are to feed on each other, that means you have to multiple passes, which again means you have to decide when to stop. There's also a question whether to do optimization bottom up (children before parents) or vice versa. He says bottom up potentially means fewer passes (because when you get to the parents the children are already simplified) but at the cost of more expensive passes.
There are complications, of course. I'm not going to record all those, since it's pretty clear from the slides, anyway, and my summary isn't going to add anything. What's interesting is that many of his optimizations produce SAXON-specific functions that implement something very close to standard XSLT functionality, but with some simplifications that speed it up. I've seem many cases where the same thing could be done in tolog, and we really ought to do the same thing at some stage.
He notes that his Java optimizer is about an order of magnitude faster than his XSLT optimizer, but notes that this "this isn't bad, given that the Java optimizer was written over 6 or 7 years, and the XSLT optimizer was knocked up in a day". (He implies here that the XSLT optimizer does a lot more.) Indeed, not bad at all. However, he's not sure that the XSLT optimizer can be, uh, optimized, although he is sure that it cannot be done easily. The key, he says, is more efficient lazy tree construction, and then shows an image of a dog eating dogfood. Message taken, I think.
Martin Bryan speaking
The first speaker after lunch was Martin Bryan on OWL and the automotive repair information supply chain. It's an EU research project that's basically about trying to create a common EU market for automative repairs. He starts with an anecdote about a car breakdown near the French-German border, and then launches into a kind of OWL tutorial. Later on, he returned to the project, which is run by a big consortium. Basically, it's a portal that helps technicians and drivers diagnose and solve problems.
The reasons given for using RDF and OWL were need for multi-lingual support, that users don't necessarily know the manufacturers' terminology, and a few other reasons. As far as I can tell, they might just as well have chosen Topic Maps.
Strangely, they started with a UML model, went via XML Schemas, then a terminology set, and finally wound up with an OWL ontology. Not sure why they took this route.
What's curious is that they also use Topic Maps in the management of the project. In other words, not in the project, but in managing it. Martin didn't know anything about this, though, so he couldn't tell me anything more about that aspect.
Then Steve Pepper spoke on Dublin Core in Topic Maps. However, the talk before his got cancelled because of technical difficulties, and so his talk was moved forward by 45 minutes. This caused me to only catch the last 2 minutes of his talk. What I did catch, however, was the discussion after the talk, which mostly centered around the web's identity crisis. It mostly took the form of W3C people trotting out old W3C proposals and these being rejected by the Topic Mappers, with various distracting interludes about Topic Maps.
This was the last talk, so after this the conference adjourned to the Acqua Lounge in the Hotel Europa, which is as bizarrely decorated as the rest of the hotel.
Filed under: extrememarkup07
Trond - 2007-08-08 06:13:43
Sounds very interesting.
"The reasons given for using RDF and OWL were need for multi-lingual support, that users don't necessarily know the manufacturers' terminology, and a few other reasons. As far as I can tell, they might just as well have chosen Topic Maps."
Sounds like a good TM candidate, yes.
Being an EU research project, perhaps they were restricted to using both open standards as well as open source software. If so, they might've concluded that their options were limited to using OWL (?). Although I don't know too much about what OWL software is out there, it is sad to say that not too many OS topic maps applications projects are up-to-date / actively maintained...
Lars Marius - 2007-08-08 10:07:11
I don't think there's any reasonable definition of "open standard" that does not include Topic Maps. It's an ISO standard, after all. It's more likely that they did this simply because John Chelsom, CEO of CSW (which employs Martin & co), is really into RDF.
Trond - 2007-08-09 03:51:14
Yes, of course TM is an open standard. I was referring to the lack of up-to-date open source Topic Maps software ;)
Ant S - 2007-08-16 05:44:53
Both comments pretty much correct; the project (MYCAREVENT) arose partly out of an OASIS draft metadata spec in RDF published in 2003. Topic Maps would have been possible for sure, OWL also did the trick. We used Protégé for the OWL modelling, then built a query/reasoning engine on top of the OWL using Jena and SPARQL.
The reason for developing the UML model first was largely one of process within the consortium (which has 20 members). We needed to build consensus around the model, and found that OWL/Protégé was not an effective way to do this - people found it too technical and hard to understand. They were much more comfortable with the UML, which was then used as a reference model from which everything else was derived.
Topic Maps within the project - we used Ontopia/Ontopoly to build a small topic map which managed and tracked development and integration within the project. This helped with the technical management of the project (11 implementors, 40-odd components), and also served to reinforce the principles behind ontologies to the partners.
Lars Marius - 2007-08-19 16:00:11
Thank you for the clarification, Antony. Much appreciated.
Peter Harder - 2007-09-11 05:07:09
So TM is or is not an open standard? Sorry but how is it connected to RDF?
Lars Marius - 2007-09-11 05:20:24
Topic Maps are about as open as a standard can get. The standard in question is ISO/IEC 13250, which is published and maintained by ISO's SC34 subcommittee. Anyone who wants to participate in the work can send in comments, and if they want a vote they can join ISO.
Comparing Topic Maps and RDF is a bit heavy for a comment, but look here: http://www.garshol.priv.no/blog/92.html
Andreas - 2008-12-01 05:31:48
Thank you Anthony, now i see TM is an open standard.