Extreme 2007—day 2
Posted in Technology on 2007-08-08 16:04
Hotel Europa, interior detail
The first talk I attended on day 2 was Patrick Durusau on Retiring your metadata shoehorn, which is really about a proposal for a more powerful metadata mechanism for ODF. As far as I can tell, what they've done is to extend the ODF schema, particularly IDs on lots of elements that did not have these before. They've also added more elements reminiscent of RDF/a, which they call "in-content metadata", for much the same purpose as that filled by RDF/a.
Unfortunately, there was too much email and other things to catch up with while he was talking, so I lost the detail of how this actually worked, but it did look interesting.
After this, I attended Roy Amodeo on Applying structured content transformation techniques to software source code. He starts from the observation that source code is also content, then goes through lots of use cases showing why processing source code can be useful. I skip those, since this is kind of obvious, in my opinion. He wants to convert all source code to XML, and looks at various ways of achieving it, based on the grammars of the languages, since these are already defined and do turn code into a tree structure. He shows an example of doing this with OmniMark.
The generated structure looks rather like what you'd expect, with grammar symbols turned into element type names etc. It is simplified, however, and not a mechanical translation. He adds some annotations in attributes, like IDs and the line numbers. He also has a ref element that's used for references (such as from a use of a variable to its definition). Strangely, the ref element uses the id attribute for the references, which is a bit confusing.
He gives an example scenario based on OmniMark, which added a short syntax in version 3, which could be used together with the old verbose syntax until version 5.3. Later, well after 5.3, people wanted help to automatically convert their old source code to the new style. He shows how this required quite complicated transformations. He implemented most of this in OmniMark as text transformations, but couldn't do everything. Now he's doing it via the XML approach explained above, and can cover more of the transformations.
He claims his approach covers about 90% of the effort, leaving 10% to be performed manually.
Patrick Durusau speaking
I spoke just after lunch on Semantic Search with Topic Maps. I think it went quite well, and it seemed to be well received. Will write more about this later.
Jose Carlos Ramalho spoke just after me, on Topic Maps applied to PubMed. He says most of the work was actually done by his ex-student Giovani Rupert Librelotto. PubMed is a huge database of articles on medical subjects. There is an XML syntax representing the content, and the metadata look reasonable. They have converted the content to Topic Maps using Metamorphosis, which is a Topic Maps tool suite they've written. He showed various parts of it, including a Topic Maps editor called XSTM.
XSTM appears also to be a kind of Topic Maps schema language, but it flew past too quickly for me to say for certain. However, they've combined it with an XML language for specifying XML-to-Topic Maps conversions, based on XPath. The conversion language looks quite powerful, but I don't really understand why they need XSTM. As far as I can tell they could just use plain XTM for the same thing.
They tried converting the whole of PubMed to Topic Maps, but Metamorphosis couldn't cope with that, so what they are doing now is to instead intercept query results, and only convert records found by the search to Topic Maps. This gives data of a much more tractable size. In the future they want to integrate MeSH subject headings in the data to weed out "false results" (in other words, improve precision).
After the break I heard Erik Hennum on DITA specialization by description and example. I find DITA really interesting, and am frustrated by my inability to find time to play with the combination of Topic Maps and DITA, which I think could be extremely interesting. To feed my frustration I listen to all talks about DITA that I can attend. (Or something like that.)
In general, DITA is a best practice for how to achieve reuse of XML content, and some simple technology to support that. In particular, they support subtyping of XML element types. Erik's talking about how example documents can be used to generate vocabularies, but I can't quite follow how he does it. This is quite annoying, because this looks interesting, but I don't quite get what is going on.
He shows an example that is a DITA vocabulary for expressing taxonomies in DITA. This is basically an XML vocabulary for taxonomies that's derived from the base DITA vocabulary. Interesting. I didn't know that existed.
The next speaker was Nikita Ogievetsky on Semantic resolvers for semantic web glasses. Unfortunately, I had to skip this presentation in order to catch my taxi to the airport.
Extreme was extremely nice again, as it always is. I'm really sorry I wasn't able to attend the last two days. Given that I leave for holiday Friday evening I had little choice, though.
Filed under: extrememarkup07
Scott Hudson - 2007-11-14 17:22:31
Hi Lars! I couldn't get travel authorized for either Extreme or XML 2007 this year, so sorry I wasn't able to see you again.
I, too, have been very interested in exploring how DITA maps could be expressed in XTM topic maps. I think it would be an incredibly interesting and powerful tool.
Unfortunately, I have also been too busy to explore this. I still continue to find a lot of resistance to XTM in favor of RDF. Frustrating, since I think XTM is much easier to understand and more powerful than RDF...
Hope you had a nice holiday! Cheers,