Why XTM 2.0 is different from 1.0
Posted in Technology on 2006-12-16 19:17
Many people have asked what the changes between versions 1.0 and 2.0 of XTM are, and what the rationales for the various changes are. The actual list of differences can be found in the standard itself, but the standard says nothing about why they were made, and so I thought I would give a quick overview of that here.
A bit of background may be useful. XTM 1.0 had a number of design goals specified up front, but there was no clear decision on what kind of format XTM 1.0 was supposed to be. For XTM 2.0 we took the position that the purpose of XTM 2.0 is to allow topic maps to be transferred from one place to another. Period. This meant that functionality to make it easier for people to work with topic maps stored in files (without using Topic Maps software) was not a priority at all, and some of the changes follow from this principle.
XLink and XML Base
XTM 1.0 used both XLink and XML Base, but XTM 2.0 uses neither. In both cases the reason is the same: using these technologies made the specification more complex, and using them achieved essentially nothing.
XML Base, for example, allowed one to specify a base URI for the XTM document that was different from the URI from which the document was retrieved. This allowed the use of relative URIs for URIs based on the base URI inside the XTM document, which makes the document shorter and simpler. However, it's rare for XTM documents to have so many similar URIs that this is helpful, and the savings are in any case small, given how verbose XTM is.
The downside of having XML Base is that the specification has to explain what it means to use this attribute, and one gets into interpretation issues regarding what URI IDs should resolve to, etc. Cutting this functionality really loses nothing of value, and at the same time reduces complexity.
The same applies to XLink, since XLink doesn't really provide any functionality that XTM has any use for. All XTM needs is the ability to put URIs into XTM documents, and a simple href attribute is sufficient for the purpose. And just as with XML Base XLink presents challenges such as what do with the required (but superfluous in XTM) xlink:type attribute, etc. Again, removing XLink makes the specification simpler, and in this case there is no loss of functionality at all.
The name changes
A lot of elements have simply changed names in the two versions, and for the most part the reasons are simple. The complete list is below.
- The parameters element (used for scope on variant names in XTM 1.0) is called scope in 2.0. In XTM 1.0 variant names actually don't have scope, and so the "parameters" really provide a processing context in which the name applies. However, in TMDM this has become scope, and so the name had to change.
- The roleSpec element (used for type on association roles in XTM 1.0) is called type in 2.0. The reason is that in 1.0 roles are called members, and role types are called roles. TMDM changed this, and so the names had to change as well.
- The member element (used for roles in XTM 1.0) has changed to role in XTM 2.0, for the same reason.
- The baseName element is called simply name in XTM 2.0, for reasons of simplicity and brevity, mainly.
- Similarly, the baseNameString element changed its name to value for reasons I should think obvious.
We took out a few things in XTM 2.0:
- In XTM 1.0 all elements could have an ID attribute, which meant that if you wanted to you could refer to a specific topicRef element in a file, for example. However, only the IDs on elements which represented things that could be reified served any function. So when we came up with a better way to do reification (see below) we dropped the IDs entirely (except on topic elements).
- In XTM 1.0 the subjectIdentity and variantName elements were used as wrappers or containers, but the syntax would have worked just fine without them. In the interest of simplicity we removed both.
- In XTM 1.0 you could not just merge in another XTM file, but you could also add scope to all characteristics in that file when you merged it in. This was, in theory, supposed to allow people to track where statements in a topic map came from. In practice, however, it was hardly ever used, and it doesn't really do this tracking as faithfully as people think. Since we thought that this sort of "transformation operation" did not belong in an interchange syntax, we removed it. (We would have removed mergeMap, too, since this is just a syntactic convenience, but pressure from users was too strong to allow this.)
The structural changes
A number of structural changes were made in XTM 2.0, mostly reflecting how the abstract model of Topic Maps was tightened by the introduction of TMDM. Some also reflect how we learned more about the role and use of XTM in the five years from the completion of XTM 1.0 in 2001.
- In XTM 1.0 member elements could specify any number of role players (including zero). However, in TMDM, each role must have exactly one player, so in XTM 2.0 we changed the syntax accordingly.
- In XTM 2.0 the instanceOf element has been broken into two elements. instanceOf remains as the mechanism for specifying the type(s) of topics, but type was introduced for all other type specifications. This was done because the two relationships are actually different (and represented differently in TMDM), and we thought this would be clearer for users.
- In XTM 1.0 variant names could be nested, which led many to think that the name structure of topics was actually a hierarchy. In reality, the only thing the nesting did was to inherit scopes, thus allowing a more compact specification of variant names. We viewed this as a confusing syntactic convenience, and have never seen it used in practice, and so decided to remove it, which simplified the syntax without losing any functionality.
- In XTM 1.0 occurrences, associations, and roles were not required to have a specified type. We really could not see any reason to allow these to be typeless, and requiring a type in both the model and the syntax makes Topic Maps a simpler standard. Actually using this "feature" of XTM 1.0 is considered bad practice, in any case. Also, anyone wanting untyped occurrences (for example) can still do it by defining their own "untyped" type, so there is no real loss of functionality there.
A very few extensions have also been made in version 2.0:
- The version attribute was added to the topicMap element so that we would be able to make minor updates in the future. Processors should then be able to tell what version of XTM they were receiving by inspecting the version attribute. Since it's on the document element they can do this before they start reading the topic map.
- Support for typed data was added in XTM 2.0 in response to user requests for this functionality. This includes support for embedded markup. In the syntax this only shows up in the form of the datatype attribute on the resourceData element.
- Typed names is now supported. This was added for political reasons, but it was added because there was a real need for this functionality. Users were already typing names via scope, which was not very pretty, and allowing names to be typed avoided ugly modelling kluges.
- The reifier attribute has been added on all elements which represent something that can be reified. This isn't really an extension, just a change in how reification is expressed. In XTM 2.0 it is expressed by means of a reference from the construct being reified to the topic that reifies it. (In XTM 1.0 it was done by means of a subject identifier pointing to the reified construct, which wasn't really very pretty.)
Various people have asked for an introduction to XTM 2.0, since the actual standard is not very easily readable, and so I thought I would provide that
Read | 2006-12-09 14:17
Day 2 started with a presentation by Naito-san about a proposal from him and Komachi-san about a standard format for publishing PSIs
Read | 2005-11-14 00:35
Trond - 2006-12-19 11:34:09
Thanks for the clarifying the differences between XTM 1.0 and XTM 2.0.
This meant that functionality to make it easier for people to work with topic maps stored in files (without using Topic Maps software) was not a priority at all.
Fair enough - and I agree, but didn't one of XTM's strengths over RDFS/OWL use to be that XTM is easier for people to read?
At least that's what Pepper claims in his Ten Theses on Topic Maps and RDF ... but now you're saying that this is no longer an important aspect of XTM? Or?
(I'm not saying that the (reading) complexity of XTM has reached that of RDF(S)/OWL (triples)).
In XTM 1.0 member elements could specify any number of role players (including zero). However, in TMDM, each role must have exactly one player, so in XTM 2.0 we changed the syntax accordingly.
This is an important change. The fact that XTM 1.0 imposed no such constraint did "compromise" the whole interchange aspect, IMHO, as different developers could come up with different ways of doing the same thing -- in a non-standard way. Implicit semantics versus explicit semantics ... the whole point of an international standard...
Lars Marius - 2006-12-19 14:33:20
It's certainly true that XTM is a lot simpler than RDF/XML (which is what you have to compare XTM with on the RDF side), but whether this amounts to anything much when you comare the two technology stacks is another question entirely. I'm not sure it does.
In any case, I think XTM 2.0 is quite a bit simpler than XTM 1.0, so we sacrificed none of that in the process. What we lost was a couple of very marginal conveniences for people who want to maintain their topic maps as sets of XTM files on disk. Personally I don't think this was a loss at all, on balance.
It's true as you say that XTM 1.0 left a bit much open to interpretation, but I don't think the member-with-multiple-players issue really stood out in any sense there. The main problem was the absence of a data model that told you how to interpret the syntax. Without that, things like the players issue become a problem, but so did many other issues throughout (hierarchical variants or flat? etc etc).
As someone who's argued for six years now that the absence of a data model in XTM 1.0 was a huge problem with the standard, I can hardly disagree with your last two sentences... :)
Trond - 2006-12-31 17:12:22
"XTM is a lot simpler than RDF/XML (which is what you have to compare XTM with on the RDF side)"
Lars, I am curious as to why the XTM community "always" (papers, mailing lists, etc.) compare XTM to RDF/XML and not to e.g. OWL Lite. RDF/XML lacks the ability to define new classes and relations / ontologies, which is the purpose of RDFS & OWL (as you would know). Is it really fair to compare XTM to a technology which was not designed to do this, when alternatives / more expressive competing technologies do exist? I do understand Ontopia's (and other companies') motivation behind this, but is the XTM community really better off by ignoring OWL as a competing language/tool?
If I was new to XTM and in need of a technology in which to define my ontologies, I'd most def. not compare XTM to RDF/XML, but to OWL (which, according to Anne Creagan, may also be used in order to express Topic Maps ).
(After all, XTM's ability to express ontologies using XML is a very important aspect of the technology)
Lars Marius - 2007-01-04 10:52:48
Good questions, Trond, and ones asked by many others as well. They really deserve a more substantial answer than a comment is suited for, so I'll write a separate blog posting on this.
Frederik - 2008-04-17 08:27:21
Hm, you said that the focus was on machine-readability, and not human-readability (or authoring, editing, etc.). So basically the XTM represents the TMDM in an XML file.
But I think there is one big exception from this principle: The instanceOf element. It is quite crucial for human readability, since it directly tells the class/type of a topic. But for machine readability, it's pointless, since it is resolved into a type-instance association, anyway.
So I am curious about round-tripping and preserving human readability in Topic Map processors. When I deserialize an XTM file, I store instanceOf-information in an association. When I serialize it, it would be natural for a machine to express it as an association in the XTM file, as well. But human readability would be lost.
So should XTM processors keep track of the XML representation of a Topic Map? Or should they even express all type-instance associations with instanceOf elements when serializing?
This is of course not the case for CXTM, which seems to be that hyper-standartized, human unreadable exchange format. But XTM might require a different emphasis.
Lars Marius - 2008-04-20 12:53:17
Frederik, it's true that the <instanceOf> element is strictly speaking not needed. It's really just included because it's convenient, and because it reduces the size of XTM files quite a lot. Human readability was not really a concern, although it does help with that.
I think if you store <instanceOf> as an explicit association you should output it that way as well. The two forms of expressing the associations are equivalent, and XTM is not very readable anyway.
As for CXTM, that's really for conformance testing. See my blog posting on CXTM.