TMShare the Second
Posted in Technology on 2008-11-08 15:45
"Now I'm the client, and Marc here is the server"
Graham Moore and Marc Wilhelm Küster presented a new Topic Maps protocol called TMShare at TMRA 2008 this year. Many Topic Maps protocols have been presented already, mostly similar in conception, but TMShare is actually a completely new kind of protocol. Unlike earlier proposals it does not allow random access to topic maps on the server, but instead provides a feed of the changes to those topic maps. So essentially it provides a mechanism to replicate a topic map or part of one to another server. (I call this TMShare the Second because there was another TMShare protocol before this one.)
How it works
TMShare consists of four types of Atom feeds, starting with a top-level feed, which has one entry for each topic map offered by the server. Each of those entries links to a feed for that topic map, which consists of two entries: one for the topic map's snapshot feed, and one for its fragment feed. The snapshot feed has links to serialized (typically XTM) versions of the topic map as it was at different points in time. What points these are is undefined, but will typically be defined either by server policy (once a week?) or user decision (press this button to make a snapshot).
The most interesting part is what's called the fragments feed, where each entry represents a topic that has been changed. The timestamps on the entries allow the client to tell which changes are new, and the entries contain a full XTM 1.0 fragment of the changed topic. (That is, the fragment has all information about the topic in the server topic map.) There is also a clever little trick to account for deleted topics. So using this feed (and the snapshot feed to get started) clients can track changes to the server's topic map.
Getting topic map fragments from an outside source is of course only interesting if the local topic map is going to add more information to at least some of these topics. When a change fragment for a topic appears it's assumed that statements in the local topic map about this topic which do not appear in the fragment have been deleted on the server. The problem is that they could also have been added locally. So how to tell? TMShare annotates all received statements with item identifiers in order to track what came from the external server. (In fact, this can handle information coming from several servers.)
Overall, I think this is an elegant design that provides powerful functionality while still being surprisingly easy to implement. The Atom feed provides a level of indirection, so that one can easily imagine, for example, a user interface allowing users to decide whether or to apply each individual change. Or an automated service that does the same. Or aggregation of multiple feeds. And so on.
There is actually a server already publishing a TMShare feed (or an earlier version of the same protocol): the Norwegian Ministry of Education's topic map for the national curriculum. This feed would let client sites (such as the Norwegian National Broadcasting Company project) automatically keep their topic maps up to date with changes to the national curriculum. A number of similar scenarios can easily be imagined. So it's clear that this protocol is a potentially very useful thing and widely applicable.
Topic Maps coder challenge
Peter handing Lars the prize
Knowing that Metcalfe's law applies to a protocol like this, Networked Planet announced a Topic Maps Coder Challenge at TMRA, with money prizes for the three best implementations of the protocol. Unfortunately, the challenge was announced a bit late, and so Lars Heuer had the only entry in the contest, and so he collected the prize.
Graham said something about the contest continuing after the conference, to allow more people to participate, but I haven't heard anything more about it, and so don't know any of the details. I'm interested myself, because this could be very useful for tmphoto, and I can see uses for it for the OKS as well. However, for it to be applicable for this, two extensions are needed, so let me explain.
In many cases, it's desirable to let the web server just serve out content, and to do content editing somewhere else. This could be for reasons of architecture, security, or something else. Several of our customers want a setup like this, and in fact I also want it for tmphoto, since editing happens on my laptop, and is later pushed to the server. At the moment I upload an XTM file and then restart the server. This causes downtime while the topic map is reloaded, and it makes it difficult to switch to using a database on the server.
In this sort of scenario it doesn't work to have the client (the web server) poll the server (the machine where editing happens). In the case of the photo application my laptop has no fixed IP, and is often behind firewalls. Similar restrictions apply in many of the commercial scenarios, too. So what to do? A workaround could be to have the server upload all the Atom and XTM files to the client, and have the client access them locally. That's hardly beautiful, but it would work.
Another solution would be to use the Atom Publishing Protocol. This provides a way to use HTTP to push Atom fragments to recipients, instead of having recipients poll for them. In this scenario there would probably be only one feed (the fragments feed), but for this particular use case that would be no loss. It also requires the source to know about all listeners, but again that is no problem in this scenario.
When using the protocol in this way the normal issue of how to know which information comes from the server and which from other sources disappears, because everything comes from the server. This means that tracking added information with item identifiers is not necessary. In general, I think that this is one place where TMShare could be improved, because it's not necessarily a given that one wants all information from the server. TMSync solves this, by letting users choose what to update, and what to consider the server as master for. In fact, the item identifier convention can be expressed using TMSync as well. So maybe it should be replaced by TMSync.
Graham Moore a few years ago came up with the idea of publishing changes to topic maps using Atom, and a CEN project has now developed and published a specification for it called SDshare
Read | 2010-11-21 14:29
As usual, the conference was opened by Lutz, who gave a short introduction based around the conference motto of "Scaling Topic Maps"
Read | 2007-10-11 18:13
Lars Heuer - 2008-11-08 10:49:52
The 1st extension is only necessary if the web server should actively collect changes from other machines. Unfortunately you do not use the terminology of the paper. You call the thing which serves the Atom feed "client" and the machine where the Topic Maps engine lives "server". Anyway, if the server (your client) is passive, a client (your server) can post changes to the Atom server. And the Atom server may be password protected (writing and/or reading). Everything could work through HTTP(S). And if you do not like updates through HTTP you can decouple the Atom representation from the storage (i.e. for fragments). The storage could be be a database and the Atom server queries the database and creates the Atom feed based on that query. So you'd update the database and the server (your client) serves automatically an updated feed.
This is how the Semagia Atomico server works (or should work ;)). You can configure it to allow the POSTing of new fragments/snapshots but you can let the server run in a read-only modus where the server queries the storage about snapshots and fragments and creates dynamically an Atom feed (the storage could be a file system, a database, a topic map...). It would also be possible to create a TM/XML feed instead of an Atom feed if the client prefers a topic map.
Cannot say anything about the update algorithm of TMShare vs. TMSync since I don't know TMSync well.
Anyway, I'd like to see the "every topic must have a subject identifier"-limitation go away from TMShare.