TMRA 2007 — day 2
Posted in Technology on 2007-10-13 23:52
The first talk I attended was by Robert Barta on Knowledge-Oriented Middleware using Topic Maps (abstract). He says he had the idea 10 years ago, and that it's only now that he's been able to realize it. What he really wants to do is syndication of Topic Maps content, and to make it possible for Topic Maps fragments to float around a landscape of knowledge syndication peers.
He did a side-track on AsTMa 3.0, which is what he actually passes around. The syntax has taken the ideas of pidgin-English text interpreted as Topic Maps, shown by Lars Heuer last year, even further.
He then went through his review of the content landscape. He basically wants to handle the variety of Topic Maps representations via virtualization, that is, via adapters that can produce a Topic Maps view of resources that are not really Topic Maps.
The basis of his virtualization is the Tau expressions, which he has a text syntax for. This has support for lots of adapters that can be used to load or connect to content. There is also an adapter for merging, where the merging may or may not be really performed. There's a number of different kinds of filters, and TMQL queries are one kind of filter. He also has a Unix command-line-like way to express conversion, which can be conversion between formats or input/output of various kinds.
He then went into how he can use this for web services, and this looked a lot like the TMIP protocol he presented a couple of years ago.
The next speaker was Volker Stümpflen (abstract), and at this point my camera batteries ran out, unfortunately. His concern is not Topic Maps, but understanding complex biological systems. These are really complex, and so there is a lot of low-level information about them, and also a smaller amount of higher-level information. He wants to collect data from all these diverse sources.
He represents biological systems in Topic Maps as huge networks of biological transitions and reactions. They found Topic Maps very suitable for this, and easier to explain to biologists than RDF/OWL. One thing they use it for is to merge knowledge from different domains. They model the domains independently, but are still able to merge them.
There are lots of life science ontologies, where "ontology" is to be broadly understood, because these things are simple vocabularies, taxonomies, and full logical ontology. The majority are of the simpler types.
He gave an example from their Topic Maps-based portal, showing protein structures. They have hundreds of databases and lots of content that could be text mined. Altogether it was estimated (4 years ago) at 1-2 Petabyte.
He then showed examples of Topic Maps-like knowledge in free text, and they have a tool called REBIMET for mining this out. It first does entity recognition, with a list of known terms and Lucene searches. They then do the relations with the ASSERT tool (Pradhan et al 2005), which uses semantic role labelling and cooccurrence. (No, I don't know what those are.) They create SPA structures for each verb, and then map them to associations. Each association created by text mining has a connection back to the text via reification of the association. They've created PSIs for their topics, and can therefore merge the mined information with genome models and other content.
The result of all this is a huge set of data, collected from many different systems (listed in his slides). We are talking about some thousand gigabytes. It's too large for them to be able to create a single huge topic map out of this. Instead, they generate topic map fragments.
Their system, called GeKnow, is J2EE- and EJB-based. It has a semantic layer on top of all the data sources. (Hmmm. Sounds a lot like Robert.) There is a manager that does merging and retrieval of fragments. There are also Java classes that convert resource-specific formats into Topic Maps. Behind all of this is a huge storage architecture. He mentioned in passing that they have one MySQL database per genome, which makes 500 databases. The portal they built is based on Java Server Faces, but they're not happy with this, and porting it to more generic portlets based on XSLT.
They want to open source their system so others can use it, partly because this will help spread the word about Topic Maps. Things they might do in the future include are:
- Visualization of the topic maps,
- Add support for TMQL and TMCL,
- Maybe use OWL, because many ontologies are being represented in OWL,
- Add support for exchange of XTM fragments.
They are very happy with their system, and want to spread it. There are also others using Topic Maps in the life sciences in Germany. For example, the Helmholtz centers across Germany will use Topic Maps for their technology platform. He's hoping that they'll be able to spread this internationally soon.
His conclusion: Topic Maps are suitable even for data sets with 100s of millions of topics/associations.
In the Q/A session it came out that the Topic Maps engine they use is home-made, and they want to open source it. They also use TM4J in their conversion code.
Versioning of Topic Maps Templates
Volker was followed by Markus Ueberall. His goal is to support the software development process, by representing the concepts used by participants in Topic Maps, and to improve traceability to improve communication. He uses Topic Maps templates (which he spoke about last year).
I have to confess I worked on my slides for the open space session during this talk, so I don't have a summary of it. You can see more on the TMRA site though. Then there was a coffee break, which I spent hunting for batteries. (I have spares, but I forgot them at the hotel.) Eventually the kind staff of the building volunteered to get me some, which was much appreciated.
Ruby Topic Maps
Benjamin Bock spoke on RTM (Ruby Topic Maps), his Topic Maps engine written in Ruby (abstract). It has an RDBMS backend based on Active Record (a relational object-mapping tool), which enables him to plug in other backends as well. He does have support for import and export of XTM. The API has lots of conveniences that make it nice to use, but I can't really reproduce those here. He also has an API of enumerable sets that allow the user to emulate a query language in the API.
He showed use of the API, and it was really quite nice. He even has an API call for initializing the database schema. I recognize the convenience methods he's added to the API as the typical kinds of conveniences that one wants. The "query API" seems to pass in boolean conditions that are evaluated on the objects in a set, and it does really support both simpler and somewhat complex queries. You need to see the slides or the paper for this, though. He says he's finished the engine itself and the Rails integration, and the next step is to make it perform and scale.
He shows some performance numbers, and it's clear that performance with SQLite 3 is dismal. I note that performance with SQLite 3 in Apple's Mail.app is also ridiculously poor. As far as I can tell the problem is probably SQLite 3, and not his code. Performance numbers for MySQL would be nice for comparison.
He also wants to go out and create real applications with this engine, and he wants more users. So, if you're interested in Topic Maps and Ruby, you should definitely look at RTM.
I asked him about SQLLite 3 performance in the Q/A session, and he says he supports pretty much any database right now, and that performance with for example MySQL is much better than with SQLite 3.
Then Stian Danenbarger and Arnar Lundesgaard spoke on ZTM (Zope Topic Maps), which is the Topic Maps engine and CMS that's been used for many, many Norwegian Topic Maps-driven portals (abstract). It's open source, and was developed originally by Ontopia, but has since been reimplmented at least once by the Bouvet guys. They gave some background on Bouvet and their team, but I'll skip that here.
They want to support global knowledge federation (as Steve Newcomb calls it), but unfortunately the tools that ordinary users can use for this are not there yet. For them management of Topic Maps content is a key goal, and one they are working towards.
The start of the whole thing was a customer that came to them wanting to create a portal describing themselves. The customer said their organization was complex, network-like, and constantly changing. This was in 2001, just around the time when XTM 1.0 was announced. They decided to "make the leap", and basically went from not knowing Topic Maps at all and not having an implementation to building a customer solution within the scope of this single project. (This is why they got Ontopia involved.) They actually did all this within budget, on time, which is really impressive.
They built the system on Zope, which gave them a lot of help. It provided the object database (Zodb) and the publishing platform with lots of helpful functionality. That it was written in Python didn't hurt, either.
The next step was Forskning.no, and this was a bigger challenge, so they had to rethink ZTM 1.0. They wound up rewriting the whole thing and creating ZTM 2.0. In ZTM 2.0 everything in the topic map is considered content by the CMS, which means that the CMS gives them functionality for managing the topics, associations, occurrences, etc.
From this a simple message has spread in Norway, which is that for findability on the web, what you need is Topic Maps. According to them, this is good, but the problem is that very few people have picked up that this is also about flexibility and the ability to interchange knowledge. They showed a list of all the Topic Maps-driven web sites, and they had to reduce the font size quite a bit to get them all into one slide.
So why did this happen? Are Norwegians different from other people? They don't think so. I lost the thread at this point, unfortunately, so I can't expand on this.
They showed a demo of ZTM, starting with the home page of a user once logged in. They then created an ontology more or less from scratch. The demo is in the slides, so you can find it there.
What they want to do now is to turn ZTM into a real open source project, which means getting a community of developers outside of Bouvet working on it. Even though ZTM is on SourceForge they haven't really kept things up to date there, and haven't really started on attracting more developers. They ended with an invitation to others to dive in.
The first question was from Robert Barta about ZTM 3, and what's going to change there. Arnar said the main thing they want to do is to open up the text in the topic map and start to exploit the content of it more, and not just keep it as an opaque blob. He also said something about moving away from the topic-page concept, and more towards more complex pages, but I couldn't really hear this part.
Robert Cerny wanted ZTM to support a web service interface so that it would be possible to retrieve topic map fragments from it. He says this would make it possible to have real interoperability.
Xuân Baldauf and Dmitry Bogachev wanted other kinds of rendering from ZTM, such as XSLT-based rendering, or (in the case of Dmitry) YAML- or JSON-based rendering.
Then it was time for lunch.
Open space session
Then it was time for the open space session again. As usual this was a pretty rushed affair, with speakers racing through their stuff, and frantic laptop fiddling in between speakers. The mood was really good, though, and the content likewise.
Topic Maps and Web 3.0
First out was Graham Moore. He wants to define the relevant terms. Web 3.0 is really about distribution and aggregation of information world-wide. "Not some magical dentist appointment-arranging application." (A side-swipe at the famous Scientific American article by Tim Berners-Lee et al.) Web 3.0 is going to be RESTful. Therefore all topics must be addressable in multiple formats. Must use existing formats, like Atom. Topic Maps and their ontologies is going to be the basis for this.
The use of ontologies is going to make systems self-configured. We should build on the Web 2.0 success with simple APIs and simple user interfaces. The goal is a real semantically linked web of two-way typed links. There's a need for controlled vocabularies and the ability to share them.
Social networks have to be modelled with topic maps and will be a driver in the platform. Must have context-support. Obviously we want to move from tags to topics. There will also be a move to knowledge-centric publishing, replacing existing web CMSs with something that sucks less.
Topic Maps-based Wiki
Then it was my turn, to show a Topic Maps-based wiki I made that basically uses Ontopoly as the wiki editor, and still enables a very simple wiki application. I'll try to blog about this if I have time.
Knowledge is a Mountain
Then Dino Karabeg got up to tell us about his theory of knowledge, that it is more than a set of concepts and relations, because this is too much for anyone to make sense of unless it's structured. You can't just dump this kind of structure on someone and expect them to make sense of it. Dino thinks that knowledge is structured like a mountain, with the most important concepts at the top, and less important stuff further down. While I was typing this he started quoting Steve Newcomb, but I couldn't follow that.
Dino quoted a magazine he got in the hotel: "The future is not what it used to be. We have unleashed power we no longer know how to control." He used the quotes to underpin his message that our civilization needs more knowledge. He has two corrolaries to his initial thesis. The first is: knowledge is not a jungle. At this point he ran out of time, but while the next speaker got set up, he got a question. It was, of course: what is the second corrolary? Answer: a mountain-building tool. Robert Barta jumped up to say that "your mountain is not my mountain," and then time was up.
A Style Language for Topic Maps
Hendrik Thomas spoke about this. He said we currently have only graph-based visualizations, but we also need different kinds of graphical views depending on what problem they are meant to solve. He's made a tool that can import any topic map, and where you can create multiple views on the topic map. He uses this to build a visualization manually, using information from the topic map. (I never had time to find out how much information in the topic map influenced the graphical view.)
He wants a standardized language to define a Topic Maps visualization. Basically: topic map + SLTM = rendering, and he wants to be able to edit both in a wiki. Something like XSLT, but for Topic Maps.
What's wrong with merging
Rani Pinchuk says: imagine one topic map and two users. They both make changes at the same time. If they each have their own copy, and merge them after editing, any deleted information will come back. The same applies to changes, since the old version of a topic will be in the topic map where the topic was not changed. This means that merging does not always do what we want.
He uses an example from TopiWriter. At this point I was distracted by a new speaker wanting to join the list, and lost the thread. In the discussion afterwards Peter-Paul Kruijsen and I tell him the problem's solved by TMSync. Xuân says he's going to outline the solution in his 5 minutes.
Naito-san began by saying: "This is advertisement," drawing a laugh. He promoted the AToMS conference in Kyoto in December. Very funny talk!
PSIs for Versioning
Xuân Baldauf spoke on this. He says duplicated data may be edited simultaneously. Why not simply merge on synchronization? He shows an example that reifies statements and states what happens to them. He doesn't even delete them, but instead mark them as deleted. There are also timestamps stating when the changes were made.
Dmitry points out that the current reification concept does not support this, since reification is about creating topics for the real-world relationships represented by statements, and not for the statements themselves. They agree it can still be done, but it won't be right.
COBOL and Topic Maps
Then it was Dmitry Bogachev's turn, and he said that COBOL has useful features we should consider for Topic Maps. Object-oriented languages are good for representing things, but not knowledge about them. In fact, for a lot of what we want to do, these languages are not very suitable. However, many languages allow domain-specific languages to be built inside them.
He shows some Ruby code that does what he wants. Unfortunately he's not able to show the slide on the entire screen, so he winds up reading the code out instead. This is hard to follow.
He says TMQL, TMCL, and CTM provide a good basis for building a new subject-centric language. One that has support for many different kinds of data (date, time, multiple sources, provenance, security etc) in its core.
A Graphical Notation for Topic Maps
Maik Pressler wants this, and he's seen the GTM proposal. He is in fact writing a thesis on the subject. His impression is that the current proposal is very oriented towards data modelling, and he wants something that's more suitable for knowledge-modelling. Something similar to mind maps and other creativity techniques. He wants to explore this in his thesis. He has a list of requirements for this language, like that it should be easy to understand etc.
His approach is to evaluate the existing solutions, and to maybe propose something new if he feels it's needed.
Reification versus Annotation
Lutz Maicher then went back to Dmitry's comment about reification. There are two different use cases for reification: either to reify the relationship an association represents, or to reify the association itself qua association. TMDM does the former, but what if you want to attach metadata to the association itself, like creation time etc? He makes the point that this is the good old identity crisis, now inside Topic Maps themselves.
He doesn't want to propose a solution, however, and at this point the clock rings. In the discussion everyone agreed that using the subject locator of the reifying topic to point to the item identifier of a reified statement (or topic, actually) would make it possible to do this. There is no explicit interpretation of this in the TMDM, but there appears to be a general consensus in the community on what it means.
Topic Maps Wiki
Benjamin Bock reviewed what's already here: wikis, compact Topic Maps syntaxes, editors etc. Another approach (compared to mine) is to add CTM/LTM/whatever directly into the wiki markup, rather than having wiki markup in an occurrence. He shows an example of this in practice. He also shows what it might look like in a wiki page.
My reaction is: this looks nice for people who are happy to edit CTM. Many people would be, but probably it wouldn't work for normal users. It would also be difficult to make people stick to a single ontology.
This can be used to create Topic Maps from the fragments. Old statements can be scoped as old. You can export the collection to a full topic map. How to spread this? He wants to put this into MediaWiki and create a mash-up of Wikipedia.
In the break Peter-Paul Kruijsen was to give a demo to Naito-san of Morpheus's OKS-based AJAX framework called Kamala. So many people wanted to see this, however, that they grabbed a projector and one of the session rooms. So lots of people spent the break listening to their presentation of Kamala. It also transpired that Kamala means something very bad in Finnish, although none of the Finns would tell us exactly what. (So probably it is really bad.)
Man taking a non-break
Peter Brown gave the closing keynote, under three different titles, using the dialogue of Alice and the Knight to introduce them. Peter is concerned that the language we have for talking about information is the terms of 18th and 19th century bureaucracy, terms like "document", "file", "paper", "folder" etc. And he feels that the language we're being offered today is not necessarily much better ("resource"). We need, he says, a new lexicon (set of terms) to help people talking about this.
He thinks Topic Maps are a step in the right direction, because it clearly describes the separation between the real world and our abstractions of it. The terminology is also clean and simple to understand, without "purist reductionism". Topic Maps "enable semantic technologies with 'a human face'", unlike RDF/OWL which is more geared towards machine inference. People, however, still need to be in the loop.
He goes on to ask some questions about names, their importance, and their relationship to identity. He describes identifiers as "point identity", dimensionless handles that tell you nothing about the thing you are talking about, but lets you determine which things are the same. This is good, because identifiers should not say anything about the things identified.
Peter thinks current work on graphical visualization of Topic Maps is still only beginning, and that we need to do more work in this area. He claims rich visualization is possible, and shows one famous example. (You could claim his next slide is another example.)
He also thinks that one reason Topic Maps haven't taken over the world yet is that the topic maps that do exist are not connected to each other. In his opinion we need to extend the current set of standards to include exchange protocols to make this possible.
He listed a set of advantages that he thinks Topic Maps have over other technologies:
- The new lexicon for information that makes sense today,
- The cleaness and human-friendliness of the model,
- Subject identifiers,
- That mapping knowledge really appeals to people, and
- What you can do with distributed Topic Maps.
Of course, as Peter says, we are not quite there yet with the last bullet point, but as the conference has shown, people are working on that.
Then it was time for Lutz to give some closing remarks, and the conference was over. Everyone agreed it had been a very good conference, and as usual, we walked over to Ohne Bedenken for a glass of gose. Then something interesting happened, but since it has nothing to do with TMRA, I'll come back to it later.
Kevin telling a story...
This is a semi-live report from the first Topic Maps conference: TMRA'05 in Leipzig
Read | 2005-10-06 16:33
As usual, the conference was opened by Lutz, who gave a short introduction based around the conference motto of "Scaling Topic Maps"
Read | 2007-10-11 18:13
Rolf Guescini - 2007-10-12 09:40:40
Hey Lars Marius, thanks for another update! I really would like to get in touch wit Mr. Bock and his RTM, do you have hook-up information ?
Lars Marius - 2007-10-12 09:55:19
I need to tidy up the open space parts from these two days, and when I do that I'll add links. So, yes, you'll get contact info.
Lars Marius - 2007-10-18 03:41:40
I've added the link to RTM now. For anyone who wants to get in touch with Benjamin the easiest way is to join the #topicmaps channel on irc.freenode.net, where he appears as "bb".
Peter McCarthy - 2007-11-05 11:06:04
First of all, thanks for the TMRA 2007 posts. Iíve found them very useful.
Now, at the conference (which I didnít attend), a poster was submitted entitled: "Why Arenít Topic Maps Ruling The World (Yet)?" http://www.informatik.uni-leipzig.de/~tmra/2007/slides/redmann1_TMRA2007.pdf
Iím interested to know what your thoughts are about the issues raised by the authors. Do you agree with the authors, and Ė if so Ė how do you think the issues can be addressed (by the Topic Maps community) to accelerate the mass adoption of this cool technology?