TMRA 2007 — day 1
Posted in Technology on 2007-10-11 18:13
Lutz opening the conference
As usual, the conference was opened by Lutz, who gave a short introduction based around the conference motto of "Scaling Topic Maps". He was followed by my colleague Axel Borge, who gave the sponsor presentation on behalf of Bouvet. Graham Moore did the presentation for NetworkedPlanet. They'd coordinated their talks, and used them to talk about why, in their opinion, Topic Maps have taken off in Norway. In their opinion, part of the reason is the close collaboration between local product vendors and consultants who do the actual projects.
Then, it was time for Marc Wilhelm Küster to give the keynote, titled Scaling Topic Maps. He gave a quick review of uses and representations of knowledge over the last millenium or so, such as hierarchical classifications, encyclopedias, etc. He compared Topic Maps-based portals to the encyclopedia, and found that they are conceptually not so different. To really go beyond the old paradigms, he claims, we need to be able to share knowledge across different repositories.
Marc Wilhelm Küster
He then moved to a use case, which is a research project using Topic Maps that he has been working on, called the eGovernment Resource Network. From the ontology he showed it seems to be a kind of standards registry, showing how standards are related and what they can be used for. They were using an open source Topic Maps engine called Py4TM, and also TMCore from NetworkedPlanet. They chose this architecture to be able to demonstrate sharing of Topic Maps information across different installations using different implementations.
The interchange protocol they use is based on the Atom syndication format, containing XTM fragments. Basically, each Atom item contains a Topic Maps transaction, that is, a set of logically related updates to a topic map that make up a unit. You can actually look at this yourself at http://psi.egovpt.org.
Marc then handed the microphone to Graham, who presented the more technical side of the use case. Graham demonstrated the end-user interface they'd built using TMCore, which is really a faceted search browser. Graham then returned to the subject of scaling Topic Maps, which according to him required more use of distributed Topic Maps, more compelling use cases, more collaboration within the community, and further evolution of Topic Maps engines.
Search and Topic Maps
The first speaker after lunch was Stefan Smolnik on search and Topic Maps (abstract), based on a practical case from the chemical industry. In the chemical industry, knowledge systems typically have a wide variety of information sources, and the challenge is how to collect and structure all of this in an effective way.
He reviewed the current state of text mining research, and his conclusion is that there are still limitations and problems with text mining. He then did the same for Topic Maps, and found some challenges, such as the effort required to build an ontology, and the lack of ready-made ontologies. Another problem was the low understanding of Topic Maps, and also inconsistent views on it, in customer organizations. He also compared the processes required for using text mining and using Topic Maps, finding them to mostly be quite similar.
His nutshell summary is quite good: it's a case of automation versus modelling. The first question was from Dmitry Bogachev, and was inevitably whether Stefan had considered using both. Stefan said he'd thought of it, but that the people he were working with basically did not want to do this.
Topic Maps at Nokia
Heimo Hänninen, Antti Rauramo, and Sirpa Ruokangas then spoke on a Topic Maps project to build a user portal for Nokia Siemens Networks (abstract). The starting point for the project is complaints from the users about the current portal: that they can't find information, that search is poor, that other portals have more advanced functionality, etc. Their goals are to solve this, build bridges between the information silos, and increase customer statisfaction. They are also looking to create process benefits (basically, simplify the content production process) and system benefits (by avoiding point-to-point integrations and using a hub-and-spoke approach instead).
The first phase is the prototype stage, which they have completed. The next phase is to adapt to the enterprise architecture, and here they have to pass an enterprise architecture review. This is quite a challenge, and they have not passed this yet. Basically, they need to convince the decision makers that there will be sufficient return on investment, and get buy-in on the technology choice from the IT people. Another challenge is that the necessary metadata is partly missing, and will have to be created somehow.
They then gave a quick demo of the system, which they call "Dynamo". They've given each product what they call a "product center", which is really a topic page for the product topic, showing an overview of everything that's known about the product. They also have relations to the various editions of the product, and a nice faceted browser for the product's documentation. There are also relations to other products, etc, as well as personalized information. The data size for the portal is not too bad: more than 600 products, with 1-5 variants of each, and 1-5 releases being sold.
One of their challenges is to fit into the customer's view of Enterprise Architecture. Is this to be considered Information Integration or Knowledge Management? Is it from IBM or Oracle? The difficulty here is that Topic Maps don't really fit into any of the categories defined by the customers. So this needs to be overcome somehow.
They built the ontology by doing workshops with the subject matter experts, and built it based on the most complex product structures, to make sure it could handle everything needed. Diagrams of the ontology are in their slides. Essentially, their model is a merging of a product data management (PDM) model and a content CMS model. I think this is something most projects find: that the ontology winds up being a mix of application domain models and content models.
The subject matter experts populated the ontology by hand, using Ontopoly. The web application was built using the OKS Navigator Framework. They've also built a web service interface to the topic map to simplify working with the topic map for programmers.
There are some performance issues, given that they'll eventually have 350,000 topics and equally many associations. They worry that synchronizing new data into this might be challenging. Their next step is to start using TMSync and DB2TM for this.
A Citizen's Portal for the City of Bergen
Then it was my turn to speak about the City of Bergen project, and obviously I did not have time to summarize my own talk. The slides are on the TMRA site. I'll try to blog about this when I can. We'll see when I can find the time.
After my talk it was time for lunch, and then immediately after lunch I was speaking again, this time on A Theory of Scope. Again the slides are posted.
Comparison of Topic Maps Constraint Languages
Rani Pinchuk presented Giovani Librelotto's paper on constraint languages, because Giovani was not able to come to the conference. He compared XTche, AsTMa!, OSL, and Toma, both in general, and using an example use case. I was too focused on my photos throughout this talk to really describe it. Sorry. You can see the TMRA site for more information.
In any case, as several of the questioners pointed out, the paper really ought to have included TMCL in the comparison.
Open space session
Then it was time for the open space session, where everyone who wants to can sign up on a flipchart, to speak for five minutes. There are no restrictions at all, except that each speaker gets only five minutes. I was the session chair, just as in 2005 and 2006, and, again just as before, I enjoyed it immensely. There's something about the lower threshold for speaking, and the way the speakers have to focus on the core of their message that makes this so enjoyable. The time constraints and general rush also somehow tends to put everyone in a good mood.
Topics as document proxies using Kamala
Peter-Paul Kruijsen spoke first, on how Morpheus uses Subversion not just for development, but also for general documents, such as meeting minutes. They have added a Subversion hook that makes topics representing each file in Subversion when they are added. So this is basically a CMS integration with Subversion as the CMS. Once the topics have been typed the schema helps the user see what information has to be added to the topic map. Peter-Paul then gave a demo of the system. He had ready-made queries for typical lists that he wanted, like "documents not yet classified". He also showed how the metadata made it easy to find information.
The whole thing was built with Ontopia's OKS, and using Morpheus's Kamala framework, which sits on top of the OKS.
Curriculum Management with Maius CIS
Quintin Siebers then followed with a demo of Maius CIS (MCIS), which is a curriculum management tool Morpheus has built for CIS (what's that?), again using OKS and Kamala. This has different stylesheets, and a nice AJAX interface. Single click to create an association.
Sorry my notes on this talk are not very extensive, but chairing the session, listening to the speakers, writing notes, and taking photos all at the same time is not trivial.
Development of TM4J 2.0
The TM4J project is currently dormant, but Xuân Baldauf is working on the 2.0 version. He's added support for importing XTM 2.0 by simply updating the XTM 1.1 import code. It does not support variants and occurrence data types, which will be added if someone needs it. TM4J 2.0 will also be updated to follow the TMDM, and will have a new API with TMDM names for methods and classes, Java 1.5 generics, and a simplified event model. The API will have a wrapper interface on each object that is compatible with the old TM4J API. There are two backends: one memory-based and one JDO-based with DB40. Xuân has had some problems with JPOX, but I couldn't pick up exactly what they were.
Xuân is not quite finished with TM4J 2.0 yet. He needs to work on unmerging, implement some more methods, and then commit to CVS. He has about 9000 lines of code that are not committed, partly because some small libraries he uses have to be open sourced first. (I didn't catch what libraries these were, and who had made them.)
Linking Topic Maps Engines
Graham and Marc
Graham Moore and Marc Wilhelm Küster returned to the subject of interchanging data between Topic Maps engines from their keynote in more detail here. Marc had originally signed up on the flipchart, but Graham spoke instead because he talks faster.
The basic idea behind this is to create an Atom feed describing a feed of changes made to a topic map. Each item in the feed specifies one update. The items (transactions) have a GUID, because you want to be able to use these GUIDs if you pass these changes on further down the line in another feed. So readers are expected to preserve these GUIDs. Local changes should of course get new GUIDs, because they are new transactions. Graham thinks this makes for a clean interaction between nodes.
He also mentioned one issue in XTM 1.0 and 2.0: in a fragment you don't know which topic is the main topic (or topics), and which ones are just stubs used as parts of the main topic(s). He doesn't really know how to solve this. In the discussion I suggested passing the ID of the main topic outside the XTM fragment itself, but as Graham pointed out there could be more than one main topic.
TMRM in OWL
Then Jack Park spoke on his work on representing TMRM in OWL. He found he couldn't do it OWL DL, but it was possible in OWL Full, which was a blessing in disguise, since OWL Full can do so much more. The reason given for switching to OWL Full is that the TMRM requires that each property type used to represent a subject must, itself, be a subject in the map. OWL separates properties from classes, and the TMRM requires that properties also be classes. This complicates applications written in OWL; they can validate to OWL Full, but not to OWL DL.
He didn't show slides, but showed RDF/XML in a tool called TopicSpaces-OWL, a Java application that he's written himself. The map includes subjects that support Tagomizer. The application, a work in progress, includes an incomplete editor, and a TouchGraph-based graph visualizer. It uses Jena for RDF manipulations, the H2 database for storage, Lucene for full-text indexing, and is primarily an exploration tool for developing topic maps in OWL using the the SubjectProxy/SubjectProperty representations of one variant of the TMRM specification.
Robert Barta talked about the process of creating TMQL, how it is difficult, because TMQL is supposed to satisfy a community, but it's hard to know what the community wants. Robert called it "flying in the dark". He showed the formal semantics, which are the definition of what the language actually does.
As Robert said, we want to finish the work as soon as possible. This means: we need feedback now! Quite soon the thing is going to be set in stone, and after that it is too late. It is not too late yet, so please read the draft and the tutorials, and give the editors feedback in any way that seems practical for you.
High-performance Topic Maps
Axel Borge then spoke on an approach to caching of Topic Maps query results that he and Graham developed in a project where they built the intranet for the Norwegian Mail. This intranet has 26,000 users, all of whom was to have a highly personalized home page. The topic map is about 2M TAOs, and they need to do 800 queries/second on this topic map. This meant that they needed to do caching, but since the pages were personalized this couldn't be done on the normal page level, and what they came up with they call CacheQube.
They store a trail of every topic and association type touched in a query and cache the result. If one of these is touched in an update you delete the cached query result. The result is that you can be sure that all cached results are accurate, because any outdated ones get deleted immediately. Since NetworkedPlanet uses TMRQL (which is basically a set of SQL views and functions) they can't actually build the trail while executing the query, and so the programmer has to do this part manually. It's usually simpler than it may sound.
He gave an example, but this was so quick that I couldn't follow it, but it's in the slides. It seems that Axel gave this presentation because he speaks even faster than Graham.
Semantic Search with Topic Maps
Then I gave a quick repeat of the core of my Extreme Markup talk from August. It's really about how it's possible to do a much better kind of search with Topic Maps if you make use of the semantics of the Topic Maps model. It was obviously impossible for me to take notes on this, but I'll try to write this up as a blog posting when I can.
Using Topic Maps to Teach Topic Maps
Steve Pepper was the next speaker. He's teaching a series of courses on Topic Maps at Oslo University College this fall, and showed how he had created a topic map via Excel spreadsheet of students, and converted them to LTM from a CSV export with regular expressions. He was interrupted by the Windows system update at this point, and when the dialog box saying "Do you want to update your system now" showed up, he exclaimed "Oh god, please don't!" Students create their own topic maps in the course, and Steve thinks would be good source material for a thesis on the kinds of mistakes newbies make when creating topic maps.
Steve closed by exhorting people to create more topic maps, because this is useful and, not least, useful for other people.
Scoping Subject Identifiers
Finally Lutz Maicher and Xuân Baldauf spoke on contextual subject identity. Subject identity, they claimed, is dependent on perspective. The current TMDM creates a single topic from all topics with the same subject identifier. However, they want to be able to scope the subject identifiers. If this is possible, then how many topics are produced depends on the context that's used.
This sparked an intense reaction from the audience. Steve Pepper, who had been fidgeting impatiently throughout the whole talk, immediately rose to say that this was just wrong. Graham Moore then got up to say that he agreed with Steve. Robert Barta was also against it, but said that Steve Newcomb was all for this, and that therefore the speakers should do this in TMRM, and not in TMDM. I jumped in to say that PSIs are used to mark agreement on identity, and that contextual agreement is not really a good idea. Having strict basic rules for identity is good. Identity is an equivalence relation, however, so theoretically one might have more than identity relation. I suggested they could play around with this in the TMRM if they wanted to.
At this point the discussion degenerated into a discussion of what makes good subject indicators, and I stopped taking notes.
End of day 1
This was the end of the first day of the conference, and so some of us went to Ohne Bedenken nearby to drink gose, and talk, and from there on to the conference dinner in the city centre.
At Ohne Bedenken
The first talk I attended was by Robert Barta on Knowledge-Oriented Middleware using Topic Maps (abstract)
Read | 2007-10-13 23:52
Day two started right off with two parallel tracks, and I went to the track on "Portals and Information Retrieval", where the first speaker was Sam Oh
Read | 2006-10-12 09:09
Stefan Lischke - 2007-10-11 06:16:44
Thanx Lars for keeping us updated ;-)
Rolf Guescini - 2007-10-12 02:27:57
It's god t know what's moving for us poor s###rs that have to stay at home ;)
Really interesting to read about how scoping of subject identity at least is starting t get mentioned openly ;) I see the the cases against it, but I agree with the "yay'ers" in that it would really be useful to scope SI's in a many-faceted world. It's probably a bitch to implement and parse but would make scoping and faceting much richer.
Steve Pepper - 2007-10-12 03:30:10
No, Rolf. It would absolutely *not* make scoping and faceting much richer. It would destroy the whole point of having subject identifiers. The result would be chaos and the antithesis of collocation - the basic goal of Topic Maps.
A subject identifier is the expression of the relation between the topic and the subject that it represents: between the signifier and the signified. It captures the *intentionality* of that relation, that is to say the intention of the person who minted the identifier with respect to the subject it is intended to identify. That relation is wholly in the mind of the minter; the subject descriptor (formerly known as the subject indicator) exists in order to allow the intentionality to be communicated to other people so that they can have the opportunity to reuse the identifier and thus facilitate merging.
I see no role in this scheme for context dependency - which is what scoping amounts to.
I would be interested to know exactly what use cases the proponents of scoped identifiers have in mind. Unless someone comes up with a convincing use case, I have to assume that those people have fundamentally misunderstood the concept of subject identifiers.
(Take that as a challenge ;-)
Marc de Graauw - 2007-10-12 03:57:48
I think Steve is right on scoped subject identity, but I can see where it comes from. Basically the problem is identity statements can be true or false. If I claim "Marc de Graauw's social security number is 123" and merge my topic map with another one which says "Steve Pepper's social security number is 123", this would make me the author of "The TAO of Topic Maps" and Steve an expert on HL7v3 Web Services. (And actually false and/or colliding SSN's are a huge problem in real life.) I think the desire to scope subject identifiers comes from this problem, but scoping identifiers isn't the right solution.
Subject identity can never be context dependent. Something is what is is, and nothing else. The relation between a subject and a piece of possibly identifying information can be context dependent - "123" can actually be my social security number in the Netherlands, and Steve's in Norway. But this shouldn't be solved by scoping subject identifiers, but by modelling the subject-SSN relation as an association, and allow TM authors to merge topics based on the presence of certain (scoped) associations.
Geir Ove Grĝnmo - 2007-10-12 04:09:02
The best candidates for scoping subject identities I've found to be item identifers, particularly those that can be considered *relative*.
Let's say that I import the 'foo' topic from the bar.ltm file on the local file system into another topic map. This could, let's say, result in a topic with an item identifer: 'file:/tmp/bar.ltm#foo'. It would be extremely useful if this item identifier was scoped with a topic that represented my local file system. In that case I would not run the risk of it merging with another topic that happened to originate from a file with the same name but on another user's file system. The same issue applies to topics loaded across the network.
Subject locators should be scoped if the identifiers are to be considered 'relative', e.g. file URIs and http URIs refering to non-global hosts like localhost.
Published subject identifiers on the other hand should not really need scope. Most of them, if not all, should be placed in the unconstrained scope. This as the whole idea of PSIs are based on the assumption that the assignee really own the namespace (i.e. DNS domain) in which it is located.
Rolf Guescini - 2007-10-12 09:56:50
Steve and Marc, I see the implications you are showing me, and I did not of course consider all the implications of scoping PSIs. But you all point to the PSI as a local measure, either as something that is relative to the intention of the creator or connected to its domain, meaning that we would, globally, end up with many PSI´s really talking about the same concept, in effect really becoming context dependent. In my mind, I was thinking of a future with global PSI repositories with scoped PSI's as having one unique identifier of course, but having some way of attaching context dependent metadata to it, making it possible to talk globally about the same subjects in situations where one cannot rely on ONE normative semantic for the subject in question. That is the general idea. Specifically one could imagine a global repository try to create one normative definition for "God". Wars have been started for less..
Trond - 2007-10-16 05:38:02
Lars Marius, your presentation of a TM-driven search during the open source session was very inspiring.
How "generic" is this search? Obviously, the search result listing was tailored to fit the application domain, so what I'm trying to say is: how generic is it outside of the presentational aspect (I assume very ;)?
And: do you have a live demo running somewhere (it doesn't seem to be part of your tmphoto application)?
Lars Marius - 2007-10-16 05:45:41
Trond: The actual search code is 100% generic. The only thing that's specific to the photo application is the presentation of search results.
There are several reasons I don't have it up on the web site. The first is that it's a bit too rough. The second is that I need a topic map annotated with the Ontopoly ontology etc, and I'm not really ready to make that the main topic map. And, finally, it's a bit tricky to install, since it's written in Jython, rather than Java.
Anyway, I still plan to blog about this as soon as I can.
arnoud haak - 2007-10-17 07:05:36
CIS is a foundation for dutch insurance companies. They enable insurance companies to exchange information between other companies, police and so on. Mainly to detect and prevent fraud.