Blog
Books
Talks
Follow
Me
Search

Larsblog

Extracting fragments with TM-Views

<< 2005-11-30 23:49 >>

Last time I wrote about how I used OSL to extract a fragment from a topic map. Today I've been working on the TMRA'05 paper on TM/XML, which also describes TM-Views. TM-Views is a mechanism for extracting fragments from topic maps created by Dmitry Bogachev, and I thought applying it to the example in my previous blog entry would be a useful way for me to really get to grips with it. I'd also like to see how TM-Views compares with using OSL.

So, here goes.

Extracting the fragment

The task is the same as it was last time: I want to extract all information related to TMRA'05 from my photo topic map. This means the event topic representing TMRA'05, all photos related to it by taken-during associations, all people appearing in those photos, all locations where the photos were taken, and all locations containing those locations.

The way to start is to define a view for events, which will then be applied to the single event topic for TMRA'05. This would look as follows in TM-Views:

<view xmlns="http://psi.ontopia.net/xml/tm-views/"
      id="event_view" name="Event view">

  <topic type="event">
    <identifier type="*"/>
     <basename type="*"/>
     <occurrence type="*"/>

     <association type="*" role="*"/>
   </topic>
</view>

This view says that for the event type we want all identifiers (that is, all identifying URIs), all topic names (presumably including their variants), all occurrences, and all associations. This seems fine, except that the associated topics (that is, the photos taken during the event) only get included as stubs with identifiers. That's not good, since we want to know everything about these topics, including what's associated with them.

I can do this, by putting a new <topic> element inside the <association> element to describe what I want to do to the associated topics. However, that's not enough, because I want to do this recursively. Unfortunately, Dmitry decided not to support that, to avoid infinite recursion. Had he supported it this would have been quite easy. As it is, things get harder.

Counterfactual detour

I spend this section on looking at how this would have been solved with recursion in TM-Views. The real solution follows further down. I could have defined a new topic type view, which works on all topic types, and made it the same as that for events. In fact, since it's the same we might as well have only one catch-all view, as follows:

<view xmlns="http://psi.ontopia.net/xml/tm-views/"
      id="event_view" name="Event view">

  <topic type="*">
    <identifier type="*"/>
     <basename type="*"/>
     <occurrence type="*"/>

     <association type="*" role="*"/>
   </topic>
</view>

This should work (given recursion in TM-Views), by taking the event, then all associated photos, then all the people in them, then all the locations, and everything associated with the locations. It goes a bit too far, though, because it will also include topics on the other side, even if we don't want them. So for the people we get all the other photos they are in, and for locations we also get not only Leipzig (where TMRA'05 was) and Germany (where Leipzig is), but also all other places in Germany. In fact, we get pretty much the whole topic map. So we need to be a bit more restrictive here.

<view xmlns="http://psi.ontopia.net/xml/tm-views/"
      id="event_view" name="Event view">

  <topic type="*">
    <identifier type="*"/>
     <basename type="*"/>
     <occurrence type="*"/>

     <association type="*" role="*">
       <except>
         <!-- for people, don't get other photos they are in -->
         <association type="depicted-in" role="depicted"/>

         <!-- for places, don't get places they contain -->
         <association type="contained-in" role="container"/>

         <!-- for places, don't get other photos taken there -->
         <association type="taken-at" role="location"/>
       </except>
     </association>
   </topic>
</view>

So now we should be done, right? Well, not quite. We forgot one crucial thing: topics are associated with their types via type-instance associations. That means that we'll be retrieving not only the topic type "event", but also all other events, which will again get us everything related to those. So we are still getting the entire topic map. However, we can solve this by including one extra exclude, which we do below.

<view xmlns="http://psi.ontopia.net/xml/tm-views/"
      id="event_view" name="Event view">

  <topic type="*">
    <identifier type="*"/>
     <basename type="*"/>
     <occurrence type="*"/>

     <association type="*" role="*">
       <except>
         <!-- for people, don't get other photos they are in -->
         <association type="depicted-in" role="depicted"/>

         <!-- for places, don't get places they contain -->
         <association type="contained-in" role="container"/>

         <!-- for places, don't get other photos taken there -->
         <association type="taken-at" role="location"/>

         <!-- and don't get the topic types, please -->
         <association type="type-instance" role="instance"/>
       </except>
     </association>
   </topic>
</view>

The real solution

So, given that recursion is not supported in TM-Views, how can we do this? We need to go back to the original solution, and start explicitly expanding the paths we want to follow. So let's look at how that would come out. First we cover the photos taken at the event, which gives us:

<view xmlns="http://psi.ontopia.net/xml/tm-views/"
      id="event_view" name="Event view">

  <topic type="event">
    <identifier type="*"/>
     <basename type="*"/>
     <occurrence type="*"/>

     <association type="*" role="*" otherrole="*">
       <!-- photos from event -->
       <topic type="photo">
         <identifier type="*"/>
         <basename type="*"/>
         <occurrence type="*"/>
         
         <association type="*" role="*"/>
       </topic>
     </association>
   </topic>
</view>

However, we are still lacking the people in these photos, and the locations at which they were taken. This is easy to add, however, as follows:

<view xmlns="http://psi.ontopia.net/xml/tm-views/"
      id="event_view" name="Event view">

  <topic type="event">
    <identifier type="*"/>
     <basename type="*"/>
     <occurrence type="*"/>

     <association type="*" role="*" otherrole="*">
       <!-- photos from event -->
       <topic type="photo">
         <identifier type="*"/>
         <basename type="*"/>
         <occurrence type="*"/>
         
         <association type="*" role="*" otherrole="*">
           <!-- places photos were taken -->
           <topic type="place">
             <identifier type="*"/>
             <basename type="*"/>
             <occurrence type="*"/>

             <!-- get place containing the place -->
             <association type="contained-in" role="containee" 
                          otherrole="container">
               <!-- now what? -->
             </association>
           </topic>

           <!-- people in photos -->
           <topic type="person">
             <identifier type="*"/>
             <basename type="*"/>
             <occurrence type="*"/>
             <!-- get family relationships, but not photos from other events -->
             <association type="*" role="*" otherrole="*">
               <except>
                 <association type="depicted-in" role="depicted" 
                              otherrole="depiction"/>
               </except>
             </association>
           </topic>
         </association>
       </topic>
     </association>
   </topic>
</view>

The now what marks the problem, however. How can we recursively move upwards to find all places containing the place where the photos was taken? There can be arbitrarily many steps, so no fixed number of levels of nesting could be guaranteed to suffice.

Compared to the recursive solution this is probably easier to understand, harder to maintain, and not as powerful. It's also less likely to cause problems by including more than the person writing the view really wanted.

Comparison with using a schema

The real question, of course, is how using TM-Views compares with using a schema. The best way to approach that is to look at the OSL schema I created for the photo application, which is quite big.

The main difference with TM-Views is that the views use wildcards a lot, and are directed at the task at hand, while the schema really describes everything in detail, and is much more general. However, the schema really could have used wildcards, too, and the view could have spelled everything out in detail. In fact, once you look closely it's amazing how similar the two languages are. It's quite tempting to claim that TM-Views is a schema language, too.

The real difference here lies in how the two are used. With TM-Views we start at a specific topic (or set of topics), then traverse out, including everything that matches the description. With OSL we remove what we don't want from the topic map, then delete all invalid topics.

To put it another way, there is one topic map, two descriptions of legal Topic Maps information, and two algorithms. In theory there is no reason why both algorithms couldn't be used with both descriptions. In practice the lack of recursion in TM-Views means that when using OSL you can't traverse along associations.

So which is better? I think the algorithm TM-Views uses is way better than the rather ad-hoc approach I came up with in the previous entry using a schema language. However, I'm not sure we really need a view language. It seems to me that a fragmentation algorithm and a schema language might be enough.

Looking forward to hear what Dmitry thinks of this.

Similar posts

How to write a TM/XML deserializer

The TM/XML syntax is easy to understand for humans, and easy to process with XSLT, but seeing how to write a TM/XML deserializer is not trivial from the spec

Read | 2006-08-02 16:56

A TMCL tutorial

The TMCL standard now seems more or less stable, and so now it is finally possible to explain to outsiders what the language looks like and how it works

Read | 2008-10-03 17:33

TMRAP support in the blog

I've been thinking for a while that it's a pity that many of the stories in the blog which are about the same things as the photos in the tmphoto application don't show up in that application

Read | 2008-01-10 18:09

Comments

No comments.

Add a comment

Name required
Email optional, not published
URL optional, published
Comment
Spam don't check this if you want to be posted
Not spam do check this if you want to be posted