Synonyms in Topic Maps

<< 2007-06-07 20:48 >>

Detail

Understanding how to represent synonyms in Topic Maps is not trivial, and the discussion of it highlights some interesting things about the semantics built in to Topic Maps, so I think it's worth having a look at this pattern. Let's say we want to create a topic for the city of Lviv, which is also known as Lvov. It has a few more names, but we'll pass over those for now. In his Master's Thesis Roy Lachica listed three different ways to do this, and we'll go through them one by one.

[lviv = "Lviv"]
[lvov = "Lvov"]

Here we have created one topic for each name. This is not necessarily wrong, but it's not entirely right, either. Given the rule of thumb in Topic Maps that says "one subject per topic, and one topic per subject" we must assume that this means there are actually two different subjects, two different things, here. If there were just one thing we would have created just a single topic.

We can still rescue the situation if we say that these topics are instances of the type "name," because there really are two names here. We could then go on to make assertions about how long the names are (both 4 characters), when and where they have been used, in what languages, etc. This would make sense, but most people would probably feel that it's not terribly interesting, and it would be hard to disagree with them.

If we had made these two topics instances of "city" it would have meant that there are two different cities, and this is just not the case. If we want to model cities we need to do it like this:

[lviv = "Lviv"
      = "Lvov" / russian]

This, on the other hand, makes one topic, which means we are representing one thing in the real world (the city of Lviv, which is the same place as the city of Lvov). This makes sense, as long as we make the topic an instance of "city" (or "place" or whatever). Now we can assert things like where Lviv is (Ukraine), how many inhabitants it has, and so on.

In short: in 99% of the cases this is the way to do it. It's worth noting that Kal Ahmed actually made two Topic Maps patterns for modeling thesauri, rather than just one. The two correspond to exactly the distinction I've shown here, between having a topic for each concept or a topic for each word/name.

Roy listed a third way to do it, however:

[lviv = "Lviv"]
{lviv, synonym, [[Lvov]]}

Here we've created a single topic, but made an occurrence for the non-preferred name, rather than a name. This is OK in the sense that we make it clear that there is just one thing out there in the real world (the city in Western Ukraine), but it's bad in the sense that it claims that "Lvov" is not a name for the city, which is wrong.

It may sound like I'm picking nits here (and I admit I am), but this has consequences. Let's assume you use this topic map for automated classification. Ontopia's tool for this will not assume on finding the string "Lvov" that it refers to the city in Ukraine, since the city doesn't have that name. (Yes, there is an unknown property with that value, but what does that mean?) However, if you'd made it a name, this would have been picked up correctly. So these fine distinctions actually matter, and I predict that further down the road they will matter a lot more than they do right now.

Comments

Steve Pepper - 2007-06-08 05:14:43

Of the three options discussed here, the second (multiple names in different scopes) is clearly the most appropriate in this particular case (unless, as you point out, the author wants to make assertions about the names themselves, in which case one needs a topic for each name).

The only situation in which the third option would be correct (or at least acceptable) is if the "name-ness" of the name is slight. An example might be a code, like "UKR" for Ukraine (or "LWO" for Lvov Airport).

For completeness I'd like to point out there are also two other options, not discussed in this blog, that are both valid in different circumstances:

1) MULTIPLE TYPED NAMES. It's not easy to give a water-tight example of this for Lviv, but the following will do. (NB It uses an extension to LTM in order to express typed names):

  [lviv = "Lviv"
        = "LWO" : airport-code]

(This is not "water-tight" because it could be argued that an airport-code is more appropriately modelled as an occurrence. As far as I'm concerned the jury is still out on that one. It's also debatable whether one should conflate a town and its airport in this way.)

2) VARIANT NAMES. If we are really talking about the same name in different forms (e.g. different transliterations), a variant name is more appropriate than a second base name:

[lviv = "Lviv" ("Лвнв" / cyrillic)]

[Disclaimer: I have no idea if my transliteration is correct, but you get the point.]

Lars Marius - 2007-06-10 07:10:08

For the record: I agree with everything you write, Steve. I think the question of what deserves to be a name is worth a blog entry of its own.

I'm afraid the transliteration of Lviv is wrong, though. You wrote "Lvnv". In Ukrainian you'd actually use "i" for the i, whereas in Russian you'd normally write that as "и".

Roy Lachica - 2007-08-08 06:35:25

I agree that option two is the most appropriate. On Fuzzzy.com on the other hand, option three was chosen during the conception phase because it was believed to simplify the system. XTM output, data model, system development and user interface interaction was believed to be less complicated when using occurrences in contrary to introducing scopes. The main goal of Fuzzzy was to create a system where novice users could grasp the tool immediately and for developers to easily understand the API and xml outputs. An evaluation has not been done to uncover the implications for using scoped names for synonyms on Fuzzzy.com.

Name	required
Email	optional, not published
URL	optional, published
Comment

Spam	don't check this if you want to be posted
Not spam	do check this if you want to be posted

Larsblog

Synonyms in Topic Maps

Similar posts

Beer-hunting in Lviv

Ukraine: some travel advice

From Krakow to Lviv

Comments

Add a comment