The supertype-subtype association

<< 2006-03-31 16:17 >>

Lots of people think that the hierarchical association type used in taxonomies is the supertype-subtype association, but this is, unfortunately, wrong. After running into three instances of this misunderstanding this week, I decided to do my bit to clear this up once and for all.

It's not really difficult to see how this confusion came about in the first place, given that taxonomies consist of terms arranged in a hierarchy, with the most general terms at the top, and the most specific ones at the bottom. It's the same with class hierarchies, and so it's natural to think that the association type used in both cases must be the same. So what's the problem, then? Well, to understand that, it helps to understand the supertype-subtype association type better.

The semantics

There are three rules about what the supertype-subtype association means, and every use of it must follow all three rules, which are:

  1. The supertype and the subtype must both be types. In other words, if you are not relating two types, you can forget about using this association type.
  2. The association tells you that every instance of the subtype is also an instance of the supertype. So if you've made car a subtype of vehicle, you are saying that every car is a vehicle (which is of course true).
  3. The association type is also transitive, which is a fancy way of saying that it behaves like "is taller than", in that if I'm taller than my girlfriend, and she's taller than her sister, then I have to be taller than her sister. So if car is subtype of vehicle, and sports car is a subtype of car, then sports car has to be a subtype of vehicle (which it is).

Reading the formal definition in all its glory is recommended.

Who cares about semantics?

You might be wondering why following these rules is so important, and who really cares whether you do. Well, the Topic Maps software cares, because it will believe that you mean what you say. So tolog queries will start producing the wrong answers if you abuse this association type, as will Topic Maps validators, etc. So don't do it.

If you are not sure about the relationship you are representing, and whether it really is supertype-subtype, then just don't use the standard supertype-subtype PSIs, and call it something else. You'll lose some functionality, but it's functionality you're not sure you want, anyway.

Back to taxonomies

Taxonomies generally do not consist of types, but instead just consist of various terms (body parts, countries, diseases, academic disciplines, and so on), all mixed up. So this alone is enough to disqualify the supertype-subtype association from being used to represent taxonomies.

In fact, when librarians construct thesauri (which are effectively a superset of taxonomies) they follow a procedure where they identify the relationships between the terms in the thesaurus, and there is a set of categories of relationships that generally turn into hierarchical associations in the thesaurus. This list includes the supertype-subtype relation, the part-whole relation (such as, Norway is a part of Europe), the containment relation, and so on.

So it's not the case that the supertype-subtype association never occurs in taxonomies, it's just that you can't assume that all the relationships in a taxonomy are supertype-subtype relations.

So if you want to represent a taxonomy or a thesaurus in Topic Maps, my recommendation is to use Kal Ahmed's PSIs for thesauri, which contain most of what you are likely to need. The hierarchical relationship used in taxonomies (and thesauri) is the one called "broader-narrower", which is just a generic taxonomic relationship stating that the one term is more specific (narrower) than the other, which is more general (broader).

Similar posts

Subtyping statements

Subtyping topic types has been supported in Topic Maps ever since the beginning, but support for subtyping statement types is much spottier, and, it turns out, trickier

Read | 2007-07-13 18:50

PSIs for Topic Maps constructs

One thing that's lacking in the current set of Topic Maps standards is defined identifiers for the Topic Maps constructs, like subject, topic, association, etc

Read | 2006-05-29 10:45

A TMCL tutorial

The TMCL standard now seems more or less stable, and so now it is finally possible to explain to outsiders what the language looks like and how it works

Read | 2008-10-03 17:33


Cathy Legg - 2009-02-02 20:32:28

A good paper on automatically distinguishing between super-type-subtype and instance-class relationships is Zirn et al (2008) "Distinguishing between Instances and Classes in the Wikipedia Taxonomy", tho' this is not a paper in topic maps.

Add a comment

Name required
Email optional, not published
URL optional, published
Spam don't check this if you want to be posted
Not spam do check this if you want to be posted