On topic types in Topic Maps
Posted in Technology on 2008-01-16 18:13
Mountain hare tracks, Geilo, Norway
A discussion on Svein's blog regarding FreeBase and a comparison of its data model with that of Topic Maps brought up some interesting question regarding Topic Maps that I think are worth discussing.
Multiple types for topics
The first question was whether topics can have more than one type, and the answer is that they definitely can. The topic types do not have to be related by supertype-subtype associations, but can be completely independent. The best-known example of this is in the Italian Opera Topic Map where some topics are both, for example, librettist and playwright. The question of whether this is good modelling has been raised several times, but I'll leave that to one side for now, since the point is that it is definitely legal.
So why is this legal? Well, there are several reasons. One is that Topic Maps were designed from the very beginning to support automatic merging of data, and it's of course entirely possible for the same subject to be typed differently in different topic maps. The most obvious reason why this would happen is if the two topic maps are using different ontologies.
There are also different ways of using topic typing, where not all of them would necessarily have a topic wind up with a single type. The Italian opera example is one example of this, but there are other modelling approaches, too, that could lead to this kind of situation.
Then there is scoping. One could easily imagine a situation where a topic winds up with different types in different scopes, for example because there is disagreement on the correct type. My TMRA 2007 paper on scope describes one such situation.
Finally, there is no strong reason to forbid this. Yes, it may look a bit odd for people used to traditional modelling, but there is nothing inherently wrong with it.
The relationship with schemas
The Bøya glacier, Fjærland, Norway
The other question was what happens to the characteristics that topic types confer on their instances through the schema. In the TMDM itself, this is simple. There is no schema, and if you say that topic A is an instance of type B, well, then that's what you've said, and it has no further consequences (except inferencing through subtyping). Nothing more happens.
In most traditional information technologies, like object-oriented programming or relational databases, the types in the schema define slots which are physically set aside for storage of the data. This is one reason why automatic merging in a relational database is impossible. If you only have a single email column in the person table there's nowhere to put a second email address.
In Topic Maps this is not an issue, since each statement lives a life of its own, completely independent of the existence or non-existence of a schema. It is also independent of the topic(s) it applies to. An occurrence, for example, is not put into a slot on the topic which is allocated by the schema. The occurrence itself knows which topic it's about. Exactly how this is implemented will vary, of course, but the point is that everything is so schema-independent that you're not even required to have a schema.
But what about TMCL?
An interesting question is of course what happens if you do have a TMCL schema, and you say that topic A is an instance of topic types B and C. In general TMCL takes the statement that, say, A is an instance of B to mean that A must conform to all the constraints on B. This includes the implicit constraint that anything that is not explicitly allowed on B is forbidden. (This means that if you give a company a date of birth you get an error since the schema did not allow this.)
So what happens if A is an instance of both B and C? Well, basically, it means that it must conform to the constraints on both, and that anything that is not allowed for one of them is forbidden. And unless it's explicitly stated that a topic may be an instance of both you get an error. So it actually works out quite naturally.
One point on which Topic Maps differ from most other information representations is the handling of unknown or missing information
Read | 2006-07-08 19:42
Last time I wrote about how I used OSL to extract a fragment from a topic map
Read | 2005-11-30 23:49
Martin Stricker - 2008-01-16 13:31:13
Re: Multiple types for topics
I think it is logical to have multiple types (classes) for topics (instances), but within the Topic Maps framework I am a bit, so to speak, semantically challenged concerning the relationship between topic types and role types. I learned at a topic maps workshop, that the type should be "intrinsical" to the individual, any "contextual" "type" or role should be modelled as role type. So what it is the difference (on a fundamental level) between these two constructs:
a) librettist and playwright (topic with multiple types, possibly subclasses of person) as librettist (role) wrote libretto for (assoc) opera (topic) as author (role) author of play (assoc) play (topic)
b) person (topic) as librettist (role) wrote libretto for (assoc) opera (topic) as author (role) author of play (assoc) play (topic)
As I am currently working with OWL/RDF and Topic Maps simultaneously, mapping role types to OWL classes would make a lot of sense for me, but I probably miss something here?
trond - 2008-01-18 02:49:19
Stricker: I'm sure Lars Marius can give you a better answer, but the way I see it, it is basically a question of modelling and best practices.
In a) you claim that a librettist is a special type (or kind) of person, just as you would claim that a human (person) is a type of mammal. It is of course correct to claim that a human is a type of mammal, but is the relationship between being a librettist and a person really the same as being a human and a mammal?
As for RDF/OWL, wouldn't you express the association as an object property, using the topic types as part of the domain and range?
You might find some answers / ideas at http://www.w3.org/TR/rdftm-survey/
dmitry - 2008-02-29 10:28:27
Fortunately or unfortunately, but Topic Maps have very open semantic interpretation of types, roles and associations. For example, in many topic maps we can find that 'person' is used as type and role at the same time.
We easily can find examples like this:
john_smith isa person new_york isa city mary_smith isa person
likes(john_smith : person, new_york :city) likes(john_smith : person, apple : fruit_type) likes(john_smith : person, mary_smith : other_person)
Mapping this kind of types/roles/associations to RDF/OWL can be quite challenging because it requires some ontology rethinking and normalization.
I typically use following strategies:
1) Replace specialized roles with more general roles likes(john_smith : person, new_york :city) => likes(john_smith : who, new_york : what)
2) Introduce sub-properties with “embedding” roles into the property likes(john_smith : person, new_york :city) => likes_city(john_smith : who, new_york : what)
3) Introduce new namespace likes(john_smith : person, new_york :city) => travel:likes(john_smith : who, new_york : what)
likes(john_smith : person, apple : fruit_type) => food_pref:likes(john_smith : who, apple : what)
Conal - 2008-03-16 22:21:39
I tend to think that the desired approach depends very much on how detailed you want to make your model. I wonder how "intrinsic" any type really is?
Consider the case of a person who creates a cultural work under a particular pseudonym:
Is the creator (role-player) of the work a person (with their pseudonymous persona playing some other role?) or should the "creator" role be played by a topic of type "author", in turn related to the person topic by a "is-creative-persona-of" association? Note also that multiple individual persons can write (jointly) under a single pseudonym. This more articulated model would have authorship the function of authors, rather than directly of persons.
As an example, the British author Ian Banks has two authorial personas, which he uses to write in 2 distinct genres (Ian Banks and Ian M. Banks). The distinctness of these two "authors" is undeniable and is worth modelling bibliographically, because of its utility to library patrons.
Finally; one aside on the subject of authorship: in our work at the NZETC, we've borrowed the CIDOC CRM model for authorship, in which authors, editors, publishers and cultural works are all participants in "creation events". This allows for modelling of the creative process as a historical phenomenon, which is richer than the "static" view implicit in "person - producer - authorship - product - work".