A TMCL tutorial
Posted in Technology on 2008-10-03 17:33
Graham explaining TMCL, Leipzig
The TMCL standard now seems more or less stable, and so now it is finally possible to explain to outsiders what the language looks like and how it works. The first thing to note is that TMCL is firmly meant for validation, and not for reasoning. In other words, TMCL is a schema language, rather like DTDs, RELAX-NG, XSD, EXPRESS, SQL DDL, and so on, but one specifically designed for Topic Maps. Note: this has been updated to the latest 2009-06-16 draft.
TMCL does not have a syntax. Instead, it is defined as a Topic Maps ontology, based on a set of PSIs. The standard does define a set of CTM templates, however, in order to make it easier to write TMCL in CTM. It seems likely that most TMCL schemas will be written either in CTM or using ontology editors, but any Topic Maps syntax will do. (In fact, I have a version of the schema used as an example here written up in LTM for test purposes, because my CTM parser isn't good enough yet.)
I'm going to use a lightly modified version of the tmphoto Topic Maps ontology as a running example to illustrate the language, and will write it in CTM, since that is the easiest. So you should probably read the CTM tutorial before reading this.
To get started, you need to include the standard TMCL file that declares the TMCL templates. You also need to define the URI prefixes you are going to use. In our case, that would look as follows:
%include http://www.topicmaps.org/tmcl/templates.ctm %prefix tmcl http://psi.topicmaps.org/tmcl/ %prefix ph http://psi.garshol.priv.no/photo/ %prefix tm http://psi.topicmaps.org/iso13250/model/ %prefix thes http://www.techquila.com/psi/thesaurus/# %prefix dc http://purl.org/dc/elements/1.1/
The first line here loads the definition of all the TMCL templates, so that we can use those below.
Defining a topic type
The next step is to define a topic type, which we do like so:
ph:photo isa tmcl:topic-type ; has-subject-locators(1, 1, ".*") ; has-name(tm:topic-name, 1, 1) ; has-occurrence(ph:time-taken, 1, 1) ; has-occurrence(dc:description, 0, 1) ; plays-role(ph:taken, ph:taken-at, 1, 1) ; plays-role(ph:taken, ph:taken-during, 0, 1) ; plays-role(ph:categorized, ph:in-category, 0, *) ; plays-role(ph:depicted, ph:depicted-in, 0, *) .
The first line says that ph:photo is a topic type. It is an error to use topics as types (or in supertype-subtype associations) unless they are instances of tmcl:topic-type. (Actually, it's possible this could be made optional. This is one thing that needs to be cleared up in Leipzig.)
The second line calls a template to create a constraint stating that all topics of this type must have exactly one subject locator (the two 1s are min and max cardinality). The last parameter is a regular expression which the subject locator must match. In this case, the regular expression is completely open and will match anything.
The template is defined as follows:
def has-subject-locators($tt, $min, $max, $regexp) ?c isa tmcl:subject-locator-constraint; tmcl:card-min: $min; tmcl:card-max: $max; tmcl:regexp: $regexp. tmcl:constrained-topic-type(tmcl:constrains: ?c, tmcl:constrained: $tt) end
You'll note that this actually creates a new topic (of type subjectlocator-constraint) to hold the constraint information. I think you also see why the templates are needed. This is just much too long-winded to be a practical way in which to write the schema, both because it takes too much typing, and because the resulting schema is hard to read. (The TMCL namespace is the default namespace here, so all IDs are actually references to that namespace.)
The third line also calls a template (which is very similar to the previous one), adding a constraint stating that the every topic must have exactly one name of the default type.
The fourth and fifth lines add occurrence type constraints, and you'll note that the second one makes the description optional.
The last four lines state that topics of this type can play association roles of a certain type (say, ph:taken) in associations of a certain type (say, ph:taken-at). In this case, a photo must be taken-at exactly one place, while it can be in any number of categories.
The star (*) may look odd, but it's there to say that the cardinality is unlimited. This is a special syntax in CTM for indicating infinity, which was introduced because it's necessary for TMCL.
Defining an association type
Describing association types from the point of view of the topic type is not enough, however. We want to also describe the association type itself a little, which we can do as follows. Note that there are also templates which allow the example below to be written as a single line. I use the long form here to show what is actually going on:
ph:depicted-in isa tmcl:association-type; has-role(ph:depicted, 1, 1); has-role(ph:depiction, 1, 1) . ph:depicted isa tmcl:role-type . ph:depiction isa tmcl:role-type .
The first line says that ph:depicted-in is an association type, which is similar to saying that photo is a topic type. The next two lines give the two allowed role types, and the cardinalities of each. So this is a typical binary association type where there are two role types which both must be used exactly once in each association. The last two lines just declare the role types as role types.
Defining occurrence types
This is actually also something one wants to do: to say which topics are occurrence types, and to specify any datatype constraints on them. The same is done with name types, but without the datatype constraints. An occurrence type is declared as follows:
ph:time-taken isa tmcl:occurrence-type has-datatype(xsd:dateTime) .
The prime minister's office, Oslo
TMCL allows you to declare a type as abstract. In our case, we might want to do that as follows:
ph:image isa tmcl:topic-type; is-abstract() . tm:supertype-subtype(tm:supertype : ph:image, tm:subtype: ph:photo) tm:supertype-subtype(tm:supertype : ph:image, tm:subtype : ph:video)
This little fragment says that image is a common supertype of photo and video, and that it is abstract, meaning that topics must be instances of one of the two subtypes, and not directly of image.
By default, topic types are not allowed to overlap, meaning that a topic cannot be an instance of two different types unless the schema explicitly says that this is allowed. There are no examples of this in the photo ontology, but I think the basic idea is easy enough to understand anyway.
An example from the Italian Opera topic map:
composer isa tmcl:topic-type; overlaps(librettist).
A third feature is that it is possible to state that values of a name or occurrence type must be unique. I don't have any good examples of that in the photo ontology, but I suppose the basic idea is intuitive, anyway. This is defined as a constraint, meaning that duplicate values are not allowed, but it's quite likely that some tools will offer to merge topics which have the same values.
Other role constraints
A fourth feature is the ability to constraint the type of topic playing another role in an association given the type topic playing one role. This may sound like an odd feature, but it's one that crops up quite often. The canonical use case is were the place topic type is subdivided into, say, country, province, and city, and cities must be in provinces, which must again be in countries.
This would typically be defined as follows:
binary-association(ph:contained-in, ph:container, ph:containee)
where the container and containee roles can both be played by places. This makes it possible to say that a city is contained in a country, which is bad, since we want to go via province. The other role constraint allows this additional constraint to be added. If we assume that the schema simply says places may be both containers and containees, then the following would provide the necessary additional constraints:
ph:contained-in isa tmcl:topic-type; has-role(ph:containee, 1, 1); has-role(ph:container, 1, 1); other-role(ph:containee, city, ph:container, province); other-role(ph:containee, province, ph:container, country).
The first other-role template says that if the containee is a city then the container must be a province. The second follows the same pattern.
In addition to what's shown so far, TMCL has the capability to express constraints using TMQL in a Schematron-like fashion. Basically, you can create your own constraint topics, and attach TMQL queries to them, where the queries effectively specify a constraint. This allows you to say things like "every photo taken during an event must have a taken-at time later than the start of the event and earlier than the end of the event", which would not be possible with the more declarative type of constraint shown so far.
Tordenskjold statue, Oslo
TMCL is not yet completely finished, but I think this shows pretty well what sort of language it is at this point. It's very much a straightforward and rather basic constraint language that allows constraints on an ontology to be described, and which, through TMQL, allows these constraints to be made very detailed, if necessary.
The functionality of TMCL appears to be about the same as for OSL and the Ontopoly editor, with some minor differences. As far as we are aware today, this is probably about the level of functionality that is needed for most applications.
Since nearly all the key people in the ISO committee were going to be in Leipzig anyway for TMRA 2006 it was decided to have an ISO meeting in conjunction with the conference
Read | 2006-10-15 17:57
Information about the structure of an ontology can be used for two different purposes: either for validation or for reasoning, and this is a distinction that it seems most people are not aware of yet
Read | 2006-10-15 19:37
Lars Heuer - 2008-10-04 06:48:25
tm:superclass-subclass(ph:image : tm:superclass, ph:photo : tm:subclass) tm:superclass-subclass(ph:image : tm:superclass, ph:video : tm:subclass)
you're using LTM notation where the role player is followed by the role type. In CTM you have to use the role type as first element followed by the role player, i.e.:
tm:superclass-subclass(tm:superclass: ph:image, tm:subclass: ph:photo)
Lars Marius - 2008-10-04 07:05:43
Old habits die hard, I guess. :) Thanks for pointing it out. Now corrected.
Lars Heuer - 2008-10-04 07:30:29
:) Btw, while reading the tutorial again, I wonder if "tm:superclass-subclass" etc. is wrong. Shouldn't that be "tm:supertype-subtype", "tm:supertype" ... ? Old habits, again... ;)
Lars Marius - 2008-10-04 07:41:06
You're right again. Fixed again. :)
Miles Thompson - 2008-10-05 23:17:06
Thanks for this - glad to see it firming up. Looks like this will be a very handy and easy to use format.
One comment on multi-inheritance where you wrote - "Another question is whether it might be better to turn this around by saying that classes are exclusive by default and providing a mechanism for overriding that."
I think that exclusive by default would be the way to go. The reasonsing is that we are talking about a format for constraint. In the more general topic maps model, it makes a lot of sense to allow all these things to be as wide open as possible, but in the constext of checking data for validity I think a 'default single inheritance' thing would be a good way to go.
There are some cases where we will want to override and allow multi-inheritance, but I can imagine that at least for someone starting out, the assumption of single inheritance would be more helpful than not.
Perhaps the way to go would be to make it possible to make a general statement that all topic-types allow multiple inheritance. Or that a given topic-type allows multi-inheritance with 'any' other.
For example non-exclusive-type(any, any) - for the first case non-exclusive-type(ph:ERSO-Type, any) - for the second
In any event I think exclusive (or at least 'mostly exclusive') type hierachies would be by far the most common in the real world, so you should validate that way by default.
Miles Thompson - 2008-10-05 23:19:54
Just a quick vote to say that unique occurence types would be highly useful for our case.
For example 'Date Published' and such. The main use case for this is to gaurantee that our 'in memory', business specific code gets very confused when it encounters a company with more than FoobarFlag and/or an article with more than one publication date.
Lars Marius - 2008-10-06 05:11:44
Miles: thank you for your comments. Your remarks on making exclusivity the default are well taken. In fact, your reasoning is very similar to my own. :-)
On unique occurrences: the cardinality functionality already allows you to say that articles must have exactly one publication date. If you made publication date a unique occurrence you would require all articles to have different publication dates, and nothing would be said (in that constraint) about how many they could have each.
schtief - 2008-10-20 04:54:03
waiting for some report of TMRA 2008........
Lars Marius - 2008-10-26 13:43:14
schtief: They've started appearing now. Thanks for asking. :)