The cxtm-tests project

<< 2008-05-23 18:21 >>

Railway bridge, Tomter, Norway

The cxtm-tests project has just released the first-ever release of a conformance test suite for Topic Maps implementations. The first release consists of 293 separate conformance tests using four different Topic Maps syntaxes, and more tests are being added all the time. It can be used by developers to check their implementations, and also by customers who want to verify that products which claim to conform to the standard actually do so.

How it works

The current test suite tests only that import of Topic Maps data from Topic Maps syntaxes conforms to the specifications. I'll use the CTM syntax as my example here, but it works the same way for the others. The tests verify three things about the implementation being tested:

Of course, there's an infinite number of valid and invalid CTM files, so we can't actually test all of them. Instead, we try to test all the different variations that are possible in CTM files to make as certain of this as we possibly can. This is also why the number of tests keeps growing, since we keep thinking of new variations to try, to make sure all implementations handle them.

Testing the last point, that invalid files are rejected, is easy. We put a collection of input files in a separate directory called "invalid", and implementations are required to reject them all. Some of these have obvious errors, others more subtle errors.

Testing the first point is done in a similar way. There is a directory called "in", which contains the valid files, and the implementation must accept them all.

The middle point, however, is much trickier, and this is where CXTM comes in. For each file in the "in" directory we have made a corresponding file in a directory called "baseline", consisting of the CXTM file corresponding to exactly that input. Implementations pass if they output a CXTM file that is byte-by-byte identical to the baseline file.

The file in the "in" directory is called "empty.ctm", and contains nothing. This is legal in CTM, and of course corresponds to an empty topic map. Therefore, there is a file in "baseline" called "empty.ctm.cxtm", which has the corresponding canonicalization, as follows:


The next step is to add a topic, which is done in "simple.ctm", which looks as follows:

topic .

The corresponding canonicaliation is:

<topic number="1">

One question that often comes up at this point is how we can ensure that topic maps with multiple topics are always canonicalized the same way. For example, say we had the following CTM file:

topic1 . topic2 .

The canonicalization of this is:

<topic number="1">
<topic number="2">

But, of course, if the input were

topic2 . topic1 .

Sailboat, Oslo, Norway

the result would be exactly the same topic map. So that file must have the same canonicalization as the previous one. And it does. Because CXTM specifies the sort order of everything in the topic map unambiguously. So before an implementation outputs the topics it must sort them in the canonical order, and that will always be the same.

So it really does work. The only downside, of course, is that you need a CXTM canonicalizer that works with an engine before you can use the test suite. This requires a bit of work. It's not terribly hard, but it does take a couple of hours. The benefit is that once you've done it you get an automated test suite for your Topic Maps engine for free.

Next steps

So far, all the tests have been contributed by Lars Heuer and myself. We have done tests for XTM 2.0, CTM, LTM 1.3, and TM/XML. None of the test suites are complete, but we are working on that. Later, it would be good to add tests for at least XTM 1.0, but maybe also for other syntaxes, if someone can provide tests.

As more engines implement CXTM and start using the tests I guess there will also be more discussion about the various test cases. Most likely we will need to document some of the test cases with information about why the canonicalization is the way it is, or why they are invalid.

We could also expand coverage to include TMCL and TMQL. For TMCL this would be easy, as all we'd need would be a three sets of test cases: one of valid (schema, topic map) pairs, another of invalid (schema, topic map) pairs, and a third with invalid schemas. For TMQL we'd need queries with the corresponding output, although most likely we'd need an extension of CXTM in order to handle the TMQL query results which are not topic maps.

We'll see how this goes. For the time being, the thing that's the most needed is more CXTM implementations so that we can test more of the engines out there. So, who's next?

Similar posts

An XTM conformance test suite

One thing that's really needed is a conformance test suite for XTM 2.0, which can be used by implementors to verify that they've actually gotten their implementations right

Read | 2006-08-02 19:24

A quick introduction to CXTM

I got some questions about how CXTM actually works, so I thought I'd put together a little introduction to it

Read | 2006-08-04 22:27

Archive web services: a missed opportunity

In my earlier piece on NOARK systems I accused the National Archives of standardizing the one thing that should not be standardized: the internal model

Read | 2013-11-24 11:41


Lars Heuer - 2008-05-24 04:42:43

Regarding the docs: I started already with a simple CSV (comma-separated values) file to describe the test cases. I thought about creating a topic map, but I think a CSV file for each syntax is good enough and it's trivial to create a topic map from it.

Add a comment

Name required
Email optional, not published
URL optional, published
Spam don't check this if you want to be posted
Not spam do check this if you want to be posted