Archive web services: a missed opportunity
Posted in Technology on 2013-11-24 11:41
In my earlier piece on NOARK systems I accused the National Archives of standardizing the one thing that should not be standardized: the internal model. This, of course, raises the question of why not, and if not, what should have been standardized instead.
Why models shouldn't be standardized
The problem with standardizing a model is that you can't really test if implementations conform to the model. The only way to do that is to see if you can build some kind of transformation from the implementation to a representation of the model. Having to build such a transformation just to test conformance is time-consuming, expensive, and prone to errors. It's not really a good idea to put yourself in a situation where you have to do this.
What you should standardize is the system's interfaces with the external world. There are two reasons why this is what should be standardized: interfaces can be tested, and it's at the interface that the system interacts with the rest of the world. If the interfaces are standardized you can leave the inside of the system to implementors, and still rely on being able to treat all the different implementations as being the same.
An example of this is the standardization work we did on Topic Maps in ISO SC34. The core of the standard is the Topic Maps Data Model, which has no conformance clause, because we can't test conformance with the model. The XML syntax, however, does have a conformance clause, because here conformance can be tested. You'll note the last bullet point, which goes "a representation that is isomorphic to the data model".
This is the transformation I was talking about, but in the case of Topic Maps, a standardized transformation exists, called Canonical XTM. It describes, in excruciating detail, a transformation from the data model to an XML syntax, which has the property that any two equivalent instances of the data model will produce byte-by-byte identical XML files. Using CXTM we were able to produce an automated set of conformance tests which could be used to verify the conformance of any Topic Maps implementation that implements CXTM.
Bridge over the Daugava
Interfaces to standardize
In the case of NOARK there are two interfaces that could be standardized: the web service interface, and the XML export format. The XML export format is described in the NOARK 5 standard, and while it could be specified more exactly, at least it's in the standard and there are XSD schemas for the various kinds of files, etc. It's possible to do some level of conformance testing using this format, although it will necessarily be incomplete. That is, you can automatically verify that all the right files are there, and that they all have the right structure. What cannot be tested is whether the files contain all the information they should, nor whether the information is correct.
Which leads me to the web service interface. In most government organizations there are many IT systems which should be integrated with the archive. Building such integrations makes it easier to archive documents, and automates the entering of metadata, which both increases the percentage of documents archived (thus avoiding later embarrassment) and increases the quality, because metadata does not need to be manually retyped.
The key to building such integrations is obviously the archive web service interface. In Norway, two standards for this exist. One is NOARK 4 WS, which is a web service interface defined for the previous version of NOARK, outside of the official standard. The NOARK 5 standard includes a requirement that a newer standard, GeoIntegration, "should" be supported. That's it as far as web service interfaces go.
This "GeoIntegration" standard is very poorly marketed. It has a confusing name, a confusing web site, and very few people know about it. That suits the vendors of archive systems very well, because they can then go around selling their own proprietary web service interfaces, which they do with great enthusiasm. The consequence is that government agencies build integrations against these proprietary interfaces instead. Once they've done a few of these integrations, changing archive system comes with an additional cost of several million kroner just to rewrite all the archive integrations to a new proprietary web service interface.
All this while an archive standard exists, and half-hearted attempts at standardized interfaces also exist.
What is to be done?
I think three things should be done. First, one should sit down and consider carefully the purpose of the standard. Why should Norwegian government agencies be forced to use software that supports a specific, Norwegian-only standard? Good answers to this question may well exist, but they should be made explicit, and the standard should then be designed to support those specific goals.
Second, the web service interface should be a key part of the standard. Instead of specifying the internal model, the standard should specify the interface, and the export format. Given a proper specification of the web service interface the standard would suddenly be a real standard. Not only that, but it would be possible to create automated conformance tests. It would also be possible to test conformance with the export format automatically. Given code which creates cases and documents, one could then do an XML export, and verify that all the information entered via the web service interface is correctly present in the export.
With this in place it would be possible to do rigorous testing before giving the official stamp of approval to an implementation as being NOARK 6 compliant. Government agencies would have some level of guarantee that they can switch between archive systems without imposing ruinous costs on themselves.
Third, the standard contains much that is actually requirements for user-level functionality. It's hard to see what these are doing in a standard. These requirements may be useful input to agencies looking to buy an archive system, in that the agencies can look at these requirements and pick the ones that they agree with for use in their own requirements documents. But, again, what are they doing in a standard? It would be far better to move them out into appendices and treat them as what they are.
Unfortunately, there is still one problem remaining with the web service interfaces. The current archive interfaces, both NOARK 4 WS, GeoIntegration, and the proprietary ones, have some serious design issues. I've already gone way over normal length for a blog post, so the description of those issues will need to wait for the next post.
I've already been through the problems with the NOARK standard, and hinted at issues with the way the web services to these systems have been designed
Read | 2013-12-15 11:19
I'm writing about a phenomenon that's specifically Norwegian, but some things are easier to explain to foreigners, because we Norwegians have been conditioned to accept them
Read | 2013-10-30 10:24