Architectural problems with archive web services

<< 2013-12-15 11:19 >>

Ski tracks, Eggedal

I've already been through the problems with the NOARK standard, and hinted at issues with the way the web services to these systems have been designed. What I describe here applies not just to the semi-standardized NOARK web services, but also to the proprietary interfaces offered by the archive products.

Let's go through the issues one by one.

Using the proprietary interfaces

Organizations using the proprietary interfaces will find that as the number of archive integrations grows, switching to a different archive system gets more and more expensive. Basically because all of the archive integrations have to be rewritten from scratch.

In addition, they quickly find that since the client applications have been integrated directly against the archive, all integrations have to be retested every time the archive is upgraded. Once you go beyond 2-3 integrations this starts getting really painful.

(Quite a few organizations have "solved" this last issue by doing so many customizations to their archive system that upgrades are no longer possible.)

Synchronicity

These web service interfaces, and the clients that use them, are generally synchronous. That means, they've been designed so that everything hangs until the archive has completed processing the request and sent a response. Since NOARK implementations are generally neither fast nor particularly stable, it follows that users wind up spending a good deal of time waiting for the archive. And if the archive is down substantial chunks of the functionality in client systems may not work at all.

Again, as the number of integrations grows, the problem gets steadily worse.

Gothic arches, York

Hard coupling

Many people think that web services by their very nature are loosely coupled, but everything is relative. These interfaces are generally designed so that client systems must first either create or find and reuse a case, and only afterwards can they place a document in the case. Further, the clients need to fill in metadata according to the structure used by the archive. This means filling in taxonomy categories and the many fixed, required fields that NOARK archives love.

What happens when the interface is designed this way is that the internal metadata structure of the archive becomes hard-wired into the the code of all client applications. Every single client application must know what defines a case, which fields are used, and what values go where, in exquisite detail. As integrations accumulate, the structure becomes wired into more and more clients.

At one organization I visited, the archivists had wanted to reorganize the archive for seven years, but had been unable to, because that meant rewriting the single archive integration they had. While I was there they were finally able to carry out the reorganization, because the integration had to be ported to a new web service interface anyway.

Imagine the situation once you get up to five or ten integrations.

Metadata quality

Generally, the quality of the metadata produced by these integrations is very, very poor. Even something so simple as who is responsible for the document very often gets lost. In many cases, a system user representing the external system gets registered as the responsible user for all documents coming from that system.

Nearly all documents in the archive have an external contact either as the sender or the recipient, and this is a key piece of metadata. A very important user requirement is to be able to see all correspondence with a single external contact. Most integrations, however, do not include the identity of the contact in the metadata, but simply repeat name, address etc for each document. And as these may change, be mistyped etc, the identity is effectively lost.

Most NOARK systems have an internal register of contacts that could be used, but since external systems generally have their own registers this becomes too cumbersome to support, and so data quality goes out the window. And even if the register were used, typically one would import the client contacts into the archive, duplicating those that are already there. And different clients generally have different contact databases, further compounding duplication.

Beach huts, Whitby

Solutions

So how can this be solved? Actually, it's not that hard. What you need is a web service interface where clients can hand over documents with the metadata they have, using the client's internal metadata vocabulary. The server queues these, then responds. Once it's ready, the server translates the metadata into its internal model, adds additional metadata, and archives the document. I should explain how, but this blog post is already too long, so that will have to wait for the next blog post.

Comments

No comments.

Name	required
Email	optional, not published
URL	optional, published
Comment

Spam	don't check this if you want to be posted
Not spam	do check this if you want to be posted

Larsblog

Architectural problems with archive web services

Using the proprietary interfaces

Synchronicity

Hard coupling

Metadata quality

Solutions

Similar posts

Archive web services: a missed opportunity

The curse of NOARK

The get-illustration web service

Comments

Add a comment