The curse of NOARK
Oslo city archive, city hall
I'm writing about a phenomenon that's specifically Norwegian, but some things are easier to explain to foreigners, because we Norwegians have been conditioned to accept them. In this case I'm referring to the state of the art for archiving software in the Norwegian public sector, where everything revolves around the standard known as NOARK.
Let's start with the beginning. Scandinavian chancelleries have a centuries-long tradition for a specific approach to archiving, which could be described as a kind of correspondence journal. Essentially, all incoming and outgoing mail, as well as important internal documents, were logged in a journal, with title, from, to, and date for each document. In addition, each document would be filed under a "sak", which translates roughly as "case" or "matter under consideration". Effectively, it's a kind of tag which ties together one thread of documents relating to a specific matter.
The classic example is if the government receives a request of some sort, then produces some intermediate documents while processing it, and then sends a response. Perhaps there may even be couple of rounds of back-and-forth with the external party. This would be an archetypal "sak" (from now on referred to as "case"), and you can see how having all these documents in a single case file would be absolutely necessary for anyone responding to the case. In fact, it's not dissimilar to the concept of an issue in an issue-tracking system.
So far, so good. As a general, abstract model for building an archive this is very sensible. In Norway, the government has not merely adopted it, but mandated it by law for all government agencies. The National Archive has developed IT standards that these archives must follow, now up to version 5 (NOARK5), and also certifies implementations as following the standard.
The standard, however, is a bit of a problem. It's designed by archivists, and so the focus is very much on the needs of archivists. The purpose of IT standards generally is to ensure interoperability, that is, that different systems can work together simply by implementing the same standard. Ideally, the implementors of the various standards don't even need to know about one another. So one would imagine that government agencies can freely switch between archive systems, because they all implement the same standard, anyway. Sadly, that's not even remotely true.
The NOARK standard actually standardizes the one thing IT standards should not concern themselves with: the internal model of the system. For decades, it did not even specify the interchange format to be used for handing over archive data to the National Archive. Nor did it specify the means by which other applications could interface with the archive. So, strictly speaking, it is not a standard at all, just a fairly abstract functional specification.
Place de la Comedie, Montpellier
As I said, the functional specification was designed by archivists, and so mainly takes their needs into account. Further, Norway being a small country and the absence of a true standard meant that competition in the archive software area was fairly limited. As a result, government workers are generally faced with systems that are deeply user-unfriendly, requiring lots of detailed metadata fields to be filled in, and often obstructing work processes because of not very reasonable demands for archive-technical reasons.
I'll give a simple example, just to illustrate. A clerk receives an application, and writes a preliminary response logged on the case in the archive. The case is passed on to a clerk further on in the chain for comment/approval. The second clerk finds that the first clerk wrote in the wrong recipient, but since the field has been locked for editing (as required by NOARK) the second clerk can't fix it. The response could have gone out now, because everything else is fine. As it is, the case has to be referred back to clerk 1, wait for clerk 1 to be ready to attend to it, be referred to clerk 2 again, wait for clerk 2 to pick it up, and then finally it goes out. Thus the resolution of the case can easily be delayed by a week, for no good reason. That's just one example. I could give many more.
Because people really, really dislike using these systems in many cases important documents are not filed. In one specific case, a minister had to answer hostile questions on TV as to why an important critical report had been hidden away from the public. When people in the ministry tried to figure out why the report was suppressed they found that in fact there was no attempt to hide it, but nobody had bothered to archive the report. The minister tried to say as much, but it didn't sound very believable.
Cases like these recur regularly. This week it happened again: four different government agencies are accused of hiding sensitive documents. It looks really, really bad in the news, and the only possible explanations are either that agencies are deliberately hiding information, or that they aren't in control of their own data. Probably the reason is the latter, but in neither case do the agencies look good.
This is another way in which NOARK causes archives to fail: the metadata may conform to the standard, but because interfaces and processes are so awkward, in many cases lots of key documents are not present at all. This is a problem for the proper functioning of democracy, since newspapers and citizens can't get hold of information they're entitled to. It's also a problem for the agencies themselves, since agency workers will often struggle to find the information they need.
Which brings us to another issue: these systems are notoriously poor at searching, so that they often become "black holes", into which information flows in order to meet a legal requirement, but then is never seen again. In fact, a very common solution is to build custom-made systems which support the work processes, and automatically file documents in the archive, thus hiding the archive completely from the users. This makes users happy, but makes a mockery of the original requirements in the NOARK standard, since the workflow and process requirements are now bypassed by simply moving the functionality out of the archive.
Steam train, Beck Hole, Yorkshire
There is a further consequence, in that since there is no standard interface to archive systems, all archive systems offer their own proprietary interfaces. Over time, many of the IT systems in the organization wind up being integrated with the archive. The result is that replacing the archive system becomes a very expensive project, requiring a large number of integrations to be replaced. This further reduces the competitive pressure in this area. (I'm exaggerating a little. A web service interface for NOARK4 exists, and there is work on a kind of standard for NOARK5, too. There are serious problems here, too, however, and I will return to that subject in another post.)
Unsurprisingly, the quality of available archive software is low. It's no exaggeration to say that these systems are widely hated. Further, poor documentation of interfaces and the obscurity of NOARK details mean that very often one is forced to use consultants from the vendor in order to set up these systems, or to write integrations against them. Switching to another system is no help, because the other systems are no better.
Ultimately, the consequences are very often that archives wind up being a purely formal affair, a system that exists only to fulfill a legal requirement, and that only does so in a technical sense. Very often, many important documents are not archived, metadata are poor, and the formal rules regarding procedure are not followed. Where the rules are followed they tend to incur considerable extra cost and delay. And, worse, the organization is left with a very poor common archive. Having gone about a sensible and worthy goal in the wrong way one has caused a lot of pain to very little gain.
How this compares to the situation in other countries I don't really know. Information on that would be very much welcome.
In my earlier piece on NOARK systems I accused the National Archives of standardizing the one thing that should not be standardized: the internal model
Read | 2013-11-24 11:41
I've already been through the problems with the NOARK standard, and hinted at issues with the way the web services to these systems have been designed
Read | 2013-12-15 11:19
Dave Pawson - 2013-10-30 06:01:09
"agencies are deliberately hiding information, or that they aren't in control of their own data. "
I would have thought an accusation like the latter would get a reaction Lars?
How to educate no.gov as to what a standard is / should be? Are there other archival standards of information that you could quote as effective, from the user perspective?
And what's a photo of Yorkshire doing amidst a piece on Norway and its data standards?
Lars Marius - 2013-10-30 12:05:05
@Dave: Well, in the Norwegian article I linked to, the newspaper basically accuses them of hiding documents, to which their response is that they don't have control over what's archived. My guess is that the agencies are right, but most people don't believe them.
I'm not aware of relevant archiving standards from elsewhere, and part of the purpose of this post was to ask if such things might exist.
That's not a photo of Yorkshire, Dave. It's a photo of obsolete technology. :)
Jon Bjerkelien - 2013-11-27 08:16:20
Hi Lars Marius
Moreq 2010 is a standard developed and maintained by DLM Forum (a European not-for-profit body). It is more conseptual than Noark 5 but the endgoal is the same. Adressing the issues of preservation of elektronic documentation in a lifecycle perspective (which often exceeds the lifespan of the system in which it is created and stored).
In the US the Departemt of Defence (DOD) has created a standard that is widely used within the US govenment. DoD 5015.2 is its name.
Personally I like the framework developed by the Australians and formalised in the ISO standard 15489 1-2. I find this conseptualiy sound and a valuable guide to ECM everywhere.
In my view the Noark 5 standard is widely misused and misinterpreted, and also has some basic flaws due to the goal of "backward compability" to is precessor - Noark 4. In addition to this the nasjonal Law regulating Public Archives is utdated. In sum these factors create a substantial hinderance when trying to create business value for clients in day to day consulting.
Maybe we shoud have coffe some day and talk it over. Sent me an e-mail if you want to get in touch.