Published subjects and PSIs
Posted in Technology on 2007-01-04 15:25
People often find the basic idea of published subjects quite clear and simple, but stumble over the detail, so I thought I'd write a little overview over the territory. The idea is to sketch out the basic concepts and how it all works.
There are basically two layers to this:
The TMDM defines the concept of a "subject indicator", which is just a web page that indicates a subject. That is, if a human being reads the page it should clearly indicate one single subject to the reader. The URI of that page can then be attached to a topic as a "subject identifier". (A subject identifier is always a PSI.)
Praying Mantis (Hiroshima, Japan)
An example will make this clearer. Let's say I have a photo, and I want to say that the photo shows a praying mantis. This is straightforward enough: I make a topic for the photo, another for the concept of "praying mantis", and then associate the two. But let's say I want to increase the chances that if I merge my photos with those of other people we'll get a single place to find all photos of praying mantises. I can do that by attaching a likely subject identifier to my praying mantis topic. A good choice might be http://en.wikipedia.org/wiki/Praying_Mantis which is a well-known page that clearly identifies this concept to a human being.
There's nothing about a the web page itself that makes it a subject indicator. It only becomes one when you attach its URI to a topic as a subject identifier. So at the moment the Wikipedia page for praying mantis is, as far as I know, not a subject identifier. But if I really created that topic, it would become a subject identifier for my topic.
What happens in a Topic Maps implementation when you do this is that any other topic that has the same subject identifier (that is, the same URI) attached to it will be forced to merge with my topic. This is why it's important that one choose a page that clearly identifies a single subject; otherwise you might get merges with subjects that don't match exactly.
The Topic Maps implementation doesn't follow the URI in any way; the only thing it does is to compare the URIs as strings. If they are equal, the topics merge. If they are not, nothing happens. And since an exact URI match is required for a merge, you really do want a well-known page with a simple URI. Note that this means that if the subject identifier doesn't actually refer to any web page everything will still work on the technical level. This is not considered optimal practice, but it works.
Thus far the TMDM. The next part of the story comes from the Published Subjects recommendation from OASIS, which builds on the TMDM.
There are several issues with just choosing an existing web page and using it as a subject indicator:
- What if the page suddenly moves, or changes?
- Usually the subject is not 100% well-defined, and could be interpreted different ways.
- There is no guarantee that other people will choose the same page for the same subject, unless someone is actively promoting a single page for that subject.
- In many cases there are no suitable pages for the subject you want. (Try to think of a good subject indicator for April 4, 2000, for example.)
Published Subject Indicators (PSIs) solves (or at least alleviates) these problems. A published subject indicator is just a web page that was created specifically to be a subject indicator. (And a published subject is a subject for which there is a PSI.) There is actually a PSI for April 4, 2000, which you may want to look at.
This is good because a good publisher will
- Not move, delete, or change the page.
- Define the subject clearly.
- Promote their page as the one right PSI for that subject.
- Choose good, simple URIs for their pages.
Of course, there's no guarantee that a publisher will be good, but over time the best ones for any given subject should win out. And if there is no PSI for the subject you want already, you can become a publisher yourself.
URIs are used to refer to both information resources (which are downloadable over the net) and abstract concepts and physical objects (which are not)
Read | 2007-10-08 08:54
I've been thinking for a while that it's a pity that many of the stories in the blog which are about the same things as the photos in the tmphoto application don't show up in that application
Read | 2008-01-10 18:09
Marc de Graauw - 2007-01-28 21:59:44
Reading this prompted me to write down some old reservations I have always had about the concept of PSI's.
See http://www.marcdegraauw.com/2007/01/28/the-trouble-with-psis/ for details.