Larsblog

Previous | Next

Blog metadata in Topic Maps

Posted in Technology on 2008-01-19 12:47

Wenlock basin, Islington, London

I've been thinking for a while of representing metadata about the blog in Topic Maps, and Robert Cerny brought this up again with his request for a way to get metadata about the entries through a web service. This would of course be a very cool thing to play around with, and having someone actually use the result would be even more fun, so clearly this needs to be done.

The ontology

So, how to represent the metadata? Well, there's not that much metadata available, and this doesn't really need to be made very difficult, either, so maybe this will do:

[entry1 : foo:blog-entry = "TMRAP support in the blog"
 %"http://www.garshol.priv.no/blog/145.html"]
{entry1, dc:date, [[2008-01-10T18:09:00]]}
dc:creator(entry1 : dcc:resource, foo:lars-marius-garshol : dcc:value)
dc:subject(entry1 : dcc:resource, foo:tmrap : dcc:value)
dc:subject(entry1 : dcc:resource, foo:tmphoto : dcc:value)
dc:subject(entry1 : dcc:resource, foo:tmxml : dcc:value)

I could throw in something associating the entry with a topic representing the blog as well, I guess. I could add the category as well. Not sure if there is any need for those, or if there are any pre-existing PSIs for this.

Robert wanted this in XTM 2.0 or JTM. I think I'd prefer XTM 2.0, simply because it's the more standardized format (and probably less work on my side), but other formats could of course be added as needed.

The protocol

Then there is the question of how to get hold of this information. Robert suggested doing a GET on the blog entry with an "Accept" header specifying that the client does not want HTML, but rather XTM 2.0/JTM. This would work, and it would be extremely RESTful, but it would require me to change the entire architecture of this blog. (Whether what you get is another representation of the same resource or a different resource is also a question, and I'm not convinced that this is the right way to do it.)

It struck me that there is another way to do this, that fits more naturally into the infrastructure I already have, which is to continue to use TMRAP. Basically, all I need to do is to implement the get-topic request in TMRAP, so that you can do a GET at http://www.garshol.priv.no/blog/tmrap.py/get-topic?subject=http://www.garshol.priv.no/blog/145.html to get the topic map fragment above.

This has the added benefit that if I talk about a new dc:subject that the client has not seen before, the client can do a get-topic on that topic, too, and get its name, type, and other information.

Next steps

So, Robert, and others, what do you think? If you think this sounds good I'll just implement it. Having a first, crude version up and running should not take long.







Similar posts

Subtyping statements

Subtyping topic types has been supported in Topic Maps ever since the beginning, but support for subtyping statement types is much spottier, and, it turns out, trickier

Read | 2007-07-13 18:50

TMRAP support in the blog

I've been thinking for a while that it's a pity that many of the stories in the blog which are about the same things as the photos in the tmphoto application don't show up in that application

Read | 2008-01-10 18:09

The get-illustration web service

I'm working on a site that lists the various Topic Maps-related software that's out there, in an effort to make all the tools that have been released more visible

Read | 2008-10-28 15:20

Comments

Robert Cerny - 2008-01-19 08:14:05

This sounds good and is actually quite RESTful in my book, since RESTfulness is developing more into a continuum. The higher up you are on the discrete scale, the more you can benefit from existing web infrastructure. And if you all the way on the peak, you get dizzy :) The GET request to the URL you proposed is fine, not reacting to the Accept-Header is ok for me. But having the correct Content-Type set in the response would be important: 'application/xtm+xml;2.0'. Not sure if caching would kick in with the query part.

That a client can learn about new subjects and that there is an automated way to an address where i can find more information about the subject is good. Getting a topic map back with more information about that subject is just wonderful. After some time in the community, i am still not sure what a topic map fragment is :-) Is it a topic map with subjects referenced which are not represented by a topic in the map?

The DC Vocabulary should be fine. Using only two role types for all association types is somehow unfortunate. I know Steve Pepper suggests it in his TMRA 2007 paper. Some razor of Occam. I will need to fix an issue[1] with Topincs first to be able to render that in a nice way. I am depending on distinct role types somehow. Having the publication date in such great detail, might leave me with with various data types for publication date. Most of the time this information is not provided with such accuracy. This is no problem for the human consumer, but will provide a challenge for machines.

You did not mention item identifiers. Anything would be ok for me. The only thing that matters is that cool URIs do not change. At this point I do not care whether the items are network retrievable or not.

I want to be able to merge an XTM 2.0 map that comes from a foreign domain into a Topincs Wiki page in the web browser, to allow a step of human editing. I have to implement a few features [2,3] in order to do that. But if you provide your service, i will do that soon. We might have to adjust the contract at a few places, once the idea takes form, but i do not see any huge obstacles along the way. Looking forward to the moment when i first merge in information about your blog entries, instead of manually editing it. It might take as long, but will scale much better.

[1] http://www.topincs.com/issues/wiki/id:167
[2] http://www.topincs.com/issues/wiki/id:168
[3] http://www.topincs.com/issues/wiki/id:169

Lars Marius - 2008-01-19 08:31:51

Great that you think this is OK. The content-type we can certainly set correctly, but not to 'application/xtm+xml;2.0', since that is not registered (yet).

There's no inherent reason why caching would not be performed on the query part, but maybe some tools don't do it.

A topic map fragment is a complete topic map in itself, but one that is just a fragment of a larger, complete topic map. Typically, like you say, it will have "stub topics," which are just referenced by identity without having any further information attached in the fragment.

My DC representation follows the latest ISO draft of this. Having just a single pair of role types as that draft does is not perfect, but it does seem better than any alternative I can think of. Anyway, I will do whatever the ISO standard eventually says. I think probably your issue will need to be solved anyway, so...

The thing about the publication date I didn't fully understand. What is the issue there?

The item identifiers will not matter much, since all topics will have a higher form of identity, but I agree they should not change. That should be easy to avoid. They will not be network-retrievable, I'm afraid. I can't think of any sensible way to support that.

Supporting manual merging of topic map fragments in the browser would be wildly cool. In fact, that would make it interesting for people to add XTM fragments of metadata in more places, just to be able to play. Hmmmmm. This really got me thinking. Would be cool to play around with the same thing in Ontopoly...

I agree we may have to adjust the contract a couple of times to get this right, but so be it.

Dmitry - 2008-01-19 13:53:37

I wish we have more blogs available in XTM format :)

Robert Cerny - 2008-01-24 03:05:12

@Dimitry: It took me a while to understand the actual message you are sending, actually Lars Marius had to interpret it for me :-) Your blog at subjectcentric.com also offers a Topic Map per blog entry. Any chance you will support XTM 2.0 in the future?

@Lars: The issue with the publication date is that you offer it as datetime and many times it will be only date. Thanks for the explanation on the fragment, that would have been my second guess .-) So it is a topic map! It would be good if stubs would have everything so that they can be displayed (a name and their type with a name).

I will release Topincs 2 pretty soon. It's not possible to add the merge in browser feature until then. That would cause more harm than good. But i will implement it in Topincs 2.1 and look very much forward to using it. My first intention was to use it for knowledge exchange between Topincs Stores, but even better if information from other sources can be integrated.

Robert Cerny - 2008-01-31 08:28:47

I noticed that the topic map you are providing uses a value element to encode the value of the occurrence. The version i use for reference[1], encodes the value in a resourceData element. What is correct?

[1] http://www.isotopicmaps.org/sam/sam-xtm/2006-06-19/

Lars Marius - 2008-02-08 06:20:44

Robert, resourceData is correct. This was a simple mix-up while writing the code. I've fixed the code now, and will deploy the fix asap.

trond - 2008-03-14 09:07:07

Dimitry wrote: "I wish we have more blogs available in XTM format :)"

I've finally released the Wordpress plugin for exporting Wordpress blogs to XTM (1.0) [see http://www.topicobserver.com/wp2tm/download/]

Hopefully, it can be of use/help..

Add a comment

Name required
Email optional, not published
URL optional, published
Comment
Spam don't check this if you want to be posted
Not spam do check this if you want to be posted
> Home
> Technology
> Beer
> Personal

> The author .
> On Twitter

RSS

follow us in feedly

Subscribe by email:

My new book


Gårdsøl
det norske ølet

My other book

Guidebook to Lithuanian beer
Rough guide to
Lithuanian beer

Technology blogs

Robert Barta
TopicObserver.Com
Sveins blogg
Stephen Fry
ongoing
Messages in a bottle
Alex Brown
Planet Topic Maps

Last comments
RSS

Lars Marius Garshol on A sudoku solver in P...

Heinz-GŁnter on A sudoku solver in P...

alex bloom on Active learning, alm...

alex bloom on Experiments in genet...

kenneth mwelwa on 10 tips on presentin...

fadirra on 7 tips on writing cl...

Tim on 7 tips on writing cl...

elmarie on What is an informati...

p2r on 7 tips on writing cl...

Jeffrey White on The solera paradox