Larsblog - technology

Bayesian identity resolution

Stian Danenbarger has been telling me for a while about entity resolution (as he and many others call it), or identity resolution (as Wikipedia calls it). Basically, it's the process of working out which records/entities/objects actually represent the same real-world things by comparing their properties. Once Stian confirmed that Bayesian inferencing was a common method for this, I suddenly saw how you can actually do a poor man's version of this with just a little basic scripting. ...

Read | 2011-02-11 13:23 | 14 comment(s)

What's up?

While RSS and Atom are a great way to stay up to date on what is published around the web, I think the feed-centric approach taken by most feed readers is suboptimal. For some feeds I want to read everything that is posted, but for others I want to read only those few posts which are about subjects I care about, or by authors I like particularly. Another problem is that some feeds (for example those of newspapers) have hundreds of posts every day. Staying on top of that is just too much manual effort. ...

Read | 2011-02-03 19:50 | 11 comment(s)

The applications of SDshare

Graham Moore a few years ago came up with the idea of publishing changes to topic maps using Atom, and a CEN project has now developed and published a specification for it called SDshare. Work is also underway to make SDshare a full ISO standard. ...

Read | 2010-11-21 14:29 | 0 comment(s)

My report on OOXML and ODF

Disclaimer: Work on this in the Norwegian government has been going on for years. I worked on this for four months, producing a 45-page report. This blog posting oversimplifies most of the way through in the interests of brevity. ...

Read | 2010-05-09 20:47 | 12 comment(s)

A path language for Topic Maps

I sketched a little path-based query language for Topic Maps this summer, mostly to explore what such a language might look like. My TMQL co-editor, Rani Pinchuk, asked me to write up a more detailed description of it, and that's what this blog posting is. ...

Read | 2009-09-23 11:01 | 13 comment(s)

Datatype validation with TMCL

It's long been generally assumed that TMCL (the Topic Maps Constraint Language) should be able to validate datatyped values, but very little thought has so far been devoted to exactly how. It may look like a trivial issue, but in fact datatypes is an enormous tangle of complex problems. To pick one example at random, consider the ordering of time durations in XML Schema. This posting is an attempt to consider what TMCL should and, equally important, should not do. ...

Read | 2009-07-20 14:37 | 3 comment(s)

A Topic Maps file system

The idea of a Topic Maps file system is not new. Robert Barta presented one such at TMRA 2008, and Inge Henriksen is also working on one. However, I had my own take on this that I wanted to realize for several years. The starting point was the Mac screensaver which shows all photos from a given directory as a kind of slide show. I've set it to the root folder I store my photos in, but then it shows all photos, which is not always that pleasant when you're on a projector in a meeting, for example. ...

Read | 2009-06-03 16:25 | 5 comment(s)

My Twitterhood

I've been using Twitter for just about a year now (username: larsga), ever since Tim Bray wrote enough about it to make me curious about what it was. I've since come to enjoy it as a kind of mix between blogs and chat, and have developed a very mixed crowd of people that I follow. One day I started thinking about categorizing these people, and I started wondering what clusters of Twitterers I was really following. ...

Read | 2009-04-05 20:43 | 6 comment(s)

The Prague meeting

The ISO SC34 meeting in Prague was a big affair with five different working groups and many attendees. Working group 3 had a lower attendance than usual (for a number of reasons), and perhaps for that very reason had a highly productive three days focusing on TMCL. The status before the meeting was that we have a quite loose draft that shows in rough outline the intended functionality of the language and gives a good indication of the way it's intended to be specified. The task of the meeting was to process this to the point where the editors could write something quite close to the final specification. I'm happy to say I think that's what we did. ...

Read | 2009-04-02 10:50 | 0 comment(s)

TMShare the Second

Graham Moore and Marc Wilhelm Küster presented a new Topic Maps protocol called TMShare at TMRA 2008 this year. Many Topic Maps protocols have been presented already, mostly similar in conception, but TMShare is actually a completely new kind of protocol. Unlike earlier proposals it does not allow random access to topic maps on the server, but instead provides a feed of the changes to those topic maps. So essentially it provides a mechanism to replicate a topic map or part of one to another server. (I call this TMShare the Second because there was another TMShare protocol before this one.) ...

Read | 2008-11-08 15:45 | 1 comment(s)

<< Previous

Last comments
RSS

Lars Marius on Bayesian identity re...

Benjamin Hersey on Bayesian identity re...

corrado campisano on tmphoto

mehpic on A sudoku solver in P...

s on Equivalence classes

Lars Marius on Bayesian identity re...

Marc Norlain on Bayesian identity re...

Lars Marius on Bayesian identity re...

Scott on Bayesian identity re...

Kenneth on 7 tips on writing cl...