Some thoughts on identity
The concept of identity is at the heart of both Topic Maps and RDF, and consequently it has been much discussed over the last few years, but I've seen very little writing on what identity itself actually is. This posting offers some thoughts on that.
Identity is sameness of referents
One difficulty with the term "identity" is that it is very abstract, but the basic meaning is pretty clear: "exact sameness", WordNet calls it, while Webster says "the state or quality of being identical, or the same; sameness". In other words, two things are identical if they are the same. On the face of it, this sounds absurd, because if the two things really are the same then they are of course not two things at all, but one thing. So how can this be?
This is where we have to step back a little, and remember that we do not really experience the physical world directly, but only indirectly via sense data. In other words, when I saw a person on the bus yesterday morning, and then I see a person in a pub yesterday evening I am dealing with two different sets of sensory data. I could decide that the two persons are identical (ie: that I saw the same person twice), or I could decide that they are distinct (ie: different, but perhaps similar-looking).
The same thing comes up with names and identifiers. Are Ceylon and Sri Lanka identical? That is, are they the names of two things, or of one thing? Similarly, are "no" and "nor" identifiers for the same thing? In ISO 639 they are (they identify the language "Norwegian".)
In other words, identity is only ever an issue when we look at pairs of proxies for some thing (sensory data, names, identifiers, ...) and try to decide whether or not these proxies are proxies for the same thing. So identity is a relationship between two proxies that determines whether in some sense they are proxies for the same thing.
Identity is an equivalence relation
Mathematically, identity is a relation. In fact, it's a particular kind of relation known as an equivalence relation. What this means is three simple things:
- All things are identical to themselves. That is, any thing is the same as itself. (Pretty obvious, I should think.)
- If A is identical to B, then B is identical to A. (This just says that identity is not directional, which is again obvious.)
- If A is identical to B, and B is identical to C, then A must be identical to C. To put it another way, if Kent Clark is Superman, and Superman is Kal-El, then Kent Clark must be Kal-El. (Again, this should be beyond dispute.)
This means (as I've blogged about before) that if you apply the identity relation to a set of proxies they will be sorted into equivalence classes, where each class contains all the proxies for one thing.
Identity in Topic Maps
So how does this square with identity in Topic Maps? Pretty well, actually. Topics are proxies for things called subjects, and if two topics are found to represent the same subject they merge. It's easy to show that the test for sameness is an equivalence relations, so topics eventually become the union of all the proxies in one equivalence class.
In the Topic Maps Reference Model (TMRM) things are a bit more complicated. The TMRM defines a class of operators known as merging operators, and set out some minimal requirements these must satisfy. It does not specify anything to do with identity, but does explicitly say that merging forms equivalence classes, and requirements on the merging operator are set out that correspond to my three claims about identity relations above.
Identity in RDF
In RDF identity is in a sense very simple: nodes with URIs either have the same URI, or they don't. So identity at the basic level is just equality, which is an equivalence relation, and obviously identity in the sense that we described it here.
However, OWL modifies the picture somewhat, by adding the owl:sameAs property, which allows you to state that resource A is the same as resource B. It's not easy to dig the precise meaning of this out of the OWL specifications, but as far as I can tell the semantics map it to "=", which again is an equivalence relation.
OWL also defines the owl:InverseFunctionalProperty class, which is a class of properties. What this says is that if p is an inverse functional property, and you have the two statements (a, p, v) and (b, p, v) then a = b. This just provides an additional way to test for identity, but one that is conformant with the general picture described above.
So, to summarize:
- identity is always a test on proxies for things (otherwise we wouldn't need an identity concept at all),
- identity always tests whether the things are the same things, and
- identity is an equivalence relation.
Further, this general picture of identity holds in Topic Maps and in RDF. I would assume that it also holds in other technologies that are identity-based.
I've been doing some thinking about identity lately, but to explain myself I need to first get across the concept of an equivalence class
Read | 2006-08-17 21:23
As usual, the conference was opened by Lutz, who gave a short introduction based around the conference motto of "Scaling Topic Maps"
Read | 2007-10-11 18:13