Ontologies: validation or reasoning

<< 2006-10-15 19:37 >>

Information about the structure of an ontology can be used for two different purposes: either for validation or for reasoning, and this is a distinction that it seems most people are not aware of yet. I'll try to clear up the confusion in this blog posting, as best I can.

sheep-1 and sheep-2

Simplistic distinction

Let's assume that we have an ontology that contains the association type creator-of with the role types creator and creation. Let's further assume that we have a constraint (represented with TMCL, for example) that states that only topics of type person can play the role of creator. Now, if we encounter the following topic map (in LTM), what should happen?

creator-of(black-thing : creation, sheep-1 : creator)
[sheep-1 : sheep = "Sheep number 1"]

There are two possible answers here, depending on whether the constraints are being used for validation or for reasoning. If they are being used for validation, the validator will see that in this case the player of the creator role is not a person, and so it will consider the topic map invalid, and complain about our one creator-of association. However, if we had used the constraints for reasoning the reasoner would have assumed that the instance data was correct, and inferred that sheep-1 is not just a sheep but also a person.

Adding the shades of grey

At this point it sounds as though there is a black-and-white distinction between the two scenarios, but this is not the case. Let's say that we extend the example topic map above with the following information:

supertype-subtype(person : supertype, farmer : subtype)
[farmer-giles : farmer = "Farmer Giles of Ham"]
creator-of(barn-x : creation, farmer-giles : creator)
tmcl:disjoint-with(sheep : tmcl:disjoint, person : tmcl:disjoint)

In this case, the validator would have accepted the second creator-of association, because it would have used reasoning to work out that since farmer-giles is a farmer he must also be a person, even if this is not stated explicitly, and so he conforms to the constraint.

What's interesting is that in this case the reasoner would have detected that there is a problem with the first creator-of association, because it implies that sheep-1 must be a person, but this is inconsistent with the information we already have about it being a sheep (because of the disjoint-with association), and so something must be wrong somewhere.

In other words: validation does some reasoning, and reasoning does some validation. The difference lies in the emphasis more than anything else.

RDFS, OWL, TMCL

At this point, RDFS and OWL only have reasoning semantics, but no validation semantics. It seems that TMCL will have only validation semantics, and no reasoning semantics. My personal opinion is that TMCL has gotten this right, while RDFS and OWL have not. The reason is that the most common business requirement is to be able to verify that the data in the topic map (or RDF model) is correct, and a constraint language with validation semantics is the best way to achieve this. An ontology language with reasoning semantics does some of the same, but not enough.

Of course, I'm not trying to say that reasoning is useless, only that I think validation is more important. This implies, of course, that we may want to create a reasoning semantics for TMCL at some point, but also that at the moment validation semantics is the main thing.

Comments

rho - 2006-10-20 12:15:00

I find this presentation of the affairs somewhat misleading. :-)

In the first sheep example, where the sheep creates something, a validator can only reject a map as invalid if it knows that sheep and person are disjunct. This can (a) be the case when CWA is part of the semantics of the constraints or (b) if an open world assumption is used and explictly person != sheep is provided as in your second code snippet.

From that snippet I also gather that you want TMCL to have OWA semantics, otherwise an explicit tmcl:disjoint would not have to be necessary. But then TMCL is in the *same* boat as OWL, as that one also uses the (somewhat weaker) OWA. That makes sense in a SW scenario where new information could be around every corner.

And then I do not quite understand why you present 'validation' and 'reasoning' as different things. As you write yourself, both are always connected (as the farmer example above shows). Whenever you validate, you have to do some form of reasoning. Always.

What I guess that you want to distinguish is *how* a given constraint is used: constructively (adding new knowledge to a given instance map), or destructively (filtering out all those fragments of a map which do not satisfy the constraint).

Yes?

Lars Marius - 2006-10-20 12:35:14

rho writes "In the first sheep example, where the sheep creates something, a validator can only reject a map as invalid if it knows that sheep and person are disjunct."

No, not necessarily. In OSL it's enough that you did not say that the sheep is a person. Since you didn't say that, the association is invalid. This is what I call "validation" semantics, where no reasoning is done from the constraints, even if reasoning is done from other information, such as subclassing.

rho also writes: "From that snippet I also gather that you want TMCL to have OWA semantics, otherwise an explicit tmcl:disjoint would not have to be necessary."

This isn't just about OWA/CWA. I want an even stricter semantics for TMCL, which is what I refer to as validation semantics. I guess you could refer to this as "correct data assumption" (CDA, or reasoning semantics) versus "incorrect data assumption" (IDA, or validation semantics).

In CDA, when you see creator-of referring to a sheep, but only persons can be creators, you assume this means that the sheep is also a person. This is what OWL/RDFS do.

In IDA, when you see this, you assume that the data is wrong, because you assume that if the sheep really were a person, that information would have been in the data.

To put it yet another way, in IDA or validation semantics, the constraints are taken literally: the player of the creator role must be a topic that is an instance of "person" or one of its subtypes.

The reason to choose IDA or validation semantics is that if you find a creator which is not explicitly stated to be a person, something is mostly likely wrong somewhere, and you want it fixed.

Name	required
Email	optional, not published
URL	optional, published
Comment

Spam	don't check this if you want to be posted
Not spam	do check this if you want to be posted

Larsblog

Ontologies: validation or reasoning

Simplistic distinction

Adding the shades of grey

RDFS, OWL, TMCL

Similar posts

ISO meeting in Leipzig

Semantic Web adoption and the users

The Prague meeting

Comments

Add a comment