Search in Topic Maps portals
Posted in Technology on 2007-08-21 11:23
One of the features that sets Topic Maps-based portals apart is their support for search, which is generally better than in ordinary portals. However, implementing search in any given portal generally requires lots of discussion with the customer and interaction designers, and it's not always clear what is the best approach.
As part of my work on semantic searching (about which more later) I looked at how various Topic Maps-based portals have approached search so far. This informal survey doesn't actually fit anywhere, so I thought it might be just as well to make a blog entry out of it, rather than throw it away. I've also added some recommendations based on the experiences I've had so far.
I looked at the following aspects of the search implementations in a number of Topic Maps-based portals:
- Whether users are given the chance to apply some kind of structured filter before they search, using a drop-down list of categories or by other means.
- Whether users are able to filter search results after they have been displayed.
- Whether or not search results are displayed grouped by type. Here I don't mean whether the type is displayed, but whether the entire layout of the results is organized so that topics of the same (or similar) types are displayed together as a group.
- Whether or not categories can be found in search.
- Topic types
- Whether or not topic types can be found in search.
- Whether or not statements about the topics (beyond just the name and type) are used to provide more information about the topics found.
- The quality of the relevance ranking of search results.
The results were as follows:
I've put "Y-" in some cases where sites have a feature, but only in a rather limited way.
Pre-filtering is quite rare, and in general it does not appear to work very well. The main problem is that people are reluctant to filter before they search because they have less of an idea what the filters mean before they have searched. And in any case they don't know if they need to filter until afterwards, and it's just as easy to do it then.
Post-filtering is quite widely supported, and is very powerful, and is definitely one of the aspects of Topic Maps-based portals that have worked very well. However, it's easy to make the filtering interface too crowded and complex for users. So the main challenge here seems to be to make this rather complex feature intuitive to use for non-technical people. In other words, at least with Topic Maps this is more of a design challenge than a technical challenge.
Grouping is not very common, and my experience with it has been very negative. It can seem attractive at first, but users don't expect it, it makes it more complicated for them to "parse" the results page, and it makes it harder to scan the list of results. The worst thing, however, is that it breaks the relevance ranking, since the ordering of results is determined by the order of the groups.
In the City of Bergen portal there are four groups, which means that on average in 3 out of 4 cases the best hit will not be listed at the top. This is not because of some limitation in the ranking of results, but because the order of the groups is fixed. In other words: vertical grouping defeats ranking of results. Fuzzzy.com also has grouping, but only into two groups, and as it does this horizontally it works better. (I'm still not sure this is a good idea, though.)
Nearly all sites allow you to find categories when searching, but in several cases I've been involved with getting the customer to agree to allow this has been a real uphill struggle. I don't know why, but to customers it often seems wrong that categories should be findable through search. The winning argument for making them findable has been that the customer typically spends considerable effort on collecting the most relevant set of content possible under each category. If the user then types the name of a category, why not offer what is in effect a hand-made page of search results to the user among the other search results? Nobody has been able to formulate a good reason not to that I've heard.
Topic types cannot be found via search in any portal that I've seen, and this is probably because the portals tend not to have any pages for the topic types. This makes sense given that a list of all persons or articles in a portal is rarely very useful. Still, this is a search that people perform, and it's not really clear that this might not be useful.
The descriptions of search hits are very limited in most portals, but some Topic Maps portals go much further in this regard. In the Kulturnett portal, for example, search hits for "Ibsen" are described as "book by Atle Næss," "museum in Oslo", "author", etc, and these descriptions are structured. This is a very useful feature for users, since it tells them much more about what they've found without taking up much visual real estate. I think many of the portals which left this out did so because their ontologies are so weak that they cannot really describe the topics much.
Day two started right off with two parallel tracks, and I went to the track on "Portals and Information Retrieval", where the first speaker was Sam Oh
Read | 2006-10-12 09:09
Tonight was another one of the monthly users' group meetings on Topic Maps, and tonight the subject was Topic Maps and Semantic Search
Read | 2007-01-30 16:14
Svein Arild Myrer - 2007-08-21 09:43:01
How do you measure "the quality of the relevance ranking" and what do you consider to be good relevance? As for the topic map portals i've been involved in we have connected weighted values to both topic types and occurrences for ranking of the search results. E.g a 'person-typed' topic would be ranked higher in the search results than an 'article-typed' topic as we normally would consider a 'person-typed' topic to be of higher importance. So far this has shown to be a ok solution.
Lars Marius - 2007-08-21 15:00:21
The "measurement" of ranking quality was done very unscientifically, but trying out a couple of searches for things I knew were in the topic maps and seeing how the presumed best hits were ranked.
Typically, if I search for part of a person's name, and an article with no obvious relevance to the name is ranked before people with that name, I consider that poor ranking. And so on.
None of the portals I've worked on yet do this kind of score weighting that you mention, but I agree that it's a good idea. I've been pushing for it for a while in various contexts, and it looks like Bergen is implementing it now. I want to go further and offer it as part of the OKS product out of the box, but that will take a while yet.
Svein ōlnes - 2007-08-24 17:32:58
This is indeed an interesting topic. I think the reason for Topic Maps based portals generally having better search is obvious: the search leans on an underlying semantic structure. Or at least it should do.
Most CMS's have poor search facilities because they have weak or no support for semantic structures. It is also very common to find that the search application knows nothing about the site's structure. It behaves like being in a vacuum. I also think many information architects underestimates the importance of having the search function build on the site's semantic structure (I am going to give a talk on this on EuroIA 2007 in Barcelona in mid September together with my colleague Nils Arne..).
The problem with some Topic Maps based search facilities is the tempting "let's show them all we know". The key to good searching is to restrict the features to the most useful and not try to show all the information in the Topic Map. The most useful feature I think often boils down to categorisation by topic type.
Your survey is interesting but the table leads one to think that the more "y"'s the better. I don't think this is the case. One thing is that the quality of the semantic structure, the ontology, is extremely important in order to get a good result. When Bergen divides information in services and articles it is a very bad categorisation. They are obviously not talking about the same things. And the search will suffer from this unclear division.
Lars Marius - 2007-08-25 05:21:54
I definitely agree with your comments on CMSs. Many of them don't even support searching of PDFs etc that get attached to articles, and in general they could do much better than they do. So could the Topic Maps systems, admittedly.
I definitely agree that more Ys is not necessarily better. I'm skeptical about showing topic types, and I think the "grouping" and "pre-filter" columns are downright negative. But I actually collected this data because I think these are negative; I wanted to be able to show future customers this and say: look, these features are not very popular, and that's for a reason.
Looking forward to seeing your EuroIA slides, and, I hope, a blog posting about the talk.