Archive for May, 2010

Ontologies, Semantic Data Integration and Semantic Web

Monday, May 10th, 2010

I’ve just been contributing to a long discussion in the Semantic Web group of LinkedIn (check it out in full here – subscription to the group is required).  It reminded me that there are still fundamental differences about what the semantic web is, could be, and is for.

Quite a while ago now I wrote a series of articles on our experience of developing large scale semantic systems for drug discovery and drug development work.  You can find the links either at Slideshare or directly as PDFs on this site:

DDT – Ontologies and Drug Discovery (http://slidesha.re/d8GqQV)

DDT – Ontologies and Semantic Data Integration (http://slidesha.re/d0z4j3)

DDW – Ontologies: Networks of Pharmaceutical Knowledge (http://slidesha.re/c7btI5)

To give a flavour of my perspective in the LinkedIn discussion, here is my initial answer:

“Depends whether you are a computer scientist, linguist or philosopher – and therein lies the problem. Semantics is about understanding the meaning of information and as Rudy says this is inherently subjective depending on the context that the objects are observed in and the experience of the observer, and voila we’re back to semiotics, ontology and epistemology.

As someone specializing in semantic search, a really big thing to bear in mind here is that most semantic web stuff focuses on the objects themselves, giving them unique identifiers and tags of their attributes. The more useful thing about ‘truly’ semantic applications is when you can start to understand more of the nuance of the relationships between objects in a given context. My standard example is a glass of wine sitting on a table. You could classify this as a clear drinking vessel for most contexts, but in a bar fight it becomes a lethal weapon. Same object, different context, very different experience.

The true goal of semantic applications is to be able to bring a user to a set of data aggregated and integrated from a wide variety of sources and let them find the information that is relevant to them (the right objects in the right context with sufficient trustworthy evidence) and let them visualize and use it in ways that make sense to them.

We do this everyday in our heads – a good example is planning a night out on the town, integrating train and bus timetables, cinema listings and restaurant reviews and plotting everything in the context of a timeline and geolocation. We are beginning to get mashups of this type where there are common semantic maps (a literal Google maps being a good starting point), but we can also do it is chemistry, and parts of biology now to a reasonable degree. What we need are good starting points (defined domain nomenclatures – note not always necessarily ontologies in a CS sense) and tools that let us map these together.

We also need a cultural shift – we have become so Google bound that we can’t see the results for the tool. Very quick example – drug researchers spend a lot of time looking for info in scientific literature. They use keyword search tools including Google. They seem happy to get back 3 million hits to a simple query like ‘Which compounds cause asthma’ and then reading through a few hundred abstracts to convince themselves of the answer (obviously ignoring the vast bulk of the hits). In fact the answer is about 500 give or take, and a semantic search tool can get you there instantly and (if done well) its answers are very nearly complete. Interestingly that seems to freak people out a bit as they seem to think this is less comprehensive than the Google search that gave them 2.999 million wrong or redundant hits. Go figure – back to epistemology… ;-)