Quest for more meaning online
By RICHARD POYNDER
6th May 2002
The World Wide Web has revolutionised the way in which information can be published and exchanged but many argue that it now needs upgrading.
Last year, Tim Berners-Lee, inventor of the web and director of the main web standards organisation, the World Wide Web Consortium (W3C), outlined his vision for what he calls the "semantic web" in an article in Scientific American. He envisages a web in which information is much easier to find and computers do much of the hard work for us.
Though the article attracted a lot of attention, it was widely misunderstood and Mr Berners-Lee and the W3C have subsequently spent a lot of time clarifying what they mean. For example, contrary to popular belief the semantic web will not replace today's web, says Eric Miller at the W3C, but will be layered over it.
"It is simply an extension of the current web based on a series of enabling technologies designed to create formal descriptions of information to allow machines to take advantage of it."
So by making web information meaningful to computers, greater automation becomes possible. Today, for instance, while the HTML tags, or mark-up, used in web pages dictate how the information will be displayed (a headline here, italics there), they give no clue as to its meaning.
"What we now need is mark-up that can talk about what the content is, what it means, what it is about, what it can do," says Ian Horrocks, a senior lecturer in computer science at Manchester University.
The first building block of the semantic web is XML, or Extensible Mark-up Language. Invisible to the human viewer, XML tags can be used to describe how information on a page is structured, allowing visiting computers to read and act on it without human invention.
"I can either go to an airline's website and find out the time of a plane and book a ticket ... orI can have a software program do it," says Simon Witts, a Microsoft vice-president for Europe. "Instead of me personally visiting the site and reading the information through my browser, I can have a program go and read the XML."
However, there are problems with this approach. Unlike HTML, which is a single, predefined language, XML is a metalanguage: a language for describing other languages. While this allows anyone to design customised mark-up languages for endless different types of documents, it means visiting computers need to be familiar with the specific XML language before they can interpret it.
To deal with this, XML-encoded pages can point to an XML "schema" located elsewhere on the web. "The idea is that there will be an XML tag that tells you the name of the schema," says Simon Phipps, chief technology evangelist at Sun Microsystems. "So if my computer did not understand the format of your data, it could refer to the schema the tag pointed to for an explanation."
The concept of linking to other information to provide contextual meaning lies at the heart of another building block of the semantic web: the Resource Description Framework, or RDF. Using the subject, verb, object structure of the simple sentence, RDF can link different pages, concepts and assertions in order to create relationships between - and make statements about - people, things and properties.
"RDF allows us to formalise relationships and begin to contextualise web resources as more than just links," Mr Miller says. "We can say, for instance, that A is dependent on B, and define relationships not only between documents but [also] between documents and people, people and organisations, people and places."
Using RDF, for instance, it becomes possible to state in a machine-readable way that the Financial Times Group is a subsidiary of Pearson; or that a particular journalist is the author of a specific article. However, RDF still only takes us so far. "While you can specify, say, a relationship of ownership, understanding that relationship implies knowing what ownership means," says Mr Horrocks.
To provide greater depth of meaning, therefore, artificial intelligence tools are also being co-opted - notably the concept of ontologies. An ontology will generally include a taxonomy, or ordered classification system, defining classes of things and their relationships, with rules defining how those relationships can be used.
"An ontology will not only talk about, say, footballers - but about different kinds of sport and their relationship to each other," says Mr Horrocks. "It will tell you, for example, that a footballer is a person who participates in the game of football and that football is a game played with a ball and 11 players on each side."
Vital to the concept of the semantic web is the use of software agents. These will be programmed - or learn how - to seek out relevant ontologies to aid their understanding. The aim is to populate the web with "dictionaries of meaning" that agents can refer to as they traverse cyberspace.
Last, some form of "reasoning engine" will be needed in order to interrogate these ontologies intelligently. This function could, for instance, sit within a dedicated semantic search engine, to which agents would refer.
The semantic web may also enable computers to infer information. "Suppose, for example, a software agent was instructed to search for a Georgian coffee table," says Mr Horrocks. "It could access an ontology specialising in antique furniture and establish that Georgian furniture is furniture made in the UK between 1714 and 1830. If it then came across an advertisement for a table manufactured in the UK in 1805, it would be able to infer that the table matched what it was looking for - even if the advertisement did not describe it as Georgian."
Is the semantic web achievable? Certainly there are sceptics. One practical consideration is that semantically enabling the millions of web pages already out there would require a massive retagging effort, since few currently contain XML tags.
For this reason, suggests Alexander Linden, a research director at Gartner, early implementation will probably focus on e-commerce applications such as procurement. "Today's procurers have a hard time finding the products they need. The semantic web will be good at defining product specifications." (See box.)
In the long term, however, Mr Berners-Lee expects the semantic web to prove truly revolutionary. In his book* he predicts that software agents will eventually take on all the low-level information discovery and exchange, allowing humans to be more creative. If he is right, he says, the pace of human learning will increase rapidly.
*Weaving the Web, Tim Berners-Lee, Texere Publishing, 2000.