Let's put that in the context of the Semantic Technology Value Chain. This launch comes as a perfect illustration of a company, Reuters --who is behind OpenCalais-- seeking to deliver value to the user through a Semantic Application --in this case, one that will enable Semantic Pages-- so it can leverage all those user interactions to grow and better their Semantic Metadata Store. Smart.
So does the Semantic Application do the job?
I've used the beta demo and it's already quite good. It mostly appears to function as a "reverse Google" function, you input a page and it outputs relevant keywords it found in the page. But it goes one step further, by organizing those keywords in categories and displaying a "relevance" factor (which would be worth describing further I must say). With URIs on top, as OpenCalais announces, it's going to rock. The value of recognizing people, companies, places etc... and making those "URIzable" is immense as it allows plenty of automatic linking across web resources.
The main audience for now appears to be programmers. OpenCalais likely hope that some third-party programmer is going to use their APIs to offer useful applications for webmasters, and that things trickle down from there. You could think of tools using Semantic Proxy to enrich blog links and content; of tools to mash-up information once the URIzing function is up; or of a tool to convert the entire web into a semantic graph. Thinking "blue ocean" a little, that last one could take existing websites and save semantic versions of those websites, possibly at an URL such as "websitename.sw" for a site entitled "websitename.com", where sw stands for "semantic web". Wow.
Unreasonably taking my review of Semantic Proxy one step further, one thing I'll look forward to understanding better is how much of the current extraction is actually "semantic" as opposed to "linked data". There is a tendency to use both terms interchangeably when really they are different. Linked data based on poor extraction methods would be poor semantic data, linked or not linked.
OpenCalais seems to nicely recognize "micro-"entities like names,
locations, dates, but it still "appears" to use a statistical engine
that picks keywords in the text rather than getting down to semantics and using
that understanding to propose relevant entities (please right my wrong if I'm
mistaken). My question for Thomas Tague: how much farther do you anticipate
OpenCalais/Semantic Proxy to be able to go beyond those recognizable entities,
and recognize higher-level concepts, i.e. what's the text is about, even if it
is not stated black-on-white?
And beyond that, to what extent is the output really
"RDFized" as well. I explain: the site mentions the output is RDF, but how much does it actually make use of the power of triples by including smart predicates? Are
currently these predicates mainly the verb "BE", as in "Paris
Hilton IS a person", "Paris IS a city", "1945 IS a date" etc...?
What I'm trying to assess is the degree of both built-in and produced
semantics. As a disclaimer I haven't dug into OpenCalais's technical guides
and FAQs in a long time, so feel free to point us to them. From a pure
marketing standpoint, adding some quick popularization information to the demo
wouldn't hurt and certainly expand the audience. I know and agree with Thomas Tague's
mantra "if you have to explain it, I don't want it", but in that case
a little explanation will make the rest of us want it even more.
All in all, this could well be one small step for Reuters, one giant leap for the semantic web.
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=024974f0-ffcf-4288-b2bb-2feb434d20fa)
