Thomas Tague of Calais dropped by and answered the question I was asking in my last post, regarding the use of sophisticated predicates (i.e. "verb-type relationships") in Calais / Semantic Proxy. Beyond simple name-dropping (yes, Tom reads my blog! Well, actually, I asked him the question directly...:), the reason I post about it is that his comment is informative and insightful enough that I think anyone interested in the semantic web, RDF and their future should read it.
As some parts of it may feel a little techie to newcomers of the SemWeb world (that's why even Calais could use marketers ;), let me give my quick take-aways (I welcome any correction on my interpretation):
- Tom reports almost 1.5M transactions per day and climbing. Impressive! I suppose that by transactions he means hits. I wonder how many RDF triples are created per day?
- The entity extraction engine in Calais is based on a set of complex technologies centred on Natural Language Processing (i.e. pattern recognition and rules), lexicons and some statistical analysis. It's not purely statistical, contrary to what I was thinking. This brings up the question of how it compares to Google's engine. My assumption is that it's more processor-intensive but also more powerful. In other words, it's semantic! Obviously, Google has been reported to use some of the technologies mentioned by Tom as well. Do they, really? Could Google launch a Calais?
- The app creates some smarter predicates than just "IS" (as in "this IS that"), which seem to be derived from entity types such as Facts and Events. It does still include a lot of "IS" predicates for simple one-to-one relationship, and I would welcome more examples of what the smart predicates created by Calais are.
- This said, creating smart predicates does not appear to be something Calais does a lot of, for now. Quoting Tom: "We don’t dip into the global linked data brain or Dbpedia or other assets to find and deliver more information about what we’ve extracted. If we tell you someone is a “Person” - we don’t tell you that people are mammals. As far as I’m concerned – that’s where linked data and large scale “describe the world” ontologies come in. So – in summary. Entity recognition (the relatively easy part of what we do) is always about “IS A” type relationships. The harder (and cooler in the long run) stuff is much more sophisticated."
In the end this provides a mixed picture on the present use of RDF but some line of sight on how the potential of RDF could be leveraged.
On the one hand, Calais does rely a lot on "IS" and probably a limited number of other simple relationships / predicates. RDF seems a little overkill for that.
On the other hand, it does report creating some smart predicates, and it looks like using the same technologies, i.e. Natural Language Processing and some statistical analysis, one could convert lots of data into smart RDF triples with sophisticated predicates. Undoubtedly, there are some other semantic extraction technologies on their way (from the research labs or B2B applications), which will make that even easier.
I know the next question on many readers' minds will be: "Cool, so what? What does RDF do for me?" and that's also one I want to explore further. But immediately, I feel quite reassured that RDF may well indeed have the potential to provide a solid foundation enabling machines to process data in very interesting ways.
Not to say I wouldn't like to hear from Nova Spivack on my second question: does Twine really use RDF and store all its tags as triples?
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=cc6abe22-0451-4ba5-adfe-3729e834ba6f)
