The Semantic Web gang gathered this month to discuss the recent launch of Wolfram Alpha and the endorsement of RDFa by Google.
My impression of Wolfram, to talk about it for a second, is that it fills a clear white space in the search engine arena, a space I would divide up into 2 sub-fields:
- FIND: when you seek a specific, well-defined piece of information, you're into FIND mode. IMHO, that's a task in which Google's supremacy is fast eroding. If I seek a precise answer to a question, say the names of the different provinces in India or all the movies in which Sharon Stone played (not that I'd ever look for that), I tend to rely less and less on the search engine gorilla. I either go directly to Wikipedia (although it's a little like Google in that it's often serving me 'too much information'), use vertical databases (such as IDMB for movies), or land directly on more targeted search engines such as Powerset or, now, Wolfram, which impressed me.
Granted, I sometime still use Google to access Wikipedia. But the point is, Google is not my exclusive entry point to the web in that scenario. So Wolfram may well have found a key weakness to exploit, as the statistical approach *may* not be ideally suited to this task. Will Wolfram steal significant volume of clicks from Google? I don't know, a lot of that comes down to execution, but there is no denying it found a crack in the shiny armor.
On top of that, the trick of using incoming links as a key determinant for search result relevancy is destined to be supplanted by approaches letting the machines interpret the content of those pages itself (read: semantics), and ultimately using that as the primary relevancy driver. Remember, incoming link data is just a proxy of popularity: and as any intermediary, it is destined to be cut out of the web search food chain sooner or later. Popularity is useful as a driver when I want to see what others read. That's not very useful when I seek some specific answers.
Once a machine is able to figure out with high accuracy what a page is about, and furthermore, what each piece of data in that page is about (a problem services like OpenCalais and Twine are working hard to crack), it becomes much more precise at serving me what I am looking for, and only that. Given that currently, I still have to go through 95% of information I don't need to find the 5% I need, I say such improvement is more than welcome, and Google better watch out for better mousetraps.
- DISCOVER: the other use of search engines is general surfing on a more or less well-defined subject, to DISCOVER interesting content. Google still dominates that activity in my day-to-day use (alongside Wikipedia, I'd say). It's pretty good at showing me things I didn't think of, from a variety of sources, and letting me explore as I wish.
Say I'm looking to learn about ocean navigation tools under the Roman empire: Google would be my first stop. There are many competing services for that task, but nothing yet replaces the one-stop-shop that Google constitutes for that activity. Whatever I want to learn more about, I pretty much know I'll get interesting pages from Google, ranked with an algorithm I am ready to trust.
Over time, I could see this Google activity as being threatened too. But the popularity-based approach strikes me as something that's likely to endure, and statistics are pretty good at determining popularity. Better than semantics that is.
Ultimately Statistics and Semantics won't be used in isolation of one another; and the two activities above are not black and white either, there are lots of grey area, with a mix of FIND and DISCOVER. But overall, I bet we could divide a good chunk of searches between those two activities. So the logical follow-up question is: how do our searches divide up between them, volume-wise. I don't know. If anyone has information on that, please share. In the meantime, I'll be watching my own searches...
All this to say I'll be listening with interest to the review of Wolfram Alpha and Google’s adoption of RDFa by the Semantic Web Gang's, which I wasn't a part of this month due to a prior client commitment following the exciting Web 3.0 conference (which I'll discuss later). The Semantic Web Gang's recording can be found here.
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=e7a99aa0-445c-4895-acfb-3a2fbddc4f15)
