In my previous post, I argued that two things can help bring to life a truly semantic web: the first one is the Linked Data medium. One person commented that Linked Data is not just a medium, but creates meaning. I see the point, if you assert that meaning is created through data transformation, as one can process RDF triples (the Linked Data format) through SPARQL (the query language, like SQL for databases) and also create new associations of triples linked through common URIs (universal concept addresses - I described them in the previous post, and you can double-click on the word for a definition). Depending on how you define meaning, you could characterize that as meaning-creating. Or not.
Let me specify my thoughts further. As I see it, the biggest hurdle in enabling the semantic web right now is in creating “clean” triples, and the right links to the right URIs, from unstructured data. That type of data transformation from unstructured to structured really is where 80% of the meaning (to pick a number that sounds right – let me bet that too will get me interesting tweets) is added. That’s why, in most cases, it’s still best done by humans. Because it’s tough. And it’s tough because it adds lots of value to the original data.
I see how further data transformation of the type enabled by Linked Data can add extra value, by allowing the processing and linking of data across the web, but technically, that is (1) not adding "as much" meaning, in the sense that most of the meaning created comes from having the right triples and linkages in the first place (if the data is poorly structured or poorly linked, Linked Data will just turn garbage into more garbage), and then (2) most of the meaning added on top of that is derived from creating the right filters using SPARQL for instance, and SPARQL still needs to be programmed, which requires humans or other extraction algorithms, something that by any definition is not Linked Data. Linked Data just gives us better tools. Like a hammer does not by itself assemble a bookcase, Linked Data does not create meaning, it just makes it easier for the same technologies we already use to create it: usually, human programming and inputs, and text analysis algorithms, based on taxonomies, natural language processing, statistical methods, and other approaches.
In other words, most of the meaning is created by structuring unstructured data, and the rest is created by programming the right algorithms to process and filter data. None of that is Linked Data. If you still think Linked Data does help create some meaning, I won’t disagree, since it’s probably related to your definition of meaning creation differing from mine. But I maintain that the main contribution of Linked Data lies in encouraging us and making it easier to add meaning by opening up the data and linking it across, and then enabling processing on all those granular bits. That’s why I tagged it with “Open” in my graphic representation, and I tagged Data Structuring with “Smart”.
One more thing on Linked Data. I reserve my judgment as to whether it will and ought to become the dominant medium to carry information going forward. While it has made great progress in the past year, I have not seen it being adopted by new start-ups in the semantic space, while at this stage I would expect it to have been. Tom Tague agreed during our last podcast. Ian Davis of Talis also pointed that out in his article Where Are All The RDF-based Semantic Web Apps? I would like to see counterexamples of this, so please fire up if you know of successful start-ups that are leveraging RDF, OWL and URIs. And I mean beyond Linked Data hosting platform play (such as Talis for instance), since it is not an application play.
Trying to answer my own question, I came across Ivan Herman’s Use Case and Case Study collection page, which is referenced by some readers, but I couldn’t find any live application, which makes it hard to assess the performance of Linked Data. His presentation on Applications is also interesting, but the examples are mostly not accessible, and those which are, are not always compelling, such as Twine, which actually shied away from using RDF to store data some time ago, I believe. Havind said that, there seems to be some concrete examples in the latest issue of Nodalities, e.g. O’Reilly’s use of Linked Data, which I will take a closer look at... soon!
I suspect this relative lack of start-up adoption is due to RDF being quite bulky as an information format, as I discussed in part 2, and thus requiring extensive processing. As such, I am not yet convinced it is destined to become the universal way to model data in a semantic web world. But so far, just like Democracy, it’s also the “least worst” for opening up your data, and many people are working on improving it. There was a seminar recently organized by Franz, a leading player in the space, on Solving Scale and Reasoning in Large RDF Datasets, for instance. As we well know by now, formats can win thanks to network effects, even without being initially the best technological option (at least in the short term, till something ten times better come along, the rule-of-thumb says). No wonder some of the Linked Data supporters are so adamant about pushing Linked Data as the universal format for the Semantic Web… Did Darwin ever consider network effects as an advantage of one species over another?
So while there is no way to state conclusively whether Linked Data is worth its weight in R&D funding (that's the temporary conclusion of my Cost-Benefit Analysis of Linked Data, for those keeping track), it clearly hinges on its ability to deliver a more granular, more flexible experience, that so far has proved a little elusive due to performance question marks, on the one hand, which I have little doubt can be solved, and lack of data sources, on the other, which is the tough part of the Semantic Web, and one that Data Structuring technologies will help resolve.
The Surprise Guest: Text Analysis and Other Data Structuring Methods
Let’s say we have the medium, what do we feed it now? This is the real problem behind the Semantic Web and Linked Data. It’s good to have a better way to carry water from a river and distribute it to all the other villages, but if that water is polluted, it doesn’t help that much. As I alluded to earlier, Garbage In, Garbage Out

By that I do not mean to diminish the huge achievement that Linked
Data represents in any way. Certainly, where the data is clean and
structured, and it is in many places, it makes sense to have a better
technology to link it out. And where it’s not, the availability of such
a channel is a new incentive to decontaminate that "water". In fact, it’s
the best incentive we’ve ever had.
But what I am pointing out is that the Linked Data format does not contain that Brita filter to select and clean the “water” it’s carrying, data. Neither does it have a mechanism to automatically sort the water in the right bottle sizes depending on village sizes, weather and other conditions (it offers us parts to do it, we still need to put those together in a clever fashion). It does not create smart data, it only enables it. And when noise-to-signal ratio is undeniably the biggest problem we face today on the web, I think spending more investment to tackle the problem of moving data from unstructured and dumb to structured and smart in a coordinated fashion, just like the coordinated effort that has been deployed for Linked Data, would be a wise investment. Without it, the ROI for Linked Data will remain invariably negative, because it will have to rely on existing sources of clean, granular, structured data, which are only a portion of all the information we exchange and create daily. It will be polluted by the other data streams, and likely add noise rather than reduce it.
So, technologies to turn unstructured data into structured data is really where we ought to invest, and focus our efforts. The good thing about Linked Data is that, if it manages to impose itself as a key medium for the semantic web, it will increasingly expose the limitations of our data analysis technologies.
What’s the endgame for the Semantic Web? I’d propose that it is a web where any information you input is immediately cleaned up, pre-structured and pre-connected to the rest. There is a variant of this vision that would see any information input remaining in its raw format until one needs it, at which point it is structured and connected on the fly, using the perspective of the person who queried to shape the structure and the connections. The problem with this vision obviously is that unless you have scouting agents that can query the whole web instantaneously for every query, and structure and link data on the fly -- and I think we can safely say that’s not going to happen anytime soon -- you need some pre-defined structure and connections so you know what the information is about and where it is. We need to meet those agents half-way. How much you process data ahead of time depends on the information’s intended use, type, importance, and other factor. But ultimately, it needs some level of structuring to be smarter. Linked Data is the medium, a medium that will be fed through Data Structuring, and a medium that will motivate us to invest further in those technologies, and fulfill much more of their potential at last.
If you’re interested in text analysis, I invite you to listen to the Interviews with Innovators podcast by Jon Udell with Seth Grimes, the Founder of a company called Alta Plana, on Business Analytics. Absolutely mind-opening.
The need for more effective data structuring technologies is nothing new, but this time, whoever gets it right and ties it together in a nice application that leverages the open channel created by Linked Data (if it can be made to scale to web proportions), could well be on its way to dominating a new sort of web. The key is to turn unstructured data into triples that make sense. By itself, Linked Data is unlikely to be a source of blockbuster application: but the newly-created ability to link data in more flexible ways will act as an echo chamber to other technologies, giving them a much larger market and amplifying their success. If you want to be successful, consider mashing up Linked Data with other technologies.
For now, the best is yet to come for Web 3.0, the Semantic Web, Linked Data and Data Structuring technologies: once again, Tim Berners-Lee was ahead of the curve when he said the Semantic Web was open for business. Let’s just say the Semantic Web is open, and business is welcome.
Recent Comments