Why does Thomson-Reuters, the multinational behind open-data advocate OpenCalais, sue Zotero over data reuse?
I've just recently become aware of this lawsuit, served to Zotero by Thomson-Reuters (TR), the corporation behind OpenCalais. What I read on the web seems to generally favor Zotero, placing the issue in the context of the Open vs. Proprietary debate.
TR presents Endnote as "the industry standard software tools for publishing and managing bibliographies". It appears to include a text extraction software extracting quotes and creating metadata on publications. As I understand it, TR is trying to prevent Zotero from reading Endnote data format. It asserts that Zotero illegally conducted reverse-engineering of its proprietary algorithms. Zotero, on the other end, frames the issue as being one of giving users the freedom to convert their Endnote information for use in Zotero. Obviously, what is at stake in this case is the Endnote user base, which TR is trying to retain and Zotero is vying to attract (and it's happening! See this blog for example).
In spite of the obvious financial motivation, I was surprised to learn that TR might want to "impede data interoperability", as it was presented in many write-ups on the issue: that would be more than a little paradoxical, given the push towards Open Data that it is advocating through its OpenCalais service. So I ran my own little enquiry, visiting the sites of both companies and reading what they had to say on the issue.
Of course, I also wondered how Thomas Tague at OpenCalais felt about all that, after having often stated that corporations needed to open their data, so I asked him. He told me as expected that he was not authorized to comment on this.
Ultimately, and without making conjectures on whether reverse-engineering took place here, the question to me came down to whether one should be allowed to reverse-engineer algorithms to force companies to open up their data? In a nutshell, that seems wrong. If the company wants to keep its data close, it should be able to. Ultimately, I'd assume it would open it up and instead focus its efforts on creating a better product, so consumers are retained because they like the product, not because they can't move elsewhere. But even that doesn't matter. The company invested to create a format, it was successful at it, and it should be able to collect the dividends of its efforts, at least for a period of time, just like composers and writers do.
Often, at first sight, the proprietary vs. open seems like a no-brainer: open
it all! But things are not so simple. To drive innovation, investments are
necessary. If companies can't profit from the creation of new
algorithms, because reverse-engineering it and copying it are allowed,
they simply won't invest in that space anymore. Then who will? Open
source developers might, but they need to be supported financially too,
somehow. How?
Having said that, allow me to expand the conversation to include other open vs. proprietary questions, that build but do not touch directly upon the TR vs. Zotero case.
In particular, should data and metadata be open or proprietary?
I see this case as another sign that, as I pointed out before, semantic web companies are going to wrestle for control of both the metadata and the software methods that extract it. They are going to wrestle between themselves, and they are also going to wrestle along the shifting frontier of the proprietary vs. open camps. It already is a strategic issue for the web that is, and it will simply intensify, becoming the key battleground for the web to come.
Although many say that a company can generate more money by opening up its data, than it would by keeping it exclusive, it remains to be seen which business models will make that possible. Ultimately, especially in professional markets that are ready to pay (law, tax, medical research, finance etc) but don't generate massive web traffic, selling the metadata may be a more profitable business than giving it away and trying to make money indirectly, through ads for instance.
Some might decide to push that one step further, by playing a negative brand awareness game. They would use the metadata created through what ultimately was an R&D effort funded by another company, without giving anything back to that company. They would feel entitled to doing so because the originating company would derive negative brand awareness from refusing. And frankly they may not think about all this, because reusing information this way has become a prevalent way of operating on the web.
Well, as Zotero just found out, the only data that's free is the one that's commoditized. Ultimately, data is never free to create, and very little of the data on the web was created without a monetary endgoal, be it direct or indirect (Wikipedia might look like a counterexample, but even then, most of it was created by people trying to establish or maintain a reputation in a space, reputation that they ultimately convert into more tangible benefits). Open data is data whose ROI is maximized when the data is open. There still is a lot of data whose ROI is maximized when that data is close. Want to see more open data? Create business models that make it more financially rewarding to open that data up.
Advocating for open standards is one thing, but forcing a proprietary environment to open up is
another one entirely. The journal Nature recently commented on the case: “The virtues of interoperability and easy data-sharing among researchers are worth restating.” But so are the virtues of profit-driven innovation.
Most of us desire to move towards more open and democratic standards, but to get there, the best driver would be to create new data monetization mechanisms, that further incentivizes the creation of quality content.
In a constructive attempt and at the risk of being wrong, I predict that a trailing revenue mechanism
is going to emerge, that will let authors of content and metadata
collect a share of the revenue generated by "remix" and "mashup"
services. It will make it easier to reuse bits and pieces of premium
content. Currently, the main currency provided for reusing these bits
and pieces is brand awareness, which only sometimes turn into
dollars-and-cents for the initial content creator. Although that may be enough for many authors on the
web, it pretty much prevents the producers of expensive quality content
from opening their data up. Open standards will blossom where they help create proprietary wealth.
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=610dfa65-78c8-4724-b5ae-b7c5e4cd2842)

![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=d8718136-633e-4d21-bea4-a6ae013e39e0)

![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=55f850f5-aac6-40ec-8f42-3073709d3263)

Recent Comments