The semantic web folks, including Sir Tim Berners-Lee, have been saying for years that the Internet could become significantly more compelling by cooking more intelligence into the way things link around the network.
The movement is getting some legs to it these days, but the solution doesn’t look quite like what the visionaries expected it to look like. It’s starting to look more human.
The more obvious journey toward a linked data world starts with releasing data publicly on the Internet.
Many startups have proven that opening data creates opportunity. And now the trend has turned into a movement within government in the US, the UK and many other countries.
Sir Tim Berners-Lee drove home this message at his 2009 TED talk where he got the audience to shout “Raw data now!”:
“Before you make a beautiful web site, first give us the unadulterated data. You have no idea the number excuses people come up with to hang on to their data and not give it to you even though you’ve paid for it as a taxpayer.”
Openness makes you more relevant. It creates opportunity. It’s a way into people’s hearts and minds. It’s empowering. It’s not hard to do. And once it starts happening it becomes apparent that it mustn’t and often can’t stop happening.
The forward-thinking investors and politicians even understand that openness is fuel for new economies in the future.
We held a sort of hack day type event at the Guardian for the Cabinet Office recently where the benefits to open data in government were catalyzed in the form of a postcode newspaper built together by Tom Taylor, Gavin Bell and Dan Catt:
“It’s a prototype of a service for people moving into a new area. It gathers information about your area, such as local services, environmental information and crime statistics.”
Opening data is making government matter more to people. That’s great, but it’s just the beginning.
After openness, the next step is to work on making data discoverable. The basic unit for creating discoverability for content on a network is the link.
Now, the hyperlink of today simply says, “there’s a thing called X which you can find over there at address Y.”
The linked data idea is basically to put more data in and around links to things in a specific structure that matches our language:
subject -> predicate -> object
This makes a lot of sense. Rather than derive meaning, explicit relationship data can eliminate vast amounts of noise around information that we care about.
However, there are other ways to add meaning into the network, too. We can also create and derive meaning across a network of linked data with short messages, as we’ve seen happening organically via Twitter.
What do we often write when we post to Twitter?
@friend said or saw or did this interesting thing over here http://website.com/blah
The subject
is a link to a person. The predicate
is the verb connecting the person and the object. And the object
is a link to a document on the Internet.
Twitter is already a massive linked data cloud.
It’s not organized and structured like the links in HTML and the semantic triple format RDF. Rather it is verbose connectivity, a human-readable statement pointing to things and loosely defining what the links mean.
So, now it starts to look like we have some opposing philosophies around linked data. And neither is a good enough answer to Tim Berners-Lee’s vision.
Short messages lack standard ways of explicitly declaring meaning within links. They are often transient ideas that have no links at all. They create a ton of noise. Subjectivity rules. Short messages can’t identify or map to collections of specific data points within a data set. The variey of ways links are expressed is vast and unmanageable.
The semantic web vision seems like a far away place if its dependent on whether or not an individual happens to create a semantic link.
But a structural overhaul isn’t a much better answer. In many ways, RDF means we will have to rewrite the entire web to support the new standard. The standard is complicated. Trillions of links will have to obtain context that they don’t have today. Documents will compete for position within the linked data chain. We will forever be reidenitfying meaning in content as language changes and evolves. Big software will be required to create and manage links.
The issue isn’t about one model versus another. As people found with tags and taxonomies, the two are better when both exist together.
But there’s another approach to the linked data problem being pioneered by companies like MetaWeb who run an open data service called Freebase and Zemanta who analyze text and recommend related links.
The approach here sits comfortably in the middle and interoperates with the extremes. They focus on being completely clear about what a thing is and then helping to facilitate better links.
For example, Freebase has a single ID for everything. There is one ID and one URL that represents Abraham Lincoln:
http://www.freebase.com/view/en/abraham_lincoln
They know that Wikipedia, The New York Times and the Congressional Biography web sites who are all very authoritative on politicians have a single URL representing everything they each know about Abraham Lincoln, too.
So, Freebase maintains a database (in addition to the web site that users can see) that links the authoritative Abraham Lincoln pages on the Internet together.
This network of data resources on Abraham Lincoln becomes richer and more powerful than any single resource about Abraham Lincoln. There is some duplication between each, but each resource is also unique. We know facts about his life, books that are written about him, how people were and still are connected to him, etc.
Of course, explicit relationships become more critical when the context of a word with multiple meanings enters the ecosystem. For example, consider Apple
which is a computing company, a record company, a town, and a fruit.
Once the links in a network are known, then the real magic starts to happen when you mix in the social capabilities of the network.
Because of the relationships inherent in the links, new apps can be built that tell more interesting and relevant stories because they can aggregate data together that is connected.
You can imagine a whole world of forensic historians begging for more linked data. Researchers spend years mapping together events, geographic locations, relationships between people and other facts to understand the past. For example, a company called Six to Start has been working on using Google Maps for interactive historical fiction:
“The Six to Start team decided to literally “map†Cumming’s story, using the small annotation boxes for snippets of text and then illustrating movement of the main character with a blue line. As users click through bits of the story, the blue line traces the protagonist’s trajectory, and the result is a story that is at once text-based but includes a temporal dimension—we watch in real time as movement takes place—as well as an information dimension as the Google tool is, in a sense, hacked for storytelling.”
Similarly, we will eventually have a bridge of links into the physical world. This will happen with devices who have sensors that broadcast and receive short messages. OpenStreetMap will get closer and closer to providing a data-driven representation of the physical world, built collectively by people with GPS devices carefully uploading details of their neighborhoods. You can then imagine that games developers will make the real world itself into a gaming platform based on linked data.
We’ve gotten a taste of this kind of thing with Foursquare. “Foursquare gives you and your friends new ways of exploring your city. Earn points and unlock badges for discovering new things.”
And there’s a fun photo sharing game called Noticin.gs. “Noticings are interesting things that you stumble across when out and about. You play Noticings by uploading your photos to Flickr, tagged with ‘noticings’ and geotagged with where they were taken.”
It’s conceivable that all these forces and some creative engineers will eventually shrink time and space into a massive network of connected things.
But long before some quasi-Matrix-like world exists there will be many dotcom casualties who have benefitted from the existence of friction in finding information. When those challenges go away, so will the business models.
Search, for example, is an amazingly powerful and efficient middleman linking documents off the back of the old school hyperlink, but its utility may fade when the source of a piece of information can hear and respond directly to social signals asking for it somewhere in the world.
It’s all pointing to a frictionlessness information network, sometimes organized, sometimes totally chaotic.
It wasn’t long ago I worried the semantic web had already failed, but I’ve begun to wonder if in fact Tim Berners-Lee’s larger vision is going to happen just in a slightly different way than most people thought it would.
Now that linked data is happening on a more grassroots level in addition to the standards-driven approach I’m starting to believe that a world of linked data is actually possible if not closer than it might appear.
Again, his TED talk has some simple but important ideas that perhaps need to be revisited:
Paraphrasing: “Data is about our lives – a relationship with a friend, the name of a person in a photograph, the hotel I want to stay in on my holiday. Scientists study problems and collect vast amounts of data. They are understanding economies, disease and how the world works.
A lot of the knowledge of the human race is in databases sitting on computers. Linking documents has been fun, but linking data is going to be much bigger.”
Related articles by Zemanta
- New York Times Iterates Linked Open Data (techstartups.com)
- Linking Open Data: An Emerging Practice Area for the Semantic Web (phaneron.rickmurphy.org)
- Linked Data and the Semantic Web: What Are They and Should I Care? (slideshare.net)