Orchestrating streams of data from across the Internet

The liveblog was a revelation for us at the Guardian. The sports desk had been doing them for years experimenting with different styles, methods and tone. And then about 3 years ago the news desk started using them liberally to great effect.

I think it was Matt Wells who suggested that perhaps the liveblog was *the* network-native format for news. I think that’s nearly right…though it’s less the ‘format’ of a liveblog than the activity powering the page that demonstrates where news editing in a networked world is going.

It’s about orchestrating the streams of data flowing across the Internet into a compelling use in one form or another. One way to render that data is the liveblog. Another is a map with placemarks. Another is a RSS feed. A stream of tweets. Storify. Etc.

I’m not talking about Big Data for news. There is certainly a very hairy challenge in big data investigations and intelligent data visualizations to give meaning to complex statistics and databases. But this is different.

I’m talking about telling stories by playing DJ to the beat of human observation pumping across the network.

We’re working on one such experiment with a location-tagging tool we call FeedWax. It creates location-aware streams of data for you by looking across various media sources including Twitter, Instagram, YouTube, Google News, Daylife, etc.

The idea with FeedWax is to unify various types of data through shared contexts, beginning with location. These sources may only have a keyword to join them up or perhaps nothing at all, but when you add location they may begin sharing important meaning and relevance. The context of space and time is natural connective tissue, particularly when the words people use to describe something may vary.

We’ve been conducting experiments in orchestrated stream-based and map-based storytelling on n0tice for a while now. When you start crafting the inputs with tools like FeedWax you have what feels like a more frictionless mechanism for steering the flood of data that comes across Twitter, Instagram, Flickr, etc. into something interesting.

For example, when the space shuttle Endeavour flew its last flight and subsequently labored through the streets of LA there was no shortage of coverage from on-the-ground citizen reporters. I’d bet not one of them considered themselves a citizen reporter. They were just trying to get a photo of this awesome sight and share it, perhaps getting some acknowledgement in the process.

You can see the stream of images and tweets here: http://n0tice.com/search?q=endeavor+OR+endeavour. And you can see them all plotted on a map here: http://goo.gl/maps/osh8T.

Interestingly, the location of the photos gives you a very clear picture of the flight path. This is crowdmapping without requiring that anyone do anything they wouldn’t already do. It’s orchestrating streams that already exist.

This behavior isn’t exclusive to on-the-ground reporting. I’ve got a list of similar types of activities in a blog post here which includes task-based reporting like the search for computer scientist Jim Gray, the use of Ushahidi during the Haiti earthquake, the Guardian’s MPs Expenses project, etc. It’s also interesting to see how people like Jon Udell approach this problem with other data streams out there such as event and venue calendars.

Sometimes people refer to the art of code and code-as-art. What I see in my mind when I hear people say that is a giant global canvas in the form of a connected network, rivers of different colored paints in the form of data streams, and a range of paint brushes and paint strokes in the form of software and hardware.

The savvy editors in today’s world are learning from and working with these artists, using their tools and techniques to tease out the right mix of streams to tell stories that people care about. There’s no lack of material or tools to work with. Becoming network-native sometimes just means looking at the world through a different lens.

Mobilising the web of feeds

I wrote this piece for the Guardian’s Media Network on the role that RSS could play now that the social platforms are becoming more difficult to work with. GeoRSS, in particular, has a lot of potential given the mobile device explosion. I’m not suggesting necessarily that RSS is the answer, but it is something that a lot of people already understand and could help unify the discussion around sharing geotagged information feeds.


Powered by Guardian.co.ukThis article titled “Mobilising the web of feeds” was written by Matt McAlister, for theguardian.com on Monday 10th September 2012 16.43 UTC

While the news that Twitter will no longer support RSS was not really surprising, it was a bit annoying. It served as yet another reminder that the Twitter-as-open-message-utility idea that many early adopters of the service loved was in fact going away.

There are already several projects intending to disrupt Twitter, mostly focused on the idea of a distributed, federated messaging standard and/or platform. But we already have such a service: an open standard adopted by millions of sources; a federated network of all kinds of interesting, useful and entertaining data feeds published in real-time. It’s called RSS.

There was a time when nearly every website was RSS-enabled, and a cacophony of Silicon Valley startups fought to own pieces of this new landscape, hoping to find dotcom gold. But RSS didn’t lead to gold, and most people stopped doing anything with it.

Nobody found an effective advertising or service model (except, ironically, Dick Costolo, CEO of Twitter, who sold Feedburner to Google). The end-user market for RSS reading never took off. Media organisations didn’t fully buy into it, and the standard took a backseat to more robust technologies.

Twitter is still very open in many ways and encourages technology partners to use the Twitter API. That model gives the company much more control over who is able to use tweets outside of the Twitter owned apps, and it’s a more obvious commercial strategy that many have been asking Twitter to work on for a long time now.

But I think we’ve all made a mistake in the media world by turning our backs on RSS. It’s understandable why it happened. But hopefully those who rejected RSS in the past will see the signals demonstrating that an open feed network is a sensible thing to embrace today.

Let’s zoom out for context first. Looking at the macro trends in the internet’s evolution, we can see one or two clear winners as more information and more people appeared on the network in waves over the last 15 years.

Following the initial explosion of new domains, Yahoo! solved the need to surface only the websites that mattered through browsing. The Yahoo! directory became saturated, so Google then surfaced pages that mattered within those websites through searches. Google became saturated, so Facebook and Twitter surfaced things that mattered that live on the webpages within those web sites through connecting with people.

Now that the social filter is saturated, what will be used next to surface things that matter out of all the noise? The answer is location. It is well understood technically. The software-hardware-service stack is done. The user experience is great. We’re already there, right?

No – most media organisations still haven’t caught up yet. There’s a ton of information not yet optimised for this new view of the world and much more yet to be created. This is just the beginning.

Do we want a single platform to be created that catalyses the location filter of the internet and mediates who sees what and when? Or do we want to secure forever a neutral environment where all can participate openly and equally?

If the first option happens, as historically has been the case, then I hope that position is taken by a force that exists because of and reliant on the second option.

What can a media company do to help make that happen? The answer is to mobilise your feeds. As a publisher, being part of the wider network used to mean having a website on a domain that Yahoo! could categorise. Then it meant having webpages on that website optimised for search terms people were using to find things via Google. And more recently it has meant providing sharing hooks that can spread things from those pages on that site from person to person.

Being part of the wider network today suddenly means all of those things above, and, additionally, being location-enabled for location-aware services.

It doesn’t just mean offering a location-specific version of your brand, though that is certainly an important thing to do as well. The major dotcoms use this strategy increasingly across their portfolios, and I’m surprised more publishers don’t do this.

More importantly, though, and this is where it matters in the long run, it means offering location-enabled feeds that everyone can use in order to be relevant in all mobile clients, applications and utilities.

Entrepreneurs are all over this space already. Pure-play location-based apps can be interesting, but many feel very shallow without useful information. The iTunes store is full of travel apps, reference apps, news, sports, utilities and so on that are location-aware, but they are missing some of the depth that you can get on blogs and larger publishers’ sites. They need your feeds.

Some folks have been experimenting in some very interesting ways that demonstrate what is possible with location-enabled feeds. Several mobile services, such as FlipBoard, Pulse and now Prismatic, have really nice and very popular mobile reading apps that all pull RSS feeds, and they are well placed to turn those into location-based news services.

Perhaps a more instructive example of the potential is the augmented reality app hypARlocal at Talk About Local. They are getting location-aware content out of geoRSS feeds published by hyperlocal bloggers around the UK and the citizen journalism platform n0tice.com.

But it’s not just the entrepreneurs that want your location-enabled feeds. Google Now for Android notifies you of local weather and sports scores along with bus times and other local data, and Google Glasses will be dependent on quality location-specific data as well.

Of course, the innovations come with new revenue models that could get big for media organisations. They include direct, advertising, and syndication models, to name a few, but have a look at some of the startups in the rather dense ‘location‘ category on Crunchbase to find commercial innovations too.

Again, this isn’t a new space. Not only has the location stack been well formed, but there are also a number of bloggers who have been evangelising location feeds for years. They already use WordPress, which automatically pumps out RSS. And many of them also geotag their posts today using one of the many useful WordPress mapping plugins.

It would take very little to reinvigorate a movement around open location-based feeds. I wouldn’t be surprised to see Google prioritising geotagged posts in search results, for example. That would probably make Google’s search on mobile devices much more compelling, anyhow.

Many publishers and app developers, large and small, have complained that the social platforms are breaking their promises and closing down access, becoming enemies of the open internet and being difficult to work with. The federated messaging network is being killed off, they say. Maybe it’s just now being born.

Media organisations need to look again at RSS, open APIs, geotagging, open licensing, and better ways of collaborating. You may have abandoned it in the past, but RSS would have you back in a heartbeat. And if RSS is insufficient then any location-aware API standard could be the meeting place where we rebuild the open internet together.

It won’t solve all your problems, but it could certainly solve a few, including new revenue streams. And it’s conceivable that critical mass around open location-based feeds would mean that the internet becomes a stronger force for us all, protected from nascent platforms whose their future selves may not share the same vision that got them off the ground in the first place.

To get more articles like this sent direct to your inbox, sign up for free membership to the Guardian Media Network. This content is brought to you by Guardian Professional.

guardian.co.uk © Guardian News & Media Limited 2010

Published via the Guardian News Feed plugin for WordPress.

Local community data reporting

EveryBlock has taken a very data intensive look at local news reporting. As founder Adrain Holovaty explains:

“An overall goal of EveryBlock is to point you to news near your block. We’ve been working hard to do a good job of this so far by accumulating public records, cataloging newspaper stories and pulling together various other geographic information from the Web.”

This generally takes the form of raw data points placed on maps. They recently rolled out a variation on the theme by using topic-specific data which adds more context to the local news reporting idea.

“A week or so ago, 15 people were arrested on bribery charges as part of a federal probe into corruption in Chicago city government. We’ve analyzed U.S. Attorney Patrick J. Fitzgerald’s complaint documents and cataloged the specific addresses mentioned within. On the project’s front page, you can view every location we found, along with a relevant excerpt from the complaint. You can sort this data in various ways, including a list and map of all the alleged bribe locations.”

This is the type of value that’s otherwise kind of missing from the experience. Rather than providing a mostly pure research tool, the site now gives some insight and perspective with an editorial view on the data. In this case, the data is telling a story that otherwise might seem a little distant to you until you see how the issue may in fact be a very real one right in your backyard, so to speak.

But it occurred to me that the community is probably even better able to capture and share this level of useful insight. It would be really neat to see EveryBlock open the reporting and mapping process so that anyone who has an interest in exposing the trends in their neighborhood or elsewhere had a platform to do so.

Average payment (€) by Area
Similar to the way Swivel allows you to collect data in spreadsheet form, visualize it and then share it the way Flickr and YouTube allow you to share, EveryBlock could provide an environment for individuals to do the reporting in their neighborhood that matters to them. The wider community could then benefit from the work of a few, and suddenly you have a really powerful local news vehicle.

This isn’t necessarily in contrast to the approach Outside.in has taken by aggregating shared information from around the web, but it certainly puts some structure around it in a way that may be necessary.

Managing a community is a very different problem than aggregating and presenting useful local data. But I wonder if it’s a necessary next step to get both of these fledgling but very forward-thinking local media services closer to critical mass.

Local news is going the wrong way

Google’s new Local News offering misses the point entirely.

As Chris Tolles points out, Topix.net and others have been doing exactly this for years. Agregating information at the hyperlocal level isn’t just about geotagging information sources. Chris explains why they added forums:

“…there wasn’t enough coverage by the mainstream or the blogosphere…the real opportunity was to become a place for people to publish commentary and stories.”

He shouldn’t worry about Google, though. He should worry more about startups like Outside.in who upped the ante by adding a slightly more social and definitely more organic experience to the idea of aggregating local information.

Yet information aggregation still only dances around the real issue.

People want to know what and who are around them right now.

The first service that really nails how we identify and surface the things that matter to us when and where we want to know about them is going to break ground in a way we’ve never seen before on the Internet.

We’re getting closer and closer to being able to connect the 4 W’s: Who, What, Where and When. But those things aren’t yet connecting to expose value to people.

I think a lot of people are still too focused on how to aggregate and present data to people. They expect people to do the work of knowing what they’re looking for, diving into a web page to find it and then consuming what they’ve worked to find.

There’s a better way. When services start mixing and syndicating useful data from the 4 W vectors then we’ll start seeing information come to people instead.

And there’s no doubt that big money will flow with it.

Dave Winer intuitively noted, “Advertising will get more and more targeted until it disappears, because perfectly targeted advertising is just information. And that’s good!”

I like that vision, but there’s more to it.

When someone connects the way information surfaces for people and the transactions that become possible as a result, a big new world is going to emerge.

The useful convergence of data

I have only one prediction for 2008. I think we’re finally about to see the useful combination of the 4 W’s – Who, What, Where, and When.

Marc Davis has done some interesting research in this area at Yahoo!, and Bradley Horowitz articulated how he sees the future of this space unfolding in a BBC article in June ’07:

“We do a great job as a culture of “when”. Using GMT I can say this particular moment in time and we have a great consensus about what that means…We also do a very good job of “where” – with GPS we have latitude and longitude and can specify a precise location on the planet…The remaining two Ws – we are not doing a great job of.”

I’d argue that the social networks are now really honing in on “who”, and despite having few open standards for “what” data (other than UPC) there is no shortage of “what” data amongst all the “what” providers. Every product vendor has their own version of a product identifier or serial number (such as Amazon’s ASIN, for example).

We’ve seen a lot of online services solving problems in these areas either by isolating specific pieces of data or combining the data in specific ways. But nobody has yet integrated all 4 in a meaningful way.


Jeff Jarvis’ insightful post on social airlines starts to show how these concepts might form in all kinds of markets. When you’re traveling it makes a lot of sense to tap into “who” data to create compelling experiences that will benefit everyone:

  • At the simplest level, we could connect while in the air to set up shared cab rides once we land, saving passengers a fortune.
  • We can ask our fellow passengers who live in or frequently visit a destination for their recommendations for restaurants, things to do, ways to get around.
  • We can play games.
  • What if you chose to fly on one airline vs. another because you knew and liked the people better? What if the airline’s brand became its passengers?
  • Imagine if on this onboard social network, you could find people you want to meet – people in the same business going to the same conference, people of similar interests, future husbands and wives – and you can rendezvous in the lounge.
  • The airline can set up an auction marketplace for at least some of the seats: What’s it worth for you to fly to Berlin next Wednesday?

Carrying the theme to retail markets, you can imagine that you will walk into H&M and discover that one of your first-degree contacts recently bought the same shirt you were about to purchase. You buy a different one instead. Or people who usually buy the same hair conditioner as you at the Walgreen’s you’re in now are switching to a different hair conditioner this month. Though this wouldn’t help someone like me who has no hair to condition.

Similarly, you can imagine that marketing messages could actually become useful in addition to being relevant. If CostCo would tell me which of the products I often buy are on sale as I’m shopping, or which of the products I’m likely to need given what they know about how much I buy of what and when, then my loyalty there is going to shoot through the roof. They may even be able to identify that I’m likely buying milk elsewhere and give me a one-time coupon for CostCo milk.

Bradley sees it playing out on the phone, too:

“On my phone I see prices for a can of soup in my neighbourhood. It resolves not only that particular can of soup but knows who I am, where I am and where I live and helps me make an intelligent decision about whether or not it is a fair price.

It has to be transparent and it has to be easy because I am not going to invest a lot of effort or time to save 13 cents.”

It may be unrealistic to expect that this trend will explode in 2008, but I expect it to at least appear in a number of places and inspire future implementations as a result. What I’m sure we will see in 2008 is dramatic growth in the behind-the-scenes work that will make this happen, such as the development and customization of CRM-like systems.

Lots of companies have danced around these ideas for years, but I think the ideas and the technologies are finally ready to create something real, something very powerful.

Photo: SophieMuc

Why Outside.in may have the local solution

The recent blog frenzy over hyperlocal media inspired me to have a look at Outside.in again.


It’s not just the high profile backers and the intense competitive set that make Outside.in worth a second look. There’s something very compelling in the way they are connecting data that seems like it matters.

My initial thought when it launched was that this idea had been done before too many times already. Topix.net appeared to be a dominant player in the local news space, not to mention similar but different kinds of local efforts at startups like Yelp and amongst all the big dotcoms.

And even from their strong position, Topix’s location-based news media aggregaton model was kind of, I don’t know, uninteresting. I’m not impressed with local media coverage these days, in general, so why would an aggregator of mediocre coverage be any more interesting than what I discover through my RSS reader?

But I think Outside.in starts to give some insight into how local media could be done right…how it could be more interesting and, more importantly, useful.

The light triggered for me when I read Jon Udell’s post on “the data finds the data”. He explains how data can be a vector through which otherwise unrelated people meet eachother, a theme that continues to resonate for me.

Media brands have traditionally been good at connecting the masses to eachother and to marketers. But the expectation of how directly people feel connected to other individuals by the media they share has changed.

Whereas the brand once provided a vector for connections, data has become the vehicle for people to meet people now. Zip code, for example, enables people to find people. So does marital status, date and time, school, music taste, work history. There are tons of data points that enable direct human-to-human discovery and interaction in ways that media brands could only accomplish in abstract ways in the past.

URLs can enable connections, too. Jon goes on to explain:

“On June 17 I bookmarked this item from Mike Caulfield… On June 19 I noticed that Jim Groom had responded to Mike’s post. Ten days later I noticed that Mike had become Jim’s new favorite blogger.

I don’t know whether Jim subscribes to my bookmark feed or not, but if he does, that would be the likely vector for this nice bit of manufactured serendipity. I’d been wanting to introduce Mike at KSC to Jim (and his innovative team) at UMW. It would be delightful to have accomplished that introduction by simply publishing a bookmark.”

Now, Outside.in allows me to post URLs much like one would do in Newsvine or Digg any number of other collaborative citizen media services. But Outside.in leverages the zip code data point as the topical vector rather than a set of predetermined one-size-fits-all categories. It then allows miscellaneous tagging to be the subservient navigational pivot.

Suddenly, I feel like I can have a real impact on the site if I submit something. If there’s anything near a critical mass of people in the 94107 zip code on Outside.in then it’s likely my neighbors will be influenced by my posts.

Fred Wilson of Union Square Ventures explains:

“They’ve built a platform that placebloggers can submit their content to. Their platform “tags” that content with a geocode — an address, zip code, or city — and that renders a new page for every location that has tagged content. If you visit outside.in/10010, you’ll find out what’s going on in the neigborhood around Union Square Ventures. If you visit outside.in/back_bay, you’ll see what’s going on in Boston’s Back Bay neighborhood.”

Again, the local online media model isn’t new. In fact, it’s old. CitySearch in the US and UpMyStreet in the UK proved years ago that a market does in fact exist in local media somehwere somehow, but the market always feels fragile and susceptible to ghost town syndrome.

Umair Haque explains why local is so hard:

“Why doesn’t Craigslist choose small towns? Because there isn’t enough liquidity in the market. Let me put that another way. In cities, there are enough buyers and sellers to make markets work – whether of used stuff, new stuff, events, etc, etc.

In smaller towns, there just isn’t enough supply or demand.”

If they commit to building essentially micro media brands based exclusively on location I suspect Outside.in will run itself into the ground spending money to establish critical mass in every neighborhood around the world.

Now that they have a nice micro media approach that seems to work they may need to start thinking about macro media. In order to reach the deep dark corners of the physical grid, they should connect people in larger contexts, too. Here’s an example of what I mean…

I’m remodeling the Potrero Hill shack we call a house right now. It’s all I talk about outside of work, actually. And I need to understand things like how to design a kitchen, ways to work through building permits, and who can supply materials and services locally for this job.

There must be kitchen design experts around the world I can learn from. Equally, I’m sure there is a guy around the corner from me who can give me some tips on local services. Will Architectural Digest or Home & Garden connect me to these different people? No. Will The San Francisco Chronicle connect us? No.

Craigslist won’t even connect us, because that site is so much about the transaction.

I need help both from people who can connect on my interest vector in addition to the more local geographic vector. Without fluid connections on both vectors, I’m no better off than I was with my handy RSS reader and my favorite search engine.

Looking at how they’ve decided to structure their data, it seems Outside.in could pull this off and connect my global affinities with my local activities pretty easily.

This post is way too long already (sorry), but it’s worth pointing out some of the other interesting things they’re doing if you care to read on.

Outside.in is also building automatic semantic links with the contributors’ own blogs. By including my zip code in a blog post, Outside.in automatically drinks up that post and adds it into the pool. They even re-tag my post with the correct geodata and offer GeoRSS feeds back out to the world.

Here are the instructions:

“Any piece of content that is tagged with a zip code will be assigned to the corresponding area within outside.in’s system. You can include the zip code as either a tag or a category, depending on your blogging platform.”

I love this.

30Boxes does something similar where I can tell it to collect my Upcoming data, and it automatically imports events as I tag them in Upcoming.

They are also recognizing local contributors and shining light on them with prominant links. I can see who the key bloggers are in my area and perhaps even get a sense of which ones matter, not just who posts the most. I’m guessing they will apply the “people who like this contributor also like this contributor” type of logic to personalize the experience for visitors at some point.

Now what gets me really excited is to think about the ad model that could happen in this environment of machine-driven semantic relationships.

If they can identify relevant blog posts from local contributors, then I’m sure they could identify local coupons from good sources of coupon feeds.

Let’s say I’m the national Ace Hardware marketing guy, and I publish a feed of coupons. I might be able to empower all my local Ace franchises and affiliates to publish their own coupons for their own areas and get highly relevant distribution on Outside.in. Or I could also run a national coupon feed with zip code tags cooked into each item.

To Umair’s point, that kind of marketing will only pay off in major metros where the markets are stronger.

To help address the inventory problem, Outside.in could then offer to sell ad inventory on their contributors’ web sites. As an Outside.in contributor, I would happily run Center Hardware coupons, my local Ace affiliate, on my blog posts that talk about my remodelling project if someone gave them to me in some automated way.

If they do something like this then they will be able to serve both the major metros and the smaller hot spots that you can never predict will grow. Plus, the incentives for the individuals in the smaller communities start feeding the wider ecosystem that lives on the Outside.in platform.

Outside.in would be pushing leverage out to the edge both in terms of participation as they already do and in terms of revenue generation, a fantastic combination of forces that few media companies have figured out, yet.

I realize there are lots of ‘what ifs’ in this assessment. The company has a lot of work to do before they breakthrough, and none of it is easy. The good news for them is that they have something pretty solid that works today despite a crowded market.

Regardless, knowing Fred Wilson, Esther Dyson, John Seely Brown and Steven Berlin Johnson are behind it, among others, no doubt they are going to be one to watch.