Orchestrating streams of data from across the Internet

The liveblog was a revelation for us at the Guardian. The sports desk had been doing them for years experimenting with different styles, methods and tone. And then about 3 years ago the news desk started using them liberally to great effect.

I think it was Matt Wells who suggested that perhaps the liveblog was *the* network-native format for news. I think that’s nearly right…though it’s less the ‘format’ of a liveblog than the activity powering the page that demonstrates where news editing in a networked world is going.

It’s about orchestrating the streams of data flowing across the Internet into a compelling use in one form or another. One way to render that data is the liveblog. Another is a map with placemarks. Another is a RSS feed. A stream of tweets. Storify. Etc.

I’m not talking about Big Data for news. There is certainly a very hairy challenge in big data investigations and intelligent data visualizations to give meaning to complex statistics and databases. But this is different.

I’m talking about telling stories by playing DJ to the beat of human observation pumping across the network.

We’re working on one such experiment with a location-tagging tool we call FeedWax. It creates location-aware streams of data for you by looking across various media sources including Twitter, Instagram, YouTube, Google News, Daylife, etc.

The idea with FeedWax is to unify various types of data through shared contexts, beginning with location. These sources may only have a keyword to join them up or perhaps nothing at all, but when you add location they may begin sharing important meaning and relevance. The context of space and time is natural connective tissue, particularly when the words people use to describe something may vary.

We’ve been conducting experiments in orchestrated stream-based and map-based storytelling on n0tice for a while now. When you start crafting the inputs with tools like FeedWax you have what feels like a more frictionless mechanism for steering the flood of data that comes across Twitter, Instagram, Flickr, etc. into something interesting.

For example, when the space shuttle Endeavour flew its last flight and subsequently labored through the streets of LA there was no shortage of coverage from on-the-ground citizen reporters. I’d bet not one of them considered themselves a citizen reporter. They were just trying to get a photo of this awesome sight and share it, perhaps getting some acknowledgement in the process.

You can see the stream of images and tweets here: http://n0tice.com/search?q=endeavor+OR+endeavour. And you can see them all plotted on a map here: http://goo.gl/maps/osh8T.

Interestingly, the location of the photos gives you a very clear picture of the flight path. This is crowdmapping without requiring that anyone do anything they wouldn’t already do. It’s orchestrating streams that already exist.

This behavior isn’t exclusive to on-the-ground reporting. I’ve got a list of similar types of activities in a blog post here which includes task-based reporting like the search for computer scientist Jim Gray, the use of Ushahidi during the Haiti earthquake, the Guardian’s MPs Expenses project, etc. It’s also interesting to see how people like Jon Udell approach this problem with other data streams out there such as event and venue calendars.

Sometimes people refer to the art of code and code-as-art. What I see in my mind when I hear people say that is a giant global canvas in the form of a connected network, rivers of different colored paints in the form of data streams, and a range of paint brushes and paint strokes in the form of software and hardware.

The savvy editors in today’s world are learning from and working with these artists, using their tools and techniques to tease out the right mix of streams to tell stories that people care about. There’s no lack of material or tools to work with. Becoming network-native sometimes just means looking at the world through a different lens.

Calling your web site a ‘property’ deprives it of something bigger

BBC offered another history of London documentary the other night, a sort of people’s perspective on how the character of the city has changed over time, obviously inspired by Danny Boyle’s Opening Ceremony at the Olympics.

Some of the sequences were interesting to me particularly as a foreigner – the gentrification of Islington, the anarchist squatters in Camden, the urbanization of the Docklands, etc.  – a running theme of haves vs have-nots.

It’s one of a collection of things inspiring me recently including a book called ‘The Return of the Public‘ by Dan Hind, a sort of extension to the Dewey v Lippman debates, what’s going on with n0tice, such as Sarah Hartley’s adaptation for it called Protest Near You and the dispatch-o-rama hack, and, of course, the Olympics.

I’m becoming reinvigorated and more bullish on where collective action can take us.

At a more macro level these things remind me of the need to challenge the many human constructs and institutions that are reflections of the natural desire to claim things and own them.

Why is it so difficult to embrace a more ‘share and share alike’ attitude?  This is as true for children and their toys as it is for governments and their policies.

The bigger concern for me, of course, is the future of the Internet and how media and journalism thrive and evolve there.

Despite attempts by its founders to shape the Internet so it can’t be owned and controlled, there are many who have tried to change that both intentionally and unwittingly, occasionally with considerable success.

How does this happen?

We’re all complicit.  We buy a domain. We then own it and build a web site on it. That “property” then becomes a thing we use to make money.  We fight to get people there and sell them things when they arrive.  It’s the Internet-as-retailer or Internet-as-distributor view of the world.

That’s how business on the Internet works…or is it?

While many have made that model work for them, it’s my belief that the property model is never going to be as important or meaningful or possibly as lucrative as the platform or service model over time. More specifically, I’m talking about generative media networks.

Here are a few different ways of visualizing this shift in perspective (more):

Even if it works commercially, the property model is always going to be in conflict with the Internet-as-public-utility view of the world.

Much like Britain’s privately owned public spaces issue, many worry that the Internet-as-public-utility will be ruined or, worse, taken from us over time by commercial and government interests.

Playing a zero sum game like that turns everyone and everything into a threat.  Companies can be very effective at fighting and defending their interests even if the people within those companies mean well.

I’m an optimist in this regard.  There may be a pendulum that swings between “own” and “share”, and there are always going to be fights to secure public spaces.  But you can’t put the Internet genie back in the bottle.  And even if you could it would appear somewhere else in another form just as quickly…in some ways it already has.

The smart money, in my mind, is where many interests are joined up regardless of their individual goals, embracing the existence of each other in order to benefit from each other’s successes.

The answer is about cooperation, co-dependency, mutualisation, openness, etc.

We think about this a lot at the Guardian. I recently wrote about how it applies to the recent Twitter issues here. And this presentation by Chris Thorpe below from back in 2009 on how to apply it to the news business is wonderful:

Of course, Alan Rusbridger’s description of a mutualised newspaper in this video is still one of the strongest visions I’ve heard for a collaborative approach to media.

The possibility of collective action at such an incredible scale is what makes the Internet so great.  If we can focus on making collective activities more fruitful for everyone then our problems will become less about haves and have-nots and more about ensuring that everyone participates.

That won’t be an easy thing to tackle, but it would be a great problem to have.

The importance of co-dependency on the Internet

The thing that still blows my mind about the Internet is that through all the dramatic changes over the last 2 or 3 decades it remains a mostly open public commons that everyone can use.

There are many land ownership battles happening all around it. But it has so far withstood challenges to its shape, its size, its governance and its role in all aspects of our lives.

Will that always be the case? Or are too many special interests with unbalanced power breaking down the core principles that made it our space, a shared space owned by nobody and available to everybody?

It used to be that corporate interests were aligned with public interests on the Internet.

Many early pioneers thrived because they recognized how to create and earn value across the network, in the connection between things, the edges (read more on Graph Theory). They succeeded by developing services that offered unfettered and easy access to something useful, often embracing some sort of sharing model for partners powered by revenue models built on network effects.

These innovators were dependent on open and public participation and a highly distributed network adhering to common standards such as HTTP and HTML.

The public and private interests were aligned. Everyone’s a winner!

However, hypertext was designed for connecting text, not for connecting people. The standards designed for communication such as SMTP and later XMPP were fine, but no standard for connecting people to other people achieved global adoption.

Even backing from the web’s founder for a standard called FOAF and a Bill of Rights for Social Networking Users backed by Arrington and Scoble failed to lift the movement for standards in social connections.

Without a standard the entrepreneurs got busy coming up with solutions, and, eventually, Facebook and Twitter were born. Like their predecessors, they recognized how to create and earn value in the connections between things, in this case, a network of people, a privately owned social graph.

But the social graph that they created is not an open, owner-less, public commons.

Facebook is interested in Facebook’s existence and have no interest in FOAF or XMPP. Industry standards are to be ignored at Facebook (and Apple, Microsoft and sometimes Google, too) unless they help to acquire customers or to generate PR.

Governance is held privately in these spaces. They consult with their customers, but they write their own privacy policies and are held accountable to nobody but public opinion and antiquated legal frameworks around the world.

Now, it’s a very appealing idea to think about an open and public alternative to the dominant social networks. Many have tried, including StatusNet and Diaspora. And more challengers are on the way.

But there are harder problems to solve here that I think matter more than that temporary band-aid.

  • We need to know why the industry failed to adopt a global standard for social networking. Will we go through this again as a market forms inevitably around a digital network of real-world things? Will it repeat again when we learn how to connect raw data, too?
  • Can the benefits (and therefore the incentives) to ensuring contributions are owned and controlled by a platform’s contributors in perpetuity be more commercially viable and legally sensible? Are there ways to support those benefits globally?
  • In what ways can those who are adversely affected by centralized control of social activity hold those forces to account? Where is the balance of power coming from, and how can it be amplified to be an equal force?

Much like the founders of the U.S. Constitution, there were some very clever folks who codified open access to this public space we call the Internet in ways that effectively future-proofed it. They structured it to reward individual contributions that also benefit the wider community.

Instead of pen and paper, the Internet’s founders used technology standards to accomplish this goal.

While it seems very doable to build a successful and lasting digital business that embraces these ideals, the temptation of money and power, the unrelenting competitive pressures, and weak incentives to collaborate will steer many good intentions in bad directions over time.

I don’t think Facebook was ever motivated to support an open public commons, so it’s no surprise when they threaten such a thing. But Twitter somehow seemed different.

At the moment, it seems from outside Twitter HQ that the company is being crushed by the weight of all these things and making some bad choices over small issues that will have big impact over time.

It’s not what Facebook and Twitter do that really matters, though.

The greater concern, in my opinion, is that the types of people who originally created this global network for the rest of us to enjoy aren’t being heard or, worse, not getting involved at all anymore.

I’m not suggesting we need more powerful bureaucratic standards bodies to slow things down. Arguably, these increasingly powerful platforms are already crushing competition and squeezing out great innovations. We don’t need more of that. Stifling innovation always has disastrous effects.

What I’m suggesting is that more people will benefit in more ways over a longer period of time if someone can solve this broader organizational problem – the need to codify open access, shared governance, and other future-proofing tactics for public spaces in whatever shape they take.

Social networking is threatening the open public network today. Something else will threaten it tomorrow.

We need to work out what can be done today to fuel and secure an ongoing and healthy co-dependency between our public spaces and commercial interests.

Dispatchorama: a distributed approach to covering a distributed news event

We’ve had a sort of Hack Week at the Guardian, or “Discovery Week“. So, I took the opportunity to mess around with the n0tice API to test out some ideas about distributed reporting.

This is what it became (best if opened in a mobile web browser):

http://dispatchorama.com/



It’s a little web app that looks at your location and then helps you to quickly get to the scene of whatever nearby news events are happening right now.

The content is primarily coming from n0tice at the moment, but I’ve added some tweets with location data. I’ve looked at some geoRSS feeds, but I haven’t tackled that, yet. It should also include only things from the last 24 hours. Adding more feeds and tuning the timing will help it feel more ‘live’.

The concept here is another way of thinking about the impact of the binding effect of the digital and physical worlds. Being able to understand the signals coming out of networked media is increasingly important. By using the context that travels with bits of information to inform your physical reality you can be quicker to respond, more insightful about what’s going on and proactive in your participation, as a result.

I’m applying that idea to distributed news events here, things that might be happening in many places at once or a news event that is moving around.

In many ways, this little experiment is a response to the amazing effort of the Guardian’s Paul Lewis and several other brave reporters covering last year’s UK riots.

There were 2 surprises in doing this:

  1. The twitter location-based tweets are really all over the place and not helpful. You really have to narrow your source list to known twitter accounts to get anything good, but that kind of defeats the purpose.
  2. I haven’t done a ton of research, yet, but there seems to be a real lack of useful geoRSS feeds out there. What happened? Did the failure of RSS readers kill the geoRSS movement? What a shame. That needs to change.

The app uses the n0tice API, JQuery Mobile, and Google’s location APIs and a few snippets picked off StackOverflow. It’s on GitHub here:
https://github.com/mattmcalister/dispatchorama/

The generosity strategy

I’ve wondered for a long time why WordPress doesn’t get the dotcom homage some of the other perhaps less interesting organizations are showered with.

There are many reasons to pay attention to them, but there’s one primary aspect of what they do that’s worth spending some time thinking about – what they give to the market in order to fuel a network that they benefit from in the end.

As Andy Weissman wrote about TED, sometimes giving away the core assets of your business is exactly what will create success for you.

“With the content, processes and brand more freely available, the community and the set of values can instead drive the business. And those are not as easily replicable.”

This attitude is what won them the war with their first rival in the blogging world, Movable Type.  It turns out that generosity is a very competitive strategy in a globally connected world.

Except it’s not my impression that was what they were intending the strategy to be used for. I think they used that strategy because it resonated with what type of company they wanted to have first and foremost.

It’s also true that ‘free’ can destroy established markets in order to create advantage for alternative models.  Bill Gurley has written before about how Google is intentionally using a scorched earth policy with Android, in particular, in order to build an unapproachable moat around their core business.

The WordPress approach has similar effects in the content management market, but they’ve built the core business itself on the open strategy.  They have made themselves dependent on the success of their customers.

I had the good fortune of inteviewing Matt Mullenweg on stage at The Guardian’s Activate Summit event last week where we spent most of our time talking about how the WordPress team operates. This is not a man chasing wealth for the sake of wealth, though wealth may in fact be chasing him. This is a person who understands the DNA of the Internet and knows intuitively how to craft a movement optimized to use the most powerful aspects of the network.

In case you’re unaware, WordPress is a publishing platform. They sell access to the WordPress tools, and they also give away the software.  The whole thing. They put it out there to download for free with an open license. They even make it super easy to install. No tricks. It’s genuine. They want you to use their software even if they don’t see a shred of direct value coming back to them as a result.

Their software is their core asset. Without it they have nothing. Why would they give it away?

What they are building is not your traditional enterprise software business. What they are building is at the very least a partner network if not something even bigger, something that looks more like a movement.

Looking at their business through the enterprise sofware lens is easy to do and certainly worth more consideration. They are leaving a lot of money on the table. They know this, and they’ve made impressive progress recapturing that lost opportunity with their VIP business.

But the founder’s philosophies lead the commercial strategy, not the other way around. WordPress wants to be a platform for free speech. Everything else comes after that.

As Matt said at Activate, “We are a neutral force. We participated in the SOPA blackout because we felt it posed a threat to our ability to stay that way.”

Operating the business strategy at that level creates a framework for all their decision-making.

They can open source their core assets because it strengthens the collective power of the WordPress toolset as a platform for free speech. In addition, it gives them a sensible model for working with developers who want to contribute code to the platform. They can operate with a small staff, prioritize product over profit, and play fast-follower to the break-neck pace of innovation that most of the rest of the top players in the business may be forced to play.

What’s the result of the generous nature of their business?

75M blogs, about half of which are hosted by them, and many of those pay them a monthly hosting fee. 341M monthly users across the network. 20,000 software plugins built by a huge network of developers working on the platform…many of whom make money being professional service providers and premium template designers for WordPress.

Now, they have a lot of powerful forces challenging their existence. Not least of which is the atomization of everything and challenges to the idea of blogs and even articles.

But by embracing a strategy of giving and a deep-seated commitment to enabling others to speak their minds on the global stage, WordPress has something more valuable than robust revenue streams. They have a network of customers who need them to succeed in the world.

That network of people is more valuable than any software or hardware distribution platform.

The power of collective research, task-based investigations and swarm intelligence

In January 2007 a well known computer scientist named Jim Gray was lost at sea off the California coast on his way to the Farallon islands.

It was a moment that many will remember either because Jim Gray was a big influence personally or professionally or because the method of the search for him was a real eye opener about the power of the Internet.  It was a group task-based investigation of epic proportions using the latest and greatest technology of the day.

I didn’t know him, but I will never forget what happened.  Not only did the Coast Guard’s air and surface search cover 40,000 square miles, but a distributed army of 12,000 people scanned NASA satellite imagery covering 30,000 square miles.  We all used Amazon’s Mechanical Turk to flip through tiles looking for a boat that would’ve been about 6 pixels in size.

They attacked the search in some phenomenal ways.  Here is Werner Vogel’s public call for help. You can also go back and read the daily search logs posted by his friends on the blog here.  Both Wired and the New York Times covered this incredible drama in detail.

Since then we’ve seen the Internet come to the rescue or at least try to make a difference using similar crowdmapping techniques.  Perhaps the most powerful example is the role crisis mappers and the Ushahidi platform played in the major Haiti earthquake in 2010.

But it’s not just crisis where these technologies are serving a public good.  We’ve seen these swarming techniques applied in a range of ways for journalism and many other activities on the Internet.

Perhaps the gold standard for collective investigative reporting is the MPs Expenses experiment by Simon Willison at the Guardian where 170,000 documents were reviewed by 15,000 people in the first 80 hours after it went live.  The Guardian has deployed its readers to uncover truth in a range of different stories, most recently with the Privatised Public Spaces story.  We’ve also looked at crowdmapping broadband speeds across the UK, and Joanna Geary’s ‘Tracking the Trackers‘ project uncovered some fascinating data about the worst web browser cookie abusers.

Last year Germany’s defense minister Karl-Theodor zu Guttenberg, a man once considered destined for an even larger role in the government, was forced to resign from his post as a result of allegations that he plagiarized his doctoral thesis.  It was proved to be true by a group of people working collectively on the investigation using a site called GuttenPlag Wiki.

ProPublica is a real pioneer in collective reporting and data journalism.  For example, their 2010 investigation into which politicians were given Super Bowl tickets provided a wonderful window into the investigative process.  And the Stimulus Spotcheck project invited people to assess whether or not the 2009 stimulus package in the US was in fact having an impact.

Also, Kevin Anderson reminded me of http://www.ipaidabribe.com tracking local corruption and http://oilreporter.org/ which came out of the Gulf of Mexico Oil Spill in 2010 and helps people report wildlife damage, share photos, etc.

Of course, swarming projects can have a range of different intentions, and if one were to try and count them I would bet only a small percentage are high impact journalistic endeavors.

Andy Baio is a pioneer in this kind of concept and has either been the curator of data already in existence or the inspiration for a crowdsourced investigation.  For example, his “Girl Turk” collective research uncovered an exhaustive list of artist and track names sampled for Girl Talk’s Feed the Animals album.

The big advertising brands intuitively understand the power of swarming intelligence, too, as they see it as a way to use their loyal customers to help them acquire new customers or to at least build a stronger direct relationship with a large group of people.  This is essentially the pitch once used by MySpace and adopted by Facebook, Twitter and Google +…Step 1: create a brand page where people can congregate, Step 2: inspire people to do something collectively that spreads virally.

The technologies that make these group tasks possible are getting easier and more accessible all the time. The wiki format works great for some projects.  DocumentCloud is a tremendous platform.   Google Docs are providing a lot of power for collective investigations, as we’ve discovered several times on the Guardian’s Datablog. And, of course, crowdmapping can be done with little technical intervention using Ushahidi and n0tice.

Of course, you can’t discount the power of the social networks as distribution platforms and amplifiers for group-based investigations.  Creating the space for swarming activity is one thing, but getting the word out is a role that Facebook and Twitter are very good at playing.  It’s a perfect marriage, in many ways.

An army of helpers may be accessible in other ways, too.

Amanda Michel who famously drove the Off The Bus campaign at HuffPo (more on that below) produced a guide to “Using Amazon’s Mechanical Turk for Data Projects” while at ProPublica where she describes how they hired workers to complete short, simple tasks.

But I imagine that the next wave of activity will arise as some of the human patterns of group tasks inspire more sustainable technology platforms.  As Martin Kotynek and ‘PlagDoc’ acknowledge in their wonderful report “Swarm of thoughts” there’s a need for some sort of centralized research platform so this kind of activity is easier to trigger and run with.

Perhaps it’s a matter of identifying a few very specific collective research concepts that work and fueling ongoing community activity around those ideas.  Citizen journalism, for example, is an obvious activity where communities are forming.

CNN’s iReport has a ready-built citizen journalist network incentivized by exposure on cnn.com, and the n0tice platform can enable citizen-powered crowdmapping activity for a range of different projects and get exposure and distribution across different platforms.  Both are capable of serving an ongoing role as useful every-day citizen journalism services that can crank up the volume on a particular issue when the appropriate moment arises.

Platforms can create some ongoing momentum, but so can issues.

Off The Bus was an 18-month HuffPo initiative where readers and staff covered the US elections collaboratively from their own communities. The project had the additional benefit of generating insights that turned into larger editorial investigations such as the Superdelegate Investigation, a report on the Evangelical Vote and the Political Campaign HQ crowdmapping project.  Ryan Tate’s book The 20% Doctrine goes into some detail about Off The Bus, how it developed, and how Amanda managed it all.

I suspect that a whole class of swarming intelligence projects is starting to bubble up that may only appear when the human story, the technology, and the amplifier join up and create a perfect storm.

In the end, it comes down to projects that resonate with people on a personal level.

Though Jim Gray was never found, the thinking about how to conduct the search amongst the leaders of the crowd at the time could not have been more cogent.  The instructions for participants were inspiring, detailing a simple task and the result of completing it:

You will be presented with 5 images. The task is to indicate any satellite images which contain any foreign objects in the water that may resemble Jim’s sailboat or parts of a boat. Jim’s sailboat will show up as a regular object with sharp edges, white or nearly white, about 10 pixels long and 4 pixels wide in the image. If in doubt, be conservative and mark the image. Marked images will be sent to a team of specialists who will determine if they contain information on the whereabouts of Jim Gray. Friends and family of Jim Gray would like to thank you for helping them with this cause.

It’s conceivable that the most important thing social media has accomplished over the last 3-5 years is that it has unlocked the natural desire people have to impact what’s happening in the world in a way they may not have felt empowered to do for decades.

Now it’s simply a matter of joining up the technologies in ways that enable those ideas to come to life.


 

A List of Collective Investigations

Below are some of the projects mentioned above and several others that have been sent to me.  I’ve included a few things that aren’t journalism investigations that are worth a closer look simply because they can be instructive.

Tenacious SearchSince January 28, the San Francisco police, the Coast Guard and Jim’s friends and family have conducted an extensive search to find him and his sailboat, Tenacious, off the California coast. I want to summarize the status of that search here, so that the broad volunteer community that’s done so much knows where we stand.

Embedly Powered

Crisis mapping brings online tool to Haitian disaster relief effortPatrick Meier learned about the earthquakes at 7 p.m. Tuesday while he was watching the news in Boston. By 7:20, he’d contacted a colleague in Atlanta. By 7:40, the two were mobilizing an online tool created by a Kenyan lawyer in South Africa. By 8, they were gathering intelligence from everyplace,…

Embedly Powered

The Brian Lehrer Show – Are You Being Gouged?Our latest “crowdsourcing” project asks listeners to go to their local grocery store and find out the price of three goods: milk, lettuce and beer. You don’t have to buy them (or consume them), but we want to know how much they cost in different neighborhoods throughout the New York area.

Embedly Powered

via Wnyc
Investigate your MP’s expensesWe have 458,832 pages of documents. 33,105 of you have reviewed 226,139 of them. Only 232,693 to go… Start reviewing Please read our privacy policy to find out how we use your data. You must also read our terms of service.

Embedly Powered

Privately owned public space: where are they and who owns them?We’re in the middle of a creeping privatisation of public space. Streets and open spaces are being defined as private land after redevelopment. It began with Canary Wharf but is now a standard feature of urban regeneration. In future, one of the biggest public squares in Europe – Granary square, in the new development around Kings Cross – will be privately owned.

Embedly Powered

Broadband Britain: how fast is your connection?With your help, the Guardian is creating an up-to-date broadband map of Britain, showing advertised versus real speeds. We want to highlight the best and worst-served communities, and bring attention to the broadband blackspots.

Embedly Powered

Tracking the trackers: help us reveal the unseen world of cookiesCookies and other web trackers monitor our online behaviour and store our browsing habits, but who are the companies behind them and what are they doing with our data? We have teamed up with Mozilla to try to find out.

Embedly Powered

GuttenPlag WikiAchtung: Dies sind keine Initiativen von GuttenPlag Dies ist eine kollaborative Dokumentation der Plagiate – jeder ist eingeladen, hier mitzuarbeiten. Ergänzungen und Änderungen in diesem Wiki sind transparent und jederzeit nachvollziehbar. Jede Bearbeitung wird protokolliert. Siehe: Letzte Änderungen (ohne Diskussionsbeiträge) Guttenbergs Dissertation und die Plagiatsvorwürfe wurden seit dem 16.

Embedly Powered

via Wikia
ProPublica’s Super Bowl Blitz: Which Congressmen Are Getting Super Bowl Perks?Carson, André (D)(202) 225-4011 7th IN Don’t call [ Sebastian Jones, ProPublica ] Don’t call [ Sebastian Jones, ProPublica ] Awaiting reply [ Kathleen McLaughlin, Indianapolis Business Journal | Congressional staff Glendal Jones, press secretary, Feb 3, 2010 ] Delahunt, Bill (D)(202) 225-3111 10th MA Staff doesn’t know.

Embedly Powered

I Paid a Bribe | Uncover the market price of corruption in Indiaipaidabribe: Share your story on bribes and corruption. Read latest news on corruption in Indian bureaucracy and civic agencies. Read corruption and bribery related stories from all over India

Embedly Powered

Deepwater Oil Reporter Crowdsourcing PlatformHere are a few things we think you need to know before joining this open data sharing initiative. Please read before you proceed. Know that all data reported on Oil Reporter is PUBLIC. If you don’t want to share information with the public, Oil Reporter isn’t for you.

Embedly Powered

Girl Turk: Mechanical Turk Meets Girl Talk’s “Feed the Animals” – Waxy.orgGirl Talk’s Feed the Animals is one of my favorite albums this year, a hyperactive mish-mash sampling hundreds of songs from the last 45 years of popular music. Gregg Gillis created a beautiful, illegal mess of copyright clearance hell, which you should download immediately.

Embedly Powered

via Waxy
HuffPost Launches OffTheBus Citizen Journalism Project Ahead of 2012 ElectionsWASHINGTON — If you are like most people, you don’t much like the way the “national media” cover politics. As a long-time member of the Washington press corps, I agree with you. We can be trivial, shortsighted, credulous, ideologically blinkered and timid — on a good day.

Embedly Powered

Netflix Prize: HomeThe Netflix Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences. On September 21, 2009 we awarded the $1M Grand Prize to team “BellKor’s Pragmatic Chaos”. Read about their algorithm, checkout team scores on the Leaderboard, and join the discussions on the Forum.

Embedly Powered

HerdictWeb : AboutAbout Us Herdict is a project of the Berkman Center for Internet & Society at Harvard University. Herdict is a portmanteau of ‘herd’ and ‘verdict’ and seeks to show the verdict of the users (the herd). Herdict Web seeks to gain insight into what users around the world are experiencing in terms of web accessibility; or in other words, determine the herdict.

Embedly Powered

The High Price of Creating Free Ads – New York TimesFrom an advertiser’s perspective, it sounds so easy: invite the public to create commercials for your brand, hold a contest to pick the best one and sit back while average Americans do the creative work. But look at the videos H. J. Heinz is getting on YouTube.

Embedly Powered

SpotCrime Crime MapArrest Arson Assault Burglary Robbery Shooting Theft Vandalism Other Loading Crime Data… City and county crime map showing crime incident data down to neighborhood crime activity. Subscribe for crime alerts and reports.

Embedly Powered

The Peer to Patent Project – Community Patent ReviewThe Community Patent Review: Peer to Patent project On June 15, 2007, the United States Patent and Trademark Office (USPTO) opened the patent examination process for online public participation for the first time.

Embedly Powered

via Nyls
Prize4LifeOur mission is to accelerate the discovery of treatments and a cure for ALS by using powerful incentives to attract new people and drive innovation. We know that the solutions to some of the biggest challenges in ALS research will require out-of-the-box thinking, and some of the most critical discoveries may come from unlikely places.

Embedly Powered

FixMyStreetHow to report a problem Enter a nearby GB postcode, or street name and area Locate the problem on a map of the area Enter details of the problem We send it to the council on your behalf 1,616 reports in past week 2,529 fixed in past month 204,852 updates on reports

Embedly Powered

Reporting Recipe: Using Amazon’s Mechanical Turk for Data ProjectsOf all of journalism’s recent evolutions, data-driven reporting is one of the most celebrated. But as much as we should toast data’s powers, we must acknowledge its cost: Assembling even a small dataset can require hours of tedious work, deterring even the most disciplined of journalists and their editors.

Embedly Powered

HuffPost’s OffTheBus Superdelegate InvestigationWe asked HuffPost readers to join with us and profile the hundreds of superdelegates who are likely to decide the Democratic nomination for president. Hundreds of you responded and we can now present our initial findings. Just click on a state or territory and a list of superdelegate profiles, as compiled by our citizen journalists, will pop up.

Embedly Powered

The Political Campaign HQ Next Door: OffTheBus Special Ops PhotographsWhere are the state campaign headquarters located, exactly, for the party that claims to represent Main Street? Where are they located for the party that claims to represent everyone? Thanks to the work of HuffPost OffTheBus Special Ops, you can visit offices around the nation in just a few key strokes.

Embedly Powered

Introducing Stimulus Spot CheckJuly 20, 2009: This post has been corrected. It’s the middle of July and we’re all wondering whether the stimulus is working. If we do as the administration has advised, we should remain patient – and let the administration measure its own success.

Embedly Powered

WNYC – Mapping the Storm Clean-upWe’ve been asking readers and listeners to let us know if their streets have been plowed. Here are maps from Tuesday, Wednesday and Thursday (white balloons represent unplowed streets, blue plowed). Click the balloons for full information and voice messages where available. Submit yours by texting PLOW to 30644.

Embedly Powered

via Wnyc
How Do You Feel About the Economy? – Interactive Feature – NYTimes.comEnter the word that best describes your current mood. You can submit a response once a day.

Embedly Powered

Adjunct ProjectThe Project The Adjunct Project exists for the growing number of graduate degree holders who are unemployed and underemployed. Many of these highly educated and passionate people are being forced to take jobs dramatically below their achievement and earning potential.

Embedly Powered

The Scrapbook – POPS Report: Tell Us About New York City’s Privately-Owned Public SpacesListen: Project Intro from October 19th // Listen: Wrap-Up from November 9th // WNYC’s Brian Leher Show and The New York World are collaborating on a project to map and report on New York City’s Privately-Owned Public Spaces, aka POPS. We want to figure out how public these public spaces really are.

Embedly Powered

via Wnyc

An open community news platform: n0tice.com

The last several weeks I’ve been working on a new project, a SoLoMo initiative, as John Doerr or Mary Meeker would call it.

One of those places
Noticeboard photo by Jer*ry

It’s a mobile publishing platform that resembles a community notice board.  It’s called n0tice*:

http://n0tice.com.

After seeing Google’s “News near you” service announced on Friday I thought it was a good time to jump into the conversation and share what I’m up to.  Clearly, there are a lot of people chasing the same or similar issues.

First, here’s some background.  Then I’ll detail what it does, how it works, and what I hope it will become.

What is n0tice?

It began as a simple hack day project over a year ago.  I was initially just curious about how location worked on the phone.  At first I thought that was going to be beyond me, and then Simon Willison enlightened me to the location capabilites inherent in modern web browsers. There are many solutions published out there. Here’s one.

It took half a second from working out how to identify a user’s location to realizing that this feature could be handy for citizen reporters.

Around the same time there was a really interesting little game called noticin.gs going around which was built by Tom Taylor and Tom Armitage, two incredibly talented UK developers.  The game rewarded people for being good at spotting interesting things in the world and capturing a photo of them.

Ushahidi was tackling emergency response reporting. And, of course, Foursquare was hitting its stride then, too.

These things were all capturing my imagination, and so I thought I would try something similar in the context of sharing news, events and listings in your community.

Photo by Roo Reynolds

However, I was quite busy with the Guardian’s Open Platform, as the team was moving everything out of beta, introducing some big new services and infusing it into the way we operate.  I learned a lot doing that which has informed n0tice, too, but it was another 12 months before I could turn my attention back to this project.  It doesn’t feel any less relevant today than it did then. It’s just a much more crowded market now.

What does n0tice do?

The service operates in two modes – reading and posting.

n0tice.com - what's near you nowWhen you go to n0tice.com it will first detect whether or not you’re coming from a mobile device.  It was designed for the iPhone first, but the desktop version is making it possible to integrate a lot of useful features, too.

(Lesson:  jQuery Mobile is amazing. It makes your mobile projects better faster. I wish I had used it from day one.)

It will then ask your permission to read your location.  If you agree, it grabs your latitude and longitude, and it shows you what has been published to n0tice within a close radius.

(Lesson: It uses Google Maps and their geocoder to get the location out of the browser, but then it uses Yahoo!’s geo services to do some of the other lookups since I wanted to work with different types of location objects.  This combination is clunky and probably a bad idea, but those tools are very robust.)

You can then zoom out or zoom in to see broader or more precise coverage.

Since it knows where you are already, it’s easy to post something you’ve seen near you, too.  You can actually post without being logged in, but there are some social incentives to encourage logged in behavior.

Like Foursquare’s Mayor analogy, n0tice has the ‘Editor’ badge.

The first person to post in a particular city becomes the Editor of that city.  The Editor can then be ousted if someone completes more actions in the same city or region.

It was definitely a challenge working out how to make sensible game mechanics work, but it was even harder finding the right mix of neighborhood, city, country, lat/long coordinates so that the idea of an ‘Editor’ was consistent from place to place.

London and New York, for example, are much more complicated given the importance of the neighborhoods yet poorly defined boundaries for them.

(Lesson: Login is handled via Facebook. Their platform has improved a lot in the last 12 months and feels much more ‘give-and-take’ than just ‘take’ as it used to. Now, I’m not convinced that the activities in a person’s local community are going to join up naturally via the Facebook paradigm, so it needs to be used more as a quickstart for a new service like this one.)

The ‘Editor’ mechanics are going to need a lot more work.  But what I like about the ‘Editor’ concept is that we can now start to endow more rights and priveleges upon each Editor when an area matures.

Perhaps Editors are the only ones who can delete posts. Perhaps they can promote important posts. Maybe they can even delegate authority to other participants or groups.

Of course, quality is always an issue with open communities. Having learned a few things about crowdsourcing activities at the Guardian now, there are some simple triggers in place that should make it easier to surface quality should the platform scale to a larger audience.

For example, rather than comments, n0tice accepts ‘Evidence’.

You can add a link to a story, post a photo, embed a video or even a storify feed that improve the post.

Also, the ratings aren’t merely positive/negative.  They ask if something matters, if people will care, and if it’s accurate. That type of engagement may be expecting too much of the community, but I’m hopeful it will work.

Of course, all this additional level of interactivity is only available on the desktop version, as the mobile version is intended to serve just two very specific use cases:

  1. getting a snapshot of what’s happening near you now
  2. posting something you’ve seen quickly and easily

How will n0tice make money?

Since the service is a community notice board, it makes sense to use an advertising model that people already understand in that context: classifieds.

Anyone can list something on n0tice for free that they are trying to sell.  Then they can buy featured promotional positions based on how large the area is in which they want their item to appear and for how long they want it to be seen there.

(Lesson: Integrating PayPal for payments took no time at all. Their APIs and documentation feel a little dated in some ways, but just as Facebook is fantastic as a quickstart tool for identity, PayPal is a brilliant quickstart for payments.)

Promotion on n0tice costs $1 per 1 mile radius per day. That’s in US dollars.

While still getting the word out and growing the community $1 will buy you a featured spot that lasts until more people come along and start buying up availability.

But there’s a lot we can do with this framework.

For example, I think it would make sense that a ‘Publisher’ role could be defined much like the ‘Editor’ for a region.

Perhaps a ‘Publisher’ could earn a percentage of every sale in a region.  The ‘Publisher’ could either earn that privelege or license it from us.

I’m also hopeful that we can make some standard affiliate services possible for people who want to use the ad platform in other apps and web sites across the Internet.  That will only really work if the platform is open.

How will it work for developers and partners?

The platform is open in every way.

There are both read and write APIs for it.  The mobile and desktop versions are both using those APIs, in fact.

The read API can be used without a key at the moment, and the write API is not very complicated to use.

So, for example, here are the 10 most recent news reports with the ‘crime’ tag in machine-readable form:

http://n0tice.com/api/readapi-reports.php?output=xml&tags=crime&count=10

The client code for the mobile version is posted on Github with an open license (we haven’t committed to which license, yet), though it is a few versions behind what is running on the live site.  That will change at some point.

And the content published on n0tice is all Creative Commons Attribution-Share Alike so people can use it elsewhere commercially.

The idea in this approach to openness is that the value is in the network itself, the connections between things, the reputation people develop, the impact they have in their communities.

The data and the software are enablers that create and sustain the value.  So the more widely used the data and software become the more valuable the network is for all the participants.

How scalable is the platform?

The user experience can scale globally given it is based on knowing latitude and longitude, something treated equally everywhere in the world.  There are limitations with the lat/long model, but we have a lot of headroom before hitting those problems.

The architecture is pretty simple at the moment, really.  There’s not much to speak of in terms of directed graphs and that kind of thing, yet.  So the software, regardless of how badly written it is, which it most definitely is, could be rewritten rather quickly.  I suspect that’s inevitable, actually.

The software environment is a standard LAMP stack hosted on Dreamhost which should be good enough for now.  I’ve started hooking in things like Amazon’s CloudFront, but it’s not yet on EC2.  That seems like a must at some point, too.

The APIs should also help with performance if we make them more cacheable.

The biggest performance/scalability problem I foresee will happen when the gaming mechanics start to matter more and the location and social graphs get bigger.  It will certainly creak when lots of people are spending time doing things to build their reputation and acquire badges and socialize with other users.

If we do it right, we will learn from projects like WordPress and turn the platform into something that many people care about and contribute to.  It would surely fail if we took the view that we can be the only source of creative ideas for this platform.

To be honest, though, I’m more worried about the dumb things like choking on curly quotes in users’ posts and accidentally losing users’ badges than I’m worried about scaling.

It also seems likely that the security model for n0tice is currently worse than the performance and scalability model. The platform is going to need some help from real professionals on that front, for sure.

What’s the philosophy driving it?

There’s most definitely an ideology fueling n0tice, but it would be an overstatement to say that the vision is leading what we’re doing at the moment.

In its current state, I’m just trying to see if we can create a new kind of mobile publishing environment that appeals to lots of people.

There’s enough meat to it already, though, that the features are very easy to line up against the mission of being an open community notice board.

Local UK community champion Will Perrin said it felt like a “floating cloud of data that follows you around without having to cleave to distribution or boundary.”

I really like that idea.

Taking a wider view, the larger strategic context that frames projects like this one and things like the Open Platform is about being Open and Connected.  Recently, I’ve written about Generative Media Platforms and spoken about Collaborative Media.  Those ideas are all informing the decisions behind n0tice.

What does the future look like for n0tice?

The Guardian Media Group exists to deliver financial security for Guardian News and Media.

My hope is that we can move n0tice from being a hack to becoming a new GMG business that supports the Guardian more broadly.

The support n0tice provides should come in two forms: 1) new approaches to open and collaborative journalism and 2) new revenue streams.

It’s also very useful to have living projects that demonstrate the most extreme examples of ‘Open and Connected‘ models.  We need to be exploring things outside our core business that may point to the future in addition to moving our core efforts where we want to go.

We spend a lot of time thinking about openness and collaboration and the live web at the Guardian.  If n0tice does nothing more than illustrate what the future might look like then it will be very helpful indeed.

However, the more I work on this the more I think it’s less a demo of the future and more a product of the present.

Like most of the innovations in social media, the hard work isn’t the technology or even the business model.

The most challenging aspect of any social media or SoLoMo platform is making it matter to real people who are going to make it come alive.

If that’s also true for n0tice, then the hard part is just about to begin.

 


* The hack was originally called ‘News Signals’.  But after trying and failing to convince a few people that this was a good idea, including both technical people and potential users, such as my wife, I realized the name really mattered.

I’ve spent a lot of time thinking about generative media platforms, and the name needed to reflect that goal, something that spoke to the community’s behaviors through the network. It was supposed to be about people, not machines.

Now, of course, it’s hard to find a short domain name these days, but digits and dots and subdomains can make things more interesting and fun anyhow. Luckily, n0tice.com was available…that’s a zero for an ‘o’.

Chad Dickerson to join Etsy in New York

Chad Dickerson writes about leaving Yahoo! and starting his new job at Etsy running their technology efforts. He’ll be relocating to New York after spending 10 years in the Bay Area.

“To my Bay Area friends and colleagues, you have given me so much in my time here, I can’t thank you enough. The time I spent out in California this past ten years has literally been life-changing. To my New York friends, I look forward to reconnecting.”

It’s great to see talented people getting rewarded for their hard work. Unfortunately, this is a loss for Yahoo! who needs forward-thinking leaders like Chad who can make things happen. Retention must be top of mind at Yahoo! before key institutional knowledge slips out the door and forces people to rethink things that have already been thought through.

There are lots of great reasons to participate in the future of Yahoo! where the Open Strategy stuff is unfolding. The Flickr Era set the stage for a lot of these smart ideas at Yahoo!. I only worry that the pace of release at the company will fail to create the impact that will make those changes matter. It’s not uncommon for great technology to lose due to bad timing.

The timing worked well for Chad and Etsy, though. He’s going to be a huge asset to Etsy at an important stage in its growth. If it wasn’t already, Etsy should be ranked high on everyone’s companies-to-watch leaderboard.

Local community data reporting

EveryBlock has taken a very data intensive look at local news reporting. As founder Adrain Holovaty explains:

“An overall goal of EveryBlock is to point you to news near your block. We’ve been working hard to do a good job of this so far by accumulating public records, cataloging newspaper stories and pulling together various other geographic information from the Web.”

This generally takes the form of raw data points placed on maps. They recently rolled out a variation on the theme by using topic-specific data which adds more context to the local news reporting idea.

“A week or so ago, 15 people were arrested on bribery charges as part of a federal probe into corruption in Chicago city government. We’ve analyzed U.S. Attorney Patrick J. Fitzgerald’s complaint documents and cataloged the specific addresses mentioned within. On the project’s front page, you can view every location we found, along with a relevant excerpt from the complaint. You can sort this data in various ways, including a list and map of all the alleged bribe locations.”

This is the type of value that’s otherwise kind of missing from the experience. Rather than providing a mostly pure research tool, the site now gives some insight and perspective with an editorial view on the data. In this case, the data is telling a story that otherwise might seem a little distant to you until you see how the issue may in fact be a very real one right in your backyard, so to speak.

But it occurred to me that the community is probably even better able to capture and share this level of useful insight. It would be really neat to see EveryBlock open the reporting and mapping process so that anyone who has an interest in exposing the trends in their neighborhood or elsewhere had a platform to do so.

Average payment (€) by Area
Similar to the way Swivel allows you to collect data in spreadsheet form, visualize it and then share it the way Flickr and YouTube allow you to share, EveryBlock could provide an environment for individuals to do the reporting in their neighborhood that matters to them. The wider community could then benefit from the work of a few, and suddenly you have a really powerful local news vehicle.

This isn’t necessarily in contrast to the approach Outside.in has taken by aggregating shared information from around the web, but it certainly puts some structure around it in a way that may be necessary.

Managing a community is a very different problem than aggregating and presenting useful local data. But I wonder if it’s a necessary next step to get both of these fledgling but very forward-thinking local media services closer to critical mass.

The useful convergence of data

I have only one prediction for 2008. I think we’re finally about to see the useful combination of the 4 W’s – Who, What, Where, and When.

Marc Davis has done some interesting research in this area at Yahoo!, and Bradley Horowitz articulated how he sees the future of this space unfolding in a BBC article in June ’07:

“We do a great job as a culture of “when”. Using GMT I can say this particular moment in time and we have a great consensus about what that means…We also do a very good job of “where” – with GPS we have latitude and longitude and can specify a precise location on the planet…The remaining two Ws – we are not doing a great job of.”

I’d argue that the social networks are now really honing in on “who”, and despite having few open standards for “what” data (other than UPC) there is no shortage of “what” data amongst all the “what” providers. Every product vendor has their own version of a product identifier or serial number (such as Amazon’s ASIN, for example).

We’ve seen a lot of online services solving problems in these areas either by isolating specific pieces of data or combining the data in specific ways. But nobody has yet integrated all 4 in a meaningful way.


Jeff Jarvis’ insightful post on social airlines starts to show how these concepts might form in all kinds of markets. When you’re traveling it makes a lot of sense to tap into “who” data to create compelling experiences that will benefit everyone:

  • At the simplest level, we could connect while in the air to set up shared cab rides once we land, saving passengers a fortune.
  • We can ask our fellow passengers who live in or frequently visit a destination for their recommendations for restaurants, things to do, ways to get around.
  • We can play games.
  • What if you chose to fly on one airline vs. another because you knew and liked the people better? What if the airline’s brand became its passengers?
  • Imagine if on this onboard social network, you could find people you want to meet – people in the same business going to the same conference, people of similar interests, future husbands and wives – and you can rendezvous in the lounge.
  • The airline can set up an auction marketplace for at least some of the seats: What’s it worth for you to fly to Berlin next Wednesday?

Carrying the theme to retail markets, you can imagine that you will walk into H&M and discover that one of your first-degree contacts recently bought the same shirt you were about to purchase. You buy a different one instead. Or people who usually buy the same hair conditioner as you at the Walgreen’s you’re in now are switching to a different hair conditioner this month. Though this wouldn’t help someone like me who has no hair to condition.

Similarly, you can imagine that marketing messages could actually become useful in addition to being relevant. If CostCo would tell me which of the products I often buy are on sale as I’m shopping, or which of the products I’m likely to need given what they know about how much I buy of what and when, then my loyalty there is going to shoot through the roof. They may even be able to identify that I’m likely buying milk elsewhere and give me a one-time coupon for CostCo milk.

Bradley sees it playing out on the phone, too:

“On my phone I see prices for a can of soup in my neighbourhood. It resolves not only that particular can of soup but knows who I am, where I am and where I live and helps me make an intelligent decision about whether or not it is a fair price.

It has to be transparent and it has to be easy because I am not going to invest a lot of effort or time to save 13 cents.”

It may be unrealistic to expect that this trend will explode in 2008, but I expect it to at least appear in a number of places and inspire future implementations as a result. What I’m sure we will see in 2008 is dramatic growth in the behind-the-scenes work that will make this happen, such as the development and customization of CRM-like systems.

Lots of companies have danced around these ideas for years, but I think the ideas and the technologies are finally ready to create something real, something very powerful.

Photo: SophieMuc