How much is the media ecosystem worth?

How do we know that media as an industry is doing well? If you look at digital ad growth you would think we’re killing it. Or the market cap for the big platforms. Or the number of new companies and job growth in media tech.

Mary Meeker’s Internet Trends Report for KPCB, 2016

Everything is going up, up, up.

Except the parts that are failing.

Quality news is suffering terribly, and therefore maybe media is not as successful as it appears from the outside.

It may not be Google’s fault or Facebook’s fault. It might be poor decisions at the top creating this existential crisis. Cutting cancers out of bodies saves lives. Perhaps news as we know it doesn’t need to exist and other forms are replacing it. Perhaps it’s a form of natural selection and that’s just the way it is.

You wouldn’t be wrong for sitting in those camps. But you would be short-sighted. We can do better. It starts by looking at media as an ecosystem, connected parts that serve different purposes to make a healthier and happier society as a whole.

The beauty of networks, the idea that has made Facebook and countless Silicon Valley success stories possible, is that the connections become inherently valuable. People thrive on being connected. Businesses do, too.

Unfortunately, news orgs have never connected to anything very well. They’ve maintained their islands, and piece by piece every aspect of the business gets replaced by things that connect instead.

Connecting them should be easy, though. And if we can connect news orgs to the wider media ecosystem then the whole market will win.

One way to connect news orgs into the wider media ecosystem is to use technology to deconstruct the news and identify things that should connect and the value of those connections.

Entity extraction tools such as the service from Aylien identify concepts in documents

In any article we know the people, places and things the article mentions, and we can deduce what the most important entities are in a given story. By using these concepts as connective tissue we can link networks and network activity with the news.

On Facebook people amplify stories about those concepts and therefore they give them more value (or ignore the ones they don’t value). Data from stock markets or Wikipedia or other sources can weight the importance of those things. Search marketing and targeted impression values tell us what commercial demand looks like for a given entity, too.

Taking this idea a step further we know from Kaleida research, for example, that 100 articles have been written about Meghan Markle in the last month from leading publishers in the US and UK. She is an American TV actress who is now Prince Harry’s girlfriend.

Those articles have been shared on Facebook about 100,000 times or about 33 shares per article per day, on average. She is not a top tier celebrity like Kim Kardashian or Jay Z, so we don’t have to weight the value of her name for this example…not yet, anyhow.

We can get search rankings and bidding prices for her name on Google AdWords or other ad platforms. Let’s say search drives $2/click against 100k searches per month, maybe $2k per day in value.

We know quality publishers value this topic. We know people care about the subject by how much they are sharing those articles. And we understand advertiser demand around the topic from PPCs.

Maybe there’s a formula here:

Value of subject = (Performance ad revenue x Social distribution) — Cost of production

$2,000 per day from search clicks x 33 avg shares per article about Meghan Markle from across a selection of quality media outlets = $6,600 per day media value or about $2M for the month. After an estimated $100,000 in production costs the topic becomes worth $1.9M.

If there are, say, 500 topics a month of equal value then we’re talking about $1B per month in total value across the media ecosystem related to news.

That’s not money generated by publishers. Equally, it’s not value captured by platforms in a vacuum. It’s a recognition of value in the market related to news coverage on a topic by topic basis, using concepts as connections.

There are lots of ways to apply such a model on the publisher side. For example, more coverage of a topic doesn’t make it more valuable either for the publisher or the wider media ecosystem. More coverage means articles have to work harder to break through, so maybe less coverage with higher impact is the way to both increase value and sustain positioning. This model could prove that.

Again, this is merely a concept. It’s meant to demonstrate that connecting the news into a larger context is not only possible but perhaps necessary.

If we want to understand the strength of media and technology as an industry then we need to measure the impact of the news. And if we want to become stronger we need to balance the amazing growth we’re seeing in distribution and advertising with substance that reinforces it or perhaps even accelerates it. Until news is connected success in digital media is just big muscles on small bones.

We’re starting to expose some of this thinking about connecting concepts in the news through data we provide at Kaleida, so keep an eye out for it in our daily email newsletters and the charts and tools on kaleida.com. Also, if you are in the business of measuring value in networks and ecosystems we would love to talk to you.

How Hamilton gave media their mojo back

How does an editor know a story will be big?

Knowing a big story intuitively is one of those skills that becomes more and more important the higher up the food chain an editor sits. But it gets harder and harder to work out the answer to that question as an editor’s readership scope goes beyond what he or she can see.

That crucial ability becomes muted in a Facebook-dominated world with no newsstand sales or web site analytics to track, a world where filter bubbles are reinforced by opaque distribution. If you can’t feel what your readers care about you’ll just be guessing.

Despite these challenges that uncanny editor’s spidey-sense kicked in this weekend when a stage actor delivered a passionate plea for diversity in America to VP-elect Mike Pence who was sitting in the audience.

How did editors know this would take off? There can be no doubt they were right.

Topic performance for Hamilton story. Source: Kaleida, Nov 2016

All the publishers we are tracking covered the story. We’ve seen about 40 articles, videos and cartoons, so far. And all those pieces are performing well.

In fact, they are performing so well that articles about the play Hamilton, the actor Brandon Victor Dixon and the show’s creator Lin-Manuel Miranda are outperforming stories about everything in the news other than Trump and Obama. When you remove the presidential outliers, this is what you get:

Kaleida data, Nov 2016

As a percentage of total coverage stories about them are even more positive than stories about Michelle Obama. And that’s saying something.

Regardless of political slant people love a good David vs Goliath story. Clinton, mainstream media and the status quo were the Goliath to Trump’s David. He slayed his giant with a relentless onslaught of soundbites, but the tables turned quickly. The people want him to do the listening now.

One thing we can be sure of is that mainstream media is changing its tone. Coverage since the election result is clearly more neutral than it was prior to the election when the positive and negative tone was much more dramatic. Trump coverage is normalizing.

The normalization of Trump coverage. Source: Kaleida data, Nov 2016

On one hand that may seem like acquiescence. But maybe that’s how to rebuild the trust that was lost in the build up to the election.

Perhaps something more profound is happening. Maybe publishers are relearning how to apply their resources and assert themselves in the new post-truth world.


When Trump reacted by demanding an apology for speaking out in this way he solidified his Goliath position vs the people’s David.

Most editors will intuitively understand pressure from the President to behave the way he expects them to behave as an indicator of things to come.

That kind of relationship with power is one they know how to deal with.

Dear Journalism

This election campaign has been pretty crazy, hasn’t it? It’s been fascinating to watch. Exciting sometimes, and disgusting. But somehow I felt like you were next to us shaking your fist at the TV, too, when you should’ve been out on the street, applying a bit of your level-headed perspective and a whole lot of your inquisitive mind to unpick the many strands coming at us from all angles.

Instead you mostly just made everything a whole lot louder.

CNN, October 20, 2016

Your headlines quoted Trump directly. You topped articles about Clinton with videos of Donald Trump saying shocking things. You challenged Trump by saying, “He thinks Saddam Hussein killed terrorists ‘so good’. How scary and what terrible grammar!” And you posted videos of his speeches on Facebook and encouraged your readers to watch them and share them.

When Nigel Farage of the right wing UKIP party in Britain spoke at rallies defending Trump’s offensive statements BBC and all the rest posted the video on all their channels, a double amplification, this time with context.

You created deep cultural deficit by normalizing this kind of language in this way. Soledad O’Brien was right to eviscerate CNN for its shoddy coverage, and yet it never seemed to offer any sort of course correction. Will it now?

Your fact-checking efforts were worthy but ineffective. You made exhaustive lists that lasted for only moments in the public discourse.

You seemed to be aware of what you’re doing, but with only days before the election you are still doing it. The New York Times Editorial Leader on Thursday described Trump’s latest threats to Clinton in detail and its implications, articulating his plans with greater strength than he ever could’ve delivered himself. In contrast and wisely, The Washington Post put it in context and focused on the issues, not the quotes.

“News is what people do not want you to print. All the rest is advertising.” -Lord Northcliffe

It took a year or so following the announcement of his candidacy to offer any real independent challenge to Trump’s qualifications as a candidate for President of the United States.

But it was billionaire Warren Buffett, not Journalism, who finally discredited Trump’s claim that there was no reason Trump couldn’t release his tax returns. He can. He should. He hasn’t. And except for a brief moment in the campaign where The New York Times got their hands on his 1995 return he has gotten away with it now. How could you let that happen?

Mother Jones, Aug 30, 2016

Smaller independents like Mother Jones did some solid independent reporting, including the story on Trump companies encouraging workers to violate immigration laws. There was so much more to do here, but it seemed to stop.

David Fahrenthol of The Washington Post used old school methods to investigate Trump’s foundation which led to a major story about his impropriety there. Why was The Washington Post doing this practically alone?

Paul Lewis of The Guardian used video effectively to get beneath the traditionally vapid vox pops and ask the tough questions that surface the real issues affecting people who want Trump or don’t want Clinton. Again, why weren’t more journalists doing work like this?

Finally, why are we reading Newsweek’s expose on Trump’s illegal email policies and stories about the FBI’s support for Trump now? You discovered Trump’s strategy was the path to his corruption pretty late in the process.

It seems to me there are a few reasons. One is simply that this whole campaign is unprecedented in so many ways, and it’s just so easy to get caught up in the news cycle. You need to be competitive. And, of course, you need to survive, and easy traffic is (sometimes) easy money.

Unfortunately, in today’s media-savvy world that kind of hypocrisy is transparent to all and makes it impossible for you to challenge Trump on that basis.

As a result, you’re losing our trust, and that is much worse than losing your income. You can always find a way to make money, but not everyone forgives betrayal.

These are difficult times for you. Everyone knows that. And not everyone is in agreement about how important you are for a healthy society. You have to work harder than ever to make an impact.

“Were it left to me to decide whether we should have a government without newspapers, or newspapers without a government, I should not hesitate a moment to prefer the latter.” — Thomas Jefferson

And I don’t think it’s a coincidence that the steady decline of your role in society maps inversely to the rise of right wing zealotry around the world. It’s not about liberal vs conservative. It’s a lack of perspective of what matters, a simple sanity check that disappears when you lose focus.

Your efforts to modernize and invest in technology may have cost you the more robust research and investigations that those investments are intended to support. A new web site, better mobile app, video gear and teams of interactive software developers aren’t silver bullets. The packaging matters, but without the substance it’s an empty vessel.

In any business you are what you measure. If you measure for profit against display advertising then you are going to keep traveling down this death spiral. If you measure impact against output then you are putting yourself on a much more challenging but much more important path.

I agree with Jeff Jarvis when he said, “After this election, the news business needs to enter into a brutal post-mortem of its performance and value.”

Until you choose impact over eyeballs the powerful will continue to take advantage of you and therefore everyone you serve the same way Donald Trump has done this year.

Sensible voices on both the left and the right worry about that, and people in all corners of society are ready to help you. It’s never been clearer how much you matter to us all. You just need to activate us, and we’ll be there for you.

A new era of Journalism starts now. Let’s digest the lessons from this year and move on quickly. We’ve got work to do if Oscar Wilde is still going to be right in four years time:

“In America the President reigns for four years, and Journalism governs forever and ever.”

Using human editorial decisions to make a better algorithm

Machine learning tools can make people smarter. The thing that makes the magic happen is the data we feed it, the source of information the mathematics turns into insights.

It’s not just social platforms and retailers and cars that can benefit from machine learning. Anyone working in media can get smarter if they have the right tools at hand, too.

In the case of news orgs it’s the choices editors make implicitly and explicitly that provide the training data for the kind of machine that will help publishers make better decisions.

Kaleida has been tracking home page articles by leading publishers. CTO Graham Tackley has been developing systems for clustering similar stories together, capturing social media activity, monitoring how each publisher treats their stories, and rolling these and other inputs into a realtime picture of what matters right now in the media.

We have about 100k articles from the last few months and social signals and trending data for each one. Before even applying any kind of machine learning Graham has been discovering some surprising facts. Only about 5% of the articles promoted on publishers’ home pages earn over 2,000 engagements on Facebook. Articles about the US election perform equally well regardless of whether the headline is more about Trump or Clinton.


We take all this kind of information and run it through tools like IBM’s Watson APIs, Google’s entity extractor, the Aylien sentiment analysis API and Amazon’s Machine Learning web service, among others. We’ve been feeding it all into Elasticsearch which makes this much easier to do.

What have we learned?

Our initial research is designed to see what impact publishers’ editorial choices have on how well an article performs on social. So, we trained the algorithm to see those patterns first.

For example, it may seem obvious that promoting a story on your home page or on your branded social media page is a good idea, but machines can tell you just how much it matters. We can see that different words in the headline have a different effect for different publishers. Want to know what the ceiling looks like for a story assuming it doesn’t go viral? Want to know which topics out there have the most potential? Machines can answer all these questions, too.

Algorithms like this one can make predictions with surprising accuracy.


The machine predicted CNN’s piece “Mosul: Most intense day of fighting since offensive began” would earn 4,800 engagements on Facebook, and, in fact, it earned 4,500.

It was off in a few cases, too, of course. The machine accurately predicted Fox News’ story on “Millennials are clueless about socialism (call it the ‘Bernie Sanders effect’)” initially. It said it would earn 633 engagements. Just as the story appeared to die on Facebook at 611 engagements it took off again over the weekend and it now has double that. Most of the failures were lowball figures on stories that became very successful.

After this test we now have some ideas on things we can feed the machine to predict the potential for virality. But there are more interesting use cases than simply predicting the number of likes a story will get.

Algorithms can help identify better words to use in a headline or where to promote a story and for how long. It can provide guidance on more nuanced decisions, too, like who is the best writer to cover a topic or maybe what tone with which subjects will resonate with a particular publisher’s readers. It can probably decide whether a particular story helps convert readers into paying subscribers, too.

The trick is identifying the question people want answered using data people generate to get there normally. The machines just accelerate and amplify the little decisions we make often intuitively.

At worst algorithms can validate what publishers already know with hard evidence. At best they might help us all fix the media business.


Originally published at www.kaleida.com on October 23, 2016.

Why publishers are turning to art for answers this week


The world has become so complex that even Stephen Hawking is unsure what’s going on. We can’t explain what we don’t understand, and when words elude us we turn to art for answers. Several publishers this week are tapping into the collective concern people are feeling.

Buzzfeed News is covering Inktober (artists from all over the world make one ink drawing a day for the entire month of October) and Ohio artist Shawn Coss who’s gorgeous but haunting images of mental health disorders are getting tremendous response.

On a lighter note The Guardian is talking about newly discovered artwork by Finnish writer and artist Tove Jansson who is best known for the Moomins.

The Internet has gone a bit bonkers over an odd looking terracotta sculpture of baby Jesus.

But that’s nothing compared to the #TrumpBookReport meme. Antonio French kicked it off in a tweet where he compared Trump’s foreign policies to badly written teenage book reports:

Trump’s foreign policy answers sound like a book report from a teenager who hasn’t read the book. “Oh, the grapes! They had so much wrath!”

— Antonio French (@AntonioFrench) October 20, 2016

Now, whether or not it counts as art I’m not sure, but we love this bike lock that emits a horrible smell that will make a thief vomit if they cut the lock.

SkunkLock was crowdfuned on IndieGogo

What an imaginative way to use technology to course correct without violence. I hope we find similar antidotes to the threats posed by AI that Hawking sees ahead of us.


Originally published at www.kaleida.com on October 21, 2016.

Who won the coverage of the US presidential debate?


After tracking three debates Kaleida can show patterns in the way leading publishers are covering them. It goes like this:

Step one: Write a “heads up” piece on the day, maybe the day before. Tell people what to expect and entice them to come back for your live coverage or follow on analysis. There are probably relevant news events or research studies, including polls that say one candidate has to ‘step up their game’ or something like that.

WINNER: NBC News, “‘Wall’ of Taco Trucks Line Up at Trump’s Hotel in Protest“

Step two: Do something live. Whether it’s video or a liveblog or whatever make sure that you are in the game and competing for position on Google News, ready to break the story of the debate as soon as it happens, whatever it is.

WINNER: CNN Live

Step three: Get the story. There are a handful of types of stories that can be written in the first few hours following the debate:

Step four: Hear what ‘the people’ have to say. You can do vox pops and interviews in places mentioned in the debate or with people demographically targeted in the candidates’ statements. There’s always a tweet that goes viral, so pick that up, too.

WINNER: These are still coming in

Step five: Amplify the key stories. Produce more analysis and thought pieces that either capture the mood following the debate or dive into the issues raised and what the candidates’ positions actually mean.

WINNER: The likely candidates are Trump’s refusal to commit to accepting the election result, saying he would deport ‘Bad Hombres’, or Clinton’s and Trump’s views about the Second Amendment. Though not directly related to the debate it does appear Trump’s children are increasingly drawing fire, too.


Originally published at www.kaleida.com on October 20, 2016.

Here’s what happens if you change the URL of a story that’s going viral on Facebook


On Sunday afternoon FoxNews.com reported that a Republican Party headquarters in North Carolina was firebombed the night before.

Kaleida showed that their story was moving really fast on Facebook earning over 40 engagements per minute. The article had 18,000 engagements and climbing and then, suddenly, it fell off a cliff.

Data from Kaleida.com

Engagements hit zero at 5:45 am GMT (12:45 am Eastern Time) and instead of climbing at 40 engagements per minute and reaching for 20k or more in total, the article started from zero again and earned about 2 or 3 per minute for the next several hours.

Now it has about 5,000 engagements in total.

What happened?

At some point during this article’s life Fox News changed the URL. When it launched Sunday afternoon (8:36pm GMT/3:35 Eastern) the URL was:

http://www.foxnews.com/politics/2016/10/16/north-carolina-gop-headquarters-firebombed.html

The URL probably changed at 12:09 am Eastern Time and became the following:

http://www.foxnews.com/politics/2016/10/17/north-carolina-gop-headquarters-firebombed.html

Fox News changed the day in the URL, presumably for enhanced positioning in Google News.

Interestingly, Facebook knows the original URL as you can see from their debugger tools. But it appears that when the ‘canonical’ URL was changed they must have zero’d the engagement count. It’s also interesting to note that Facebook still recognized the old URL for about 30 to 40 minutes before changing the way they were dealing with it.

Why did the momentum crash so suddenly? Just because the URL changed shouldn’t mean people would share it any less, right?

We have to assume that it was moving quickly because the story was hot and people were sharing it on Facebook a lot. But when it suddenly registered zero engagements the Facebook algorithm must have reprioritized other stories in front of it.

The new URL meant that Facebook thought it was a new article with no engagement instead of the highly active article that was flying across their network only moments before.

All those shares coming from within the world of Facebook, particularly URLs being viewed via the mobile browser or as instant articles, disappeared behind the news feed algorithm.

The lesson here is to be careful about changing URLs, particularly the <link rel=”canonical”> tag in your HTML. It seems Facebook will consider it a new page and rescan for engagement counts on the new page. All your engagements will be lost.

This is a case where what works in search may actually do damage in social.


Originally published at www.kaleida.com on October 17, 2016.

Analysis: Publishers have overcooked the Kim Kardashian robbery story

Coverage analysis by Kaleida

Late Sunday night reports emerged that Kim Kardashian was robbed in Paris. Publishers were quick to cover it, and they have been publishing related stories for 3 days now.

Is the effort paying off? Not so much.

Kaleida shows that among the 12 sources we are tracking at the moment about 70 articles have been published covering the Kim Kardashian robbery. In total these articles have earned about 150,000 engagements on Facebook or 2k per article (mean average).

The BBC, for example, has published 7 related articles earning a combined total of 30,000 engagements. Their most successful story has less than 10,000 engagements. Here’s how some of the publishers are performing, so far, ranked by efficiency:

Buzzfeed News  4 stories, 24k engagements. Top story: 10k Avg: 6k 
NYT 2 stories, 11k engagements. Top story: 7k Avg: 5.5k
BBC 7 stories, 30k engagements. Top story: 10k Avg: 4.3k
CNN 6 stories, 22k engagements. Top story: 11k Avg: 3.6k
NBC News 6 stories, 19k engagements. Top story: 14k Avg: 3.2k
The Guardian 6 stories, 8.5k engagements. Top story: 8k Avg: 1.4k
The Telegraph 10 stories, 7k engagements. Top story: 2k Avg: 700
Fox News 8 stories, 1.4k engagements. Top story 1k Avg: 175

Celebrity news can open opportunities to raise issues that are core to your brand as a publisher, though there seem to be few examples of that. Most if not all of these articles seem to be placed to drive traffic, rank high in Google News and find younger readers through social. It would seem to be a story practically designed for social news channels.

Unfortunately, the low engagement numbers relative to the output doesn’t seem to justify the resource.

Just to illustrate the point let’s use an average cost per article of $500 which could include both cost of production and total cost of delivery, and then let’s use that figure to see what social efficiency looks like.

               Total cost  Cost per engagement 
Buzzfeed News $2,000 $0.08
NYT $1,000 $0.09
BBC $3,500 $0.12
CNN $3,000 $0.13
NBC News $3,000 $0.16
The Guardian $3,000 $0.36
The Telegraph $5,000 $0.71
Fox News $4,000 $2.86


It’s likely Buzzfeed has a lower cost per article than The New York Times, so we can safely say they win in this case.

Fox News, on the other hand, seems to have performed particularly poorly compared to other leading publishers on this story. They published more stories than necessary, and they are getting very little lift on social for the effort.

There are some caveats to mention, not least of which is the fact that we’re only including stories in this analysis that publishers have promoted on their web site home pages. Some of these publishers may have had a lot more success on Facebook than indicated here. They may have had success on other platforms including Twitter and Snapchat. They may have had video views that made it all worthwhile. And perhaps they are drawing stronger visitor figures to their websites than this analysis implies.

But comparing similar coverage from similar publishers we can certainly derive a few lessons.

First, Kim Kardashian may not be much of a draw for national news media sources on the web.

Second, publishers can easily overspend on a hot story and fail to get much value out of it.

The second point is the important one.

Data-informed editorial decisions become increasingly important as the story in question falls further from core brand values for a publisher. Otherwise, valuable editorial resources are getting wasted

Journalism as an industry can’t afford to waste resources on coverage that doesn’t matter.


Originally published at www.kaleida.com on October 5, 2016.

What Mary Meeker’s Internet Trends report means to publishers

In the early 2000s many companies buried their heads in the sand hoping the internet would go away. The retreat strategy in those days involved a reinvestment in analog-based business models. There was plenty of life there to do interesting things, and they got a chance to think through their digital transition after everything calmed down.

But the game has changed again. It is moving much faster than before, and failure to complete the previous transition from analog-to-digital means there are no safety nets to fall back on.

Getting through this storm is going to require something really different.

What’s exciting for those willing to think in terms of networks instead of destinations is that the arrows on Mary Meeker’s slide on Google and Facebook’s dominance in advertising are all pointing up and to the right.

Mary Meeker’s Internet Trends Report 2016

There’s a massive advertising market that keeps getting bigger and, as the newcomers keep demonstrating, there are new ways to capture pieces of that market appearing every day.

What’s stopping media orgs from joining up and doing something about this?

I’ve tried to explain what’s going on and describe some of the things the media industry can do in an article for The Guardian. I think the answer is to form a network.

http://www.theguardian.com/media-network/2016/jun/15/google-facebook-publishers-network-collaborate-trends-report

How algorithms are like puppies and why it’s useful to understand that

Photo by MickiTakesPictures

Originally published at www.theguardian.com on June 6, 2016.

Algorithms are often characterised as dark and scary robotic machines with no moral code. But when you open them up a little and look at their component parts, it becomes apparent how human-powered they are.

Last month, Google open sourced a tool that helps make sense of language. Giving computers the power to understand what people are saying is key to designing them to help us do things. In this case, Google’s technology exposes what role each word serves in a sentence.

The technical jargon for it is natural language processing (NLP). There is mathematics cooked into the tools, but knowing what sine and cosine mean is not a prerequisite to understanding how they work.

When you give Google’s tool or any NLP system some text, it uses what it has been told to look for to decipher what it is looking at. If the creators taught it parts of speech then it will find nouns, verbs, etc. If the creators taught it to look for people’s names then it will identify word pairs and match them against lists they were given by trainers.

The computer then processes the things it found and provides results. As the one asking for results, users have to decide things like whether to exclude words with relevance scores at certain thresholds or maybe to only match words in a whitelist they have provided.

Different tools provide different types of results. Maybe the tool is designed to look for negative or positive sentiment in the text. Maybe it’s designed to identify mentions of city streets. Maybe it’s designed to find all the articles in a large data set that are talking about the same subject.

Many startups today are using NLP to inform artificial intelligence systems that assist with everyday tasks such as x.ai’s calendar scheduler. Over time we are going to see more startups using these tools.

But there are many practical things that media companies can do with NLP today, too. They might want to customise emails or cluster lots of content. I can imagine publishers creating ad targeting segments on the fly using simple algorithms powered by NLP systems.

It’s worth noting that Google is actually late to the game, as many solutions already exist. IBM’s AlchemyAPI will look at text users supply it and then return data about relationships between people, places and things. There is an open source solution called OpenNLP from the Apache Foundation. Apache is also where you find Lucene, a popular search service used by companies such as Elasticsearch that can solve similar problems that NLP systems solve.

At their most basic level, these technologies essentially automate decisions at scale. They take a lot of information in; they work out what answers resolve certain kinds of questions based on what people teach them; and then they spit out an answer or lots of answers.

But every step of the way, people have told the computers what to do. It is people who provide the training data. It is people who instruct the algorithms to make the decisions they want made on their behalf. It is people who apply the results the algorithm returns.

These tools are particularly powerful when they are given the authority to make lots of decisions quickly that could never be done by hand. And that is also where problems emerge.

Lots of small errors in judgment can turn into an offensive or even threatening force. Sometimes adverse effects are infused accidentally. Sometimes they are not. And sometimes unwanted behaviour is really just an unintended consequence of automating decisions at scale.

Like any new technology we don’t yet have a clear model for understanding and challenging what people are doing with algorithms. The Tow Center’s recent book on algorithms dives into the issues and poses important questions about accountability.

How has the algorithm been tuned to benefit certain stakeholders? What biases are introduced through the data used to train them? Is it fair and just, or discriminatory?

It’s a great piece of research that begins to expose the implications of this increasingly influential force in the world that effectively amplifies commercial and government power. The key, according to the Tow report, is “to recognise that [algorithms] operate with biases like the rest of us”.

Algorithms aren’t monsters. Think of them more like puppies. They want to make you happy and try to respond to your instructions. The people who train them all have their own ideas of what good behaviour means. And what they learn when they are young has a profound effect on how they deal with people when they’re grown up.

As policy folks get their heads around what’s going on they are going to need some language to deal with it. Perhaps we already know how to talk about accountability and liability when it comes to algorithms.

California’s dog laws state that the owner of the dog is liable if it hurts someone. It reinforces that by saying the owner is liable even if they didn’t know their dog could or would hurt someone. Finally, being responsible for the dog’s behaviour also means the owner must do what a court decides which may include “removal of the animal or its destruction if necessary.”

A policy with similar foundations for algorithms would encourage developers to think carefully about what they are training their machines to do for people and to people.

Maybe it’s overkill. Maybe it’s not enough.

But let there be no confusion about who is creating these machines, teaching them what to do, and putting them to work for us. It is us. And it is our nature that is being reflected, multiplied and amplified through them. Tools like natural language processing are merely paint brushes used by the artists who dream them up.