data – Page 2 – Matt McAlister

Data dynamics: How the rules of sharing are changing

Today it’s easy to store and share my pictures, my favorite URLs, my thoughts and lots of other things online. There are a range of data repositories that allow me to do this kind of thing in different ways.

What still needs work is how I give trusted services access to much more private data — things like my current location, my spending behavior, access to my friends and family, etc.

To date, most services follow the premise that the looser the controls, the more fluidly data will travel. And that’s all that mattered when it was still hard to get data flowing.

Data flow is no longer an issue. Perhaps data flow has actually become too easy now. And therein lies the problem.

Clearly, blogging, RSS and feed readers drove a lot of the early thinking about syndication. Blogging enabled people to post content in a publicly accessible data repository somewhere for anyone to pull out without any privacy or permissioning controls. The further your content then syndicated, the better.

Wikis and community sites like Slashdot created a slightly more complex read/write dynamic against the central content repository that lots of people could access together. The permissioning model was essentially hierarchical where controls were kept in the hands of a smaller community.

Then Flickr broke ground with a new approach. They applied a user-centric friends and family relationship model to permissioning access to personal photos. Flickr opened up what was once considered private data and defaulted it to a public read-only permission status. But each individual still has a great deal of control over the data he or she contributes.

Similarly, del.icio.us made it possible to store and publicly address what had previously been private data. The nice twist here was the easy-to-understand URLs that allowed machines to consume, interpret and redistribute data stored in del.icio.us.

Where services like Facebook and Wesabe are now breaking ground again is in identifying a security model around highly sensitive data. Contact lists are very personal, but there aren’t many data sets more personal than my purchases and spending patterns.

Neat things can happen when I give machines access to my data, both the things I explicitly ‘own’ and my implicit behaviors. I want machines to act on my behalf and make my data more useful to me in a range of different contexts.

For example, I like the fact that Facebook slurps up my Twitter activity and shares it with my friends in the Facebook network. I don’t want to change my ‘status’ on every service that shows status messages. Similarly, I like that Last.fm captures my listening behavior from iTunes and then uses that data to give back personal recommendations on a badge posted to my blog.

Allowing machines to automatically act on personal data on my bahalf is the right direction for things to go. But important questions need to be resolved.

For example, what happens to my data in all the places I’ve allowed it to appear when I change it? How do permissions pass from one service to another? How do I guarantee that a permission type I grant in one service means the same thing in another service? How do changes propagate? How does consent get revoked?

And even trickier than all that will be the methods for enforcing protection of privacy and penalties for breaking those permissions.

Until trust is measurable with explicit consentual triggers, loosely coupled networks that act on the data I wish to protect are going to struggle to talk to each other. Standards need to enable common sharing tactics. Responsibility needs to be clearly defined. And policies need to be enforceable.

Empowering a person to invest in storing and sharing the more sensitive data he or she owns is going to require a lot more than traditional read/write controls. But given the pace of change right now I suspect the answers will happen as the people behind these services work things out together before the industry taskforces, legal entities and blogosphere sort it out for them.

The business of network effects

The Internet platform business has some unique challenges. It’s very tempting to adopt known models to make sense of it, like the PC business, for example, and think of the Internet platform like an operating system.

The similarities are hard to deny, and who wouldn’t want to control the operating system of the Internet?

In 2005, Jason Kottke proposed a vision for the “WebOS” where users could control their experience with tools that leveraged a combination of local storage and a local server, networked services and rich clients.

“Applications developed for this hypothetical platform have some powerful advantages. Because they run in a Web browser, these applications are cross platform, just like Web apps such as Gmail, Basecamp, and Salesforce.com. You don’t need to be on a specific machine with a specific OS…you just need a browser + local Web server to access your favorite data and apps.”

Prior to that post, Nick Carr offered a view on the role of the browser that surely resonated with the OS perspective for the Internet:

“Forget the traditional user interface. The looming battle in the information technology business is over control of the utility interface…Control over the utility interface will provide an IT vendor with the kind of power that Microsoft has long held through its control of the PC user interface.”

He also responded later to Kottke’s vision saying that the reliance on local web and storage services on a user’s PC may be unnecessary:

“Your personal desktop, residing entirely on a distant server, will be easily accessible from any device wherever you go. Personal computing will have broken free of the personal computer.”

But the client layer is merely a piece of the much larger puzzle, in my opinon.

Dare Obasanjo more recently broke down the different ideas of what “Cloud OS” might mean:

“I think it is a good idea for people to have a clear idea of what they are talking about when they throw around terms like “cloud OS” or “cloud platform” so we don’t end up with another useless term like SOA which means a different thing to each person who talks about it. Below are the three main ideas people often identify as a “Web OS”, “cloud OS” or “cloud platform” and examples of companies executing on that vision.”

He defines them as follows:

WIMP Desktop Environment Implemented as a Rich Internet Application (The YouOS Strategy)
Platform for Building Web-based Applications (The Amazon Strategy)
Web-based Applications and APIs for Integrating with Them (The Google Strategy)

The OS metaphor has lots of powerful implications for business models, as we’ve seen on the PC. The operating system in a PC controls all the connections from the application user experience through the filesystem down through the computer hardware itself out to the interaction with peripheral services. Being the omniscient hub makes the operating system a very effective taxman for every service in the stack. And from there, the revenue streams become very easy to enable and enforce.

But the OS metaphor implies a command-and-control dynamic that doesn’t really work in a global network controlled only by protocols.

Internet software and media businesses don’t have an equivilent choke point. There’s no single processor or function or service that controls the Internet experience. There’s no one technology or one company that owns distribution.

There are lots of stacks that do have choke points on the Internet. And there are choke points that have tremendous value and leverage. Some are built purely and intentionally on top of a distribution point such as the iPod on iTunes, for example.

But no single distribution center touches all the points in any stack. The Internet business is fundamentally made of data vectors, not operational stacks.

Jeremy Zawodny shed light on this concept for me using building construction analogies.

He noted that my building contractor doesn’t exclusively buy Makita or DeWalt or Ryobi tools, though some tools make more sense in bundles. He buys the tool that is best for the job and what he needs.

My contractor doesn’t employ plumbers, roofers and electricians himself. Rather he maintains a network of favorite providers who will serve different needs on different jobs.

He provides value to me as an experienced distribution and aggregation point, but I am not exclusively tied to using him for everything I want to do with my house, either.

Similarly, the Internet market is a network of services. The trick to understanding what the business model looks like is figuring out how to open and connect services in ways that add value to the business.

In a precient viewpoint from 2002 about the Internet platform business, Tim O’Reilly explained why a company that has a large and valuable data store should open it up to the wider network:

“If they don’t ride the horse in the direction it’s going, it will run away from them. The companies that “grasp the nettle firmly” (as my English mother likes to say) will reap the benefits of greater control over their future than those who simply wait for events to overtake them.

There are a number of ways for a company to get benefits out of providing data to remote programmers:

Revenue. The brute force approach imposes costs both on the company whose data is being spidered and on the company doing the spidering. A simple API that makes the operation faster and more efficient is worth money. What’s more, it opens up whole new markets. Amazon-powered library catalogs anyone?

Branding. A company that provides data to remote programmers can request branding as a condition of the service.

Platform lock in. As Microsoft has demonstrated time and time again, a platform strategy beats an application strategy every time. Once you become part of the platform that other applications rely on, you are a key part of the computing infrastructure, and very difficult to dislodge. The companies that knowingly take their data assets and make them indispensable to developers will cement their role as a key part of the computing infrastructure.

Goodwill. Especially in the fast-moving high-tech industry, the “coolness” factor can make a huge difference both in attracting customers and in attracting the best staff.”

That doesn’t clearly translate into traditional business models necessarily, but if you look at key business breakthroughs in the past, the picture today becomes more clear.

The first breakthrough business model was based around page views. The domain created an Apple-like controlled container. Exposure to eyeballs was sold by the thousands per domain. All the software and content was owned and operated by the domain owner, except the user’s browser. All you needed was to get and keep eyeballs on your domain.
The second breakthrough business model emerged out of innovations in distribution. By building a powerful distribution center and direct connections with the user experience, advertising could be sold both where people began their online experiences and at the various independent domain stacks where they landed. Inventory beget spending beget redistribution beget inventory…it started to look a lot like network effects as it matured.
The third breakthrough business model seems to be a riff on its predecessors and looks less and less like an operating system. The next breakthrough is network effects.

Network effects happen when the value of the entire network increases with each node added to the network. The telephone is the classic example, where every telephone becomes more valuable with each new phone in the network.

This is in contrast to TVs which don’t care or even notice if more TVs plug in.

Recommendation engines are the ultimate network effect lubricator. The more people shop at Amazon, the better their recommendation engine gets…which, in turn, helps people buy more stuff at Amazon.

Network effects are built around unique and useful nodes with transparent and highly accessible connection points. Social networks are a good example because they use a person’s profile as a node and a person’s email address as a connection point.

Network effects can be built around other things like keyword-tagged URLs (del.icio.us), shared photos (flickr), songs played (last.fm), news items about locations (outside.in).

The contribution of each data point wherever that may happen makes the aggregate pool more valuable. And as long as there are obvious and open ways for those data points to talk to each other and other systems, then network effects are enabled.

Launching successful network effect businesses is no easy task. The value a participant can extract from the network must be higher than the cost of adding a node in the network. The network’s purpose and its output must be indespensible to the node creators.

Massively distributed network effects require some unique characteristics to form. Value not only has to build with each new node, but the value of each node needs to increase as it gets leveraged in other ways in the network.

For example, my email address has become an enabler around the Internet. Every site that requires a login is going to capture my email address. And as I build a relationship with those sites, my email address becomes increasingly important to me. Not only is having an email address adding value to the entire network of email addresses, but the value of my email address increases for me with each service that is able to leverage my investment in my email address.

Then the core services built around my email address start to increase in value, too.

For example, when I turned on my iPhone and discovered that my Yahoo! Address Book was automatically cooked right in without any manual importing, I suddenly realized that my Yahoo! Address Book has been a constant in my life ever since I got my first Yahoo! email address back in the ’90’s. I haven’t kept it current, but it has followed me from job to job in a way that Outlook has never been able to do.

My Yahoo! Address Book is becoming more and more valuable to me. And my iPhone is more compelling because of my investment in my email address and my address book.

Now, if the network was an operating system, there would be taxes to pay. Apple would have to pay a tax for accessing my address book, and I would have to pay a tax to keep my address book at Yahoo!. Nobody wins in that scenario.

User data needs to be open and accessible in meaningful ways, and revenue needs to be built as a result of the effects of having open data rather than as a margin-based cost-control business.

But Dare Obasanjo insightfully exposes the flaw in reducing openness around identity to individual control alone:

“One of the bitter truths about “Web 2.0” is that your data isn’t all that interesting, our data on the other hand is very interesting…A lot of “Web 2.0″ websites provide value to their users via wisdom of the crowds appproaches such as tagging or recommendations which are simply not possible with a single userâ€™s data set or with a small set of users.”

Clearly, one of the most successful revenue-driving opportunities in the networked economy is advertising. It makes sense that it would be since so many of the most powerful network effects are built on people’s profiles and their relationships with other people. No wonder advertisers can’t spend enough money online to reach their targets.

It will be interesting to see how some of the clever startups leveraging network effects such as Wesabe think about advertising.

Wesabe have built network effects around people’s spending behavior. As you track your finances and pull in your personal banking data, Wesabe makes loose connections between your transactions and other people who have made similar transactions. Each new person and each new transaction creates more value in the aggregate pool. You then discover other people who have advice about spending in ways that are highly relevant to you.

I’ve been a fan of Netflix for a long time now, but when Wesabe showed me that lots of Netflix customers were switching to Blockbuster, I had to investigate and before long decided to switch, too. Wesabe knew to advise me based on my purchasing behavior which is a much stronger indicator of my interests than my reading behavior.

Advertisers should be drooling at the prospects of reaching people on Wesabe. No doubt Netflix should encourage their loyal subscribers to use Wesabe, too.

The many explicit clues about my interests I leave around the Internet — my listening behavior at last.fm, my information needs I express in del.icio.us, my address book relationships, my purchasing behavior in Wesabe — are all incredibly fruitful data points that advertisers want access to.

And with managed distribution, a powerful ad platform could form around these explicit behaviors that can be loosely connected everywhere I go.

Netflix could automatically find me while I’m reading a movie review on a friend’s blog or even at The New York Times and offer me a discount to re-subscribe. I’m sure they would love to pay lots of money for an ad that was so precisely targeted.

That blogger and The New York Times would be happy share revenue back to the ad platform provider who enabled such precise targeting that resulted in higher payouts overall.

And I might actually come back to Netflix if I saw that ad. Who knows, I might even start paying more attention to ads if they started to find me rather than interrupt me.

This is why the Internet looks less and less like an operating system to me. Network effects look different to me in the way people participate in them and extract value from them, the way data and technologies connect to them, and the way markets and revenue streams build off of them.

Operating systems are about command-and-control distribution points, whereas network effects are about joining vectors to create leverage.

I know little about the mathematical nuances of chaos theory, but it offers some relevant philosophical approaches to understanding what network effects are about. Wikipedia addresses how chaos theory affects organizational development:

“Most of the focus on chaos theory is primarily rooted in the underlying patterns found in an otherwise chaotic enviornment, more specifically, concepts such as self-organization, bifurcation and self-similarity…

Self-organization, as opposed to natural or social selection, is a dynamic change within the organization where system changes are made by recalculating, re-inventing and modifying its structure in order to adapt, survive, grow and develop. Self-organization is the result of re-invention and creative adaptation due to the introduction of, or being in a constant state of, perturbed equilibrium.”

Yes, my PC is often in a state of ‘perturbed equilibrium’ but not because it wants to be.

How to fix building construction bureaucracy

Sometimes I forget to step outside of our little bubble here and see how people use or in fact don’t use the Internet. When I get that chance I often wonder if anything I’m doing in my career actually matters to anyone.

Usually, however, I’m reminded that even though the Internet isn’t weaved into every aspect of everything, it has great potential in places you might not consider.

For example, I’ve been remodelling my house to make room for a new little roommate due to be delivered in September. I’m trying to do most of the work myself or with help from friends and neighbors. I’m trying to save money, but I also really enjoy it. It’s a fantastic way to reconnect with the things that matter…food, shelter, love and life.

Well, I made the mistake of working without permits fully aware that I probably should have them. It’s my natural inclination to run around bureaucracy whenever possible.

As luck would have it, just as the pile of demolition debris on the sidewalk outside my house was at its worst, a building inspector happened to drive by on his way to another job. He asked to see my permit to which I replied, “The boss isn’t here. Can you come back later?”

The building inspector just laughed. After pleading a bit and failing, I started making calls to get drawings and to sort out the permits.

It was at this moment I realized how much building planning and construction could benefit from the advances made in the Internet market the last few years. The part of construction that people hate most is the one that is perhaps the most important. And it is this part that the Internet is incredibly well-suited to improve.

Admittedly, the permit process was not actually that painful and relatively cheap, too. I have spent in total maybe 1 day dealing with permits and drawings, so far, with a bit more to come, I’m sure.

But the desired effect of permitting jobs is sorely underserved by its process.

At the end of the day what you want is the highest building quality possible. You want builders using proven methods with at least semi-predictable outcomes. You want to make sure nobody gets hurt. And you want incentives for people to share expertise and information.

Rather than be a gatekeeper, the city needs to be an enabler.

One of the brochures I read called “How to Obtain a Permit” includes a whitelist of project types. I’m apparently allowed to put down carpets and hang things on my walls without a permit. Glad to know that.

Strangely, after explaining all the ways the city asserts itself into the process, on the very last page of the brochure it then says, “Remember, we are here to assist you. If you have any questions about your project, please give us a call!” I didn’t meet one person in the 6 queues I waded through the first morning who wanted to help me. They were mostly bored out of their brains.

Instead, the city should be putting that brainpower to work finding ways to lubricate conversation and collaboration around solving building problems. If the building community was in fact a community powered by thoughtful city-employed engineers, then I would be much more interested in working with them. I might even become dependent on them.

For example, if they helped me organize, store, print and even share my plans, then I’d be more than happy to let them keep my most current drawings, the actual plans I’m using to build with. If they could connect me to licensed contractors and certified service providers, I’d gladly give them my budget.

As it stands, my incentive is to avoid them and hide information whenever possible.

Imagine if I was able to submit a simple SketchUp plan to a construction service marketplace. I could then sit back and watch architects and interior designers bid for the planning work. My friends in the network could recommend contractors. Tools and parts suppliers could offer me discounts knowing exactly what I needed for the job. I could rate everything that happens and contribute to the reputation of any node in the ecosystem.

Imagine how much more value would be created in the home buying market if a potential buyer could see all this data on a house that was for sale. I might be able to sell my home for a higher price if my remodel was done using highly reputable providers. There would be a financial incentive for me to document everything and to get the right certifications on the work.

Imagine lenders knowing that I’m an excellent remodeller based on my reputation and sales track record. I might be able to negotiate better terms for a loan or even solicit competing bids for my mortgage on the next house I want to invest in.

At every step in the process, there is a role for the city government to add value and thus become more relevant. Then the more I contribute, the more it knows about what’s happening. The more it knows, the more effective it can be in driving better standards and improving safety and legislating where necessary.

My mind spins at the possibilities in such a world. Of course, when you have a hammer everything looks like a nail. But it seems to me that the building permit and inspection business is broken in exactly the places that the Internet is more than capable of fixing.

Crime data stories

My Potrero Hill neighbors tell me that the sweet song of crackling firearms in the evening always begins again in May as the days get longer, hotter and schoolless.

Recently, I witnessed a sample of the gun play happening in the nearby projects, and I decided to do some of my own research to understand what’s going on. The first thing I found was that I wasn’t the only witness to this particular incident:

“Two of the bullets hit our daughters bedroom– one went through the wall and crossed a small portion of the room and lodged in another wall near her sliding glass door.

[The Police] told us that based on the 24 bullet shells they found up the hill on Missouri St. near the public housing, there were two guns involved, one of which was an AK47 the other was probably a 9mm pistol. The police have no idea who was firing the guns and given that there are not witnesses, there is not likely to be any resolution to the incident. The officers were confident that the two bullets that hit the condo were random and not targeted at us.”

There are lots of factors behind violent neghborhoods, and the San Francisco projects are pretty densely representative of many of those factors. But it really irritates me that guns are so prevalent in the area, and, in general, so prevalent in America.

So, I started my journey at the old PotreroHillSF Crime Mashup which apparently doesn’t work any more. There is an ongoing “Police Blotter” on the site, though, with some good reporting.

I then found the official San Francisco Police Department Crime Map. Of course, the data is wrapped in their own heavy-handed user interface and unavailable in common shareable web data formats. The tool is burdened with legal trappings and strangely fails to acknowledge homicides, though they offer an explanation:

“A homicide may not appear correctly on the map because:

The incident was initially reported as an assault and the victim died some time later from the injuries.

The incident was reported as an arson, and the body was not found until a later time.

A body was found and the cause of death was not obvious to the officer making the incident report.”

I’m hoping that the City has more advanced reporting capabilities internally, as it seems pretty obvious that we have a data visualization failure going on here. I can see some data around assaults, robberies, larceny, vandalism, drug incidents, etc.

But the compelling visual storytelling is missing.

I want to know how many crime incidents in the projects this year involved guns. How many guns in these events are registered/unregistered? How many of the gun incidents were or became homicides vs non-gun related incidents? Where did the guns come from? What kinds are being used?

I suspect most guns aren’t registered which is an argument used by those who think a gun ban would be useless. People who want guns will find them, legal or not. But I also suspect that the victims aren’t carrying guns. Thus, the argument that people should have the right to own a gun to protect themselves isn’t a counterbalancing force. People who avoid violence won’t carry guns, legal or not.

As I progressed with this research I realized that somewhere in between raw data and overt campaigning is an interesting space. Data can help us learn and make more intelligent and informed decisions about how to manage and evolve our society and its rules.

Unfortunately, that space seems more difficult to find than it should be. I should be able to download data for myself or at least be able to visualize the stories behind the data in relevant pictures and charts.

Of course, there’s the fantastic ChicagoCrime.org web site which has done a lot to raise awareness about crime data. Despite the lack of available data from the local government, site owner Andrian Holovaty found a way to collect what he needed to make this site through an automated script:

“Each weekday, my computer program goes to the Chicago Police Department’s website and gathers all crimes reported in Chicago.”

The site has some great info (such as this screenshot of “Armed Robbery: Handgun Incidents”), though I still want to see an editorial lens on this data that puts a bit more meaning behind it.

For example, it only takes a glance to see in this series of Census images of San Francisco that the City is incredibly segregated, something I think many residents choose to ignore under the mask of open-mindedness. Even here, though, the story is incomplete without some intelligence wrapped around the data. What’s the trend? Is it becoming whiter? Where are people going who are leaving?

This same question punctures my happy place every time I exit onto Palo Alto’s University Avenue from Highway 101 and pass what is now a high end office park where one of the most dangerous areas in the country used to exist only a decade or so ago. I’m very pleased it’s a safer place, but do we understand the cost of that transition? Where did those people go? Are they better off?

Yahoo! colleague Micah Laaker pointed me to an interesting project he worked on back in 2002 and 2003 called the Denver Census Tract Animation Project. He worked with Citizen Mapmakers to trend movement of the African-American population in Denver from 1960 to 2000. Here’s a snapshot of their work:

I really like the way they visualized data to tell a story here. We need similar visualizations for crime data.

The InfoPlease “School Shootings” site gets closer to telling a story about guns just by focusing on a type of statistic and representing it. What a powerful domain name! However, the data here is still pretty raw and limited. This is hugely important information, but there’s an implicit argument here that should be made much more explicit with actionable information and analysis. In its current state it’s just telling us that there are a lot of school shootings (a surprising number in Europe, actually).

The Citizen Crime Watch site for New Orleans gets even closer to what I want to see. Similar to ChicagoCrime.org, they visualize with your standard data-on-a-map mashup, but the hover links point to coverage in the local media. I’m suddenly given a much more human window into the crime scene, and I can read about each event. For example, on April 9, 2007, there was a homicide in a trailer park:

“…Officers found Williams lying on the floor of the trailer with blunt-force trauma to her head. Emergency medical technicians declared her dead at the scene. An autopsy shows she had been beaten to death, said John Gagliano, chief investigator for the Orleans Parish coroner’s office.

The trailer is in a trailer park at 6801 Press Drive run by the Federal Emergency Management Agency. Although the trailer park is near the campus of Southern University, the chancellor, Victor Ukpolo, said neither faculty nor students live there.

The murder is being investigated by Detective Harold Wischan, who can be reached at (504) 658-5300.”

I’m very thankful for local reporting from sites like Nola.com, The Times Picayune, and community leaders such as Mike Lin of PotreroHillSF and the increasingly active Yahoo! Group Potrero Hill Parents Association who all help surface this kind of information, but it’s not enough. The City needs make it easier for its residents to both report on things that matter to us and to collect the data, filter it, and act on it.

People will always want greater access to information. This is particularly true in communities where poor decision-making creates mistrust:

“Under pressure from constituents who say New Orleans police stonewall requests for crime data, the City Councilâ€™s criminal justice subcommittee took police representatives to task Wednesday, calling for a faster, freer flow of public information…When asked for a written breakdown of policy and procedures relating to the release of public information, Maj. Michael Sauter, the head of technology, told the council most of that information was ‘not meant for the public.'”

Similarly, Rick Klau has begun experimenting with this kind of thing in response to the Magnetix toy recall incident. He calls it “Open source parenting” and observes that bottom-up community-driven politics is likely to be more successful than anything a politician can enable:

“If the government is under-staffed and under-funded to help parents avoid harmful toys, then why canâ€™t we help ourselves?…Give thousands of parents the tools to easily identify harmful products, leverage the communityâ€™s ability to provide visibility to legitimate threats while minimizing less serious risks, and quickly disseminate information that could be instrumental in avoiding a serious accident.”

I’m suddenly wondering what role politicans will play if communities are able to form solutions to issues locally, nationally and internationally on their own. Maybe instead of legislators (or merely professional campaigners/marketers), politicians will become community managers.

I also start wondering what politicians do all day if they can’t sort out ways to curb violence in our neighborhoods. I don’t see why anyone living in this country or any other should have to worry about whether their child will be shot accidentally in his or her bedroom by stray AK47 bullets or intentionally while at school.

I’m convinced the answer is in the data that is already being collected in various government crime databases. And I’m sure the answer is related to gun access.

Where is Tufte when you need him?

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Share this:

Share this:

Share this:

Share this: