Monday, January 29, 2007

The data dimension

The network-centric way of looking at the web has been bugging me for a while, but I’m still struggling to come up with an alternative to this analysis. I suspect the answer is a data-centric view. The network is already pretty pervasive, so “ubiquity” isn’t news. Moore’s Law is driving sophisticated packet processing into the core of the network, so “smarts” isn’t quite it, either. What is most important about its current development is that reach and processing are being leveraged by complex new content: the data dimension.

I’ve argued that one can see the IT developments of the last 25 years as three waves of commoditization:

80’s: Commodity Computing
90’s: Commodity Communications
00’s: Commodity Data
The current view of networking is packet- and processor-centric. A packet-centric view buys into the end-to-end worldview, where the network is dumb, and all the interesting stuff happens outside it, on the edge. This is of course not the case; cf. in-transit virus scanning, large data caches (Akamai), in-network codec transcoding, and deep packet inspection. The processor-centric view adds this edge and core computing to the picture. It assumes that packets flow largely unchanged across lines of communication between nodes. What matters are the lines and the nodes, and not what flows over and between them.

This approach under-values the importance of massive, distributed, evolving data sets. I’m not saying that processors and packets are unimportant. The three decades of commoditization I list are cumulative: one needs commodity computing and commodity communications to get the real value of commodity data. But the novelty is in the data, and that’s where the hardest new questions of policy and governance will arise.

I haven’t thought through all the implications of the data-centric view. But to get past the old mental image of glowing tubes connecting pulsing processors [1], imagine yourself as a data file:

You start life, let’s say, on Alice’s computer. You’re small and lonely, and when you look out at the world, all you see is Alice’s computer.

Suddenly you feel a little bigger, and you can see another computer: Bob’s. Alice has sent a copy of you to Bob. Now you’re a little bigger.

In fact, if you paid attention, there was a blink when the world seemed to flash by. the world temporarily got bigger – the flicker of many router processors as you passed over the network from Alice to Bob. You got briefly bigger as copies of you were cached along the way.

After a while you find yourself in a much bigger world; Bob copied you up to YouTube, and computers flicker in and out of existence around you as Youtube users download and watch you. You don’t feel much bigger, because nobody’s copying you. However, occasionally you grow and start morphing, as someone takes your original self and mashes it into a new version.

Then your world changes again. You can now see many PCs; Charlie got hold of you, and posted a bittorrent tracker to you.

More parts of you start morphing and growing as people incorporate you into other data files. You still feel as if these new existences are part of you, but they’ve also started to change on their own. Each of them opens up windows onto new host machines.
If you can’t resist thinking in terms of nodes and links (and who can, in this age of network theory), imagine that the nodes are data sets, and the links are semantic and genealogical connections. The meaning is in these nodes and links – not those boring old fiber links and CPUs.

[1] Since I’m pre-occupied with mental models, I can’t resist pointing out how pervasive the “communications is stuff flowing in pipes” metaphor is. It’s not just poor Sen. Stevens; the term “data flow” occurs 1.2 million times on the web, according to Google. I also don’t think it’s a coincidence that a dominant, though increasingly discredited, folk theory of language meaning is the conduit metaphor.

No comments: