The Long Tail Paradox - You don't need a tail

August 09, 2006 at 09:15 AM | categories: python, oldblog | View Comments

This is something I've been itching to write for a while, and I'm not sure I've got my analogies and descriptions right yet, but the gist is right more or less right (perhaps less than I'd like), the long tail sucks, viva le long tail...

The long tail - or rather Zipf's Law - has been getting a lot of publicity recently, which is nice in a way, but also suprising in others. I first came across the long tail 8 years ago when I started work at the Janet Web Cache Service in a variety of papers from the 3rd web caching workshop (which had been held in Manchester just before I started working there). The one thing that struck me was the fact that Zipf's law is a fundamental aspect of human behaviour (as fundamental as fire).

As a result, when people started putting catalogues on the internet people, whole new communities - business communities - started to see the effects of the long tail of human behaviour on their profits. Intuitively we all understand the long tail - we know that we want things that suit us, that match our desires.

However the reason why "hits" and "hit culture" took off is simple to explain - we all want quality. We don't want to choose from 20,000,000 things - this for me is the nightmare of the record store (online or real). Ask me what music I like and I'll say "good stuff". Don't ask me if it's garage or rap or indie, I don't know. I don't care about that subdivision, that label. I like cool stuff. I can point at stuff by Queen, stuff by Rob Dougan, Pictures at an Exhibition, the theme tune from twin peaks, Ernie the Milkman, Lilly the Pink, and so on. I don't classify my likes. As a result music stores suck for me (of almost all kinds).

Expanding that choice by even double, let alone 10 fold or 20 fold, leaves me cold. I'm aware that this makes it more likely that it is _possible_ for me to find something I'll like, but my ability to find it decreases as that choice size increases. This is Mooer's law in action. The usefulness of the system to me decreases as the amount of information increases.

I'm not alone. (I might be one datapoint, but I'm not arrogant enough to assume on a planet with 6 billion people to believe I'm that different from other people)

That's why top tens are good. It's why if I'm going on a long flight at an airport, I'm pleased that they have a top ten (or a variety of top tens). It's why I'm pleased that shops tend to operate on a principle of "survival of the fittest". If a book is good, it's likely to stay on the shelves (through restocking). If it's not, it's likely to disappear or get covered in dust. The smaller the shelf space, the more important those decisions become. Too much dust and you go out of business. Sure, the books available are less likely to be a good fit for me, but they're also more likely to be closer to the head than the tail (The tail being where the likelihood of it sucking for me increases).

This for me is the real issue. Why do you see zipf distributions? Because by and large the values as to what is good are shared by many people, we do tend to have similar likes on some levels to other people. That is why places like Amazon are particularly good, they don't just operate a long tail - every online bookstore does that - they allow people to gain insight into what's going on in that long tail. Similarly, Google News has the ability to look at what thousands of journalists worldwide have chosen to write about and chosen to publish. They then allow you to look at the head of that snake, by time, date, a search or a combination of all of these.

These services make the head & body of the long tail visible, which in essence is what we all want anyway, though personal to us.
The long tail exists because our tastes all subtley differ, and we all want quality. What you think is terrible, I might think is great. I still remember seeing Spawn at the cinema, how much fun I found it, and how good a film I find it, and yet, I'm still to find another person who agrees. The phrase "so bad it's good" is a cliche, and with good reason. If I say "The Matrix" however, you find lots of people agreeing that think its a cool film.

As a result, the head of the snake is useful. The head of the snake is a means of navigating yourself to content that lots of other people who may share similar tastes to you think is good for some reason. If you make the place to choose from attractive to a wide audience who choose from the wide variety of content, then the head of that snake will be attractive to that wide audience. And that's why hit culture took off. As long as everyone was choosing from the same pot and the reporting on that pot was accurate, then the top 10, top 40, top 100 was useful. That's why the top 40 in your local supermarket might be more relevant to you than a general top 40.

The real interesting aspect of things like recommendation engines is that they're personalising this snake. They're turning the snake into a hydra, and each head is a real user.

However, the interesting point is this: caching makes sense. Caching to be effective has to identify the head of the snake. By identifying the head of the snake, but still making available the tail the cache is useful, but provides a time benefit to the user and cost benefit to the provider with regard to the content. What does this mean in the context of a long tail? It means that small stores can exist, and can stock a wide variety of useful content, and can even use simple heuristics to make money. This is essentially what web caching does after all.

And why does caching make sense? It identifies the head & body of the snake, allowing you take advantage of the fact that the head and body have equal business or bandwidth value as the entirety of the tail, which is a choice set, many, many, many, many, times larger.

Now, I'm not a business person (by choice), but I'm savvy enough to realise this: if a web cache (fixed amount of choice of storage) can cope with the vagaries of an effectively infinite choice zipf distribution, and still turn a profit (ie be worth running), surely the same can be true of a business. You don't have to say "we'll stock everything", merely being able to get everything, and be able to serve the high quality stuff (as chosen by that audience) is sufficient. Furthermore, it's entirely likely that given a sufficiently "good" recommendation engine, that the amount you stock can be kept small.

The paradox of the long tail is this: you don't need a tail to take advantage of it, a virtual tail is sufficient - as long as you're willing to change your body and head to match the whims and desires of those choosing. If you can provide insight into that long tail, and shift content into a local store - and turn that tail into a body & head, then you increase the value of your proposition to the audience, and they will move your store along the long tail of online stores further towards the head, than the tail of online stores.

After all, if you could go into a store on the high street and say "give me something cool to listen to", and they did, and every time you went there not only did they give you something cool, but it got cheaper with time, surely you'd go back? You'd stop caring about the size of the tail, as long as you could get at it.

blog comments powered by Disqus