Skip to content ↓

17/02/06

Archived and unpublished article on Google Print from last summer

The impetus for posting this article, which has lain unpublished for over six months, was Corey Doctorow’s article in BoingBoing
which proved that the story is not dead - and in some part gave me the confidence to post this. Read it.

The ongoing, and surprisingly vitriolic, reaction of publishers, agents and authors to Google and their Google Print project has been the most fascinating development in the book trade of recent weeks.

That it has come at a time when the wounded ‘content’ industries – most notably the film and music businesses –appear to have won some small victory against piracy, via the peer to peer file-sharing networks, is perhaps confusing, perhaps serendipitous. The recent MGM vs. Grokster case (a studio suing a software company for making file sharing software) has been the most fascinating development outside the book trade in recent weeks. But whilst these fascinations are almost as far as the similarities between the industries go, there is some value in making some comparisons –to highlight why publishers are in danger of making the same mistakes as the film, and music companies before them, and getting ‘digital’ wrong.

The Grokster case is notable in that, like several key cases before it, the underlying theme is not really about a piece of technology, but about studios blaming someone else for the financial free-falls of the music – and more recently, film – businesses since downloads became a familiar term.

As many have said before me, the real blame can be landed at the corporate giants who were very slow to adopt a distribution platform that the ‘kids’ – who must surely be the largest audience for both music and films – were clearly embracing. As iTunes has shown us, had the music industry provided a legal download service earlier, it is very likely that the illegal options would not have prevailed to the extent that they have. The words of Levi Strauss Jeans spring to mind, explaining after realising that they had been caught napping at the emergence of rival’s ‘baggy’ jeans that, “Baggy jeans are not a trend, they are a paradigm shift.” It took Levis many years of very expensive product R&D and advertising to catch back up with the seismic shift that this trend represented – and with probably the permanent loss of market share they enjoyed previously. The content industries are experiencing a very similar and equally seismic paradigm shift: the adoption and ubiquity of digital distribution networks,

Which brings me to Google, and publishing. Google is a very smart company. It may be, in ten years time, more powerful, valuable and reviled, than Microsoft. It is possible that it will be the most powerful company on the planet. “Google’s mission is to organize the world’s information and make it universally accessible and useful.” Let’s quickly run over their achievements.

Before Google emerged, searching the web was a gamble. Search engines were slow, random, overblown and ad-stuffed ‘portals’ that explicitly tried to distract you from what you were looking for: searching, or rather, finding, web pages. These distractions were scatter-gun ‘content’ in the form of news, sport, sex – and so on – in the hope that you would instead make them money by clicking on an advert. There was no distinction between a visitor looking for news and one looking for sex: they all saw the same ads.

Google cut through all that. Its home page was almost challengingly simple: a text box, and a go button. Even today, with the additional searches offered (images, products, news, etc) it retains that simplicity.

We all marvelled at Google – it magically found what we were looking for, pretty much all of the time. And it did this because it was smart. It’s method for ranking pages – called an ‘algorithm’ – was wonderfully simple: it was meritocratic.

All search engines ‘trawl’ the web, ‘indexing’ pages and following the links from one to another and building up a snapshot of how pages, and sites, and terms, are related. But Google counted each link to a page (e.g. click here for Bret Easton Ellis) as a vote for the target site for the term ‘Bret’ or ‘Easton’ or ‘Ellis’ or all. It then looked at the target page, and counted how often, in with what weight, the term ‘Bret Easton Ellis’ appeared on that page. And then it compared that to all other pages saying ‘Bret Easton Ellis’, counted them all up and the site with the most points came top of the list. Sites with more points in turn had weightier votes when it came to linking to other sites. Google’s original name was backrub – and the idea worked, and still works.

When Google started placing adverts on search results, you may have thought that would be the end of good results: that either people would pay for rankings, or the banner style ads that ruined the other sites would feature: in fact Google showed how smart and how different it was.

Their ads were text only – unobtrusive, quick to download, democratically uniform – and were displayed only when a search term was entered that the advertiser had nominated as a ‘key word’ for their ad. American Psycho, Less Than Zero, Bret, Brett, Easton, Easton-Ellis, Bateman, Patrick, and so on.

Finally, they only charged advertisers if someone clicked on their advert, and the amount was set by how many people had recently put the same terms into the search engine. This is so far removed from ‘old’ advertising, which believes that buying a poster or other media space in a position where some of the target audience may see it and then commit its message to memory is how to sell more products. Old media space is very expensive. Google’s model says that it will only show your ad to people you say you want to see it, and it will only charge you if they react. This is to my mind, a very good deal.

It’s also a very good deal for Google: they make all of their money from ads. And naturally they want to make more money. So they take the targeted adverts model and they take it further, and they do this aggressively. The key to their growth is to sustain their position as the best search engine, which means improving their algorithm, and increasing the amount of content they index. So as well as searching web pages, they take on more web content: the discussion groups, business and phone directories, images, catalogues, news, blogs, shops (‘froogle’).

And then, in book industry terms, they go for the jugular: they decide that what is missing is the ‘offline’ content – the content of books. At the London Book Fair in 2004 Google announced that they have agreements with some of the world’s greatest libraries to scan the contents of all of their millions of books: “the libraries of Harvard, Stanford, the University of Michigan, and the University of Oxford, and The New York Public Library [have allowed Google] to digitally scan books from their collections so that users worldwide can search them in Google

The mammoth scale of this task aside, I’m not sure that any copyright alarm bells started ringing with publishers at the time, possibly because it was presumed that the contents of the library were out of copyright – or they were too busy in the rights centre. But before we look at the copyright issues let’s project to what they may have been trying to achieve.

Because of how Google’s algorithm has evolved (and the lack of links in printed books) results ranking is most likely to be based on keyword density – the number of times the searched-for term appears on a page, or in a chapter. This density includes the priority of that term in the relevant book – ie. in its title, subtitle, chapter headings and so on. So a book by Bret Easton Ellis probably out-ranks one that mentions him in passing.

Whilst the thinking behind this makes sense – and the presumption that the content is ‘out of copyright’ softens the concept to publishers – the business gains for Google are harder to figure. Of course it has increased its range of content, but a user who has searched and found out of copyright information on the site is unlikely to click on any ad as they should (if the algorithm is up to speed) have found everything they are looking for.

So Google has not made any money from this transaction, which begs the question of whether the copyright library scanning project is a loss-leader? This remains unanswered until Google begins to announce a number of other similar print-based programmes, including the ‘Scholar’ (indexing academic texts) and most recently ‘Print’.

‘Print’ rolls the Library model out to contemporary publishers and offers the following service: if a publisher supplies Google with a finished copy of a book, it will scan and index it, and make the resulting page available to users who search for terms that make the page relevant. To avoid any infringement of copyright, it only displays restricted content – say the highest ranking page, plus the preceding and successive two pages only.

This snippet view is designed to help users find the book in their search results and make a decision about whether to go find a physical copy of the book with just bibliographic information and a few short sentences around their search query.

The user only sees five or six pages. This is currently changing so that users may only see ‘snippets’ or have to log in to view content. [UPDATE – this has now changed so that only snippets are shown; and snippets amount to the number of words allowed by 'fair usage' in existing copyright laws.]

In return, the normal Google book page listing results for ‘Bret Easton Ellis’ will feature a link saying ‘Search for Bret Easton Ellis in Books’. That arguably makes any other results redundant.

Using a recent search, the terms ‘snowblind’ on www.google.com brings back the US edition of Robert Sabbag’s Snowblind at the top of the list. Not surprising - Sabbag’s US publishers, Grove/Atlantic, have (like many US publishers) entered into the Google Print programme. Clicking through to the GooglePrint page for Snowblind brings back the first three pages, facsimiled from the book, and a list of e-tailers offering the book for sale. (Amazon.com, Barnes&Noble.com, Booksense, Froogle, Amazon.co.uk, Blackwells and WHSmith)

As an aside, tant pis for his UK publishers (Canongate) or anyone else selling the non-US edition: the etailers make links by ISBN – and hilariously the UK links are for import editions of the US version, despite UK editions being widely available and in print. As another aside, the price for the US version on Amazon.co.uk is £5.71, the UK edition £5.59. Let’s not go into Amazon and ‘authority’ here…

As an exercise, I put the above terms into the Google Ad programme to estimate the value of this in advertising terms. (results: it will cost £712.05 per day to get #1 ranking for Snowblind, snow, blind, Robert Sabbag) The same – if not better – results will be achieved by entering the Print programme. Which is free.

Further, and more specific, results can be found by entering the same terms in print.google.com, which will only search the content of books. There are some initial restrictions. A www.google.com search for ‘cocaine smuggling book’ will not bring back Google Print results – it seems to only bring back very explicit matches for the title of a book (‘Snowblind’ works, but ‘snowblind cocaine smuggling’ does not).

Now of course, perhaps the Google user (me) goes to the Print entry for Snowblind, searches for the smuggling tip he is looking for (aka the ‘Duplicate Bag Switch’), reads the relevant extracts, if possible from the snippets - and does not buy the book. Perhaps Grove has lost out on my $10 – but arguably the same thing would have happened had I gone to a bookstore to do my research.

Nonetheless, Grove’s signing up for the programme is a very smart move. The global audience looking for information (rather than a book) on Google far outweighs any single audience in a bookshop, newspaper, or other market. And the space is much cheaper (read: free). If a publisher were to ask my advice on how to boost sales for a title I would suggest either taking out targeted Google adwords (with the concomitant cost) or sending the book to Google for scanning. In my mind, engaging in the Google Print Programme is the single most cost-effective piece of marketing a publisher can do for their titles – period.

At least, I think this is true for the majority of non-fiction titles – I can’t see as much call for a result on ‘Dumbledore, Harry, Hogwarts’ bringing back much other than Harry Potter – but then again there is a sales value in that book (via Bloomsbury, or Scholastic’s official sites) coming up top of a Google search, as it would if it were in the Print programme. Better for Bloomsbury to direct traffic to their own e-commerce site than for customers to go to retailer, surely? Think again, Nigel Newton?

So whilst at the moment a ‘web’ search will only give a cursory comparison to results from print, over time and as the number of titles grows, this is bound to change. And what will that mean for the publishers who are – and aren’t – involved in the programme?

A lot of UK Publishers seem terrified that agreeing to get involved in this programme is akin to digitising their content and opening themselves up to the piracy that has plagued the music and other industries. The concern is that Google will convert their books into a format that can be ripped off and distributed via networks such as Grokster. And they could be right – but for a number of key factors:

Two devices – the iPod and the PC or laptop – made digitised versions of music and films simple and pleasurable to absorb. These devices were readily available, and more attractive than the original devices (CD player and cinema) and so the market adopted. The paperback book has evolved over 70 years at the last count and at £5.99 is basically perfect. We’re miles away from a credible, user-friendly and cheap ebook device. Books can make as much of a style statement as an iPod – and arguably more sore given the ubiquity of the iPod. [Update – we'll wait and see about the hype surrounding the new E-Ink Sony Reader device.]

Even if such a device existed, getting files pirated from Google onto it would probably be difficult. It’s unlikely that Google has digitised into PDF or other simple file formats: see below for more.

Google is very, very, very smart and very very ambitious. It has invested a huge amount of money into its Print Programme, identifying that this is where its growth lies, at least in part. Google’s revenue is based not on digitising the books, nor selling them, nor even on taking a cut of the sales made to visitors who click on the links. Their revenue is, still, based on the ads that appear next to the listing. And they share that with the copyright-holding publisher. Thus their business model depends on publishers getting involved in this. It is my belief that Google, being smart, will have identified that copyright infringement, and the risk of publishing going the way of music, will have been a major barrier to publisher’s entry – and that they will have addressed this through their technology, contracts and infrastructure. I imagine that they would be able to shut anything down immediately were there any security infringements.

Finally, and as a safety measure, if I were a publisher getting involved, I would write into my contract an immediate and total reversion clause that held Google accountable for any copyright infringement.

Tim O’Reilly, a US publisher, recently said that “Obscurity is a much bigger threat to most authors than piracy.” [Update, I was recently informed by someone who knows, that Tim was quoting Doctorow.] The ‘most’ is telling here. Of course Nigel Newton and Bloomsbury can afford to not include Harry Potter in the Google print campaign, and of course if there is a risk of copyright infringement then it is too much for a property as valuable as theirs.

But for the more obscure titles, then I believe that Google print is the saviour for that obscurity: if you have published (or even written) an obscure treatise on Scottish gravestone typographic heroes – and if you have, you may have been marginalised by the book chains, the literary editors, and even amazon may be claiming you are ‘hard to find’ – then if there is an audience for you book at all, it is likely that it is looking for your words on Google. And if you are in the Print Programme, that audience will find you – and if your book is what they want to read, they will buy it. Surely if this is the case then publishers, authors, and their agents must recognise that this is only very clever marketing, or very targeted advertising, and that they would be fools to think otherwise.

July 20, 2005

Update. Since I wrote, and neglected to publish this, things have obviously changed, including - in parts - my stance. First of all, as this was aimed for The Bookseller magazine, I asked people at Google to comment, and they didn’t.

I then began listening to the publishers who told me that what this amounted to was an incredibly arrogant move by Google which, although maybe not legally true, felt like it amounted to a hijacking of their crown jewels: copyright. Of course, that’s not true - but it was a breathtakingly arrogant move by Google to go for the ‘opt-in’ rather than ‘opt-out’ model.

Then we had the Frankfurt Book Fair, the whole lawsuits against Google, the PA debate, and all the other search engines doing the same - and the smart money being on Amazon who seem to have been able to do it all with publisher’s co-operation. Since then Google has softened its stance a little, renamed things a bit and tightened things up. But how much further they are remains unclear.

Posted by Peter Collingridge in Future of the book, Google print, Publishing.

Print on demand // Reading

  1. # Pingback by Times emit » Blog Archive » The Long Tail @ 3:30 pm, July 20, 2006:

    [...] So how do publishers leverage the Long Tail? Can they? Ironically, I think the answer lies with Google, an idea that I have mooted before, and which comes into sharper focus in LT economics. [...]

RSS feed for comments on this post. TrackBack URL

Leave a comment