Google’s secret effort to scan every book in the world, codenamed “Project Ocean,” began in earnest in 2002 when Larry Page and Marissa Mayer sat down in the office together with a 300-page book and a metronome. Page wanted to know how long it would take to scan more than a hundred-million books, so he started with one that was lying around. Using the metronome to keep a steady pace, he and Mayer paged through the book cover-to-cover. It took them 40 minutes.
With that 40-minute number in mind, Page approached the University of Michigan, his alma mater and a world leader in book scanning, to find out what the state of the art in mass digitization looked like. Michigan told him that at the current pace, digitizing their entire collection—7 million volumes—was going to take about a thousand years. Page, who’d by now given the problem some thought, replied that he thought Google could do it in six.
An absolutely fascinating dive into the history of Project Ocean, covering how it started at Google, how Google scanned the books (camera arrays, clever algorithms and human page turners), and the years-long legal wrangle between Google, the Authors Guild and the DOJ.
It’s there. The books are there. People have been trying to build a library like this for ages—to do so, they’ve said, would be to erect one of the great humanitarian artifacts of all time—and here we’ve done the work to make it real and we were about to give it to the world and now, instead, it’s 50 or 60 petabytes on disk, and the only people who can see it are half a dozen engineers on the project who happen to have access because they’re the ones responsible for locking it up.
Interestingly Page later opined during a Q&A that maybe it would be a good idea to “set aside a part of the world” to try out some “exciting things you could do that are illegal or not allowed by regulation.” He was roundly criticised for being an annoying, out-of-touch billionaire at the time, but perhaps he was just being wistful.