Day 118: What we learned from 5 million books

How do we maximise our broader knowledge of the world? Reading has always been a popular tool, however most people read some things slowly, getting to know their particular content well while missing out on the bigger picture outside. What if you could search for words and ideas across a broad spectrum of all the books ever written? In this talk, Jean-Baptiste Michel and Erez Lieberman Aiden show how this can be done using Google Ngram.
Google have recently digitised 15 million books, or 12% of the total books ever written. When Google digitise a book, they put it into a usable format providing data as well as metadata. There are obviously huge legal implications involved when a book is released in full format. For this reason, Michel and Aiden decided to release statistics. They can search for specific words and phrases across all of the books, giving a table of two billion lines which provides a picture of how culture has been changing.
Using the NGram program to look up a word such as influenza will produce a graph by which one can gauge the points at which there was an epidemic. Very strong signals, where a name is mentioned in rapidly increasing or decreasing frequency, lead us to the conclusion that the person in question has been the subject of propaganda or suppression.
This study is termed culturomics- the application of massive scale data analysis to the study of human culture.
So who wants to use the n-gram program? Well, it turns out over a million people did on the first day. It can be used for all kinds of projects, ultimately transforming our understanding of our past and present.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s