Google’s New Database for Geeks like Me

by Rachel Baker on December 16, 2010

The New York Times today has an article about the Google release of a new book database.

The digital storehouse, which comprises words and short phrases as well as a year-by-year count of how often they appear, represents the first time a data set of this magnitude and searching tools are at the disposal of Ph.D.’s, middle school students and anyone else who likes to spend time in front of a small screen. It consists of the 500 billion words that are contained in books published between 1800 and 2000 in English, French, Spanish, German, Chinese, Russian and Hebrew.

The intended audience is scholarly, but a simple online tool also allows anyone with a computer to plug in a string of up to five words and see a graph that charts the phrase’s use over time …

To read the story click here: http://www.nytimes.com/2010/12/17/books/17words.html?hp

This database is apparently incredibly huge, but what you can do with it is sort of interesting.

If you go here: http://ngrams.googlelabs.com

and type in (to use the example discussed by the NY Times) women, men, you get a chart showing

that ‘women,’ in comparison with ‘men,’ is rarely mentioned until the early 1970s, when feminism gained a foothold. The two lines, moving in opposite directions, finally cross paths in about 1986.

Obviously, this could be a tool to justify cultural or social relevance between what is being written and what is going on in the world at the same time.

However, it may be also be an interesting tool for someone who is reading a specific genre and wants to see when specific topics are being written about the most and what else was happening in the world at this time.

As an example, I’m really into reading the short stories of Henry Rider Haggard and Edgar Rice Burroughs right now. Both of these authors wrote adventure stories about lost continents and civilizations. And while reading, I’ve had a nagging question about why these particular type stories were popular when they were published. This question led me to research about short stories published in penny magazines and pulp fiction.

Okay, so that’s the literary aspect, but what about the social and cultural aspects…what made people think the lost civilization and lost continent sub-genres so interesting?  I’ve not really gotten to the full answer to this, but when I saw the NY Times article today, I thought I’d look at the time-line where these two sub-genres and Africa converged … because hey, why not?

So, just to see what I’d get, I typed in the search box:

africa, lost civilizations, lost continents
and changed the dates to 1860 and 1965 (when Africa takes off on its own in literature with no convergence with these two other terms).

These three terms converge around 1895, then again 1915 through 1921, then again around 1932 – 1937, then again between the late 1950s and the very early 1960s.  Interestingly, Africa is not coupled with the other two terms until the late 1880s when Africa and lost continent converge.  Lost Civs are not written about until 1870 and Lost Continents not until 1880. Lost continent and lost civilization converge in the mid 1880s, but then separate until the mid 1890s.

Africa takes off on its own in the middle of the 1960s (coincidentally, this is when South Africa became a Republic) and then it sky-rockets far away from the lost civs and lost continents.  Now, in the 1950s there’s more written about lost continents than lost civs but there are more of both than of Africa.  When they converge again in the sixties, they stay that way until the early eighties when lost civs breaks away from the trend line and increases – leaving lost continents to stay pretty even keeled.

What does this mean? I don’t know… its just sort of interesting (to me at least).

Check out both links, type in your own keywords (remember to put a comma between them) and you know what?  Share the keyword strings here – I’m interested to know what others may be looking at.

FacebookTwitterGoogle+Amazon Wish ListEvernoteFlipboardInstapaperNewsVineSpringpadWordPressTypePad PostStumbleUponLiveJournalPocketRedditShare

Previous post:

Next post: