« History Weekend/Image of the Week | Main | Image of the Week: Davy's Death »
April 24, 2005
Amazon Analysis: "Text Stats" and more
Newsflash: Einstein is harder to read than Shakespeare. Tolstoy used more words per sentence than Mark Twain. “The Da Vinci Code” is suitable for readers with at least a 7th grade reading level. And the most common word in “The Education of Henry Adams” is: Adams. (It occurs 1072 times.)
This is just some of the strangely fascinating but not terribly meaningful information you can glean from the newest feature on Amazon.com. Some time ago the folks at Amazon began offering “Search Inside the Book,” making it possible to search the entire text of books whose publishers have allowed them to be scanned in. Now they are offering up some qualitative analysis on all that text that they have stored on their disks.
For each scanned book you can find a concordance—a list off the 100 most frequently used words in the text, excluding common words such as "of" and "it." The more a word is used, the bigger it appears. There is also a text analysis that spits out various ratings of the book’s readability and complexity, based on the average number of syllables per word and the average number of words per sentence.
Thus you can discover that according to the Flesch-Kincaid index, Shakespeare is suitable for anyone with a 4th grade reading level. This will undoubtedly come as a surprise to the countless high school students who have gotten bogged down in the bard’s work. But he was definitely a short-words-and-short-sentence guy, averaging about 9 words per sentence, with only 8% of his words at 3 syllables or more.
Einstein, on the other hand, averages 26.8 words per sentence, with 19% of the words being 3 syllables or more. (Too bad he insisted on using that five syllable word “relativity” so much). Thus the Flesch-Kincaid index suggests he is suitable reading for a junior in college. Hopefully a very smart junior.
The readability indexes make for interesting comparisons, as do such stats as words-per-dollar. “War and Peace” is a good value at 51,707 words per dollar, while Chris Van Allsburg’s “The Polar Express” has only 53 words per dollar. (What was he thinking, including all those pictures?). The concordances are less interesting, because they offer few surprises. The most common word in “The Adventures of Tom Sawyer,” for instance is “Tom.” Shocking!
Amazon also now spotlights what it calls Statistically Improbable Phrases, or SIPs. If a phrase pops up a large number of times in a particular book, it gets listed; click on it and you can find other books that use the same phrase. One of the SIPs for “The Da Vinci Code” is “sacred feminine,” used by Dan Brown 26 times. With a click we can learn that it is also used 50 times in the spiritual text “The Return of the Mother,” which is only about 300 thousand places lower than “The Da Vinci Code” in Amazon’s book rankings.
Ah, the rankings. Authors are a chronically insecure lot. They are even more likely to dip into Amazon to check their ranking as they are to dip into the sherry to ease their pain. I have one friend who logs on every hour to track the rise and fall of his book. (OK, that’s actually me.) The text analysis features have opened up new vistas for anxiety and fretting. Imagine writers nervously comparing their readability compares to Stephen King, desperately trying to cram in more words per dollar than Harry Potter, or emulating the SIPs used by Dave Barry (“garbage barge,” “creamed chipped beef,” and “pig parts” to name a few.
Where does my own book, The Greatest Stories Never Told score? According to the Flesch-Kincaid index, it is suitable for someone with a 9th grade reading level, making it harder than Mark Twain, but easier than Tolstoy. (Please don’t tell PBS, which has recommended the book to children in grades 3-8.) It offers only 2866 words per dollar, and sadly, has no Statistically Improbable Phrases. None at all. I am in no way insecure about that, but you can be darn sure that in my next book, "Pig Parts: The Story of Creamed Chipped Beef," I won’t make the same mistake.
Posted by rickbeyer at April 24, 2005 07:40 AM
Comments
Post a comment
Thanks for signing in, . Now you can comment. (sign out)
(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)