Books are great! Not only can you read them, but you can make computers read them, too. Once upon a time, my brother made a computer read a book and shared this result:
Quick side note: this post will probably not work very well on a mobile device. Sorry.
To make the image, I made the computer write every word in the book along the top and side of a table. Then it colored each cell black if the words in the top and the side are the same. For example:
However, some books are very, very long. War and Peace is about half a million words and an image that was half a million by half a million pixels big would be bigger than my screen. So instead of counting each word, the computer divides the book up into sections. It compares each section and counts the number of matching words. The total count is then used to color a pixel some shade of grey where white means no common words, and black means all the words are the same.
Also, I’ve made some other adjustments:
- Since each section of a book is the same as itself, I turn that middle diagonal line white so it’s not distracting.
- Words like “the” and “a” are removed because they are so common. In fact, I remove all words that make up more than 1% of the total.
Alright, want to see some books and if these charts are useful or not? All of the text for these books came from Project Gutenberg.
Persuasion, by Jane Austen
I’ve not yet been convinced to read Persuasion. Perhaps if the title were more enticing. Anyway, take a look at this. Click or shift+click around, drag the sliders and see the text at different sections.
It’s fairly uniformly gray. There’s that darker square near the end, but since I haven’t read the book, I couldn’t tell you why.
I made another variation of the chart by comparing word pairs rather than single words. For instance take
prided himself on remaining single
and turn it into
Here’s what Persuasion looks like with that method:
Seems like it just filters out more stuff.
I’m still not quite sure what these charts show other than similarity. It could be writing style similarity. It could be topical similarity. It could be diction similarity. Probably just a combination. Here’s what Ben Franklin’s autobiography looks like:
It’s overall lighter than Persuasion, which I guess means it has more variety? But also, there’s right in the middle of the book that is unlike any other part of the book! That section begins with him talking about a blacksmith making an ax. Again, I haven’t read this book, so I don’t know why that part is different.
I’ve wondered if these kinds of heatmaps can help you identify authors. Do isolated squares indicate different authors? To test this, here’s a look at Pride and Prejudice and Persuasion, both written by Jane Austen:
And again using the word pair method, which shows .
Now let’s add another contemporary author’s book and see how different it is from the other two. Here’s a compilation of Pride and Prejudice, Persuasion and The Count of Monte Cristo (finally, a book I’ve read!):
I thought I could see where each book began, but I accidentally thought the first part of Monte Cristo was Persuasion. Here’s where they .
Pride and Prejudice is very much like Persuasion, which are both very different from the The Count of Monte Cristo. And both Austen books are very much more like themselves than Monte Cristo is like itself.
Here’s The Count of Monte Cristo by itself:
That is the moment Dante finds the treasure. It’s a moment that divides the plot in two; how fun that it’s a moment unlike most others in the story.
The Bible is a book written by various people. How does it look? Here’s the King James version:
The black square right in the middle is Psalms. It’s interesting to see Deuteronomy’s echo later in the book. The Gospels (Matthew, Mark, Luke and John) are very similar to each other, as are Paul’s epistles.
The Bible is pretty evenly similar to itself—that is, not very similar for the most part. Not what I would have guessed. I was expecting more distinct squares. Perhaps it was evened out because it was translated into English by a smallish group in a shortish time?
Book of Mormon
Anyway, back to the image from the beginning, which is also a book full of books: the Book of Mormon. Here it is again, but interactive:
This book looks different than all the others I’ve looked at.
Definitely the most unique section is the in Jacob. Here’s the in 2 Nephi. Here is Jesus Christ’s .
Here it is again with the word pair method:
So what does this mean? Does this prove the Book of Mormon is true? By “true” I mean “written by ancient prophets and translated by Joseph Smith.”
Nah. It’s just interesting. A computer’s not going to be able to tell you if it’s true or not.
If you’re not familiar with the book, in brief: it’s a record kind of like the Bible, but written in the American continent rather than around the Middle East. The book was assembled on golden plates by a prophet named Mormon (hence the title), buried in the ground by his son, Moroni, then given by Moroni (now an angel) to Joseph Smith in the 1800s who then translated it “by the gift and power of God” as he says.
Plenty of people dispute Joseph Smith’s story. I can understand why—it does sound a bit outlandish. There have been attempts (and will continue to be attempts) to prove and disprove its authenticity. While interesting, to me, the proof is in the pudding.
The pudding, in this case, is what the Book of Mormon teaches. I’m a better person for reading the book.
If you hang around members of the Church of Jesus Christ of Latter-day Saints long enough, they’ll eventually mention a promise Moroni wrote down near the end. Moroni promises that you can know whether the Book of Mormon is true by reading it, then asking God “in the name of Christ, if these things are not true … with a sincere heart, with real intent, having faith in Christ.” I believe Moroni’s promise is true, and I have tested it and feel that the promise has been proved to me personally.
There’s another thing Moroni suggests right before the promise, and it has been one of the most meaningful things I’ve ever done when reading the scriptures:
when ye shall read these things, if it be wisdom in God that ye should read them, that ye would remember how merciful the Lord hath been unto the children of men, from the creation of Adam even down until the time that ye shall receive these things, and ponder it in your hearts.
Remember how merciful the Lord has been throughout all time. Throughout your life. Take some quiet time to ponder.
It was fun to recreate my brother’s original image. Any other ideas for things you can make computers do to books?
Here’s the code used to make these images and here are a few more books: