As previously discussed, black is the color mentioned most often in The Lord of the Rings, and white is right behind it. But grey is #3. Take that, Edwin Muir!
I fed the list of X11 color names into a text-processing program and collected all the color mentions I could find. With one exception: “tan” is a part of so many English words that it would be unfair to expect a computer to pick out which words containing that trigram were colors and which were not, so I deleted it from the list. This is what came out.
Figure 1. Frequency of color mentions
There are ten colors mentioned more than ten times in the text. Their relative frequency is in the pie chart in Figure 1. Oddly, none of the top-notch Tolkien illustrators has used this palette. I wonder why.
The places colors are most-often found are sometimes surprising. The chapter in which black is mentioned most is “The Siege of Gondor”. White, “The King of the Golden Hall”. Grey, “The Great River”. Red, “The Tower of Cirith Ungol”. Green and brown are mentioned in “Treebeard” more than any other chapter. Blue, yellow, and gold are mentioned most in “In the House of Tom Bombadil”; sometimes the place is not surprising at all.
Silver is most mentioned in “Lothlorien”. That chapter is #3 for “gold” instead of #1, because when a character has a color in her name, that tends to skew the distribution. Gold and silver are strongly present in all three chapters involving Lorien, though.
If we make a vector out of the fraction of each color’s mentions that happen in each chapter, we can test which colors tend to form clusters in the narrative. The dendrogram is in Figure 2. (I’ve inflicted dendrograms on you before.) As we trace a line from one color to another, the further left we have to go, the less-related the colors are in their occurrence in the text.
Figure 2. Which colors go together in the text
But what do we do with all these measurements? With an Idiosopher’s well-trained eye for the most significant thematic content of a work, I zeroed in on the disagreement between Celeborn and Treebeard. “Yet they should not go too far up that stream, nor risk becoming entangled in the Forest of Fangorn,” said Celeborn. “Do not risk getting entangled in the woods of Laurelindorinan!” said Treebeard. What’s the subject of their disagreement?
Figure 2 gives us an insight: brown is used to describe Fangorn more than any other place. Gold and silver are dominant in Lothlorien. The two forests agree on green, but to get from brown to gold and silver, we have to go all the way to the left edge of the diagram. These are the furthest-apart pair of colors in the text. So here is our answer: the source of the ancient enmity between the two forests is interior decorating. When Galadriel sang the woods of Lothlorien into existence, she may have had an idea of the kind of forest she didn’t want, and Fangorn may have been it.
Coda: Boring Details
Sometimes a color word is also a noun. Olive dropped out of the analysis because it’s only mentioned twice, one of each. That was an easy one. I tried to separate mentions of gold and silver into the color and the metal, but quickly discovered any partition I could make would be arbitrary. Tolkien doesn’t clearly separate them. He rarely mentions the metals without the colors being important, so I left them all in.
The method: First, all the color words were pulled from the text. Then they were classified into a standard color-word. Usually that was straightforward. The exception was “scarlet”, which got absorbed into “red”. Then each instance of a color was collected into a histogram by chapter or whatever.
Instances of a color by chapter form a vector in a 62-dimensional space. Vectors were normalized so the elements of each color’s vector were the fraction of mentions that were in that chapter. The distance between two vectors was computed using the linear distance between elements. (This is not the Euclidean distance between unit vectors; I re-did the analysis with those and got similar results, but not as easy to interpret them in a way that made sense with respect to the text. Linear differences seem more relevant to text analysis, but it’s always good to check.) The vectors were clustered using the R hclust function with complete linkage.