A research team at the Oxford English Dictionary has released a visualization engine for text analysis. This is fun: give it a text (up to 500 words, for the moment) and it will make a graph showing how common the word is in English (vertical axis), the year the word entered the English language (horizontal axis), the frequency of each word in the sample (size of the circle), and the language group from which we got the word (color).
This can be used for lots of things. We can test (for example) J.R.R. Tolkien’s success at excluding any word from later than 1600 from his prose.
Me, I wanted to go back to something that bothered me when I was a teenager. The first description of Minas Tirith, seen from a distance, sounded weird to me.
For the fashion of Minas Tirith was such that it was built on seven levels, each delved into the hill, and about each was set a wall, and in each wall was a gate. But the gates were not set in a line: the Great Gate in the City Wall was at the east point of the circuit, but the next faced half south, and the third half north, and so to and fro upwards; so that the paved way that climbed towards the Citadel turned first this way and then that across the face of the hill. And each time that it passed the line of the Great Gate it went through an arched tunnel, piercing a vast pier of rock whose huge out-thrust bulk divided in two all the circles of the City save the first. For partly in the primeval shaping of the hill, partly by the mighty craft and labour of old, there stood up from the rear of the wide court behind the Gate a towering bastion of stone, its edge sharp as a ship-keel facing east. Up it rose, even to the level of the topmost circle, and there was crowned by a battlement; so that those in the Citadel might, like mariners in a mountainous ship, look from its peak sheer down upon the Gate seven hundred feet below. The entrance to the Citadel also looked eastward, but was delved in the heart of the rock; thence a long lamp-lit slope ran up to the seventh gate. Thus men reached at last the High Court, and the Place of the Fountain before the feet of the White Tower: tall and shapely, fifty fathoms from its base to the pinnacle, where the banner of the Stewards floated a thousand feet above the plain.
LotR, V,i
Here’s what that looks like in the visualizer. Huge cluster of blue and green for English and other Germanic languages. The thing that struck teenaged me, though I didn’t know it at the time, was all that red. This paragraph is loaded with French words, from “fashion” at the beginning to “plain” at the end. 30 out of 300.
For comparison, here’s the first description of Edoras.
‘I see a white stream that comes down from the snows’, he said. ‘Where it issues from the shadow of the vale a green hill rises upon the east. A dike and mighty wall and thorny fence encircle it. Within there rise the roofs of houses; and in the midst, set upon a green terrace, there stands aloft a great hall of Men. And it seems to my eyes that it is thatched with gold. The light of it shines far over the land. Golden, too, are the posts of its doors. There men in bright mail stand; but all else within the courts are yet asleep.’
‘Edoras those courts are called’, said Gandalf, ‘and Meduseld is that golden hall. There dwells Theoden son of Thengel, King of the Mark of Rohan. We are come with the rising of the day. Now the road lies plain to see before us. But we must ride more warily; for war is abroad, and the Rohirrim, the Horse-lords, do not sleep, even if it seem so from afar.
Draw no weapon, speak no haughty word, I counsel you all, until we are come before Theoden’s seat.’LotR, III,vi
Crunched and visualized, Edoras looks like this:
A bare smattering of French words (10 out of 200). All the words from before 1600, with two exceptions. One of those yellow others is “Rohan”, which the OED thinks is Sanskrit (and I’m sure it is). We’ll let that slide. The other is “afar”, which is listed as Cushitic. I’m not sure I believe that — it sounds like the Old English prefix “a-” stuck to the Old English-derived “far”. This descriptive passage passes Tolkien’s constraint test easily.
In conclusion, my old suspicion has been quantified: Gondor is 10% French. Tolkien may have been using French words to designate social hierarchy, which Gondor has in bucket-loads. I suspect a lot more French words will appear in Gondor once we can process more than 500 words at a time. We’ll see if the OED research team lets us do that before my Signum classmate James Tauber releases the same capability open source.
Tip of the hat to Thijs Porck for letting us know about this via Twitter.
Cerberus
Most interesting. Could it be that Tolkien used more French and Latin words to describe more urban environments? For cities are associated with complexity, science, and refined techniques.
In fact, such words may have come to him naturally, as many scientific and technical words reached England only with the French invasion, or only came to be used more commonly after that point. I am thinking of such words as city, citadel, court, line, plain, circuit, circle, divide, point, but also various French or Latin words ultimately of Germanic origin, such as bastion, battlement, tunnel.
Joe
“French” is also in that last category.
Cerberus
Quite frankly, you are correct.
LeesMyth
Cool – I just ran this on some descriptions of Orthanc (from http://www.henneth-annun.net/places_view.cfm?plid=87)
Joe
If you think it made a mistake, you can download all the results as a spreadsheet and fix them. Nice to see humility from software guys.