Reading by the Numbers: When Big Data Meets Literature

Oct 30, 2017 · 33 comments
David (Ann Arbor, MI)
Bouvard and Pécuchet do literary criticism.
David B. Benson (southeastern Washington state)
But we will want well-read robots.
rad6016 (Indian Wells)
The whole exercise sounds like a literary chapter in Tom Wolfe's "The Painted Word'". Artistic oneupmanship masking an absence of real ideas,
Steve Beck (Middlebury, VT)
I give up. Timothy Snyder's Lesson # 9 in his 20 Lessons on Tryanny was to divorce yourself from the internet and READ MORE BOOKS. My take? Computers ruin everything.
Bob S (New Jersey)
Mr. Moretti wrote “Reading is one of life’s greatest pleasures,” which we “would be insane” to give up, he said. “But the question is whether reading and knowledge are continuous with each other.” It appears that Mr. Moretti left out that reading as a pleasure many times allows for thinking that produces knowledge of value. So far Mr. Moretti has failed to show that his idea of using "distant reading" has created any knowledge of value. It appears so far that "distant reading" and knowledge are not continuous with each other.”
Jane Eyrehead (California)
The Times has written about Moretti before--the reporter asked Harold Bloom to comment, and HB's shudder was palpable. Technology can be an aid in the serious study of literature--another writer commented on the excellent Princeton Dante digital project--but don't be surprised when this sort of endeavor is used as a way to sidestep reading literature.
Doug (NJ)
I find it interesting the number of comments that confuse reading with analysis and large language usage trend data with individual writing style. I give Mr. Moretti a big thumbs up for placing a stake in the ground. Perhaps someone will pick up the challenge and try to extend it.
Blunt (NY)
I have had the hardest time to understand Moretti managed to paint a portrait of a scholar and literary critic over the years to deserve tenured positions at Columbia and Stanford. His so called analysis of data to arrive at trivial conclusions, for example, as depicted in his "Atlas of the European Novel 1800-1900" about the class society of England by counting the number of times different parts of London or English countryside is mentioned before and after a marriage takes place in an Austen novel. Pointless and inane come to mind. Moretti is a well read person. There is no question about that. The question is why he bothers doing what he does instead of being a Literature professor and critic that imparts knowledge to his students and readers rather than the charlatan's nonsense he produces that makes one see even the now discredited Structuralist and deconstructionist tomfoolery as relatively more useful.
Bob S (New Jersey)
“distant reading” would be very valuable for analysis of the large number of the lines of Shakespeare that are typed by chimpanzees.
Mike S. (Monterey, CA)
Since the 1970s when I first started working with computers, I have seen so many various situations where people apparently operate according to the maxim, "it come from the computer, so it must be right." While it is true that most of the time computer error is due to the work of fallible humans, computers cannot operate without humans. So, in addition to being tools for doing calculations very rapidly, computers are also tools for making mistakes very rapidly. Perhaps some day we will invent true AI, but thus far even the smartest computer system is dependent on humans for building analysis algorithms, collecting information to feed to the computer algorithms, and interpreting the results of the automated analysis. On the other hand, one of my favourite science fiction stories is one by Isaac Asimov on a computer that was used to determine what the source of humor, and the consequences of knowing the answer to that question.
Bob S (New Jersey)
Doug NJ Search is not the same as analysis. "they found, through automated analysis" I have worked with computers for over 30 years and there is no such thing as "automated analysis". Thinking is required for analysis and computers do not think. Software can be built to search for given symbols in text. Software can search for the word "apple" in computer files but to the computer this is simple a search for the symbols apple. I am always amazed at so many people that can not understand that computers are machines and that machines do not do think.
Bob S (New Jersey)
For literature it would be nice if the Oxford English Dictionary would be available as a computer program where it was not necessary to access the Oxford English Dictionary using a web site on the internet. This would allow researchers to do intensive searching. At some point individuals will understand that users can not do intensive searches on information that is contained in web sites since 50 users doing intensive searching might bring down the web site. I sometimes have to roll my eyes when I think about academia and computers. Colleges want to do on line teaching when instead any amount of information regarding any academic subject can be made available in an inexpensive computer that does not require the internet. At some point in time academia might see that this will be the real revolution in education.
Michael Paine (Marysville, CA)
It use to be, I have on my computer that last version, I find it so much better, needless to say than the two vol. Problem is, this is OED 2, all subsequent additions and changes on now on-line, and must be subscribed to. Sob!!
ss (los gatos)
The digital humanities projects I have seen provide excellent starting points for real reading and research--not end points. Unrelated question: did the proliferation of "ands" in banking texts coincide with the annoying trend a couple of decades ago for academics in the humanities to drop the "ands" lists of nouns? Could there be an economy of "and" in which the conjunction migrates to find where it is welcome?
Al Maki (Victoria)
The only thing I've read of his is "The Bourgeois" but my understanding is that he is not saying one should not read novels, but rather if you want to understand the form you need to look at lots of them - thousands of them - and see what they all have in common and how this changes across time. He wrote "Novels are the fossils of extinct social relations." To understand those relations, you need to look at more than just a few extraordinary peaks.
Bob S (New Jersey)
I am a software developer. I created a computer program that can collect tens of thousands of text files and can search for text in the collected files. Searching can be called "distant reading" but for analysis it still requires a human being to perform "close reading" of the text that was found by "distant reading". Computers can build a forest of information but it still takes a human being to process the information.
Doug (NJ)
Search is not the same as analysis. They are performing textual analysis on collected works, not mere search. That is why some of their discoveries were interesting, because they found, through automated analysis (not close reading) of thousands of texts, word use relationships that were not obvious given prior small sample sizes.
Gregory de Nasty Man, an ORPy (Old Rural Person) (Boulder Ck. Calif. Home Of an armchair warrior)
I grew up, not in Stanford, but schooled with some that lived there. One thing that's prevalent in academia is the pressure to publish some type of book on a periodic basis. Then you retire. What happens in the interim, Is that other scholars will review your work/writing, presenting critiques of your work . That is How professorships work, I guess. Some books are informing and well written – and some may be triple-trifle, And hardly worth the time spent by a human to absorb, as another commentor 'rich guy' has noted. If only I could figure out a way to get a machine/computer to do my reading for me… But then I wouldn't get to savor Much.
Bob S (New Jersey)
“distant reading”: the computer-assisted crunching of thousands of texts at a time. Computers do not crunch anything. Instead a computer data program can count the number of specified data in files. The specified data in thousands of text files could be words. A computer data program could be created to provide counts on given word(s) in sentences and search thousands of text files. Some of the results of the computer data program might provide information on the use of words over different time periods of the thousands of text files. To obtain any possible real information from the counts of data, provided by the computer data program, would require a human being to read and review the sentences and paragraphs of each line that was found to have data. Computer programs can be created to aid human beings working with large amounts of text information. Computer programs can not process text information. Only human beings are capable of processing text information. At one point in time individuals will understand that computers are simply machines that counting specified data.
Doug (NJ)
But computers are very good and accounting broad selections of data through multiple associations, without prejudice, which is something humans are really poor at doing. Computers hand data volumes that the human mind cannot comprehend, and they don't fall prey to that all too human error of seeing patterns where none exist. The very synaptic behavior that keeps us alive in the wilderness at night is really poor at avoid patterns pitfalls in big data. Your fifth paragraph presents a false dichotomy. Without machine data being able to infer language usage, machine language translation would not be possible. Last, I am not claiming that they are employing deep learning or AI in any form, but deep learning/machine learning is not simply machines counting specified data. Deep learning AI uses data history to provide weighted paths to interpret unspecified data, and that is a huge difference.
Bob S (New Jersey)
A computer product can search and provide the count of the number of toys that have been sold over a number of days for a company but it is a human being that does the analysis of the counts of toys that have sold on different days. By the way it is also a human being that sets up the search that the computer product will do.
Bob S (New Jersey)
"Without machine data being able to infer language usage." Machines do not infer language usage since computers deal deal with numbers that are recognized by the machine. When a user presses the key with a symbol a number is sent to the machine. The machine only deals with numbers and does not deal with symbol that was pressed by a user.
richguy (t)
Not such a radical departure from more traditional literary criticism. To me, it sounds like scholars don't want to plow through pages and pages and pages of subpar writing from the 1800's. I suspect they will still read Middlemarch and the Wings of the Dove with immense care and pleasure, but 90's New Historicism unearthed a trove (perhaps not a treasure) of third rate novels that had some ppularity during their time and that might, due to that popularity, shed some light on the mental habits, the episteme of those eras. Scholars such as Walter Benn Michaels, have made brilliant careers from excavating era-bound habits of thought from tedious novels far, far below Dickens and Collins in quality. This is the equivalent of using computers to read through pages and pages of bad poetry from college writing classes and slam poetry events to discover how people conceptualized themselves. I think most critics will still sit down with dog-eared copies of Beckett, Proust, and Dickens.
richguy (t)
Imagine wanting to use the Fifty Shades novels or he Twilight series or Anne Rice's novels or various bad detective or romance novels or to help characterize the mindset of the early 21st C. Nobody wants to read all that stuff. It's a bit elitist. Bad literature is good enough for data mining, but not for reading.
Doug (NJ)
It is more than that. Bad, or at best, average literature is excellent for data mining general patterns of language use over long time periods. The worst literature, and the best, become data points a couple standard deviation from the norm, which leaves a huge trove of data in the middle to glean for daily usage analysis. Then we can stick to reading the books we want to read for pleasure. Be whatever that book is to any given individual. Reading for analysis is not the same as reading for pleasure.
candidie (san diego)
Reading is a physical activity, literature barely touched in the opening sentence. To turn reading into robotics may enhance science but removes it from the realm of art.
SGK (Austin Area)
Read, count, interpret, analyze, read some more. What's to lose? Everything can be knowledge, enjoyment, theory, and especially, et cetera. I've got no problem with Mr. Moretti's approach, if used as one of many. Decades ago I received " punitive B" in a grad school English class for refusing to count syllables or some such in a sonnet, and instead wrote a sonnet of my own. Glad I did it. But I probably missed a chance to sharpen my addition skills.
John Brews✅✅ (Reno, NV)
The computational work is good for establishing context - trends that set the matrix within which writing occurs. Awareness of such broad shifts cannot but aid our understanding. Perhaps the advent of Facebook and Twitter will make the context within which we write and read more obviously important?
Coco Pazzo (Florence Italy)
Not sure if it falls under the purview of "distant reading" but projects such as the Digital Dante (Columbia) and the Princeton Dante Project have been incredibly helpful in explicating the Divine Comedy and placing Dante's writings within an otherwise overwhelming context of historical, Biblical, theological, and literary precedents and references. And from this understanding comes an even greater appreciation of the poem.
msaby2002 (Middle of nowhere, more or less)
What a strange approach to pursue during the terrifying rise of illiteracy in American politics. Moretti obviously didn't single-handedly elect the worst president in history, but this might not be the best time to downplay the significance of reading in order to verify impressions about the trajectory of stylistic changes in literature that a well-read person could guess. For instance, the assertion that the language in the British novel from 1785 to 1900 shifted from words relating moral judgment to words of concrete description--"duh" is exactly the right term for that revelation. Beware machine-obsessed academics besotted with the notion of themselves as "revolutionaries." But they might be able to affect the continuing downward spiral of the English major as it attempts to compete with the hollow edifice of capitalism in the business college next door. I'm sure that will be wonderful for us all. Right?
ecco (connecticut)
rather taxonomy than criticism, computation for sure, its usefulness and validity to be determined, but hardly comparable to the immeidiacy of an experience of reading or watching a film (see the deft sorting of a tv series by it devotees who have the schematics of dozens of eposides down pat) or actually listening to a piece of music, apreciating its moment as well as, to the degree experience allows, its echos. if the "laws of literature" are your thing, "computational criticism" may save some time, like a law or medical dictionary or say the classic penguin summary of literary terms, a beside must for the taxonomist but if "subtle meanings" and "singularity" inspire, try instead a serious review of a difficult work, say gilbert seldes' review of james joyce's "ulysses." "Mr. Moretti’s own output has a similar dividedness. His early work was grounded in close reading..." and now he seems to be passing up the trees (works of literature) for the forest (the laws of literature)...but if failure in his self-defined "revolution" makes him happy, who is anyone to jeer as the emperor leads his parade.
Amir (<br/>)
Although the academic virtues of Moretti’s work are debated, is it possible that some political genius is using Morretti’s methodology to create political ads/speech that are narrowly tailored to a specific audience when presented but then in the aggregate lead to broad voter turn out and electability of that candidate? Are Russia’s involment in our election a prelude to such?
CLP (Meeteetse Wyoming)
Not only is the data fascinating, but it seems like a gift to readers... Having left academia and spent years as a common reader, I like the possibility that these efforts liberate us! -- let the crunching analysis be done more efficiently, compartmentalized even, so we can have more intimate, profound, soul-connecting engagement with the texts? Mr. Moretti asks whether "reading and knowledge are continuous with each other." The answer depends on what kind of knowledge we are talking about.