I’m just getting into topic modeling as a research method, thanks to my husband, Jonathan Goodwin. This post represents my first attempt to make sense of it, because its value hasn’t been immediately comprehensible to me. You take a huge corpus of thousands of academic articles (or whatever), and then run a program, which extracts and presents you with groups of words that tend to occur together in the articles (topics).
Jonathan took all of JSTOR’s archives of College English, CCC, Rhetoric Review, Rhetoric Society Quarterly, and JAC and generated 100 topics. Some were coherent, while others seemed more random – though also somewhat interesting; see postscript. Here’s an example of a coherent one:
The program will show you a long list of articles that are associated with the topic. So if you were determined to find absolutely everything having to do with Hugh Blair, this method might give you some articles you wouldn’t have found via a regular JSTOR search.
OK, but besides that, what do you DO with the topics? That has always been the confusing question for me, though I have read some work about topic modeling. I want to write a series of posts explaining what I’ve been doing with the topics, to sort it out for myself.
When Jonathan sent me the list of 100 topics, I went through all of them and selected 53 that I thought were interesting. Mostly these were the ones that were most coherent, like the Blair example. I then pasted them into a document, labeled each one, and grouped them together, like so:
History of Rhetoric
classical cicero ancient rhetoric greek roman oratory orator eloquence quintilian invention renaissance speaking aristotle orators rhetoricians vols modem history
plato sophists gorgias socrates sophistic greek phaedrus ancient platonic greece athenian greeks athens sophist protagoras logos dialogues carolina isocrates
lectures belles century lettres historians rise influential reform hugh british campbell scottish southern rhetorical blair late alexander england founded
philosophical philosophy truth logic philosopher rational doctrine philosophers theory aristotle essays thing writings mere science thinking human mind truths
rhetoric rhetorical persuasion rhetoricians kenneth communication speech burke aristotle classical audience argumentation philosophy discourse persuasive arguments quarterly speaker invention
Now, I haven’t yet looked at the lists of articles associated with these topics, but here’s a list of questions that might be answered by giving these lists a close review:
How has the interest in these topics changed over time? This seems to be a favored approach among nerds like my husband – visualizations: graphs that plot the trajectory of when people started becoming interested in the topic, when interest peaked, and when it waned. Below is an obligatory graph for the Hugh Blair topic:
Another question I’ve never heard anyone ask, though, is this: which journals are publishing the most on these topics? Most of us in rhetoric and composition assume that if you have a manuscript about Cicero, you send it to Rhetoric Society Quarterly or Rhetoric Review, not to College English, because they don’t publish that kind of thing, generally speaking. But are we sure about that? To what extent? I can now do a frequency count of the journals that are represented in the “classical cicero ancient” topic and make a pie chart showing the breakdown. More on that later.
And Now for Expressivism
For six of the topics, I wasn’t sure what they meant, but I went ahead and labeled them “Expressivism.” Because expressivism doesn’t have an attendant set of terms, and is misunderstood so often and so profoundly as a theory of writing and teaching philosophy, I was interested in seeing what kinds of articles were listed. As it turns out, most of them were actually about literature, or were works of creative writing. But one of them yielded a few interesting bits:
Most of those, of course, are commonly used words, and I’m not really convinced that this is an “expressivist” topic. Still, I did find these articles, among many others:
Richard K. Redfern, "A Brief Lexicon of Jargon: For Those Who Want to Speak and Write Verbosely and Vaguely", College English, 1967
Donald Murray, “Henry James in the Advanced Composition Course,” 1963 College English
Peter Elbow, "Exploring My Teaching", College English, 1971
Joseph J. Firebaugh, "On being Unacademic", College English, 1946
Geraldine Hammond, "How Gladly Do We Teach?", College English, 1951
Winfield H. Rogers, "Responsibilities of the English Teacher in the Urban University" 1940 College English
J. Mitchell Morse, "Why Write like a College Graduate?" College English, 1970
J. Mitchell Morse, "The Case for Irrelevance", College English, 1968
In reviewing the list of articles associated with this topic, I think I now have a better handle on the sources of the theory some of us call expressivism: certainly it arises from attitudes of respect, concern, and care for students (see Rogers, 1940; Hammond, 1951). But I now see the overlap between expressivist values and the study of literature, as well as the study and practice of creative writing, and descriptive approaches to linguistics*. Those connections should have been obvious, but they weren’t. I can see that as early as 1940, some of the ideas associated with expressivism were in circulation – though certainly some would say that the seeds were planted even earlier, with Fred Newton Scott’s work (see Linda Adler-Kassner’s excellent article “Ownership Revisited” in CCC, 1998). I also think there are many more expressivists than we'd realized, and that a lot of people are/were expressivists but don't/didn't know it. No one is old enough to have witnessed all of this, comprehensively, in real time, and topic modeling is almost like having such a person.
More to come. For now, I’ll say that the best way to grasp the value of topic modeling as a method is to focus on one topic and mine the articles.
* I’m prepared to argue that James Sledd was an expressivist; I think I have a good bit of evidence.