How should descriptive grammars cover interjections?

Posted by mark

Interjections are, in Felix Ameka’s memorable formulation, “the universal yet neglected part of speech” (1992). They are rarely the subject of historical, typological or comparative research in linguistics, and as Aimée Lahaussois has shown (2016), they are notably underrepresented in descriptive grammars. As grammars are the main source of data for typologists, this is of course a perfect example of a self-reinforcing feedback loop. How can we break this trend?

I was at a very stimulating workshop last week, organized by Maïa Ponsonnet, Aimée Lahaussois and Yvonne Treis as the kickoff of a larger project on Typologizing Interjections. This blog post captures some of my reflections after the workshop.

In more than 11 words

The neglect of interjections is not a modern phenomenon. The following 11 words (out of a total of 11k) constitute the full treatment of interjections in one grammar of Zulu (Grout 1849):

Interjections. The principal interjections are: —au! mame! mamo! maye! o! ou!

And even if some grammars today devote more than 11 words to the topic, the basic format of the interjection section, if it exists, is still a list of items (as Lahaussois 2016 shows). This invites a view of interjections as items that stand apart from all other linguistic structures, without anything in the way of semantic, pragmatic and combinatorial structure that might be worth investigating. Indeed the single structural fact mentioned about interjections in most grammars is how some of them feature speech sounds that deviate from the average lexical item.

Good models

The good news is that better models are already available. An unparalleled example, and possibly the most extensive grammatical treatment of interjections in any one language, is Felix Ameka’s grammar of Ewe, a Kwa language of Ghana. Ameka devotes a whopping three hundred pages to “illocutionary devices and constructions used in interpersonal communication” — so many that the proofreader for my Annual Review article, where I mention this fact, tagged it with a clarification question: “[AU: Highlighted range correct? It spans several hundred pages.]”. Within this part, there is a chapter on interjections spanning 50 pages, with separate discussions of about 30 specific emotive, cognitive, and phatic interjections (another dozen conative interjections are treated in an earlier chapter).

Earlier examples exist. One is Charles Fries’ (1952) The Structure of English, the first grammatical description of English based on actual recorded speech. Fries found that the most frequent single-unit utterances where items of a form class whose function was “continued attention, conventionally signaled”, counting “yes”, “yeah”, and “mhmm” among them. Another is Yuen Ren Chao’s famous grammar of spoken Chinese (1965). It devotes only a handful of pages to interjections, but makes them count. It pulls out some of the most frequent interjections, exemplifies them, and contrast them with one another, providing ample detail on possible phonetic realizations. In the passage below, Chao reviews Ng ~ M ~ ə̃, “the weakest from of assent, which is little more than acknowledging ‘I am listening'” (Chao 1965: 819).

Excerpt from Chao's grammar of spoken Chinese (1965). Chao describes the interjections Ng ~ M ~ ə̃, "the weakest from of assent, which is little more than acknowledging 'I am listening'"

More recent examples of grammars that have more than average coverage of interjections are grammars of Acholi (Rüsch 2020), Alto Perené (Mihas 2017), Kalamang (Visser 2022), Lao (Enfield 2007), and Zapotec (Sicoli 2020). For instance, Visser’s grammar of Kalamang provides the customary notes on the phonology of interjections and a list of them, but also has a few pages exemplifying some of the more common ones. Sicoli’s grammar doesn’t have a section on interjections, as it is organised primarily by interactional domains: how people use linguistic resources to offer, recruit, repair, and resonate in interaction. But because it focuses on studying language in vivo, almost every single extract of conversation in it features interjections, and it provides invaluable materials for the comparative study of these items.

Practical proposals

Based on the examples of these grammars and other recent work on the question of how to cover interjections in grammatical descriptions, we can isolate a few best practices.

I. Present and analyse interjections in the context of conversational sequences

When we analyse grammatical items like case markers or numeral classifiers, we take care to consider their immediate grammatical contexts: the words they latch onto or agree with, the semantic contribution they make to the whole of the sentence. For interjections, this strategy breaks down, because they typically appear alone. The solution is to consider the relevant level of structure.

Interjections come alive in social interaction. They are responsive to prior utterances or events, and invite certain response in turn. To understand their function and to enable comparison, we should consider them in their primary ecology: the conversational sequence. In the below example from Mark Sicoli’s (2020) multimodal grammar of Zapotec, we see interjections at lines 2 and 4, and their sequential context demonstrates an important part of their interactional function:

1 Angeles: Sofía: [S is outside the kitchen] SUMMONS Sofía 2 Sofía: Eè? ACKNOWLEDGEMENT Huh? 3 Angeles: Lìkkì’ ínza itta no á DIRECTIVE L*-likì’ ínza ita=no=á imp-(pot) give water relax=instr=1s (Yo u) would give water so I can soften (the dough). 4 Sofía: Áà (First Sign Of Fulfillment)

II. Cover at least the most frequent and interactionally consequential interjections

Given the long tradition in grammar-writing of just listing interjections without any guidance, it can be difficult to know where to start. Which interjections should be featured in in a grammar? In principle of course, any linguistic item that is part of larger linguistic systems deserves coverage; but in practice we cannot do everything equal justice and choices have to be made. My suggestion here is to start from an evidence-based perspective and aim to identify and describe at least some of the most common and interactionally consequential interjections.

Cross-linguistic work on interjections (some of it my own, but building on the work of many others) has isolated a number of highly frequent interactional jobs for which languages tend to mobilize interjections. Three of these are listed in Table 1 below (from Dingemanse 2023). These represent very common functions that will be encountered even in a few minutes of everyday conversation. They are interactionally consequential in that we rely on them for negotiating mutual understanding and realizing complex syntax (as in storytelling). They are also grammatically relevant, as they punctuate complex turns (continuer), invite repetition and clarification (repair initiator), and intersect with epistemic markers (news receipt). In my opinion, any grammar that doesn’t cover at least these interactional resources should be considered incomplete.

III. Gloss interjections in more specific ways than just INT/INTJ

Interjections are word forms with specific functions, just like grammatical morphemes. A filler like uhm is formally and functionally different from a continuer like mhmm, and glossing them both as INTJ obscures that difference. Consider the analogy of morphemes marking tense, aspect and mood: if we glossed all of them TAM indiscriminately, our grammars would be much less useful. Eline Visser’s grammar of Kalamang (2022) is exemplary in this regard: it offers a list of interjections and their gloss, and one can search for these glosses and find examples of most of the interjections in context.

IV. Consider animal-oriented interjections as a locus for comparison

This is perhaps the most quirky suggestion but it is based on some of the better grammars and linguistic descriptions covering a particular class of functions for which interjections are often recruited: animal-oriented utterances. Felix Ameka’s grammar of Ewe devotes a considerable number of pages to such conative interjections, but so do the grammars of Lao (Enfield 2007) and Kalamang (Visser 2022). One thing that is specifically interesting about such animal-oriented interjections is that they represent a corner of language that is specifically devoted to interspecies interaction. And we can expect the resources in this corner to be adapted at least partly for that particular purposes, with potential cross-linguistic similarities as a result.

One example I’ve documented in my own work is a curious convergence in the sounds that occur in shooing words: interjections used to chase away birds, especially domestic chicken. As I found, shooing words, but not words for ‘chicken’ often feature sibilant or fricative sounds — a fact that may be connected with the fact, independently established by ethologists, that such sounds are the sounds chicken are most aversive to (Dingemanse 2020:396-8).

Focusing on animal-oriented conative interjections is also a way of achieving comparability, and so it would represent a promising direction for a comparative typological approach to interjections.

Note

This post summarises some of my take-aways from the workshop and builds on some proposals I’ve made in my contribution to the Oxford Handbook of Word Classes. That paper is available here: osf.io/preprints/psyarxiv/ngcrs. Most if not all of the grammars and sources cited in this post are also discussed and cited there.

Grammars cited

Ameka, Felix K. 1991. ‘Ewe: Its Grammatical Constructions and Illucutionary Devices’. PhD dissertation, Australian National University.
Chao, Yuen Ren. 1965. A Grammar of Spoken Chinese. Berkeley: University of California Press.
Enfield, N. J. 2007. A Grammar of Lao. Berlin: Mouton de Gruyter.
Fries, C. C. (1952). The Structure of English; an Introduction to the Construction of English Sentences. New York: Harcourt, Brace.
Mihas, E. (2017). Conversational structures of Alto Perené (Arawak) of Peru. John Benjamins Publishing Company.
Rüsch, M. (2020). A Conversational Analysis of Acholi: Structure and Socio-Pragmatics of a Nilotic Language of Uganda. Brill.
Sicoli, M. A. (2020). Saying and doing in Zapotec: Multimodality, resonance, and the language of joint actions. London ; New York: Bloomsbury Academic.
Visser, Eline. 2022. A Grammar of Kalamang. Language Science Press. Language Science Press. https://doi.org/10.5281/zenodo.6499927.

Other sources cited

Ameka, F. K. (1992). Interjections: The Universal Yet Neglected Part of Speech. Journal of Pragmatics, 18(2–3), 101–118.
Amha, A. (2013). Directives to Humans and to Domestic Animals: The Imperative and some Interjections in Zargulla. In M.-C. Simeone-Senelle & M. Vanhove (Eds.), Proceedings of the 5th International conference on Cushitic and Omotic languages. Cologne: Rüdiger Köppe.
Dingemanse, M. (2020). Recruiting assistance and collaboration: A West-African corpus study. In S. Floyd, G. Rossi, & N. J. Enfield (Eds.), Getting others to do things: A pragmatic typology of recruitments (pp. 369–421). Berlin: Language Science Press. doi: 10.5281/zenodo.4018388
Dingemanse, M. (2023). Interjections (E. van Lier, Ed.). The Oxford Handbook of Word Classes. Oxford University Press. doi: 10.31234/osf.io/ngcrs
Fries, C. C., & Pike, K. L. (1949). Coexistent Phonemic Systems. Language, 25(1), 29–50.
Lahaussois, A. (2016). Where have all the interjections gone? A look into the place of interjections in contemporary grammars of endangered languages. In C. Assunção, G. Fernandes, & R. Kemmler (Eds.), Tradition and Innovation in the History of Linguistics (pp. 186–195). Nodus Publikationen. Retrieved from https://hal.archives-ouvertes.fr/hal-01361106
Lahaussois, A. (2020). Descriptive and methodological issues in Kiranti grammar(s) (PhD thesis, Université de Paris). Université de Paris. Retrieved from https://halshs.archives-ouvertes.fr/tel-03030562
Ponsonnet, M. (2023). Interjections. In C. Bowern (Ed.), Handbook of Australian Languages. Oxford: Oxford University Press.

When removing ‘disfluencies’ in NLP is like sawing off the branch you’re sitting on

Posted by mark

There is a minor industry in speech science and NLP devoted to detecting and removing disfluencies. In some of our recent work we’re showing that treating talk as sanitised text can adversely impact voice user interfaces. However, this is still a minority position. Googlers Dan Walker and Dan Liebling represent the mainstream view well in this blog post:

People don’t write in the same way that they speak. Written language is controlled and deliberate, whereas transcripts of spontaneous speech (like interviews) are hard to read because speech is disorganized and less fluent. One aspect that makes speech transcripts particularly difficult to read is disfluency, which includes self-corrections, repetitions, and filled pauses (e.g., words like “umm”, and “you know”). Following is an example of a spoken sentence with disfluencies from the LDC CALLHOME corpus:

“But that’s it’s not, it’s not, it’s, uh, it’s a word play on what you just said.“

It takes some time to understand this sentence — the listener must filter out the extraneous words and resolve all of the nots. Removing the disfluencies makes the sentence much easier to read and understand:

“But it’s a word play on what you just said.“

Fair enough, you might say. Everyone understands there are use cases for identifying and sometimes removing these items, for instance (possibly) when subtitling or transcribing spoken material for written consumption. And surely in this example, the sanitized version seems “much easier to read and understand” than the original.

Easier for whom and relative to what?

Hold on. Easier to read for whom? Easier to understand relative to what? It never hurts to go back to the source. Here is a more precise transcript of the interaction from the CALLHOME corpus. The target utterance appears at line 11:

B yeah
A so he would like to have a place where eh they can come and visit him
B yeah
A and so, and so
B /laugh/
A its fine with [me
B [but anyway- never [mind
A /laugh/ what?
B i- i dont want to say
A /laugh/
B [but thats its not- its not- its- uh its a word play [on what you just said
A [oh
B its kind_of a switcheroo
A whats- what is it?
B /laugh/ well th- th- he or they can they can visit him and come on him
A ah ha ha
B instead of coming and visiting
A yeah well okay /laugh/
B i just

What happens at line 11 cannot be understood without the immediate prior context. Technically (that is, using the analytical tools of conversation analysis) we can describe it as a case of ‘disfluency’ or ‘hesitation’ deployed to do the interactional work of showing an orientation to inappropriateness (Lerner 2013) — in this case of a lame pun with some sexual innuendo. The pun (“kind of a switcheroo” as B says) is a juvenile word play that exchanges “come and visit him” for “visit and come on him” (15). After an initial laugh (5), B spends considerable work drawing attention to what crossed his mind while at the same time casting doubt on its tellability: “anyway- never mind” and “I don’t want to say” (7, 9). It is quite remarkable to see so many evasive moves. All this forms the backdrop to the turn in focus:

but that’s it’s not- it’s not- its- uh- it’s a word play on what you just said (11)

When something is produced after so much evasion and in such a belaboured, disfluent, hesitant way, you can bet the delivery is meaningful in itself. The hemming and hawing is the point. It contributes to putting up a smokescreen of ambiguous commitment to what might become (we already sense at that point) something problematic. The deflationary “kind of switcheroo” (13) further aims to diffuse a delicate situation. Only after A’s second request to deliver the goods, B produces the word play. And then the whole thing falls flat, as seen among other things by A’s performative laughter particles, B’s explanation (any pun that needs an explanation is dead on arrival), A’s non-commital “yeah well okay”, the subdued laughter by both, and B’s self-deprecating “I just” (16-19).

An infrastructure for collaborative indiscretion

When we live through episodes like this in everyday life, we get all this in a split second. The slipperiness of jokes and puns, the inescapable social accountability that always hovers over anything we say, and the degree to which we depend on others for realizing indiscretions. We get it when others do it, and we do it ourselves. As I said, the hemming and hawing is the point. Disfluencies are a key interactional tool that we use to navigate interactionally delicate episodes (Jefferson 1974). Gene Lerner (2013) has described hesitations in this kind of context as an infrastructure for collaborative indiscretion. The point: there is a great deal of order and regularity even to things like hesitations and disfluencies.

Let’s back up a bit. First we have an original utterance, warts and all, situated in an actual interaction, formulated in a way that displays self-consciousness, saturated with accountability. Then we have, in the Googlers’ version, an abbreviated, regularized, decontextualized version that is emptied of all significance, all of the wrinkles ironed out. The original relates to the sanitized form approximately as a living, fluttering butterfly relates to a pinned and preserved specimen. The latter may be easier to classify, easier to POS-tag, and easier to vectorize — which is probably what most NLPers mean when they say “read and understand”. But it is not the same. In fact I’d say it is almost the inverse.

(And note, too, that even the cleaned up version is not going to lead to better understanding of what’s actually going on. After all, the hesitations and so on only served to foreshadow that a word play crossed the speakers’ mind, which is only revealed after the whole back and forth. Good luck to your co-reference resolution algorithm!)

Why this matters

How we say something has implications for what it means, how we want it to be taken up, how we expect to be held accountable for it (Jefferson 1974, Clift 2016). People in interaction frequently mobilize disfluencies to stall for time, to display uncertainty, to foreshadow disagreement, to find an ally to co-produce an indiscretion, and a great many other things. Perhaps there are contexts or applications where it may be useful to detect or even hide disfluencies, but erasing them wholesale should raise red flags. And yet that appears to be the sole purpose of Walker and Liebling’s work. As they write,

we created machine learning (ML) algorithms that identify disfluencies in human speech. Once those are identified , we can remove the extra words to make transcripts more readable. This also improves the performance of natural language processing (NLP) algorithms that work on transcripts of human speech.

If we ‘clean up’ transcripts of talk to look more like sanitised text data, NLP algorithms trained on text data also perform better on the cleaned-up transcripts. I bet they do! And again, for some purposes, this may be useful. But it so happens that for this particular case —which, remember, I didn’t pick, they did— the act of cleaning up actually conceals what happened and why. Something essential was lost in the process. Not just the disfluencies, but our power to understand what people do when they wield disfluencies.

Scaling up and losing touch

Let’s think ahead. When feeding only ‘sanitised’ transcripts like this to NLP algorithms, one thing you’re doing is you’re teaching those algorithms to pick up and reproduce, say, lame jokes without the hedging and disfluency they are sometimes produced with (as in this case). It doesn’t take a lot of imagination to see how scaling this up might lead to serious problems (see Birhane et al. 2023 on the not so innocent nature of scaling). The case Walker and Liebling picked happened to be a relatively tame pun. Racism, sexism, gaslighting, and all forms of subtle and not so subtle verbal abuse — these occur in real data, and the way they are produced and responded to is immensely important for a deeper understanding of human interaction.

By removing disfluencies and turning situated talk into sanitised text, you’re removing all public evidence of the very resources people mobilize to manage social accountability and navigate episodes of interactional delicateness. You’re sawing off the branch you’re sitting on. You’re sabotaging your own ability to understand how ethical norms and values are socially enforced in interaction. You’re blocking the path to meaningful improvement. You’re forcing rich, ambiguous, human interaction into a straitjacket of tokenizers and transformers. You are, fundamentally, dehumanizing human interaction.

Talk, warts and all

Linguists and computer scientists have long been conditioned to separate competence from performance, and to regard the latter as essentially disposable. If pristine competence is the supreme goal, only to be reached by excavating it from under the rubble of performance, no wonder that we work hard to remove all evidence of the human in our texts and transcripts (Dingemanse & Enfield 2023).

However, even though the competence/performance distinction has loomed large in NLP, and likely forms part of the cultural backdrop to unexamined choices like this (the standard ‘stopword removal’ procedure is another example), it’s not the only game in town and never has been. A century ago, anthropologist Bronislaw Malinowski wrote:

Indeed behaviour is a fact, a relevant fact, and one that can be recorded. And foolish indeed and short-sighted would be the [wo]man of science who would pass by a whole class of phenomena, ready to be garnered, and leave them to waste, even though [s]he did not see at the moment to what theoretical use they might be put!
Malinowski 1922:20

If we take this whole class of phenomena to include human interactive behaviour, recorded and represented as faithfully as possible, then it should be clear today that not only are their ample theoretical uses for it, but also practical ones. The theoretical uses include forming a sophisticated understanding of how people exchange information and build social relations through situated talk; a critical prerequisite to any serious work on human language technology. The practical uses include building on such insights to make language technologies that do not sanitise and dumb down what we say, but that instead harness our linguistic abilities — including our formidable and sophisticated abilities to delay, hesitate, backtrack, and repair. As conversational agents and voice-driven interfaces grow increasingly ubiquitous, now is the time to move beyond text-bound conceptions of language, and to start taking talk seriously.

References

Birhane, A., Prabhu, V. U., Han, S., Boddeti, V., & Luccioni, S. (2023). Into the LAION’s Den: Investigating Hate in Multimodal Datasets. Presented at the Thirty-seventh Conference on Neural Information Processing Systems, Datasets and Benchmarks Track. Available at https://openreview.net/forum?id=6URyQ9QhYv
Clift, Rebecca. 2016. Conversation Analysis. Cambridge: Cambridge University Press.
Dingemanse, Mark, and N. J. Enfield. 2023. ‘Interactive Repair and the Foundations of Language’. Trends in Cognitive Sciences. https://doi.org/10.1016/j.tics.2023.09.003.
Jefferson, Gail. 1974. ‘Error Correction as an Interactional Resource’. Language in Society 2: 181–99.
Lerner, Gene. 2013. ‘On the Place of Hesitating in Delicate Formulations: A Turn-Constructional Infrastructure for Collaborative Indiscretion’. In Conversational Repair and Human Understanding, edited by Makoto Hayashi, Geoffrey Raymond, and Jack Sidnell, 95–134. Studies in Interactional Sociolinguistics 30. Cambridge: Cambridge University Press.
Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. ‘The Timing Bottleneck: Why Timing and Overlap Are Mission-Critical for Conversational User Interfaces, Speech Recognition and Dialogue Systems’. In Proceedings of the 24th Annual SIGdial Meeting on Discourse and Dialogue. Prague. https://aclanthology.org/2023.sigdial-1.45/
Malinowski, Bronislaw. 1922. Argonauts Of The Western Pacific. London: Routledge & Kegan Paul.
Walker, Dan, and Dan Liebling. 2022. ‘Identifying Disfluencies in Natural Speech’. Google Research Blog. 30 June 2022. https://blog.research.google/2022/06/identifying-disfluencies-in-natural.html.

How to avoid all-male panels (manels)

Posted by mark

The last time I blindly accepted an invitation to speak was in 2012, when I was invited to an exclusive round table on the future of linguistics at a renowned research institute. As a fresh postdoc I was honoured and bedazzled. When the programme was circulated, I got a friendly email from a colleague asking me how I’d ended up there, and whether I thought the future of linguistics was to be all male. Turns out the round table was not merely exclusive but also exclusionary. (Things that often go together.)

I was ashamed and embarrassed. Both that it happened and that I had not seen it. That was my introduction to the notion of an all-male panel, or manel for short. It seems stupid to me now, but that notion had not occurred to me before. Yet once I knew it, I saw it everywhere. I realized that I had been too dense to see what more than half the world’s population can’t help seeing.

The title of this post is ‘how to avoid manels’. I’m going to take it as a given that you understand why you might want to avoid them, but if you would like to sealion about that I would invite you to read up on the literature, starting maybe with Martell et al. (1996), followed by Martin (2016). I can also recommend the personal perspective from Särmä (2016), one of the folks behind the fabulous Tumblr blog Congrats, you have an all-male panel! Also let me note this post is aimed especially at (white, cis) men who keep finding themselves being invited to panels, or keep putting together panels without thinking about diversity.

So, how do you avoid manels? It is said that sophisticated people can hold up to ten rules in their mind but I’m going to boil it down to two:

Don’t participate in them. This means —if you’re a (white, cis) male— pledging to not be part of line-ups where everybody looks like you, and letting organisers know at the first opportunity that this is a condition for your participation. In the film industry this is also called an inclusion rider or equity rider.
Don’t organize them. This means thinking of diversity ahead of time. If you’re organizing panels or inviting speakers, diversity should not be an afterthought. In particular, don’t start thinking about it after you’ve invited the first five male speakers and one of them (if you’re lucky) mentions this.

For both #1 and #2 it’s good to be prepared to suggest speakers that you think should be represented. This holds especially if you’re a (white, cis) male, as too often, this kind of work falls to minoritized people. Oh, and two more things:

‘But I can’t think of anyone!’ is probably more an indictment of your own thinking than of the state of the field. If you don’t know excellent speakers beyond a few white male usual suspects, you don’t know the field well enough.
Likewise ‘I tried, but they said no!‘ is not an excuse but more likely an indication that you started too late or went for the already overburdened superstar everyone reaches for. Branch out to early career researchers; platform new voices. It takes work to achieve diversity and inclusion.

There is more to say. Perhaps most importantly, underrepresentation is not just a gender issue, as a second look at just about any offending panel will show. Intersectionality matters. On this, Better Allies is worth following. Another thing is that many people have written about this more eloquently than I have; some references are below (and see, e.g., seminar speaker selection). And finally, today there are many resources to help you find excellent speakers, e.g., in one of my own fields, Women in Cognitive Science.

The Panel Pledge is now ten years old, but unfortunately no less relevant. The immediate reason to write this post was that I saw an advert for an all-male event on replication and open science methods in linguistics — a field where there is a lot of choice when it comes to qualified folks, and where the “bro” problem has specifically been called out by scholars like Whitaker and Guest (2020). I don’t know the speakers at this event and am sure they are qualified — but I hope they’ll sign the Panel Pledge and prevent themselves, the organisers and all of us from the embarassment of all-male panels in the future.

References and further reading

Martell, Richard F, David M Lane, and Cynthia Emrich. 1996. ‘Male-Female Differences: A Computer Simulation’.
Martin, Jennifer L. 2014. ‘Ten Simple Rules to Achieve Conference Speaker Gender Balance’. PLOS Computational Biology 10 (11): e1003903. https://doi.org/10.1371/journal.pcbi.1003903.
Rosen, Rebecca J. 2013. ‘The Panel Pledge: A Follow-Up’. The Atlantic. https://www.theatlantic.com/technology/archive/2013/01/-the-panel-pledge-a-follow-up/266886/.
Särmä, Saara. 2016. ‘“Congrats, You Have an All-Male Panel!”’ International Feminist Journal of Politics 18 (3): 470–76. https://doi.org/10.1080/14616742.2016.1189671.
Whitaker, Kirstie, and Olivia Guest. 2020. ‘#bropenscience Is Broken Science’. The Psychologist, 28 September 2020. https://www.bps.org.uk/psychologist/bropenscience-broken-science.

Pitfalls of fossil-thinking: a peer review II

Posted by mark

This is a the second part in a two part series of peer commentary on a recent preprint. The first part is here. I ended that post by noting I wasn’t sure all preprint authors were aware of the public nature of the preprint. I am now assured they are, and have heard from the senior author that they are working on a revised version. Since the first preprint version is still public and since the senior author responded publicly, I also want to commit the below comments to the public record.

Di Paola, Giovanna, Ljiljana Progovac, and Antonio Benítez-Burraco. 2023. “Revisiting the Hypothesis of Ideophones as Windows to Language Evolution: The Human Self-Domestication Perspective.” PsyArXiv. https://doi.org/10.31234/osf.io/7mkue.

Legitimate critique is not fear

Independent from Part I of my review, fellow ideophone and iconicity expert Dr. Ian Joo responded to the preprint with an excellent thread on twitter, a crumbling social media network where links are not guaranteed to keep working. To this, corresponding author Antonio Benítez-Burraco responded as follows:

Thanks for the criticism. This is very helpful. With regards to the “primitiveness” issue… I see that most people are afraid of finding less complex elements in present-day languages. Perhaps they don’t exit, or perhaps we would not like to find them. I understand the reasons. But I think they are ideologically-motivated rather than scientifically-motivated (avoid racism, etc.)”.

I want to take a strong stand against this reframing. It casts legimitate critique as a form of being “afraid” and places the authors in the role of the intrepid explorers boldly going where noone else dares to go. But this seems a bit silly. Being aware of historical harms perpetrated in the name of science is important. Trying to avoid pitfalls of prior work (e.g., Levy-Bruhl’s selective reading) is objectively useful. Neither have anything to do with being “afraid” to find something.

We’re calling into doubt the scientific utility of seeing something as a “fossil” (a relic frozen in time, an archaism, a remnant of what once was). And we’re drawing attention to the ways in which such a construal limits what one can find out, and risks imposing a kind of tunnel vision on a phenomenon. Those are the pitfalls of fossil-thinking. This is nothing to do with ideological motivation, and everything with good science.

I should add that being a good scientist, to me, certainly also means not being racist.

“Words”?

Anyway, on to the preprint. The ms starts off with an odd claim that we have to get out of the way first. Section §2 is entitled “Ideophones as sound-symbolic “words”¹ (with words in scare quotes) and a footnote:

¹ Technically speaking, ideophones may not be ‘words’ in the traditional sense, because they do not combine with other words to form specific phrases, as is the case with typical words such as e.g. nouns, verbs, adjectives, etc. They instead seem to be holistic expressions which attach to full sentences to provide a vivid depiction of the scene. For the lack of a better term, we refer to ideophones here as ‘words.’
Di Paola, Progovac and Benítez-Burraco 4

This statement is at odds with decades of scientific work on ideophones. While ideophones can indeed sometimes be seen as expressions that “attach to full sentences to provide a vivid depiction of the scene”, the ms provides no argument why this would exclude them from being words. Ideophones are recorded in dictionaries, learned by children, incorporated in sentences, described in good grammars. It is unclear what one gains from calling into question their status as words.

More importantly, it is simply untrue that “they do not combine with other words to form specific phrases”, as shown in work on ideophone constructions in Basque, Gbaya, Japanese, Korean, Mandarin, Semai, Siwu, Turkish, Quechua or Zulu (see for example Akita & Dingemanse 2019; Ibarretxe-Antuñano 2017; Van Hoey 2023; and sources cited therein). I’ll revisit the claim below, as it appears in multiple places in the ms; but suffice it to say, for now, that when it comes to purporting to present a scientific understanding of ideophones, the ms starts off on the wrong foot.

Questionable claims

The first major section of the ms introduces ideophones in a way that appears designed to shed the most favourable light on the hypothesis to be defended in the ms, namely that ideophones qualify as ‘linguistic fossils’ (see part I on why I believe that is an ill-advised aim, even if one thought, as I think we all do, that ideophones can definitely be relevant to understanding language evolution). As these claims are used to support points made later in the ms, it is important to consider them carefully.

(i) They can be enriched with specific sound types that are located in specific word positions

Obscured in this statement is the admission that the base material of ideophones is often perfectly phonologically regular, as indeed linguists working on ideophones have long described. This undercuts the claim that ideophones are not quite “words” (see above), and also undercuts later claims about the supposed holophrase-like nature of ideophones.

(ii) Prosodically, ideophones often contrast with other words in the utterance because of their higher or lower pitch

No beef with this claim; it is indeed a fair summary of the literature (the only such claim in this section.)

(iii) Morpho-phonologically, ideophones are frequently created through reduplication (…) Ideophones can be said to lack double articulation (i.e., combinatoriality of sounds plus combinatoriality of morphemes, as discussed further below), which is typical of ordinary words

Some major problems here. First, yes, reduplication is a striking characteristic of some ideophones, but there are many non-reduplicated ideophones: in the few inventories for which we have counts (sources in Dingemanse 2015), the number of reduplicated base forms ranges from 7% (Somali) to 35% (Japanese) and 59% (Siwu), so defining them as reduplicative words (as the ms does later on, referring back to this questionable claim) risks disregarding 93% to 41% of ideophones depending on the language.

Second, introduced in this paragraph and mobilised later on in the ms, is the strong claim that ideophones “lack double articulation”. No source is given for this remarkable statement. The notion of double articulation (due to Martinet, and closely related to Hockett’s duality of patterning) is often connected to the arbitrariness of the sign. In a simplistic view of ideophones as “iconic” signs, their iconicity would stand in the way of them exhibiting this kind of duality.

However, ideophones are not simply iconic words. They have, all of them, arbitrary characteristics and clearly function as conventionalised words in larger linguistic systems — a point made in my 2019 paper on ‘Ideophone’ as a comparative concept, which is cited in the ms but not engaged with; but also by ideophone scholars like Diffloth, Kita, Nuckolls, Ibarretxe-Antuñano and many others. Their phonological form, riffing on the larger phonological system of the language (Diffloth 1980), is one piece of evidence for this. The fact that they can combine with other bits of morphology in many languages is another (Van Hoey 2023).

Types of reduplicative morphology in Siwu ideophones (from Dingemanse 2015)

In fact, even reduplication, presented in the ms as the simplest operation possible and therefore implicitly as evidence of a supposed lack of double articulation, presents counterevidence. In several languages, the processes of reduplicative morphology found in ideophones form something like a mini-grammar with its own rules and regularities of form and meaning (Diffloth 1976, Dingemanse 2015; and see Table above). So when we consider all the evidence, this claim shoots itself in the foot.

(iv) Syntactically, ideophones tend to occur in separated utterances, although they can be occasionally used as completive clauses (Diffloth, 1972), or as part of a phrase (Dingemanse and Akita, 2017).

The “tend to” is problematic here, as most evidence points to this being possible but not necessarily the most frequent form of occurrence. It is hard to say what this statement is based on, because the sources cited here and further on in the same paragraph all actually show how ideophones are often integrated into the utterance to various degrees. Indeed a key point of the second paper cited in this paragraph is the evidence-based observation that Japanese ideophones are separate utterances (‘holophrases’) only in a handful of cases, and that it is much more common for them to be part of larger morphosyntactic constructions (Dingemanse & Akita 2017:502).

There is also extensive work on ideophone collocations (Samarin 1971, Van Hoey 2023) that falsifies this claim. Perhaps on a sympathetic reading this is meant to refer to the relative syntactic freedom of ideophones — relative, it is important to note, to other elements in the utterance, i.e. taking ideophones to be part of the same utterance and not a separate one. However, as we see below, it appears not to be so innocent.

(v) Semantically, ideophones convey many different types of meanings, but their typical function is to qualify verbs of perception

This statement is too categorical to be accurate, but to the extent that it is true (which is only partly), it directly conflicts with statement (iv) — after all, a common way for ideophones to “qualify” verbs of perception is to actually appear in lexical collocation with them, i.e. not as separate utterances but as integrated constructions.

(vi) Pragmatically, ideophones are mostly meant to cause a major engagement with the addressee. Because they occur mainly in spoken discourses, ideophones tend to highlight the acoustic and visual dimensions of conversational exchanges (Clark, 2016). Also, in their affective-imagistic dimension, they contribute to the emotivity of the discourse (Baba, 2003), whereas as depictions, they can be construed as performance

Again, this statement is too categorical and generalizing to be accurate. Do I mean to “cause a major engagement with the addressee” when I use ideophones in doctor-patient interaction (as in Japanese, Sakamoto et al. 2014), or in expert-novice learning situations (as in Ashéninka Perene, Mihas 2013), or in dance classes (Keevallik 2021)? I think this is painting an overly simplistic a picture of what ideophones are used for in interaction. Later on in the ms, this paragraph is mostly mobilized for its aside on “emotivity”, a rhetorical narrowing that does not represent what we know about ideophones.

Rhetorical slippage

The rhetorical function of the claims in the introductory section is to set the stage for an argument that presents “Ideophones as linguistic fossils” (§3). To the extent that the argument depends on those claims, it will falter when those claims turn out to be questionable. What makes the overal edifice weaker still is that later sections often rely on a telegraphed, oversimplified form of the claim which is even more questionable.

This rhetorical strategy is on full view in §3.3, where the goal is to argue for “a degree of continuity with primate abilities” (yes, the ms really appears to go there, or at least does not fend off the deeply problematic interpretation that ideophones are basically like nonhuman primate vocalizations):

(i) ideophones are loosely integrated words with the simplest possible reduplicative structure; (ii) they typically constitute full, holistic utterances; (iii) they are mostly used to convey emotional content; (iv) they are often accompanied by gestures (as mentioned above); and (v) they are tightly linked to the context of use.

I must confess I find it hard to be gentle here; to my mind this is a truly irresponsible reduction of ideophones, and a striking example of a modern-day attempt to exoticise ideophones. They have “the simplest possible reduplicative structure”? Even ignoring that significant parts of ideophone inventories are not reduplicative, every empirical study of reduplication in ideophones has shown intricate patterns of reduplicative morphology.

“They typically constitute full, holistic utterances”? Note that by this point, the “tend to occur in separated utterances” of claim (iv) is silently upgraded to “full, holistic utterances”; a rhetorical sleight of hand. Second, this is a flat-out untruth, no way around it: the available evidence is not kind on this claim. Only 12% of all ideophone tokens in a corpus of Siwu is a holophrase, making it the least frequent construction (Dingemanse 2017); and only 3 out of 692 tokens in a corpus of Japanese is a holophrase (Dingemanse & Akita 2017). Puzzlingly, both these papers are cited in the ms, so the evidence is available to the authors.

“They are mostly used to convey emotional content”? Note again the upgrading of what was a small aside in another claim to a highly questionable generalisation. This sudden shelving away of ideophones as conveying “emotional content” also seems to make the common error of conflating depictive/performative iconicity and expressive/emotive indexicality; several of the papers cited in the ms point out why this is unnecessary and misleading, so I won’t go into that here (see also this old blog post).

Drive-by citations

There is a larger point here that struck me about this ms, though it is not unique to this ms. Citations are important in academia for a couple reasons: they are supposed to back up the claims we make (evidence); they provide ways to establish the lineage of ideas (attribution); and they enable us to engage in a dialogue with different bodies of work (engagement).

In the first version of this preprint, however, many citations appear to be used for something else than these three things. For instance, eleven distinct papers of mine are cited, but if we look up the actual citations in context, as we see above, many of them treat the empirical evidence contained in the cited work as irrelevant (no evidence); many do not properly attribute original ideas (no attribution); and many fail to enter into a dialogue with the work cited (no engagement).

Instead, the citations seem to act as a kind of epistemic cover: a way to claim legitimacy for an argument without the content of the citation actually conferring that legitimacy. Andrew Perrin has coined a very useful term for this: drive-by citations. As he defines them (in the context of discussing how they appear in students’ essays):

These are, essentially, references to a work that make a very quick appearance, extract a very small, specific point from the work, and move on without really considering the existence or depth of connection between the student’s work and the cited work. This is an issue, in part, because the claim or finding being cited is often much more nuanced and complex than the quick way it is used in the citing work.

Drive-by citations have probably always been a feature of academic work (and not only in student essays). So let me make clear that I don’t think the paper I’m reviewing here is uniquely culpable for them. I did see them here more sharply because the paper cites so much of my own work.

When I see drive-by citations as a peer reviewer, I try to call them out, because they muddle the picture of attribution and don’t help to build cumulative progress. At best, they constitute a mere nod to other nominally relevant work; at worst, they misrepresent the cited literature and give readers a skewed view of others’ evidence and arguments. Since we cannot all be expected to know all of the literature cited, we need to be able to trust each others’ use of citations for evidence, attribution and engagement. Drive-by citations dilute academic discourse and hamper scientific progress.

Misrepresenting work on word learning

Drive-by citations citations are a problem throughout, as the above discussion shows, but they are perhaps especially grating in the section §3.6 on “language acquisition”. Rhetorically, it would of course be useful for the argument if ideophones and iconic words more generally were learned early and if infant came with innate biases for iconicity. It so happens that we critically reviewed the word learning & iconicity literature and found things to be a lot less simple:

The combined weight of evidence suggests that the role of iconicity in word learning may be more complicated than supposed: if the processing and understanding of iconicity has its own developmental trajectory and occurs partly in parallel with non-iconic word learning rather than prior to it, iconicity loses some of its bootstrapping appeal, and it becomes more critical to understand the distribution and functions of iconicity by itself.
Nielsen & Dingemanse 2021

The preprint under review here cites that paper, so one might hope it takes careful note of these complications. Alas, it does not. Our paper is cited as follows: “According to the “sound-symbolism bootstrapping hypothesis” (Imai & Kita, 2014; Nielsen & Dingemanse, 2021), this special sensitivity to sound symbolism by preverbal children can be ascribed to a biologically endowed ability to map and integrate multi-modal inputs”. But even a quick skim of our paper shows that in fact we critically review that bootstrapping framework and find the most common interpretations of it wanting.

In the language learning section of the preprint, other literature is cited only if it fits the narrative (e.g., if it purports to show early effects). The more numerous experimental studies showing that there is a learning trajectory to children’s understanding of iconicity itself (in gesture as well as in sound), are all ignored, even though they are highly relevant and prominently reviewed in our paper. This is a fatal combination of drive-by citation and tunnel vision. I would not hold it against the authors if they did not know the literature; they are by their own admission not experts in this domain. However, to cite literature that undermines one’s very argument is generally not a sustainable practice.

Needless to say, all of this seriously complicates the ontogeny-recapitulates-phylogeny narrative the preprint seeks to push, undercutting yet another pillar of the argument for seeing ideophones as ‘linguistic fossils’.

In sum

As I noted in the first post, I have written about ideophones and their possible relevance for matters of language evolution myself. Already then I combined a critical note on published work (in that case by Kita) with a constructive contribution:

Still, I do not bring up this topic just to air some scepticism and move on. Ideophones are clearly relevant to the evolution of language in at least one important sense. Even if they do not provide us with a peek into the minds of our protolinguistic ancestors, they do show us how aeons of cultural evolution may shape and hone spoken language into a system in which both description and depiction play important roles — a system in which speech is not just about something, but is something, to use Peek’s turn of phrase. I see no need to dispute the possibility that depiction came before description in the evolution of language. What I argue is that it is difficult to tell at this point (see Davidson and Noble 1989 for discussion), and that one need not commit to speculations to still appreciate that ideophones are powerful proof of the fact that the depictive potential of speech may be exploited by evolutionary processes. From this perspective, what is typologically interesting about ideophonic languages is that the depictive use of speech has taken on a life of its own, in the form of a sizable class of words which is primarily depictive. This possibility, which we can think of as just one of the many possible trajectories of the ever-evolving bio-cultural hybrid that is human language (Keller 1998; Croft 2000; Evans and Levinson 2009), has often been overlooked or downplayed by linguists focusing on Standard Average European languages. Ideophone systems offer a useful corrective here, shedding light on another corner of the design space of language.
Dingemanse 2011:342

In sum. Things are more nuanced, and therefore more interesting, than a view of ideophones as linguistic fossils would suggest. Work on iconicity in relation to language evolution will benefit from a broad view of the evidence, and from a strong and even-handed grasp of empirical work on ideophones and depictive constructions across languages and modalities.

References

Akita, Kimi. 2020. “A Typology of Depiction Marking: The Prosody of Japanese Ideophones and Beyond.” Studies in Language. https://doi.org/10.1075/sl.17029.aki.
Akita, Kimi, and Mark Dingemanse. 2019. “Ideophones (Mimetics, Expressives).” Edited by Mark Aronoff. Oxford Research Encyclopedia of Linguistics. https://doi.org/10.1093/acrefore/9780199384655.013.477.
Davidson, Iain, and William Noble. 1989. “The Archaeology of Perception: Traces of Depiction and Language.” Current Anthropology 30 (2): 125–55. https://doi.org/10.2307/2743542.
Diffloth, Gérard. 1972. “Notes on Expressive Meaning.” Chicago Linguistic Society 8: 440–47.
Diffloth, Gérard. 1976. “Expressives in Semai.” Oceanic Linguistics Special Publications, no. 13: 249–64.
Diffloth, Gérard. 1979. “Expressive Phonology and Prosaic Phonology in Mon-Khmer.” In Studies in Mon-Khmer and Thai Phonology and Phonetics in Honor of E. Henderson, edited by Theraphan L. Thongkum, 49–59. Bangkok: Chulalongkorn University Press.
Dingemanse, Mark. 2011. “Ezra Pound among the Mawu: Ideophones and Iconicity in Siwu.” In Semblance and Signification, edited by Pascal Michelucci, Olga Fischer, and Christina Ljungberg, 39–54. Iconicity in Language and Literature 10. Amsterdam: John Benjamins. https://doi.org/10.1075/ill.10.03din.
Dingemanse, Mark. 2015. “Ideophones and Reduplication: Depiction, Description, and the Interpretation of Repeated Talk in Discourse.” Studies in Language 39 (4): 946–70. https://doi.org/10.1075/sl.39.4.05din.
Dingemanse, Mark. 2018. “Redrawing the Margins of Language: Lessons from Research on Ideophones.” Glossa: A Journal of General Linguistics 3 (1): 1–30. https://doi.org/10.5334/gjgl.444.
Ibarretxe-Antuñano, Iraide. 2017. “Basque Ideophones from a Typological Perspective.” The Canadian Journal of Linguistics / La Revue Canadienne de Linguistique 62 (2): 196–220.
Keevallik, Leelo. 2021. “Vocalizations in Dance Classes Teach Body Knowledge.” Linguistics Vanguard 7 (s4). https://doi.org/10.1515/lingvan-2020-0098.
McLean, Bonnie. 2020. “Revising an Implicational Hierarchy for the Meanings of Ideophones, with Special Reference to Japonic.” Linguistic Typology, October. https://doi.org/10.1515/lingty-2020-2063.
Mihas, Elena. 2013. “Composite Ideophone-Gesture Utterances in the Ashéninka Perené ‘Community of Practice’, an Amazonian Arawak Society from Central-Eastern Peru.” Gesture 13 (1): 28–62. https://doi.org/10.1075/gest.13.1.02mih.
Sakamoto, Maki, Yuya Ueda, Ryuchi Doizaki, and Yuichiro Shimizu. 2014. “Communication Support System Between Japanese Patients and Foreign Doctors Using Onomatopoeia to Express Pain Symptoms.” Journal of Advanced Computational Intelligence and Intelligent Informatics 18 (6): 1020–26.
Van Hoey, Thomas. 2022. “A Semantic Map for Ideophones.” OSF Preprints. https://doi.org/10.31219/osf.io/muhpd.
Van Hoey, Thomas. 2023. “ABB, a Salient Prototype of Collocate–Ideophone Constructions in Mandarin Chinese.” Cognitive Linguistics, March. https://doi.org/10.1515/cog-2022-0031.

Pitfalls of fossil-thinking: a peer review I

Posted by mark

One of the benefits of today’s preprint culture is that it is possible to provide constructive critique of pending work before it is out, thereby enabling a rapid cycle of revision before things are committed to print. I have myself benefited from comments on preprints, and have acknowledged such public pre-publication reviews in several of my papers. The below remarks are shared in that spirit. (This is part I; part II is here.)

I could keep this short, and I probably should. I wrote about the idea of ideophones as “linguistic fossils” in my thesis:

In Kita’s (2008) view, they are “fossils of protolanguage” and they provide us with a peek into the minds of our protolinguistic ancestors. Although many have granted such ideas at least some intuitive plausibility, in the absence of evidence I think it is hardly useful to speculate about the matter. The lack of evidence shows itself in the fact that speculation goes both ways. For instance, whereas Kita argues that “sound symbolic words are fossils of protolanguage that have been engulfed and incorporated (albeit not fully) into the system of modern language” (Kita 2008:32), we find Diffloth arguing the opposite: “Expressives [ideophones, MD] are not a sort of ‘pre-linguistic’ form of speech, somehow half-way between mimicry and fully structured linguistic form. They are, in fact, at the other end of the spectrum, a sort of ‘post-linguistic’ stage where the structural elements necessary for prosaic language are deliberately re-arranged and exploited for their iconic properties, and used for aesthetic communication” (Diffloth 1980:58).
Dingemanse 2011:341-2

(The passage in my thesis is longer than this; it dismantles some further assumptions about ideophone meanings and formulates a constructive take on the relevance of depiction and multimodality to the cultural evolution of language. As I show, such a take has no need for a problematic notion of “fossils”.)

In my thesis and my subsequent work, I come down more on the side of Diffloth based on a detailed review of the typological evidence and in-depth study of the ideophone system of Siwu, which I have found to be sophisticated, creative, complex, systematic, conventionalized, and an integral part of the larger linguistic system. It is therefore surprising that the present ms, which cites no less than 11 of my publications, somehow finds a way to paint a maximally exceptionalist and exoticist picture of ideophones.

A fraught history

The ms that I review here triggered my Google Scholar notification for “intitle:ideophones”. I was naturally interested to read it, though I was immediately wary of the notion of “linguistic fossils”.

Why might I be wary of such notions in a piece on ideophones? There is a shameful history of scientific racism in the field of the evolution of language and culture, and hastily formed impressions of ideophones have played a questionable role in this. I have called out this scientific racism in 2011 (‘Ezra Pound among the Mawu‘) and 2018 (‘Redrawing the margins‘). As I wrote in the first paper, there is a recurrent pattern where Western scholars meet with ideophones “on the unholy ground of scientific racism and cultural evolutionism” (p. 52). A driving force here has been philosopher and professional racist Lucien Lévy-Bruhl, whose view of sound-symbolic words as signs of “primitive mentality” came to dominate public perceptions of ideophones and iconicity in the early 20th century. As I have documented, Lévy-Bruhl’s account acquires its force (if it has any) only through a highly selective and misleading reading of Westermann, a primary source on ideophones at the time.

Given this fraught history, I would expect any paper that broaches this topic to tread very carefully. At the very least, it would seem a good idea to acknowledge the problems and harms of past approaches in this area, and to avoid Lévy-Bruhl’s approach of selective reading, instead carefully weighing all available evidence and arriving at a balanced account.

Unfortunately, the version of the ms I am reviewing here does not show awareness of this history, and I feel that it engages in several cases of oversimplification and what has been called “drive-by citation“. Indeed, it seems to me that this ms presents a strikingly one-sided view of ideophones that not only ignores large bodies of scholarly work on ideophones (even while citing them) but also risks reverting to centuries-old exoticist views without clearly attempting to fend off problematic interpretations.

The notion of “linguistic fossil” is far from innocent in this regard. It paints a living linguistic system in the most static way possible. The use of this metaphor in creole studies and in work on click sounds has been critiqued in the past as indirectly or directly linked to racist and colonial-era notions about supposedly ‘primitive’ language structures (see Güldemann & Sands 2009). It is not clear what using this tarnished term buys one, academically speaking.

That’s it (for now)

I was actually all set to publish a longer post with detailed and substantive critique of many aspects of this preprint. I think those comments deserve to be public just as the preprint is public. However, I noticed that the preprint was uploaded by a senior corresponding author and that the other two authors (including the first, who may be an early career researcher) do not appear to be very active online.

In this situation, I am weighing the very public nature of the preprint (and my blog) against the not-so-public-yet profile of the first author, and choosing to err on the side of caution. I don’t even know whether they are aware that their work is public, and up for public discussion, in this way. So as a first port of call I have shared my detailed comments on the text with the senior corresponding author.

At the same time, since the fraught history alluded to above is apparently not known widely enough, I am choosing to publish this little prelude here and may revisit the post at a later point with an update.

Update: part II is now online.

Putting interaction centre-stage

Posted by mark

I’ve been taking part (virtually) in a workshop today at the Cognitive Science conference in Sydney entitled “Putting interaction center-stage for the study of knowledge structures and processes”.

Kicking off the workshop, my own contribution was a summary of our Beyond Single-Mindedness manifesto. This was followed by Nick Enfield, who argued that concepts are necessarily social-relational, and by Joanna Rączaszek-Leonardi & Julian Zubek, who drew attention to the importance of first-person experiences of active agents as they couple with others in interaction. These talks were bundled in a ‘theoretical’ block, though each of them also had empirical components.

One feeling I had during the discussion following the talks is that it’s too easy to get muddled in theoretical distinctions and philosophical musings, and THAT is where interaction offers firm empirical grounding. If I look at the work of Lucy Suchman, Ed Hutchins, Gail Jefferson, or Linda Smith — it’s the direct empirical grounding afforded by looking at rich records of interaction that makes it possible to achieve real theoretical and conceptual progress.

For instance, Lucy Suchman (1987), by carefully studying how people interact with photocopiers, was able to singlehandledly upend the classical cognitive science agenda of plans as individual representations and instead show compellingly how they emerge as situated actions. From that empirical work we can then derive concepts like the contingent co-production of a shared material world (=Suchman’s definition of interaction).

As we write in Beyond Single-Mindedness (following Wittgenstein), interaction offers a form of direct empirical access to the multiscale dynamics of cognition that is hard to get otherwise. In that sense, interaction is a privileged locus for cognitive scientists interested in interactional resources and cognitive processes. Looking at it and closely observing it ought to be our first stop, not an afterthought.

Methods

The second half of the workshop was devoted to methodological contributions: Michael Richardson & Rachel Kallen on nonlinear modelling and machine learning approaches to prediction movement and action; Veronica Romero, Alexandra Paxton and Tahiya Chowdhury presenting a range of tools including OpenPose, OpenSmile and recommending Whisper ASR for automatic transcription. Finally, Kristian Tylén showed how coordinated epistemic interaction makes cognition a ‘public process’ and Hadar Karmazyn-Raz & Linda Smith presented work on the dynamics of caregiver-infant interactions.

I liked seeing this work presented, and I have learned new things. At the same time I have some doubts about unintended side-effects of some of these methods. I should clarify that we’ve used kinematics, unsupervised machine learning and speech recognition ourselves, so I’m aware of the utility. My own experience when it comes to such methods is that they are cool and potentially useful, but they also risk being “methods in search of questions” and moreover methods that risk putting us at larger distance from the actual empirical data. After all, Lucy Suchman didn’t need models of nonlinear coupled oscillators to turn the classical cogsci take on planning on its head. Gail Jefferson didn’t need recurrence quantification analysis to bring to light the turn-taking system in what would become the most cited paper of linguistics: she pioneered a system for the detailed transcription and analysis of interactive behaviour that is still the standard in conversation analysis.

New methods create new affordances and allow different types of analyses. But as I use them in my own work, complementary to the rigorous and systematic qualitative ways of looking at data furnished by ethnomethodology and conversations analysis, I do sometimes feel these new methods may have the effect of putting us at a larger distance from the data as it unfolds in the lived experience of the participants themselves.

A very simple illustration of this problem is the recommendation to use Meta’s Whisper ASR for automatic transcription. Our own recent research has shown that Whisper, like most if not all currently available ASR solutions, is terrible at representing timing and overlap, and erases many of the little words that are interactionally important. By our measure, using such ASR systems erases 15% of speech, or 1 out of every 8 words, and the words erased are some of the most interactionally consequential ones. So if you use transcripts or timing data coming out of Whisper without labour-intensive human quality control and correction, they’re nearly useless for fine-grained work on timing, alignment, and intersubjectivity. You’d be working with a funhouse mirror version of your ‘data’ and you wouldn’t see it unless you dive into the output yourself and compare it to the actual conversations. Unexamined use of technology has a way of putting us out of touch with the realities of interactions as they unfold.

Grounding

We can also frame that as a question: as our methods become more technologically sophisticated, is there not a risk of losing sight of the value of careful qualitative observation of interaction? If we replace ANOVAs and 2×2 designs by RQAs and nonlinear modelling, what have we gained?

In response to this, Michael Richardson countered that you can use some of these methods to “uncover structures that you cannot uncover by observation”. He concurred though in saying that you cannot use these methods as a silver bullet; you cannot simply apply them and get something — “you still need to ground them theoretically”.

I would add to that that the grounding needs to be empirical as well (and perhaps in the first place). There is an unnecessary rift in cognitive science discourse (in general, I’m not talking about Richardson’s point here) where theory is cast as high-minded conceptual work and empirical research (especially of the observational kind) is more seen as grunt work, paving the way for the real (often experimental and computational) work. That is not how things work in my experience at all: there is a direct line between empirical, data-driven observation and theory development that does not always need to be mediated by experiments. Some of the strongest theoretical claims in my work (and some of the most replicable ones) derive directly from fine-grained empirical observation of co-present interaction.

Experiments are nice to check hunches; computational models are good to force oneself to specify things in unambiguous ways; but careful, systematic, disciplined observational work forms the empirical backbone of a lot of the most consequential research in human interaction over the past five decades. I’m putting things purposefully strongly here; of course there is a place for all these things besides one another, and I have used all of them in my own work. But the grounding, ultimately, has to come from the ground: the earthy, artisanal reality of everyday interaction.

References

Rączaszek-Leonardi, Joanna, Kristian Tylen, Mark Dingemanse, Linda Smith, Hadar Karmazyn Raz, Nick Enfield, Rachel W. Kallen, et al. 2023. “Putting Interaction Center-Stage for the Study of Knowledge Structures and Processes.” Proceedings of the Annual Meeting of the Cognitive Science Society 45 (45). https://escholarship.org/uc/item/8571r2dz.
Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “The Timing Bottleneck: Why Timing and Overlap Are Mission-Critical for Conversational User Interfaces, Speech Recognition and Dialogue Systems.” SIGDIAL 2023 (arXiv: https://doi.org/10.48550/arXiv.2307.15493)

Opening up ChatGPT: Evidence-based measures of openness and transparency in instruction-tuned large language models

Posted by mark

With the first excitement of ChatGPT dying down, people are catching up on the risk of relying on closed and proprietary models that may stop being supported overnight or may change in undocumented ways. Good news: we’ve tracked developments in this field and there are now over 20 alternatives with varying degrees of openness, most of them more transparent than ChatGPT.

Ah yes, LLaMA 2, you may think, I heard about that one. Nope — by our measures that is literally the least open model currently available. This is from the company that has had no qualms experimenting on millions of Facebook users without consent and wrecking the mental health and body images of further untold millions on Instagram. They don’t deserve an ounce of trust when they release an “open” language model that they trained, using god knows which training data and bespoke “Meta reward modelling” based on over a million undisclosed data points.*

So anyway, I have two things to share:

We’ve published a short paper on why openness is important and how to assess it across a wide range of dimensions, from availability to technical documentation and access methods. We’ve used LLM+RLHF architectures as a timely test case but the framework is more generally applicable to current generative AI and ML releases.
Along with the paper comes a live tracker backed by a crowd-sourced repository that enables us to keep up with this fast-evolving field. For instance, while the paper was published before LLaMA 2 was released, it’s right there on our live tracker, and as I mentioned above it’s not looking good: it is the least open model of all ~20 ‘open’ models currently available.

In this post I’ll give some personal background on this project. First off, credits to first author Andreas Liesenfeld who pitched this idea as early as January 2023, when there were at best a handful of open alternatives. I confess I was skeptical at first —mostly because one of our problems is that we have too many paper ideas— but we decided to keep tracking this space until March, when a CUI paper deadline provided a useful anchoring point to make a go / no go decision. Obviously it was a go: by then, there were >10 alternatives, at revision time end of May we added another 5, and the live tracker currently features over 20 in total, with several more on our radar (you can help us out if you want!).

My personal interest in LLMs goes back to at least 2020, when I saw GPT3 and blogged about the unstoppable tide of uninformation this kind of language model was bound to cause if released to the masses. A year ago, in September 2022, I used GPT3 in teaching, first impressing my (undergrad) students with some smooth text output and then letting them demystify the system for themselves by giving them assignments to probe the limits and poke holes in its abilities (they were very good at that).

So in November 2022 I watched the release and breathless press coverage of ChatGPT with mild exasperation (and made a prediction: this creates a whole new market for generative AI detection). And I started worrying more about the closed and proprietary nature of ChatGPT. By cynically giving folks ‘free research access’ and keeping all prompts, what OpenAI was doing was harvesting human collective intelligence at unprecedented scale. Truly OpenAI gives with one hand and takes away with the other.

Our paper calls out OpenAI for their exploitative practices and highlights the role of publicly funded science in building alternatives that can be used responsibly in fundamental research and in education. ChatGPT as it is is unfit for responsible use. Fortunately, there are enough alternatives currently to not have to resort to it except as the cynical prototype of maximally closed corporate profiteering — right at the very bottom of our live tracker.

Critical and constructive contributions

A final note of clarification. Opening up ChatGPT is important, not because this technology is so beneficial or offers a good model of language (it does not), but because we can only effectively understand and responsibly limit it when it is open sourced: when we can audit its data and document its harms; when we can examine the reinforcement learning methods to study the contribution of human labour; when we can tinker with models to test the consequences of synthetic data and relying on automatic evaluation methods.

From what I’ve seen so far I doubt that bringing this technology into the world will be a net benefit to humanity. Many current harms have been pointed out by bright minds all around the globe. Let me single out Emily Bender, Timnit Gebru, Margaret Mitchell, Angelina McMillan-Major and their teams in particular. Not only have they articulated some of the most important fundamental critiques of large language models (Bender et al. 2021), they have also made immense constructive contributions towards doing things better, by spelling out frameworks for model cards (e.g., Mitchell et al. 2019) and data statements (e.g., Gebru et al. 2021, McMillan-Major et al. 2023). Also, Abeba Birhane and co-authors like Vinay Uday Prabhu and Emmanuel Kahembwe (Birhane et al. 2021) deserve major credit for showing how to do the incredibly important work of auditing datasets used in current machine learning. Only some models currently are open enough to allow this kind of auditing, but it is absolutely critical for any responsible use.

Many of these elements are directly incorporated into our framework as dimensions on which projects can claim openness points. The BLOOMZ model, an outcome of an audacious year long global collaboration under the moniker ‘BigScience Workshop’, currently tops our list as a project with exemplary openness on all fronts. We’ve been quite impressed by it (and it is no surprise that some of the same names mentioned above are involved in it), even if openness doesn’t mean there are no problems with language models. This project shows what scientists can do when they put their minds to working together, and when openness and transparency are included as a design criterion right from the start.

Our work is only a small pebble in this larger stream of work towards responsible, transparent, and accountable AI/ML. As we write:

Openness is not the full solution to the scientific and ethical challenges of conversational text generators. Open data will not mitigate the harmful consequences of thoughtless deployment of large language models, nor the questionable copyright implications of scraping all publicly available data from the internet. However, openness does make original research possible, including efforts to build reproducible workflows and understand the fundamentals of LLM + RLHF architectures. Openness also enables checks and balances, fostering a culture of accountability for data and its curation, and for models and their deployment. We hope that our work provides a small step in this direction.

Thanks for reading!

Paper — Live tracker — Repository

References

* At least that’s what the July 18, 2023 version of their own report hosted on their own server said. I try not to cite stuff that is not clearly version controlled; here is their link.

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–23. Virtual Event Canada: ACM. https://doi.org/10.1145/3442188.3445922.
Birhane, Abeba, Vinay Uday Prabhu, and Emmanuel Kahembwe. 2021. “Multimodal Datasets: Misogyny, Pornography, and Malignant Stereotypes.” arXiv. https://doi.org/10.48550/arXiv.2110.01963.
Gebru, Timnit, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021. “Datasheets for Datasets.” Communications of the ACM 64 (12): 86–92. https://doi.org/10.1145/3458723.
Gebru, Timnit, Emily M. Bender, Angelina McMillan-Major, and Margaret Mitchell. 2023. “Statement from the Listed Authors of Stochastic Parrots on the ‘AI Pause’ Letter.” https://www.dair-institute.org/blog/letter-statement-March2023.
Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In ACM Conference on Conversational User Interfaces (CUI ’23). Eindhoven. https://doi.org/10.1145/3571884.3604316.
McMillan-Major, Angelina, Emily M. Bender, and Batya Friedman. 2023. “Data Statements: From Technical Concept to Community Practice.” ACM Journal on Responsible Computing. https://doi.org/10.1145/3594737.
Mitchell, Margaret, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. “Model Cards for Model Reporting.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 220–29. FAT* ’19. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3287560.3287596.

How robots become social: A comment on Clark & Fischer

Posted by mark

— by Mark Dingemanse & Andreas Liesenfeld, Radboud University Nijmegen

Clark & Fischer propose that people see social robots as interactive depictions and that this explains some aspects of people’s behaviour towards them. We agree with C&F’s conclusion that we don’t need a novel ontological category for these social artefacts and that they can be seen as intersecting with a lineage of depictions from Michelangelo’s David to Mattel’s talking barbie doll. We have two constructive contributions to make.

First, we think C&F undersell the power of their depiction account as a tool for designers: it can help us understand the work of creating simulacra of social agents (Suchman, 2007), and can help explain people’s initial responses to perceptions of them. When mechanical artefacts are endowed with cues to agency like voice, movement, and likenesses of body parts, this shapes people’s perceptions of their affordances for interaction. It may also help explain the uncanny valley effect (Mori, MacDorman, & Kageki, 2012): when depictions are so lifelike as to be mistakeable for the real thing, a closer look may jolt us from as to as-if perception.

Second and more critically, we note that C&F leave unexamined the notion of “social” in social robots; the question of how technologies like this become enmeshed in human sociality (Heath & Luff, 2000). In particular, C&F’s focus on robots-as-depictions risks losing sight of the question of how exactly people make robots part of their ongoing interactional business. The transcripts shown by C&F provide direct evidence of this interactional work. For instance, the laughter and non-serious responses to Smooth and Aibo show how people normalise strange situations by turning to humour and playful exploration (Moerman, 1988). It is not just that people can see these objects as depictions; they treat them as “liminal members” that can barely hold up their end of a conversation (Kopp & Krämer, 2021). This is where the limits of the depiction account come into view. There are many other beings that, for better or worse, are sometimes treated in interaction as having partial or liminal membership (Sacks, 1989). Surely children and pets are not depictions, and yet they too can be treated as less than full agents, with less than full social accountability.

In sum, while the robots-as-depictions account helps make sense of the construction and perception of humanoid robots, it needs to be coupled with investigations of our interaction with them to explore how they become social.

References

Clark, Herbert H., and Kerstin Fischer. 2022. “Social Robots as Depictions of Social Agents.” Behavioral and Brain Sciences, March, 1–33. https://doi.org/10.1017/S0140525X22000668.
Heath, C., & Luff, P. (2000). Technology in action. Cambridge, U.K. ; New York: Cambridge University Press.
Kopp, S., & Krämer, N. (2021). Revisiting Human-Agent Communication: The Importance of Joint Co-construction and Understanding Mental States. Frontiers in Psychology, 12. doi: 10.3389/fpsyg.2021.580955
Moerman, M. (1988). Talking Culture: Ethnography and Conversation Analysis. Philadelphia: University of Pennsylvania Press.
Mori, M., MacDorman, K. F., & Kageki, N. (2012). The Uncanny Valley [From the Field]. IEEE Robotics Automation Magazine, 19(2), 98–100. doi: 10.1109/MRA.2012.2192811
Sacks, H. (1989). Extract Nine: For children: A limited set of categories. Human Studies, 12(3–4), 363–364. doi: 10.1007/BF00142783
Suchman, L. A. (2007). Human-machine reconfigurations: Plans and situated actions (2nd ed). Cambridge ; New York: Cambridge University Press.

Consolidating iconicity research

Posted by mark

Readers of this blog know that I believe serendipity is a key element of fundamental research. There is something neatly paradoxical about this claim. We might like ‘key elements’ to be plannable so that we can account for them on budgets and balance sheets. But here is an element that I think can make a huge difference to the quality of our scientific work yet that is pretty much the antithesis of gantt charts, KPIs and work packages. (Then again, who has ever thought that gantt charts contributed to the quality of their work?)

A lot of my collaborations start out serendipitously. Someone mentions a cool topic in the comments on my blog. A colleague mentions a weird observation at a conference. You have some interesting data, I have a research question (or vice versa). This is how I’ve come to work on vowel-colour synaesthesia, a link between ‘r’ and rough, and communicative redoings across species. Sometimes such serendipitous collaborations turn into more durable associations. That is the case with three papers on iconicity that were published this month.

Making and breaking iconicity

My involvement in iconicity research has always been a bit cautious. The topic doesn’t play a large role in my early work on ideophones, because I saw that was where prior work had sometimes gone astray (either overclaiming or underclaiming the relevance of iconicity). When I did get into it, it was with the goal of bringing more conceptual and methodological precision (as in our teasing apart of iconicity vs systematicity, our cross-linguistic study of what sound-symbolism can and cannot do, and Gwilym Lockwood’s PhD work on individual differences in sensitivity to iconicity). In a 2017 keynote at SALC in Lund, I summarised my approach as “Making and breaking iconicity“.

To the extent that I continued work on iconicity, it was in this vein. The three studies out this month have grown out of long-term collaborations and each pick up some of these threads. Here they are:

Winter, Bodo, Gary Lupyan, Lynn K. Perry, Mark Dingemanse, and Marcus Perlman. “Iconicity Ratings for 14,000+ English Words.” Behavior Research Methods, April 20, 2023. https://doi.org/10.3758/s13428-023-02112-6.
McLean, Bonnie, Michael Dunn, and Mark Dingemanse. “Two Measures Are Better than One: Combining Iconicity Ratings and Guessing Experiments for a More Nuanced Picture of Iconicity in the Lexicon.” Language and Cognition, April 11, 2023, 1–24. https://doi.org/10.1017/langcog.2023.9.
Van Hoey, Thomas, Arthur L. Thompson, Youngah Do, and Mark Dingemanse. “Iconicity in Ideophones: Guessing, Memorizing, and Reassessing.” Cognitive Science 47, no. 4 (2023): e13268. https://doi.org/10.1111/cogs.13268.

These papers build on prior work, replicating some of our findings in larger data sets, different languages, and complementary methods. They contribute to rounding out the picture of iconicity. And yet despite their excellent topical and methodological fit, they originated on different timelines, with teams in Birmingham, Uppsala and Hong Kong that worked entirely independently from one another.

Iconicity ratings for lots of words

This paper arises out of a very pleasant collaboration with Bodo Winter and Marcus Perlman at Birmingham along with Gary Lupyan (Wisconsin) and Lynn Perry (Miami). Noting the growing interest in iconicity ratings, we operationalised a measure that we believe captures raters’ subjective feeling of iconicity quite well, and we collected ratings for 14k English words (with at least 10 raters per word). This is currently the largest resource of carefully operationalised iconicity ratings available. You can find the ratings in the OSF repository.

There are many findings in this paper — the figure above highlights some of them. The strongest correlation of the new iconicity ratings is with humor ratings, a theoretically motivated association that we investigated in prior work using a smaller set of ratings for about 1,400 words (and imputed ratings for >60,000 words). These new ratings constitute the strongest replication yet of that finding — both the positive correlation between humor (or word-level funniness) and iconicity ratings, and the negative correlation to various measures of structural markedness which we hypothesized underpinned this link.

From rating to guessing

Another new paper out this month is with Bonnie McLean and Michael Dunn at Uppsala. This is a report of a range of experimental designs for assessing lexical iconicity. The key point is, as the title says, that ‘Two measures are better than one’: we can learn more about iconicity by using information from both guessing tasks (in which people link forms to meanings or meanings to forms) and rating tasks (in which people rate the match between forms and meanings).

Bonnie wrote a guest post about this paper so I don’t need to say too much. I do want to highlight two aspects of this work. First, the paper offers a systematic tour of the possibility space of experimental approaches to lexical iconicity. If you wonder whether to use random foils or opposite ones; whether to let people guess between the meanings of words (given a form) or between forms (given a meaning); or whether who rates words (native speaker or non-speaker) makes a difference for how ratings come out — this paper has you covered.

Second, as part of doing these experiments, Bonnie also wrote a python package icotools that makes it easy to set up rating and guessing experiments of all kinds. Software work is rarely credited in academia, but it is hugely important for reproducibility and for cumulative progress. Our hope is that this package will help more people to do experimental work on lexical iconicity — and if they do, they will be able to report the results in transparent and reproducible ways.

From guessing to memorizing and reassessing

The third new paper this month comes out of my collaboration with Youngah Do at Hong Kong University, with whom I co-direct a GRF project on iconicity. Postdocs Thomas van Hoey and Arthur Thompson have carried out a number of experimental studies of cross-linguistic iconicity. The backbone of this paper is a replication effort of two of our previous studies: a cross-linguistic one where we test the guessability of ideophones from 5 languages, and one on Japanese where we test the learning (or memorability) of ideophones.

Here as above, key results do indeed replicate. One prior study tested the guessability of ideophones from Japanese, Korean, Semai, Siwu and Ewe. This new paper adds another West-African language (Igbo) and replicates the work on Japanese and Korean. And where the prior study had speakers of Dutch doing the guessing, here we have speakers of Cantonese, showing that the result generalizes (we didn’t think Dutch speakers would have a unique aptitude for guessing ideophones, but it’s always nice to actually see that something works the same for speakers of a different language halfway around the globe).

So ideophones may be somewhat easier to guess, but does that also make them easier to memorise? In this paper, we find that people are pretty good at learning ideophones and adjectives (about as good, in fact, for the forms tested here), even if you swap the meanings to opposite foils. In this respect the findings differ from my earlier work on Japanese ideophones — a partial non-replication. An important difference though is that in that prior study we used the same preselection filter for both ideophones and adjectives, while here that filter is only used for adjectives, stacking the deck against finding a difference. Can we say more about this? We introduce a fun new analysis to dig into this question that looks at “flip-flopping”, or reassessing to use a more respectable term. For form-meaning pairs with ‘wrong’ foils that they encountered, it turns out that people are more willing to revise their mapping in the right direction for ideophones than for adjectives. More about this paper on Thomas’ blog.

In closing

I started out by drawing attention to the role of serendipity in fundamental research. The coming together of these three papers is also an example of that. It almost looks like it’s planned: a bouquet of complementary methods, with findings that establish the reliability and replicability of some of the first studies that brought together ideophones and iconicity. And yet the trajectories of these three papers have been entirely independent, which makes it all the more remarkable that they’ve come out within two weeks of one another.

So, here’s to serendipity!

At the smallest scale of human interaction, prosocial behavior follows cross-culturally shared principles

Posted by mark

We have a new paper out in which we find that people overwhelmingly like to help one another, independent of differences in language, culture or environment. This finding is surprising from the perspective of anthropological and economic research, which has tended to foreground differences in how people work together and share resources. Key to this new work is a focus on the microscale of human interaction.

Sampling many hours of everyday social interactions around the world, we find that small requests for assistance are very common —once about every two minutes on average— and that people overwhelmingly help each other out, with compliance seven times more likely than rejection. And in the few cases that people do reject, they tend to give a reason.

We found that the frequency of small requests for assistance is highest in task-focused interactions like cooking dinner, with an average of one request every 1 minute and 42 seconds, and lower in talk-focused activities, with an average of one request per 7 minutes and 42 seconds.

The kind of low-cost requests we see —hand me that thing, hold this for me, lift your arm, let me see that, zip me up— are very different from the type of higher stakes exchanges often studied by economists and behavioural ecologists, which are much rarer and so perhaps more susceptible to cultural inflection. These small acts of human kindness suggest deep similarities in cooperative behaviour across cultures.

The report is the latest outcome of a large-scale comparative project that started in Nijmegen over a decade ago, in Nick Enfield’s ERC project Human Sociality and Systems of Language Use. Earlier work from the same team documented universals and cultural diversity in the expression of gratitude (2018) and universal principles in the repair of communicative problems (2015). Our work on small requests for assistance (or ‘recruitments’) is also documented in Getting others to do things, an open access book published with Language Science Press.

Rossi, G., Dingemanse, M., Floyd, S., Baranova, J., Blythe, J., Kendrick, K. H., Zinken, J., & Enfield, N. J. (2023) Shared cross-cultural principles underlie human prosocial behavior at the smallest scale. Sci Rep 13, 6057. nature.com/articles/s41598-023-30580-5

New paper – What do we really measure when we measure iconicity?

Posted by mark

Guest post by Bonnie McLean!

The first paper from my thesis is now out in Language & Cognition!🎉

You can find (and cite) it below:

McLean, B., Dunn, M., & Dingemanse, M. (2023). Two measures are better than one: Combining iconicity ratings and guessing experiments for a more nuanced picture of iconicity in the lexicon. Language and Cognition, 1-24. doi:10.1017/langcog.2023.9

Continue reading →

Playing with R: unrolling conversation

Posted by mark

A lot of our recent work revolves around working with conversational data, and one thing that’s struck me is that there are no easy ways to create compelling visualizations of conversation as it unfolds over time. The most common form seems to be pixelated screenshots of transcription software not made for this purpose. In the Elementary Particles of Conversations project we’re aiming to change that — see our ACL2022 paper and stay tuned for updates on talkr. Here I want to give a sneak peek into the kitchen as I experiment with new ways of plotting conversational structure.

Also, this post is my first time blogging to WordPress directly from rmarkdown. I’m using the amazing goodpress package: https://github.com/maelle/goodpress/.

Ten minutes of conversation

I’m going to walk through an example of plotting a ten minute stretch of conversation. We start by loading some sample data. For privacy reasons I’ll use only the timing data. The most important bits here: every annotation corresponds to a turn at talk by some participant with a begin and end time in milliseconds. This corresponds to the minimal flat data format for diarised conversational data we specify here.


extract <- readr::read_csv('sample_conversation.csv',show_col_types = F) %>%
  select(begin,end,duration,participant,uid,nwords,nchar,n,rank,freq,overlap,priorby,FTO,overlapped,talk_all,talk_rel,load,transitions,topturn,focus,scope,participant_int,begin0,end0,participation)

head(extract)
#> # A tibble: 6 × 25
#>   begin   end duration participant    uid      nwords nchar     n  rank     freq
#>   <dbl> <dbl>    <dbl> <chr>          <chr>     <dbl> <dbl> <dbl> <dbl>    <dbl>
#> 1  1271  3158     1887 A_agent_text   hungari…      1     2    84    40  1.29e-6
#> 2  3158  4830     1672 A_agent_text   hungari…      4    20     1   112  1.53e-8
#> 3  6221  8150     1929 A_agent_text   hungari…      1     7     1   112  1.53e-8
#> 4  7344  8145      801 A_speaker_text hungari…     NA    NA    NA    NA NA      
#> 5  8150  9821     1671 A_agent_text   hungari…      4    26     1   112  1.53e-8
#> 6  9007 10151     1144 A_speaker_text hungari…      4    17     1   112  1.53e-8
#> # ℹ 15 more variables: overlap <chr>, priorby <chr>, FTO <dbl>,
#> #   overlapped <chr>, talk_all <dbl>, talk_rel <dbl>, load <dbl>,
#> #   transitions <dbl>, topturn <dbl>, focus <chr>, scope <chr>,
#> #   participant_int <dbl>, begin0 <dbl>, end0 <dbl>, participation <chr>

We start by setting a few basic parameters that will help us to divide these ten minutes of conversation into as many lines. Basically, we cut up the conversational turns by their categorizing begin values into intervals the width of window_size.

extract_length (ms): total length of desired stretch of conversation
window_size (ms): length of a single line
window_breaks: integer vector of extract_length divided by window_size


extract_length <- 600000 # 10 min
window_size <- 60000 # 1 min
window_breaks <- as.integer(c(0:round(extract_length/window_size)) * window_size)

extract <- extract %>%
  mutate(end = end - min(begin), # reset timestamps to start from 0
         begin = begin - min(begin),
         line = cut(begin,window_breaks,right=F,labels=F)) %>%
  drop_na(line) %>%
  group_by(line) %>%
  mutate(begin0 = begin - min(begin), # reset timestamps to 0 for each new line
         end0 = end - min(begin)) %>%
  ungroup()

Now that our extract has turns divided into lines, we start with a regular plot in which the conversation flows from left to right and from top to bottom. We number the lines so that you can see how turns get assigned to them.

There’s a bunch of things you might note about this plot. We reverse the y scale because we want the first turns (line 1) to be on top. We plot the actual turns (or at least their timing) with geom_rect(). And there’s another layer of items plotted with geom_point(); these are interjections, one-word turns like uh-huh and yeah.


plot.base <- extract %>%
  ggplot(aes(y=participant_int)) +
  ggthemes::theme_tufte() + theme(legend.position = "none",
                        strip.text = element_blank(),
                        axis.text.y = element_blank(),
                        axis.ticks.y = element_blank(),
                        plot.title.position = "plot") +
  ylab("") + xlab("time (s)") +
  viridis::scale_fill_viridis(option="plasma",direction=1,begin=0.2,end=0.8) +
  scale_y_reverse(breaks=seq(1,max(extract$line,1)),
                  labels=seq(1,max(extract$line,1))) +
  scale_x_continuous(limits=c(0,window_size),
                     breaks=seq(0,window_size,10000),
                     label=seq(0,window_size/1000,10),
                     oob = scales::oob_keep) 

plot.simple_grid <- plot.base +
  theme(axis.text.y = element_text()) +
  ggtitle("Ten minutes of conversation",
          subtitle="Points are interjections, time moves left-right and top-bottom") +
  geom_rect(aes(xmin=begin0,xmax=end0,ymin=line-0.5+participant_int/3-0.2,ymax=line-0.5+participant_int/3+0.2), 
                      linewidth=0,colour=NA,fill="lightgrey") +
  geom_point(data=. %>% filter(nwords == 1, topturn == 1),
             aes(x=begin0+200,fill=rank,y=line-0.5+participant_int/3),colour="white",size=2,shape=21,stroke=1)
plot.simple_grid

A grid is boring though. Can we make it more interesting visually? Let’s try a polar plot. One lovely thing about ggplot is that this is as easy as adding coord_polar():


plot.simple_grid +
  coord_polar()

Make it spiral

It would be much nicer of course if this were a spiral instead of concentric rings. For that, we’re going to have to take a slightly different approach. Recall that the y coordinate of individual turns is currently determined by their line number. If we want to make the end of one line meet the beginning of the next, we need a variable that increments over the window_length. Let’s call it line_polar:

extract <- extract %>% 
  mutate(line_polar = line+((1+begin0)/window_size)) 

# we'll need to update our plot.base with that new dataset
plot.base <- extract %>%
  ggplot(aes(y=participant_int)) +
  ggthemes::theme_tufte() + theme(legend.position = "none",
                        strip.text = element_blank(),
                        axis.text.y = element_blank(),
                        axis.ticks.y = element_blank(),
                        plot.title.position = "plot") +
  ylab("") + xlab("time (s)") +
  viridis::scale_fill_viridis(option="plasma",direction=1,begin=0.2,end=0.8) +
  scale_y_reverse(breaks=seq(1,max(extract$line,1)),
                  labels=seq(1,max(extract$line,1))) +
  scale_x_continuous(limits=c(0,window_size),
                     breaks=seq(0,window_size,10000),
                     label=seq(0,window_size/1000,10),
                     oob = scales::oob_keep)

We’ll also need a line to guide our eyes along the spiral. We make a grid of points that we can plot as a geom_line(). We now recreate our plot using the updated dataframe and position the turns on the line_polar variable:


spiral <- expand.grid(seconds = seq(0,window_size,1000), line = 1:max(extract$line))
spiral$line_polar <- spiral$line+(spiral$seconds/window_size)

plot.polar <- plot.base +
  coord_polar() +
  geom_rect(aes(xmin=begin0,xmax=end0,ymin=line_polar-1+participant_int/3-0.2,ymax=line_polar-1+participant_int/3+0.2),
            linewidth=0,colour=NA,fill="lightgrey") +
  geom_line(data=spiral,aes(x=seconds,y=line_polar-0.5,group=line),
            color="darkgrey",linewidth=0.5) +
  geom_point(data=. %>% filter(nwords == 1, topturn == 1),
             aes(x=begin0+200,fill=rank,y=line_polar-1+participant_int/3),colour="white",size=2,shape=21,stroke=1)

plot.polar

Cool! It starts to look like a spiral. But wait — now perhaps the time dimension doesn’t make a lot of sense anymore: it starts at the top and rolls… inward? It would be nicer if things moved outward, giving a sense of conversation unrolling. That’s as easy as un-reversing the y-axis (which, you recall, we reversed because in plot.simple_grid we wanted it to run from top to bottom).

We get a warning here for setting the y-axis for a second time but that’s okay.


plot.polar + scale_y_continuous() 
#> Scale for y is already present.
#> Adding another scale for y, which will replace the existing scale.

Bonus plot. The line_polar trick we use to weld line ends to next line beginnings is easy to see if we unroll this plot. Commenting out coord_polar() gives us:


plot.base + 
  #  coord_polar() +
  theme(axis.text.y=element_text()) +
  geom_rect(aes(xmin=begin0,xmax=end0,ymin=line_polar-0.5+participant_int/3-0.2,ymax=line_polar-0.5+participant_int/3+0.2),
            linewidth=0,colour=NA,fill="lightgrey") +
  geom_line(data=spiral,aes(x=seconds,y=line_polar,group=line),
            color="darkgrey",linewidth=0.5) +
  geom_point(data=. %>% filter(nwords == 1, topturn == 1),
             aes(x=begin0+200,fill=rank,y=line_polar-0.5+participant_int/3),colour="white",size=2,shape=21,stroke=1)

That’s it for today. Note that none of these experimental plots are proposed as serious scientific visualizations. In particular, the polar plot has the obvious drawback of deforming time. So really the point of this blog post is just to test a new workflow for blogging straight from R, which will be useful if I have more plots and code to share. For more information on conversational structure and its importance for NLP, linguistics and the cognitive sciences, check out Elementary Particles of Conversation. Thanks for reading!

Malinowski (1922) on Large Language Models

Posted by mark

It’s easy to forget amidst a rising tide of synthetic text, but language is not actually about strings of words, and language scientists would do well not to chain themselves to models that presume so. For apt and timely commentary we turn to Bronislaw Malinowski who wrote:

there is a series of phenomena of great importance which cannot possibly be recorded by questioning or computing documents, but have to be observed in their full actuality. Let us call them the inponderabilia of actual life.

In follow-up work, Malinowski has critiqued the unexamined use of decontextualised strings of words as a proxy for Meaning:

To define Meaning, to explain the essential grammatical and lexical characters of language on the material furnished by the study of [written records], is nothing short of preposterous in the light of our argument. Yet it would be hardly an exaggeration to say that 99 per cent of all linguistic work has been inspired by the study of dead languages or at best of written records torn completely out of any context of situation.

Malinowski did not write this on his substack, in an op-ed in the New York Times, or in a preprint. He spent time doing primary fieldwork, lived with people whose language he learned, and based on close observation of language in everyday use came to an informed critique of his contemporaries’ extreme reliance on strings of text.

He did all this over a century ago, and yet here we are, running in circles around stochastic text generators or text regurgitators, as we may call the LLMs that today excel in next token prediction. Makes me think of something Wittgenstein wrote in another context, for a similar problem: “A picture held us captive. And we could not get outside it, for it lay in our language and language seemed to repeat it to us.”

Malinowski, B. (1922). Argonauts Of The Western Pacific. London: Routledge & Kegan Paul.
Malinowski, B. (1923). The problem of meaning in [underdescribed*] languages. In C. K. Ogden & Richards (Eds.), The meaning of meaning (pp. 296–336). London: Kegan Paul.

* I write [underdescribed] where Malinowski had ‘primitive’ to draw attention to the following: Malinowski wrote at a time when scientific racism meant that “modern” or “civilized” languages were habitually contrasted with “primitive” or “savage” ones — even as his own work helped demolish that distinction and showed the primacy of language use in everyday life across societies.

Update July 2023: If, despite this, you’re interested in “Large Language Models”, we have some relevant work for you: Opening up ChatGPT (and accompanying paper, and blog post).

Mindblowing dissertations

Posted by mark

We don’t generally see PhD dissertations as an exciting genre to read, and that is wholly our loss. As the publishing landscape of academia is fast being homogenised, the thesis is one of the last places where we have a chance to see the unalloyed brilliance of up and coming researchers. Let me show you using three examples of remarkable theses I have come across in the past years.

Unflattening by Nick Sousanis

I didn’t even know it was even possible to do a PhD dissertation in graphic novel style. And yet here we are. This is a mindblowing work that (my colleagues can attest) I keep raving about. From the back blurb:

Nick Sousanis defies conventional forms of scholarly discourse to offer readers both a stunning work of graphic art and a serious inquiry into the ways humans construct knowledge. Unflattening is an insurrection against the fixed viewpoint. Weaving together diverse ways of seeing drawn from science, philosophy, art, literature, and mythology, it uses the collage-like capacity of comics to show that perception is always an active process of incorporating and reevaluating different vantage points.

The title, “Unflattening” reads as a reference to the famous mathematical allegory by Abbott, Flatland: the two-dimensional world where one day A. Square is visited by a being from another dimension. Just as A. Square learns to “unflatten” his world as he gets to know the three-dimensional Sphere visiting him, so the readers of Sousanis’ work are invited to venture out of their conceptual comfort zones and explore a multiplicity of perspectives.

(Update: Sousanis notes: “I came up with the word Unflattening early on, and in my planning, figured I should use Flatland – originally at the end, but moved up to front…”)

The published version of this work is a book to have lying around on your desk. It’s always a pleasure to leaf through and there’s always something new to discover.

The equidistribution of … number fields, by Piper Harron

The full title of this work is “The Equidistribution of Lattice Shapes of Rings of Integers of Cubic, Quartic, and Quintic Number Fields: an Artist’s Rendering”, and if this makes you glaze over, you should skip right to the prologue where it says:

Respected research math is dominated by men of a certain attitude. Even allowing for individual variation, there is still a tendency towards an oppressive atmosphere, which is carefully maintained and even championed by those who find it conducive to success. As any good grad student would do, I tried to fit in, mathematically. I absorbed the atmosphere and took attitudes to heart. I was miserable, and on the verge of failure. The problem was not individuals, but a system of self-preservation that, from the outside, feels like a long string of betrayals, some big, some small, perpetrated by your only support system. When I physically removed myself from the situation, I did not know where I was or what to do. First thought: FREEDOM!!!! Second thought: but what about the others like me, who don’t do math the “right way” but could still greatly contribute to the community? I combined those two thoughts and started from zero on my thesis. What resulted was a thesis written for those who do not feel that they are encouraged to be themselves
Harron 2016:1

The whole thing is written in an irreverent, cool and clear writing style, smoothly segueing from “laysplaining” sections aimed at a general audience to more technical parts aimed at mathematicians. No wonder that Harron’s thesis went viral, winning coverage on Mathbabe, Scientific American and even getting its own Wikipedia article. Even for readers without the requisite background in mathematics, this thesis is a wonder to behold and a joy to read.

Swarmpunk by Jonny Saunders

Swarmpunk! The name alone is a reason to check it out — and be sure to have a look at the corresponding website. There are many things to admire about this thesis. Content-wise, it is hard to summarise (the mark of a good thesis to my mind). Here’s the author’s own attempt:

Drawing from decades of digital infrastructural history within and beyond science, I will sketch a path by which we might build systems that empower, rather than control us. I will argue a better future for science is not utopian, nor solely dependent on funding and administrative agencies, but something we can organize ourselves. Woven together with my work in ill-defined phonetic categories and distributed experimental systems, I have written a love letter to the power of swarms: how by embracing heterogeneity and rough consensus we might make science more boisterous, creative, and human.

I love the wide-ranging nature of this work all the more because the swarm metaphor does pull things together in a compelling way. Who would have thought that the neuroscience of phonetic categorization in mice could come together with a technical description of peer to peer protocols and trenchant critiques of academia’s infrastructural woes. And yet it all makes sense.

The sheer breadth of topics covered evokes Bruno Latour’s sprawling Actor Network Theory framework and also shares a lot of insights with current approaches to collective intelligence (e.g., all intelligence is collective intelligence). What’s also impressive about this thesis is the way it combines a hard-hitting critique of the current state of academic workflows with concrete tools and constructive proposals to make things better. This makes it a truly excellent example of what Ivan Illich calls counterfoil research.

And then I haven’t even discussed the type-setting (in lovely Tufte-style), the anarchist easter eggs (like figure captions that say “taken without permission”) and the sharp-witted writing throughout. Read this work and let it broaden your conceptions of how to build a better academia.

Real life is gnarly

Here’s another thing I love about these three theses: they all show something of the making process. They don’t present a fully polished surface; they have rough edges and that is entirely okay (especially for a thesis — but I think this holds in general for the scientific process).

Sousanis’ Unflattening was revised for publication with Harvard University Press, so I imagine some of the loose ends have been tied up; but it includes, at the end, a set of sketches that document the very beginning of the work, dated to April 14, 2011. Why don’t more theses have this?

First idea map for *Unflattening*, April 14, 2011 (Sousanis 2015:194)

Meanwhile, for Harron, working on the thesis had to find a place next to other momentous life events. In the Acknowledgements, there is a parenthetical aside “I would thank the children, but frankly, they’ve been no help.”. One reason for this is found on the opening page of Chapter 5: a cartoon of the author working on “THESIS (draft)” followed by a panel with an inset where the author says “OW”, and contractions start…

Saunders’ Swarmpunk moves from work on phonetic categorization in mice to an unfinished chapter of musings on the topic of language games, and from there even more abruptly to work on a software infrastructure for scientific experimentation and collaboration (check out Autopilot). As they explain:

I had intended to finish my dissertation with an experiment that was the next logical conclusion of the mouse model of phonetics, doing longitudinal mesoscopic calcium imaging of auditory cortex as the phonemes were being learned in order to model the changes in network activity. (…) It comes out of chronological order in the spring of 2021, after my work with Autopilot and a covid-induced awakening of the possibility of public engineering with the People’s Ventilator Project[72]. I was restless and not ready to return to basic research while the world was still so broken, and so it was abandoned in favor of the last piece on digital infrastructure. Accordingly, it ends relatively abruptly, without satisfying conclusion. I include it here in its unfinished form, roughly edited, warts and all, as something I intend to pick up perhaps one day when basic science is more possible.

And yet the end product is amazing. Perhaps precisely because of the abrupt transitions, it invites the reader to fill in the gaps, to create those conceptual linkages, to contribute the cognitive work that makes all of science a collective enterprise.

More importantly, it reminds us that we are humans first and foremost, and science is a human enterprise in which the neat plans we may make can be greatly affected by real life events — as they should.

A plea for serendipity

Dissertations like these may be rare. But they exist. And by existing, they show that true creativity is still alive in academia. For me personally, this kind of work strengthens my convinction that as supervisors, we need to be serious about ideas and open-minded about form and content. (Perhaps this is why I have always liked my PhD student’s Gwilym Lockwood’s mad MS-Paint skills — and the Harron-like irreverent style of the opening pages of his 2017 PhD.)

Today’s PhD projects tend to be considerably less free-form than even 5 or 10 years ago. Today we ask graduate students to write full-blown project proposals, force them to specify their planning in gantt charts, and herd them through series of checkpoints. I understand why this is done, and that it is helpful to some. But I cannot ignore that it is also a machinery for control, arising from management processes built towards verification, certification, and standardization. It is, at base, a stark form of risk-aversion.

But here is the thing about science: it thrives on serendipity. If all risks had been averted and all processes duly followed, it is unlikely that we would have seen these three mindblowing dissertations. Here’s to an academia with room for serendipity!

Update: turns out Sousanis has a wonderful overview page on alternative forms of scholarship.

References

Harron, P. A. (2016). The Equidistribution of Lattice Shapes of Rings of Integers of Cubic, Quartic, and Quintic Number Fields: An Artist’s Rendering (PhD thesis, Princeton University) https://www.theliberatedmathematician.com/thesis/
Saunders, J. L. (2022). Swarmpunk: Rough Consensus and Running Code in Brains, Machines and Society (PhD dissertation, University of Oregon) https://jon-e.net/dissertation/
Sousanis, N. (2015). Unflattening. Cambridge, Massachusetts: Harvard University Press.

Beyond Single-Mindedness

Posted by mark

No mind is an island (after John Donne). In a new piece, we make the case for putting interaction at the heart of cognition. This represents a figure-ground reversal for the cognitive sciences, which traditionally have focused on single minds. Read the piece on the journal website, head directly to the PDF, or see the simple accompanying website.

On Bakhtinians

Posted by mark

Perhaps only those who haven’t read Bakhtin can call themselves true Bakhtinians: the ideas have to reach you and influence you through a polyphony of other texts and people.

We zijn al lang elders

Posted by mark

NRC vraagt zich af of wetenschappers hun werk blijven delen op twitter en vindt op twitter maar liefst 7 fervent twitterende wetenschappers die desgevraagd bevestigen nog op twitter te blijven.

Twitter is inderdaad van belang geweest voor de wetenschap, maar het lijkt vooral de journalistiek te zijn die nog aan het twitterinfuus ligt. Het geweldige collectief WO in Actie ontleende een deel van haar slagkracht aan sociale media; des te verbazender dat NRC onvermeld laat dat Robeyn’s trouwe kompanen Remco Breukers en Rens Bod al lang en breed twexit zijn. Ook over naar mastodon zijn bekende wetenschappers als Ionica Smeets, Daniël Lakens, Iris van Rooij, en Marc van Oostendorp — stuk voor stuk goed voor 8k-80k twittervolgers, maar de krant wist ze ineens niet meer te vinden, want ja, niet op twitter.

Grote delen van twitterend academia zijn kortom al lang elders. Ze schrijven, podcasten, bloggen, en tooten in alle vrijheid en openheid, zonder het knagende gevoel cosponsor te zijn van een povere miljardair die bevestiging zoekt bij bruinrechts. Groepsblogs als Neerlandistiek, Stuk Rood Vlees, Astroblogs en Bij Nader Inzien floreren. Podcasts zijn niet van de lucht en mooie initiatieven als Nemo Kennislink, de MuseumJeugdUniversiteit en de IMC Weekendschool brengen een groter publiek in aanraking met wetenschap en onderzoek.

Kortom, als het gaat om kanalen voor kennisdeling en opinievorming heeft het medialandschap buiten de twitterbubbel er lange tijd niet zo florissant bijgelegen als nu. Kom ook buitenspelen!

Thinking visually with Remarkable

Posted by mark

Sketches, visualizations and other forms of externalizing cognition play a prominent role in the work of just about any scientist. It’s why we love using blackboards, whiteboards, notebooks and scraps of paper. Many folks who had the privilege of working the late Pieter Muysken fondly remember his habit of grabbing any old piece of paper that came to hand, scribbling while talking, then handing it over to you.

Since the summer of 2021 I have owned a Remarkable, and it has become an essential part of my scientific workflow because it seamlessly bridges this physical form of thinking with the digital world of drafts, files and emails. I rarely rave about tools (to each their own, etc.) but this is one of those that has changed my habits for the better in several ways: I’ve been reading more, taking more notes, writing more, and also doodling and sketching more. As a cognitive scientist I would describe it as a distraction-free piece of technology with just the right affordances for powerful forms of extended cognition (it is probably no coincidence that it was recommended to me by fellow traveller Sébastien Lerique, whose interests range from embodied rationality to interaction).

One of ways in which the Remarkable has changed my workflow and my collaborations is that it is much easier to sketch a basic idea for a visualization and share it digitally. We use this during brainstorms to produce first impressions or visualize hypotheses. Often such a rough sketch then functions as a placeholder in a draft until we’ve made an actual version based on our data.

sketch made on remarkable of points scattered in two dimensions, with some points labelled. The dimensions are hours (x) and number of tokens (y), and the graph suggests a positive correlation for both groups of words, with one less frequent than the other — Left: first sketch made on Remarkable; Right: version published in osf.io/cwvbe

Worked out version of the sketch, created using ggplot in R. Caption in published version: Prevalence of sequentially identified candidate continuers and repair
initiators, demonstrating the potential of using sequential patterns
to identify them in language-agnostic ways. — Left: first sketch made on Remarkable; Right: version published in osf.io/cwvbe

The above example from a recent paper with Andreas Liesenfeld shows this process: first my rough sketch of what the plot might look like, which fuels our discussion and helps me to express how to transform our source data in R. Then a ggplot version I made in R that preserves the key idea and adds some bells and whistles like loess lines and colour.

I want to credit my collaborator Andreas Liesenfeld for pushing me to do more of this visual-first way of thinking. One of the things Andreas often asks when brainstorming about a new paper is: “okay but what’s the visual?”. Thinking early about compelling visualizations has made our papers more tightly integrated hybrids of text and visuals than they might otherwise have been. For instance, our ACL paper has 7 figures, approximately one to a page, that support the arguments, help organize the flow, and generally make for a nicer reading experience.

Conceptual frameworks

Sketches can also be useful to work out conceptual frameworks. In a recent collaboration with Raphaela Heesen, Marlen Fröhlich, Christine Sievers and Marieke Woensdregt we spent a lot of time talking about ways to characterize various types of communicative “redoings” across species. A key insight was that the variety of terms used in different literatures (eg. primatology vs. human interaction) could actually be linked by looking more closely at the sequential structure of communicative moves. I sent off a quick Saturday morning doodle to my collaborators, and ultimately we published a polished version of it in our paper on communicative redoings across species (PDF here).

Left: sketch of a conceptual framework for studying redoings across species. Right: published version following deliberation among collaborators and multiple smaller adjustments.

Finally, sketches are useful to express ideas and hypotheses visually even before the data is in. For instance, in current work with Bonnie McLean and Michael Dunn we’re thinking a lot about transmission biases and how they influence cultural evolution over time. Bonnie’s dataset looks at biases and rates of change in how concepts relate to phonemic features. It’s helped me to express my thinking on this visually, and I can’t wait to see what Bonnie ultimately comes up with. (This visualization is inspired in part by something I read about parallax in Nick Sousanis’ amazing book Unflattening.)

Sketch showing three panels side by side. One the left, a plot showing a time series with a multitude of grey lines in the lower range and a single black line rising above the grey mass to occupy a distinctly higher position on the Y axis.

In the middle, a skewed square with points corresponding to the end points of all the lines in the left panel, suggesting that it is a sliver of the end of the first plot.

On the right, the middel panel turned towards the reader into a square X-Y plot with a mass of grey dots joined by isolines roughly in the middle and a solitary black dot in the top right.

Not a review

This is not a review of the Remarkable — just a reflection on how it’s changed my academic life for the better. Every device has pros and cons. For instance, I don’t particularly love the overpriced stylus (‘Marker plus’) or how they sell Connect subscriptions for slightly better syncing options — though you should be aware you don’t need a subscription to do any of the things I’ve described in this post. And on the other hand, I absolutely do love the litheness of this device, the just-right friction when writing, and the fact that it has no backlight. The design in general strikes me as a perfect embodiment of that philosopher Ivan Illich has called ‘convivial tools’: tech that is sophisticated yet also responsibly limited in ways that support human flourishing. Anyway, there’s a good remarkable subreddit if you’re in the market for a device like this.

Note. Remarkable has a referral program that gives you a $40 (or equivalent) discount if you use this link to purchase one. If you like the device and keep it, that would also mean I earn $40, which I would use to treat my team to fancy coffee and cakes!

Monetizing uninformation: a prediction

Posted by mark

Over two years ago I wrote about the unstoppable tide of uninformation that follows the rise of large language models. With ChatGPT and other models bringing large-scale text generation to the masses, I want to register a dystopian prediction.

Of course OpenAI and other purveyors of stochastic parrots are keeping the receipts of what they generate (perhaps full copies of generated output, perhaps clever forms of watermarking or hashing). They are doing so for two reasons. First, to mitigate the (partly inevitable) problem of information pollution. With the web forming a large part of the training data for large language models you don’t want these things to feed on their own uninformation. Or at least I hope they’re sensible enough to want to avoid that.

But the second reason is to enable a new form of monetization. Flood the zone with bullshit (or facilitate others doing so), then offer paid services to detect said bullshit. (I use bullshit as a technical term for text produced without commitment to truth values; see Frankfurt 2009.) It’s guaranteed to work because as I wrote, the market forces are in place and they will be relentless.

Universities will pay for it to check student essays, as certification is more important than education. Large publishers will likely want it as part of their plagiarism checks. Communication agencies will want to claim they offer certified original human-curated content (while making extra money with a cheaper tier of LLM-supported services, undercutting their own business). Google and other behemoths with an interest in high quality information will have to pay to keep their search indexes relatively LLM-free and fight the inevitable rise of search engine optimized uninformation.

I believe we should resist this, and hold the makers of LLMs directly accountable for the problems they create instead of letting them hold us ransom and extract money from us for ‘solutions’. Creating a problem and then selling the solution is one of the oldest rackets in corporate profiteering, and we should not fall for it. Not to mention the fact that generic AI detection tools (an upcoming market) will inevitably create false positives and generate additional layers of confusion and bureaucratic tangles for us all.

Meanwhile, academics will be antiquarian dealers of that scarce good of human-curated information, slowly and painstakingly produced. My hope is that they will devote at least some of their time to what Ivan Illich called counterfoil research:

Present research is overwhelmingly concentrated in two directions: research and development for breakthroughs to the better production of better wares and general systems analysis concerned with protecting [hu]man[ity] for further consumption. Future research ought to lead in the opposite direction; let us call it counterfoil research. Counterfoil research also has two major tasks: to provide guidelines for detecting the incipient stages of murderous logic in a tool; and to devise tools and tool-systems that optimize the balance of life, thereby maximizing liberty for all.
Illich, Tools for Conviviality, p. 92

Frankfurt, H. G. (2009). On Bullshit. In On Bullshit. Princeton University Press. doi: 10.1515/9781400826537
Illich, I. (1973). Tools for conviviality. London: Calder and Boyars.

Talk, tradition, templates: a meta-note on building scientific arguments

Posted by mark

Chartres cathedral (*Gazette Des Beaux-Arts*, 1869)

Reading Suchman’s classic Human-machine reconfigurations: plans and situated actions, I am impressed by her description David Turnbull’s work on the construction of gothic cathedrals. In brief, the intriguing point is that no blueprints or technical drawings or even sketches are known to have existed for any of the early modern gothic cathedrals, like that of Chartres. Instead, Turnbull proposes, their construction was massively iterative and interactional, requiring —he says— three main ingredients: “talk, tradition, templates”. This sounds like an account worth reading; indeed perhaps also worth emulating or building on. In the context of the language sciences, an analogue readily suggests itself. Aren’t languages rather like cathedrals — immense, cumulative, complex outcomes of iterative human practice?

Okay nice. At such a point you can go (at least) two ways. You can take the analogy and run with it, taking Turnbull’s nicely alliterative triad and asking, what are “talk, traditions, and templates” for the case of language? It would be a nice enough paper. The benefit would be that you make it recognizably similar and so if the earlier analysis made an impact, perhaps some of its success may rub off on yours. The risk is that you’re buying into a triadic structure devised for a particular rhetorical purpose in the context of one particular scientific project.

Going meta

The second way is to ‘go meta’ and ask, if this triad is a useful device to neatly and mnemonically explain something as complex as gothic cathedrals, what is the kind of rhetorical structure we need to make a point that is as compelling as this (in both form and content) for the domain we are interested in (say, language)? See, and I like that second move a lot more. Because you’ve learnt from someone else’s work, but on a fairly abstract level, without necessarily reifying the particular distinctions or terms they brought to bear on their phenomenon.

While writing these notes I realise that I in my reading and reviewing practice, I also tend to judge scientific work on these grounds (among others). Does it work with (‘apply’) reified distinctions in an unexamined way, or does it go a level up and truly build on others’ work? Does it treat citations perfunctorily and take frameworks as given, or does it reveal deep reading and critical engagement with the subject matter? The second approach, to me, is not only more interesting — it is also more likely to be novel, to hold water, to make a real contribution.

In more than 11 words

Good models

Practical proposals

I. Present and analyse interjections in the context of conversational sequences

II. Cover at least the most frequent and interactionally consequential interjections

III. Gloss interjections in more specific ways than just INT/INTJ

IV. Consider animal-oriented interjections as a locus for comparison

Note

Grammars cited

Other sources cited

Easier for whom and relative to what?

An infrastructure for collaborative indiscretion

Why this matters

Scaling up and losing touch

Talk, warts and all

References

References and further reading

Legitimate critique is not fear

“Words”?

Questionable claims

Rhetorical slippage

Drive-by citations

Misrepresenting work on word learning

In sum

References

A fraught history

That’s it (for now)

Methods

Grounding

Critical and constructive contributions

References

Making and breaking iconicity

Iconicity ratings for lots of words

From rating to guessing

From guessing to memorizing and reassessing

In closing

Ten minutes of conversation

Make it spiral

Unflattening by Nick Sousanis

The equidistribution of … number fields, by Piper Harron

Swarmpunk by Jonny Saunders

Real life is gnarly

A plea for serendipity

References

Conceptual frameworks

Not a review

Going meta

Further reading