When is it fair and valid to compare bi/multilinguals with monolinguals?

This is the question investigated by a team of 19 researchers in Norway, Spain, Germany, the U.K., the Netherlands, and the U.S. (Rothman et al., 2022), who wrote the commentary that I summarize in this post. A commentary is an essay with examples from empirical studies (“empirical studies” being articles with an Introduction, research Methods, Results and Discussion; they’re also known as IMRD studies). Rothman et al. begin by discussing when it may not be scientifically valid to compare bi/multilinguals with monolinguals, though they also explain that in certain cases it is valid to do. In other words, instead of monolingual native speakers of Spanish/English/Japanese being the norm against which bi/multilingual learners/users of that language are compared, the “monolingual comparison” should be justified each time it is made, and psycholinguistic studies do not always need to have a monolingual control group. Next, Rothman et al. discuss alternative ways of designing studies besides comparing participants to monolingual control groups, using as examples studies they conducted which compare bi/multilinguals with other bi/multilinguals, arguing for the benefits of this approach. Their studies focus on heritage bilinguals (people who grew up speaking a language at home that is not the dominant language in their society). I end the post by commenting on how sociolinguistic (ethnographic/qualitative) research can extend these psycholinguistic experimental/quantitative research findings.

The authors—Rothman. Bayram, DeLuca, Di Pisa, Duñabeitia, Gharibi, Hao, Kolb, Kubota, Kupisch, Laméris, Luque, van Osch, Pereira Soares, Prystauka, Tat, Tomić, Voits, and Wulff—say they speak “many languages (including minority, heritage, and majority ones), acquired at different points in life, and some of us also raising multilingual children… we have all worked with diverse populations of both mono- and multilingual speakers” (p. 2). From the beginning of their paper, they describe both monolinguals and bi/multilinguals as diverse, meaning that even though these two categories exist, there is as much within-group variety as there may be between-group similarity, even though people tend to focus on between-group differences. This brings the authors to the question: Should bi/multilinguals be compared to monolinguals? 

They give the analogy of medical experiments with people assigned randomly to experimental and control groups: 50% of the people are given a drug, and the other 50% of the people are given a placebo, which is basically a pill that does nothing. Then researchers compare outcomes between the two groups. This works well in STEM fields, but not so well in linguistic research because control groups are not perfect (i.e., similar to the experimental group in every single way, except for the experiment conditions). For a drug, you can ensure everyone in both groups is the same age, no one has underlying health conditions, there is a normal bell curve of fitness levels in both groups, etc. In bi/multilingual research, when you compare first language speakers to second or heritage language speakers, you always end up comparing two different groups because of natural conditions, e.g.

  • L1 Korean speakers in Korea with L2 Korean learners who have limited input in Korean outside of class
  • Heritage speakers of Turkish in Germany who speak Turkish at home mostly and are schooled in German, with L1 speakers of Turkish who have lived their whole lives in Turkey and are schooled in Turkish

And when studies conclude: “these L2/HL speakers don’t perform like L1 speakers with respect to X, Y, and Z,” Rothman et al. state that this “is not terribly interesting or informative” (p. 5). This is not to say that L1-L2 or L1-HL comparisons are always inappropriate. “For example, if one is interested in documenting the (potential) role that crosslinguistic influence has in the development of bilingual grammars in childhood, it could be reasonable to compare a child bilingual group to a monolingual one” (p. 4). Another thing is if you wanted to see if tests used to diagnose potential language impairments in children would be meaningful for bilinguals if the tests were piloted with monolinguals (p. 4). Or (my example) you were to study the effectiveness of immersion programs: Do Canadian kids in a French immersion program whose L1 is English suffer in terms of their academic knowledge if taught in French? To know the answer, you’d have to compare them to peers taught in English (Swain & Lapkin, 1982).

However, let’s go back to situations where the comparison is neither interesting nor informative. For example, let’s take the act of comparing heritage speakers (HSs) of a language—defined by Rothman (2009) as people who speak a language at home but it is not the dominant language in the larger national society—with people who have lived their whole lives in the HSs’ country of origin. Given the differences between these HL and L1 speakers in terms of “quantities and qualities of input, opportunities for converting input into intake, the (lack of) opportunities for formal training in the HL, the social milieu and distribution of language use” (p. 6), etc., both the scientific and the practical contributions would be limited because you don’t always know which of the variables (of which there are quite a few!) is/are accounting for the differences. The same can be said with comparing foreign language learners with native speakers.

So what to do instead, the authors ask, in order to better isolate the relevant factors that facilitate language acquisition? You compare the bi/multilinguals with each other. These people vary in innumerable ways (as do monolingual speakers of the target language). If you take the bi/multilingual population and study them in terms of larger or smaller social networks, degree of literacy and formal education, or patterns of language use in home, school, and work contexts, you will come up with more meaningful (and valid) theorizations, not to mention more equitable comparisons. [Side note: Other things to examine, from a psycholinguistics class I took in university, might be ethnolinguistic vitality, or how much a language is used in a community, and in what domains, or family composition (e.g., the existence of multiple kids who are L1 speakers of the dominant societal language seems to pressure the parents to use that dominant language more than if they only had one kid; Bridges & Hoff, 2014). As Rothman et al. argue, researchers should also be on the lookout for potential overlap between groups. For example, if HL speakers are well educated in their HL, for example by majoring in Spanish at a university in Ohio, are they equal to L1 users of Spanish in Mexico?

What have been discoveries in psycholinguistics when bi/multilinguals were compared to each other?

Rothman et al. focus on two studies that compared heritage speakers to other heritage speakers of the same language, rather than L1 speakers. Each of these two studies isolated an interesting key factor in its findings.

The first study is Bayram et al. (2019). Bayram and colleagues studied Germans and residents of Germany who were of Turkish descent. They wanted to know how these people constructed passives in Turkish. Basically, since passives are constructed differently in an agglutinating language (Turkish) than in an isolating language (German), they predicted that certain passive constructions in Turkish would be underused by these HL speakers compared to L1 Turkish speakers. They were right, when the two groups were compared in aggregate (i.e., combining all the scores of people in the same group). But what about individual variation? Rothman et al. write:

Applying logistic regression analysis, they [Bayram et al.] regressed age at the time of testing, parental background (immigration status of both parents), and exposure to formal Turkish literacy (none, self-taught, less than 3 years in Turkish supplementary school or 5 years or more). As it turned out, only one variable mattered: Turkish literacy scores provided a wonderful fit of the data. (p. 12).

In other words, if people got enough years in Turkish supplementary school, they did not underproduce those Turkish passives (though a cursory look at Bayram et al.’s appendix graphs shows some participants may have overproduced them; that is a potential thing in classroom-based language learning). But the ones that did not get enough formal instruction definitely underproduced them, or may not have known them. At the same time, this is not to say that the HL speakers should be performing like L1 speakers anyway—because the way Turkish functions in their lives may be different.

The second study is Lloyd-Smith et al. (2020). They had Turkish HL speakers read a wordless picture book called Frog, Where Are You? that is commonly used to study proficiency in many languages.

The participants have to supply the verbal narration in the target language (Turkish in this case). Lloyd-Smith et al. looked at “type-token frequency (TTR)” which is a measure of vocabulary size, and “clausal morphosyntactic complexity (cMSC),” which appears to be some grammatical complexity variable. The results?

Like the previous study, there was significant variation across the HSs [heritage speakers]. Running the same variables as regressors, there was again systematicity in coverage of individual differences for both measures in Turkish. However, the variable this time was different: In both cases it was parental background. For both TTR and cMSC, it did not matter if one was 8 or 12 at the time of testing, nor whether or not they had any formal training in Turkish but indeed whether or not both one or none of their parents were immigrants themselves (as opposed to 2nd generation HSs) of Turkish. (p. 12; my bold)

At this point, when I saw both studies juxtaposed, I had some mild critiques of this conceptual paper that I really like, but let’s start with what I agree with in Rothman et al.’s argument:

  • There is much variation within and not just between groups: L1 users, L2 users, and HL users of a language. Thus, we need to look within groups for variation, and across groups for similarities, rather than just comparing groups. The beauty of this quote by the authors: “Underlying this approach is the hypothesis that individual-level quantities, qualities, usage patterns, and contextual opportunities with language(s)-related experience ultimately predict individual outcomes regardless of -lingualism type.” (p. 10) [See my post on this here.]
  • As Rothman et al. correctly said: If we compare bi/multilinguals with other bi/multilinguals instead of meaninglessly comparing them to monolinguals, we will see systematicity and patterns in the ways bi/multilinguals differ. Thus, we don’t always need monolingual “control” groups. What is meant by “systematicity” and “patterns”? Categories like “HL speaker” and “FL learner” are just associated with stereotypes like “HL speakers can speak fluently but can’t read/write well, especially formal registers of the language” and “FL speakers know the grammar but can’t converse.” But that is only because of typicalities of each group’s patterns of experience: lack of formal education in the HL, or lack of immersion in the FL. But an HL speaker who has had formal education in the HL or an FL user who has extensive online social networks that they chat with on a regular basis may not pattern this way.
  • There is no question that monolingual-to-bilingual comparisons have been and can continue to be fruitful and theoretically relevant. Yet, in our view, the juxtaposing of monolingualism against bilingualism… has contributed to the sweeping under the rug of (inherent) confounds” (p. 13); for some confounding variables, see the previous point.

How can sociolinguistic (ethnographic/qualitative research) extend these findings?

  • Language competencies require early immersion, e.g., intuitive use of articles in English. They are implicit knowledge, acquired naturalistically. This was the finding in Lloyd-Smith et al. (2020), which suggests HL speaker children have an edge in everyday grammar and vocabulary tested on Frog, Where Are You? if one (better both?) parents are first generation immigrants. This is a good thing to prove with HL speakers, but it’s well known (even with L1 users and L2 users in bilingual programs) that childhood immersion gives you basic intuitive grammar knowledge (but not academic sentence structures) and a wide range of everyday vocabulary.
  • Literacy competencies require formal schooling, e.g. use of elaborate relative clauses in academic English writing [I don’t know about other languages because I don’t have academic proficiency in any other language]. Literacy competencies don’t require early immersion, because there are adult L2 users who have them, and L1 (native) speakers who don’t—for example, a person who learned academic English as an L2 versus a person who has English as L1 but didn’t finish high school. This was the finding with regard to Turkish in Bayram et al. (2019).
  • Yes, by all means, compare bi/multilinguals to other bi/multilinguals unless the monolingual-bi/multilingual comparison is justified.
  • Investigate what we don’t already know. I appreciate that Rothman et al. added a discussion about the need to study new populations—”a larger array of minoritized individuals with distinct profiles (not mainly ones available in psychology pools at our universities)” (p. 9). In other words, study people apart from your own students (“extra credit if you come to my experiment!”) and colleagues (“treat you out for coffee”).
  • Investigate populations in an interdisciplinary way: quantitative studies can isolate the factors that are important conditions, if not absolute guarantees, while qualitative studies can explain what makes a potential into a reality. Read research outside your own methodological area, but which investigates the same practical area, to make the necessary connections.

In these ways and others, we are more likely to generate research findings about bi/multilingualism that are empirically valid, new in terms of discovery, more societally equitable, and replete with practical implications.


Chen, S.-C. (2020). Language policy and practice in Taiwan in the early twenty-first century. In H. Klöter & M. Söderblom Saarela (Eds.), Language diversity in the Sinophone world: Historical trajectories, language planning, and multilingual practices (pp. 122–141). Routledge.

Jaffe, A. (2003). Talk around text: Literacy practices, cultural identity and authority in a Corsican bilingual classroom. In A. Creese & P. Martin (Eds.), Multilingual classroom ecologies: Inter-relationship, interactions and ideologies (pp. 42–60). Multilingual Matters.

Jia, G. (2008). Heritage language development, maintenance, and attrition among recent Chinese immigrants in New York City. In A. W. He & Y. Xiao (Eds.), Chinese as a heritage language: Fostering rooted world citizenry (pp. 189–203). National Foreign Language Resource Center.

Lloyd-Smith, A., Bayram, F., & Iverson, M. (2020). The effects of heritage language experience on lexical and morphosyntactic outcomes. In F. Bayram (Ed.), Studies in Turkish as a heritage language (pp. 63–86). John Benjamins Publishing. 

Swain, M., & Lapkin, S. (1982). Evaluating bilingual education: A Canadian case study. Multilingual Matters.

Published by annamend

Assistant Professor in the Department of Linguistics, University of Illinois at Urbana-Champaign

One thought on “When is it fair and valid to compare bi/multilinguals with monolinguals?

Comments are closed.

%d bloggers like this: