Investigating quantity and quality of vocabulary input in language immersion programs

In language revitalization efforts, school-based “immersion programs” are considered the best type of program. However, what needs to be monitored is not just how people avoid use of other languages in the classroom (i.e., avoid translanguaging)…………. but the quantity and quality of input in the target language. This issue was recently investigated by two veteran applied linguists, William O’Grady in Hawai’i and Jeanette King in New Zealand, along with their colleagues Raina Heaton and Sharon Bulalang, with regard to the languages Kaqchikel (Maya) in Guatemala, Western Subanon in the Philippines, and Māori in New Zealand. For each learning situation, the group uncovered previously unknown facts about (1) number of words heard per hour in immersion programs, (2) variety of vocabulary, and (3) distribution of words from most to least common. They compare these findings to immersion in a home setting and give recommendations for pedagogical strategies in the classroom. Their accessibly written and open access article also offers a lot of findings and pedagogical implications that are relevant to second language immersion programs for young children.

O’Grady, W., Heaton, R., Bulalang, S., & King, J. (2021). The role of input in language revitalization: The case of lexical development. Language Documentation & Conservation, 15, 433-457.

O’Grady, Heaton, Bulalang, and King begin by calling attention to “what is arguably the single most important factor for linguistic development: the availability of ample high-quality ‘input’ in the form of speech” (p. 434). Because this is partly under a teacher’s control, it can enhance the opportunity for successful language learning. They were therefore curious to study “teacher talk” in three language revitalization programs in terms of quantity (number of words learners hear in the language) and diversity (variety of words, or number of different words that they hear). 

Studies of “lexical development” (vocabulary development) have historically focused on quantity of input, but more recently they have also examined diversity of input. A footnote in the introduction explains that vocabulary size is an important aspect of language development, and it also relates to syntactic development (the ability to form sentences) and cognitive development (p. 435). That is, if you know different parts of speech, you can probably create a greater variety of sentences, using a word as a noun on one occasion, and its verb form the next… and the more words you know, the more concepts you know as well. 

Hoff et al. (2012) investigated the development of Spanish and English in 47 children in the U.S. at age 1;10 and 2;1 and 2;6 (linguist shorthand for 1 year 10 months, 2 years 1 month, and 2 years 6 months); interestingly, they found that the students who had balanced exposure to English and Spanish were the mid-rangers in vocabulary size. Those whose input had a greater percentage of English had the smallest vocabulary size, and those whose input had a greater percentage of Spanish had the BIGGEST vocabulary size. This makes sense: if you are largely exposed to English, the dominant societal language, there is no need to learn the other; if you are exposed to the other language at home (and the more of it the better), you will ALSO learn English eventually. (Kudos to my husband’s Cantonese-speaking friend schooling a small child in Edmonton, Canada during the COVID-19 quarantine, who figured this out without any applied linguistics background.)

Rowe (2012) found that “whereas the sheer quantity of words that a child hears is important in the second year of life, diversity and sophistication of vocabulary become the better predictors of lexical growth in the third and fourth years” (p. 437). This highlights that caregiver speech is essential to vocabulary growth, and so this study uses it in comparison to teacher speech in immersion programs for endangered languages, with relevance for school language policy, curriculum design, and pedagogy.

Next, O’Grady et al. discuss the three different language revitalization contexts they studied.

Similarities and differences between 3 language revitalization situations

In the first two language revitalization situations, there was a “classic immersion” experience for children who initially had limited to no proficiency in the endangered language.

In the Nimaläj Kaqchikel Amaq’ program at an elementary school in Chimaltenango, Guatemala, children are taught different subjects (math, art, computers, physical education, and language arts) in Kaqchikel for two hours per day, and are encouraged to use the language during recess and lunch. All other instruction and activities are in Spanish. Kaqchikel is classified as “Vulnerable” in the Catalog of Endangered Languages. In the cities, it has not been used for at least two generations (Heaton & Xoyón, 2016), but in rural areas it is still the primary means of communication. In this school, the researchers focused on the grades 1-4 math classes taught by a male teacher in his early 20s, a native speaker of Kaqchikel and Spanish. Nearly all students at the school came from impoverished situations. They were all ethnically Kaqchikel but not from Kaqchikel-speaking homes, living in a city of 100,000.

In a total Māori immersion school in Christchurch, South Island, New Zealand, all instruction took place in Māori. About 21% of the ethnically Māori population in New Zealand can speak Māori, and about 40% of those over 65 can speak it fluently. “Domains of use include traditional temples (marae), churches, language nests, immersion schools, radio, and television” (p. 438). The participants at this site were 25 students in year 5 of elementary school, aged 9 and 10, many of whom had been in Māori immersion since age 5, and half of whom received “semi-regular exposure to Māori in the home or community” (p. 439). Many came from low socioeconomic status (SES) backgrounds. The teacher was a second language speaker of Māori in her mid-30s; her first language was English.

The third program differed in the sense that the students already spoke the language, but it was being used less and less in its traditional territory. 

In the Malayal Community School in Zamboanga del Norte, Mindanao, Philippines, an immersion program used Western Subanon to teach language arts, history, math, music, art, physical education, and values education, and as a medium to learn English and Filipino (Tagalog), the national language. Dominant languages in the region include Tausug, Chavacano, and Cebuano (the lingua franca of the Visayas and Mindanao), making Subanon about five levels down in a diglossic hierarchy. The participants were 35 first grade students and 36 second grade students whose parents spoke to them in Subanon at home, and family members were also fluent to varying degrees in Cebuano and Chavacano. Students came from impoverished situations. Both the first and second grade teachers were Malayal natives who were native speakers of Western Subanon; they taught all subject areas in their classes.

In the methods section, the researchers describe how they compared teacher talk in these classrooms with the extensive published data on child-directed speech in English-speaking families in the U.S. who spoke no other language. They state:

We do not believe that immersion programs can be expected to replicate the conditions under which family-based first language acquisition takes place. Nonetheless, acquisition of a first language in a monolingual family provides a useful baseline in its own right since it represents the one setting in which language acquisition is invariably successful. As we will see, comparisons with this setting not only prove to be helpful but at times also yield pleasantly surprising results. (p. 440)

The data were collected in 2016 and 2017, and many things may have changed in each setting since then—language policy, curriculum or teachers. The researchers state that the goal “is not to evaluate or pass judgement on any of the programs, but only to contribute to a better understanding of the workings of school-based language revitalization and to make available information that might be of use to other programs” (p. 440).


The researchers had the four teachers wear recording devices to gather speech samples. They did not record children’s speech, as their goal was to focus on input (and besides, gathering data from students would have required a much more complex research protocol and a far larger set of permissions). Of course, children learn a language through conversation, including with other students, but in this research, few children were that fluent in the target language (except in the Western Subanon program).

The teachers’ talk was “transcribed by fluent native speakers according to standard orthographic conventions for the language” (p. 440). (I notice that the transcribers had to be both fluent AND native, two entirely different things that I define in this other post.) The transcriptions were tagged for content words (e.g., “noun,” “verb”) and other information. All tagging was done with time stamps in a computer program called ELAN used to annotate linguistic data that is audio- or video-recorded, for the purposes of language documentation.

From these tagged transcriptions, the researchers took basic measurements like total number of words and total number of unique words, which ELAN can search for and export, similar to software that is designed for corpus research. The researchers defined “unique word” as a root word plus any associated affixes—that is, “eat,” “eating,” and “eaten” count as one unique word. Remember that the study purpose was to measure quantity and quality (diversity) of vocabulary, but the researchers promise that “a follow-up paper… will report on the occurrence and distribution of morphosyntactic [grammatical] features with a view to assessing the extent to which the input is rich enough to support grammatical development” (p. 441). Given how accessible this study was to read, as it was likely written so that school staff (and researchers in other areas like me) could understand it, I can’t wait to read that one too.


1. Speech Time. Inevitably, the schools varied in amount of data they provided due to timetabling. Table 1 shows the total speaking time of each teacher, relative to the total amount of time students spent in all classes at school. In other words, the students were immersed in the target language at school 13% to 26% of the time, which highlights that immersion is almost always a program, but it largely doesn’t disrupt the wider schooling in the dominant societal language(s) (e.g., Spanish, English, or Cebuano/Filipino).

O’Grady et al., (2021), p. 442

That said, we would also expect to see the same proportions of language use opportunities for Kaqchikel, Māori and Western Subanon in the natural word out of school; in fact, “the amount of speech heard in an immersion classroom on a per-hour basis may well be greater than what would be encountered in a noneducational context” (p. 442)… in which people are communicating with whatever languages are most convenient, and they are not likely going to be the endangered ones.

2. Number of Words. In terms of words per contact hour, here were the findings:

O’Grady et al. (2021), p. 443

The lower number for Kaqchikel might be due to one particular subject, Math, being surveyed. These numbers are all similar to those found in studies of caregiver speech in English-only speaking families in the United States:

O’Grady et al. (2021), p. 443

Therefore, the similarity across immersion classrooms AND home immersion in terms of input quantity is a pleasant surprise… i.e., at least quantity-wise, the school setting is meeting goals for vocabulary input, measured as sheer number of words per hour in the target language that are heard by the learners.

3. Lexical Diversity (Word Variety). This is where the teacher who spoke the taught language as a second language performed differently from the other three teachers, who were first language speakers of the languages they taught. Even though the researchers never explicitly pointed this out, it was apparent in the numerical data.

O’Grady et al. (2021), p. 444

However, in a home immersion setting for English monolingual children, the token-to-type ratio (i.e., number of words versus number of unique words) was estimated to be 8.78 to 1 (Montag et al., 2018, p. 378), and hence these classrooms ALL face a challenge in terms of lexical diversity. This is related to the fourth and last finding, with regard to distribution of words.

4. Distribution of Words. Vocabulary researchers well know Zipf’s Law, which exists across languages. According to this law, the words used in natural speech are heavily skewed with respect to their frequency. The second most frequent word in a language is only about half as frequently used as the most common word, the third most frequent word may be used a third as often as the second most common, the fourth a quarter as often as the third most common, and so on.

O’Grady et al. (2021), p. 446

According to the researchers, “This cannot but affect the opportunities for learners to extend their vocabulary. The fact that a very large proportion of the words in a  language are encountered very infrequently means that learners will have only fleeting exposure to most lexical items” (p. 446). Note the similarity between the “ideal” Zipfian curve (in theory) and the Zipfian curve for teacher talk found for the four teachers’ talk in three languages:

Thus, no matter what the language, natural talk has this universal pattern. In fact, every word after the 20 or so most frequent words will be in <1% of the corpus, no matter what the language or how big or small the sample is… as long as we’re talking about natural everyday conversation. This is why we DON’T want classroom talk to be 100% like natural conversation in a language learning classroom in a society where students have limited exposure to the language outside of class. With no deliberate shaping of input, varied vocabulary learning is hardly facilitated.

Another common finding across the 3 languages in this study was that less than half the words in a 100-word teacher talk sample (containing the 100 most frequent words) were “content words” (words like nouns, verbs, adjectives, and adverbs). That means more than half—or the majority of words in the top 100 most frequent words used in each language classroom—were common function words like “a” and “on” and “but.”

O’Grady et al. (2021), p. 448

Finally, when the researchers calculated the percentage of words the teachers used that were “rare” (defined as appearing less than 3 times, that is only once or twice, in the corpus for that language program), the teacher who was a second language speaker of the endangered language again performed differently from the others, this time with regard to what percentage of her words were that rarer kind, even though she taught entirely in Māori:

O’Grady et al. (2001), p. 449

However, the researchers downplayed this finding, simply stating: “the proportion of infrequent words was similar across the corpora, with ~40%-60% of all unique words appearing fewer than three times in the speech recorded for each language” (p. 449). In fact it doesn’t matter, as the same challenges—regarding Zipf’s Law, content words being a minority, and most of them not being heard more than once or twice in the classroom—undoubtedly exist for all three language learning settings. The point is not whether the teacher is a native speaker, because even if they were, these problems would still be there because of Zipf’s Law. The point is how deliberate pedagogy and program design can “defeat” Zipf’s Law. (In other words, if you count on natural immersion without any deliberate program design, native speaker teachers will perform better than non-native speaker teachers in the narrow area of lexical diversity in spontaneous oral speech, but it won’t really make a difference in terms of students’ learning… due to Zipf’s Law.)

The researchers conclude with four suggestions for deliberate pedagogy and program design, and these recommendations are common to first/heritage language revitalization programs AND second language immersion programs.

Discussion and implications

Recall that in terms of lexical quantity, the school-based language immersion programs were similar to home immersion in terms of number of words students heard per hour in the target language. In terms of lexical quality, immersion programs fell short on word diversity.

But even quantity-wise, all things cannot be considered equal because 10,000 words per hour, 15 hours per week, is not the same as growing up with the language in and out of school, at 10,000 words per hour, 90 hours per week. If Zipf’s Law shows that all but the most frequent words in a language occur <1% of the time in everyday speech, foreign language settings or language revitalization settings will not give people much opportunity to learn words beyond the most common ones. There is an important quote from Harris et al. (2011, pp. 51, 57) in the discussion section of the article:

first exposure to a word yields no more than ‘a cursory understanding of word meaning; repeated exposures to a new word in varied contexts, or the provision of definitions to which children can relate, lead to a deeper, more nuanced understanding of word meaning.’ (as cited in O’Grady et al., 2021, p. 450)

A study by Schwartz and Terrell (1983) is also cited; it found that people aged 12-18 require, on average, 10-12 exposures to a new word to produce it appropriately (this probably means repeated encounters in different contexts and in different modalities). Another study by Childers and Tomasello (2006) found that exposure to new words on different days is more important than just the number of exposures, as the learning presumably goes deeper if there are gaps between encounters—I think this is like “digestion” (but I suppose the gaps should not be too long apart; that would be forgetting and re-learning).

Similar findings apply to second language learning. Of course, teacher input is not everything: student participation in conversations, especially one-on-one conversations, drives language acquisition quite strongly (Romeo et al., 2018). But in classrooms where students’ proficiency in the target language is still emergent, this may not be feasible. What, then, is feasible? The researchers conclude with four points (pp. 451-452).

Concluding remarks: To be effective, what should language immersion programs do?

It is important to remember that even in the best circumstances, a typical immersion program is 25% of a child’s total waking hours. That’s enough to acquire a second or additional language to some degree (Baker, 2014, p. 38; Genesee, 2007). However,

It is clear that the success of immersion programs lies in finding ways to increase the types of exposure and interactions that are available to young language learners, particularly in contexts where the language is not widely spoken in the home and community. The key to this effort, we believe, lies in informed planning, careful implementation, and regular assessment. Four considerations require special attention. (p. 451)

“First, it is important to have at hand a basic lexicon (vocabulary list) of the language.” This can include body parts, common objects in the environment, numbers from zero to 100, days of the week and months of the year, telling the time, words to describe feelings, the weather, plants, animals, shapes, and so on. These simple words are in fact very common across languages and can be translated across the students’ dominant language and the target language with few problems. Wordbank (, an online database, has information about early vocabulary in 29 languages.

“Second, it is essential that immersion programs track children’s lexical development.” They can do this in any number of ways, including picture naming tasks, periodic quizzes, etc. A wide range of token types must be introduced, a handful must be deliberately assessed each lesson, and they must be reviewed periodically. For me, the takeaway here is that instead of avoiding translanguaging or policing first language use, immersion programs should direct attention to what students are actually learning in the target language and maximize this.

“A third essential point involves the need to incorporate lexical items into the curriculum in a way that maximizes the chances of acquisition [despite Zipf’s Law]” i.e., “flatten the curve.” I understand this to mean that if you can’t increase the frequency of exposure to 300 rarer words, you can still increase the chances students learn most of them over an academic year by drawing students’ explicit attention to them through quizzes (with recycling/review), reading, journaling, doing role-plays, making skits, singing songs (including those made up by the teacher/class to familiar tunes), giving short speeches, etc.

“A fourth point calling for action involves finding ways to increase the amount of input that children receive over the course of a  day.” For example, fluent speakers of the language can be brought to the classroom as guest speakers or conversation buddies. Moreover, literacy materials can help language learning a lot, because they let teachers

pre-plan the choice of words and the contexts in which they are used, and control their distribution and frequency—factors that are virtually impossible to monitor and manage in the case of spontaneous speech (p. 452).

Teachers can make picture books (or co-author short nonfiction readings for older children about topics of interest, current events, etc.) using the words in the school word bank, then have fluent community members come in to read and discuss these texts with learners in small groups. Then, there can be oral or written production activities for students (skits, role-plays, journals, etc.) to use the vocabulary. The researchers powerfully conclude:

In sum, educators need to be aware of both what they are doing with language and what the students in the classroom are learning from it. In the case of children acquiring a first language in a monolingual setting, essentially everything can be left to chance. Over the course of time, given a reasonable amount of exposure, those children will acquire their language, including its vocabulary, to a satisfactory level of proficiency. In the case of immersion programs for endangered languages, in contrast, nothing can be left to chance. The stakes are too high, and the risks are too great to do anything other than engage in a careful program of teacher training, curriculum planning, and proficiency assessment. Vocabulary learning offers an ideal opportunity to put this policy into practice. (p. 453)


Baker, C. (2014). A parents’ and teachers’ guide to bilingualism (4th ed). Clarendon, UK: Multilingual Matters.

Childers, J., & Tomasello, M. (2006). Are nouns easier to learn than verbs? Three experimental studies. In K. Hirsh-Pasek & R. Golinkoff (Eds.), Action meets word: How children learn verbs (pp. 311–335). Oxford, UK: Oxford University Press.

Genesee, F. (2007). A short guide to raising children bilingually. Multilingual Living Magazine, 2, 18–21.

Harris, J., Golinkoff, R., & Hirsh-Pasek, K. (2011). Lessons from the crib for the classroom: How children really learn vocabulary. In S. Neuman & D. Dickinson (Eds.), Handbook of early literacy research (pp. 49–65). New York, NY: The Guilford Press.

Heaton, R. & Xoyón, I. (2016). Assessing language acquisition in the Kaqchikel program at Nimaläj Kaqchikel Amaq’. Language Documentation & Conservation, 82, 317–352.

Hoff, E., Core, C., Place, S., Rumiche, R., Señor, M., & Parra, M. (2012). Dual language exposure and early bilingual development. Journal of Child Language, 39, 1–27.

Montag, J., Jones, M., & Smith, L. (2018). Quantity and diversity: Simulating early word learning environments. Cognitive Science, 42, 375–412.

Romeo, R., & Leonard, J. A., Robinson, S. T., West, M R., Mackey, A. P., Rowe, M. L., & Gabrieli, J. D. E. (2018). Beyond the 30-million-word gap: Children’s conversational exposure is associated with language-related brain function. Psychological Science, 29(5), 700–710.

Rowe, M. (2012). A longitudinal investigation of the role of quantity and quality of child-directed speech in vocabulary development. Child Development, 83, 1762–1774.

Schwartz, R. & Terrell, B. (1983). The role of input frequency in lexical acquisition. Journal of Child Language, 10, 57–64.

Published by annamend

Assistant Professor in the Department of Linguistics, University of Illinois at Urbana-Champaign

%d bloggers like this: