10.6.2
Calculating lexical load

You may wish to use an unadulterated authentic written or spoken text to present new vocabulary. In this case, it is useful to gauge how difficult comprehension of a text as a whole is likely to be. We know, from vocabulary frequency lists drawn from a variety of corpora, that knowing even fifty of the most frequent words in a language will get you quite far in decoding a spoken text, but for a written one it has been estimated that around five thousand word families need to be known (Nation and Waring, 1997).

There are two useful formulae for calculating the vocabulary load of a given text (Laufer and Nation, 1995). In order to use these formulae, it is necessary to understand the difference between the terms type and token, content words and function words. To distinguish between type and token, look again at the paragraph above this one, the sentence beginning 'In this case …' and ending '… need to be known.' If you count all the words in that paragraph, you will arrive at a total of 74 words, or tokens. However, some of these words occur more than once: 'vocabulary' occurs once, and 'a' occurs six times, for example. So the word 'vocabulary' represents one token and one type, 'a' represents six tokens but one type, and so on. As for content words, these are words that carry meaning, whereas function words are the grammatical words that glue the content words together into coherent sentences (for written text) or utterances (for spoken text). In the paragraph you have just been studying, 'vocabulary' is a content word, and 'a' is a function word.

One way of finding out a text's difficulty is to see how many different words need to be known in order to understand it, and one useful formula for this called lexical variation. It is calculated by dividing the number of types by the number of tokens and multiplying by 100.

Another measure of a text's difficulty is the ratio of content words to grammatical words; this ratio is known as lexical density. Texts with a high proportion of content words are said to be lexically dense. The formula for calculating lexical density is to divide the number of content words by the total number of tokens and multiply by 100. It should be noted, however, that such type-token ratios generally go down as texts get longer, so it is important to compare texts of similar lengths.

For easier alternatives, Paul Nation's VocabProfile is free from the author (Nation 2003, 2004), and will assess the difficulty of a text and simplify it for you. It will also give the frequency of every word in the text. Using the Microsoft Word tools menu will also give you basic text statistics.

Activity 19

Even if French is not one of your languages, you will probably be able to do this activity, as it involves a routine business communication event. Do you consider the following extract to be authentic? Which vocabulary items would you choose to present from this extract? How would you teach them?

A: Etablissements Desnos, bonjour.

B : Bonjour, Mademoiselle, Laurence Bellamy de la société Haut-Brane. Pourriez-vous nous envoyer votre nouveau catalogue avec le tarif correspondant ?

A : Veuillez ne pas quitter, Madame, je vous passe le service commercial.

C : Allo ! Oui, j'écoute.

B : Laurence Bellamy de la société Haut-Brane. Nous serions interessés par vos produits. Vous serait-il possible de nous adresser votre catalogue ainsi que votre tarif ?

C : Bien sûr. Pouvez-vous me laisser vos coordonnées ?

B : Société Haut-Brane, 35, rue Jourdan, 33020 Bordeaux Cedex.

C : Voilà, c'est noté …

B : Je vous remercie. Au revoir, Monsieur.

C : Je vous en prie. Au revoir, Madame.'

(From Danilo and Penfornis, 1993: 30)

Click on 'Commentary' for feedback on this activity.