Delphi module

13.1.6.3
A mixed system?

There are a number of ways in which this norm/criterion distinction is not as clear-cut as the above might suggest. First, even in such mass systems of essentially criterion-referenced assessment as school awarding bodies or The Open University language programmes, norm-referencing still plays a role in ensuring consistency from year to year; ie it is expected that, broadly speaking, a comparable number of students will fall within the major grade boundaries. This acts as a check on standards and on the level of difficulty of the exam, which may have proved more or less challenging in a particular year. (Yet the gradual inflation of GCSE and 'A' level results over recent years shows that the system still allows for year-on-year improvements in performance.)

Second, criterion-referencing almost always involves a measure of ranking too. In universities, for example, most categories described by criteria are broad ones (eg 60-69) and, consequently, once a student's performance has been allocated to one such category, the marker still has to decide where within the category to place the student (is it 62, 65 or 68?). Usually the only reliable way to do this is to compare the student's performance with that of another student in that band - eg 'I gave Smith 62 but this was a bit better, therefore 65'. (It may be questioned whether this is meaningful or reliable, and therefore whether it is a sensible use of staff time both in marking and subsequent moderation, but the fact is that most mark systems require it and so you are likely to have to do it.)

Universities nowadays have very full regulations concerning the allocation of marks on modules and the profiling of degree classifications, and most module handbooks include criteria grids. Alongside this apparent universal acceptance of criterion-referencing, however, one can detect a sort of informal parallel norm-referencing. To illustrate this, try awarding eight first-class marks in a group of 15 students and the chances are that aspersions will be cast by colleagues on (a) the quality of your assessment task, and (b) the soundness of your marking. This is because most tutors retain, alongside their commitment to criterion-referencing, a notional norm or quota of students that should be awarded a first-class mark, a 2.1, etc. There is no doubt that this cuts across a commitment to criterion-referenced assessment, and that limiting marks to, in effect, the relatively narrow 35-75 band fails to adequately distinguish student performances and thus serves to distort results. However, given the current HE assessment culture, this practice does act as a check on grade inflation: if individual markers are awarding too many marks over 70%, this distorts students' final profile or mark average and leads, ultimately, to larger numbers of students being awarded first-class degrees. Conversely, an excessive run of low marks can lead to a higher proportion of failures. University exam boards, in collaboration with external examiners, are charged with ensuring that quality and standards are broadly maintained from one year to the next, but it is at the level of the individual tutor marking a group's assignments that this process starts.

These musings lead on to bigger issues, such as how, in a criterion-referenced system in which successive cohorts of students are supposed to be assessed according to the same criteria, the proportion of firsts a department awards can rise over a ten-year period. Or how any university can exhort its departments to award more firsts. Add to this the strongly held views of many academics in modern languages that the linguistic demands made on students and the quality of students' written work are both lower than they were 20 years ago, and throw in the much wider ability range now appearing in most language programmes, and one might well wonder how one can still talk about maintaining absolute standards in the assessment of language proficiency. (As the earlier reference to 'rising' GCSE standards shows, this applies more generally: it has been shown that students throughout education systems are being awarded higher marks; see Elton, 1998.) Of course, one cannot claim any such thing, but no university is going to admit this in public. For whatever reason, standards do change over a period of time, usually very gradually and sometimes imperceptibly, but what is important is that, as far as they can, tutors ensure a consistency of approach in their own work and that, allowing for the quality issues mentioned in the previous paragraph, they seek to apply high-quality criterion-referenced mark schemes. The emphasis on the quality of the mark schemes is important, since good criteria that differentiate effectively between categories (I, 2i, 2ii, etc) will reduce the need for informal norm-referenced adjustment.