二语写作测试语境效度研究(英语)
上QQ阅读APP看书,第一时间看更新

2.1 Writing assessment: A brief history

The assessment of proficiency in writing generally falls into four periods (Hamp-Lyons, 2001; Yancey, 1999): ① direct testing of writing (i.e., essay tests), represented by the practice of selecting government officials in ancient China, ② the indirect assessment of writing ability or multiple-choice tests of writing knowledge (1920s—1970s), ③ the renewed use of the timed impromptu writing test or direct assessment (1980s—present), and ④ the portfolio assessment movement (mid 1980s—present).

The beginning of the testing of writing ability can be traced back to the Sui Dynasty (581—618) in China, where performance on written examinations determined placement of individuals into government positions (Han & Yang, 2001). In contrast, oral assessments were the primary method of placement in the early academic institutions of both Europe and the United States. During the mid- to late nineteenth century, assessment of students' academic proficiency shifted from oral to written examinations as enrollments in colleges increased and as demand grew for assessments which were more scientific and less subjective. The written exam was first introduced in Britain in 1858, while in US universities, it could be traced back to Harvard University's 1873—1874 entrance examination (Hamp-Lyons, 2002). Since then, direct testing of writing proficiency has waxed and waned but never stopped.

In the twentieth century, criticisms began to mount that essay examinations were flawed in terms of reliability and validity. Notably, questions were raised regarding reliability in scoring essays, particularly with respect to large-scale, high-stakes assessments of student writing ability. The lack of agreement on the part of different raters and of consistency among raters themselves was recognized as an enormous source of measurement error in direct writing assessment. Driven by psychological and statistical theories that well-constructed indirect measures were, indeed, a valid predictor of writing ability as demonstrated by their high correlation with actual writing samples(Godshalk et al.,1966,cited in Kroll,1998), a number of limited response formats were developed, such as multiplechoice tests of error recognition and cohesion and coherence. Economical and reliable, indirect assessment of writing prevailed in school and college entrance examinations during the 1950s and 1960s (Spolsky, 1995). Until the late 1970s, and even in the 1980s, it was regarded as acceptable that an estimate of one's ability to write extended prose could be gained from indirect measures.

Despite the proliferation of indirect measures of writing, the use of essay examinations continued even in the heyday of indirect assessment of writing (Huot, 2002). For example, a direct test of writing has been an integral component of most of the Cambridge ESOL examinations ever since the Certificate of Proficiency in English (CPE) was introduced in 1913 (Taylor, 2004). Most likely, this is due to the persistent belief on the part of educators that indirect measures are not valid measures of writing performance (Grabe & Kaplan, 1996).

From the 1950s, the problems inherent in indirect testing of writing were increasingly recognized. Researchers realized indirect assessment of writing was not as valid as it had been claimed to be (Quellmalz, 1986), since Cumming et al. (2001: 3) commented“the exact nature of the construct they assess remains uncertain.” As indirect tests do not represent what proficient writers can do (Hamp-Lyons, 1990; White, 1985), they fail to reveal how candidates might perform on longer practical and productive writing tasks. Furthermore, such indirect tests were more likely to exert a negative washback effect on the teaching and learning of writing.

In response to the deficiency of the indirect tests of writing as well as the demand from the general public, large-scale and high-stakes test designers began to combine direct writing assessment with, or even replace, indirect measures. Throughout the 1980s, direct measures of writing ability were incorporated in nationwide achievement assessments, college entrance examinations, and high-school graduation requirements.

More recently, large-scale direct writing assessment has come to be viewed with increasing skepticism by the literacy community and deep concerns were raised regarding the validity of the writing sample. It was claimed that writing performed in timed tests corresponded to only a small portion of what we now understand to be involved in writing, and therefore performance on the writing sample no longer appeared to be an adequate representation of the accepted theoretical construct for writing. In response to these limitations of the timed impromptu writing test, there arose another movement of writing assessment, i.e., portfolio assessment of writing. Portfolio assessment is generally considered to be a kind of comprehensive and systematic assessment that involves multiple samples of performance, demonstrates performance in a variety of modes of discourse, and provides evidence of the process and interactions involved in composing (Camp, 1993; Camp & Levine, 1991; Condon & Hamp-Lyons, 1991; Freedman, 1993; Murphy & Smith, 1992; Wolf, 1989). While portfolio assessment offers promises such as high validity and positive consequences (Genesee& Upshur, 1996; Linn et al., 1991; Ruetten, 1994; Wolf, 1989;Yancey, 1999), and thus sparks the interest and imagination of teachers, educators and testing specialists, it also leads to problems in accurate measurement. It is reported portfolio assessment has been fraught with indecision and indeterminacy as to how to objectively assess such complex collection of work of various types (Brown & Hudson, 1998; Hamp-Lyons, 1991a;Hamp-Lyons&Condon,1993;Herman,et al.,1993;Supovitz et al.,1997).

In summary,writing assessment has dramatically evolved with a view to serving the needs of education and society. The quest for more valid,reliable and efficient means of assessment remains in progress. However,whether timed impromptu test or portfolio assessment,“the norm” (Kroll,1998: 221) of the day is the direct testing of students' writing ability,wherein writing task construction plays an essential role in producing valid assessment,and should always reflect the state-of-the-art research on the writing construct.

In the field of language testing,we cannot develop sound language tests without defining what it means to know a language since every language test is the operationalization of a definition of language or language use. As Weir and Shaw (2006: 9) explicitly pointed out,“[A]dequate construct definition … is a vital principle in language testing”. Due consideration,therefore,must be given to adequate writing construct definition before developing any writing tests. In view of this,the next two sections were devoted to the discussion of the construct of writing in general and ESL/EFL writing in particular.