Post-secondary Entry Writing Placement: A Brief
Synopsis of Research
Richard H. Haswell
Texas
A&M University, Corpus Christi
November, 2004
In most countries, placement into writing
courses does not happen on admission into higher education. Testing
of writing proficiency may be part of a qualification system for college,
as with Britain's A-Level exams, but that proficiency is construed as
acceptability for entrance, not readiness for instruction at some point
within a curricular sequence. There is no post-secondary composition
curriculum in which to be placed. But in the US, where open admissions
stands as an ideal and sometimes as a reality (over 60% of two-year
institutions currently endorse it), where millions of students are enrolled
each year, and where writing courses usually form some sort of instructional
sequence, placement into writing courses is the norm. The most recent
survey found over 80% of public institutions using some form of it (Huot,
1994), quite similar to the findings of earlier surveys, which also
report an equal portion of private institutions with exams in place
(Lederman, Ryzewic, and Ribaudo, 1983; Noreen, 1977).
Equally
the norm, however, is a sense of displacement in many aspects of entry
writing placement systems. The most pervasive of the disconnects is
that the act of assessment leading to placement usually is standardized
and decontextualized, whereas the effects of placement lodge a particular student in the concrete
surround of a particular academic class. Traditionally the physical
site of placement testing is removed from the college writing classroom
and indeed from most normal life, more like an isolation chamber, perhaps
a summer-orientation room full of computer monitors with shielded screens,
or an eleventh-year schoolroom with proctors hawking students hunched
over bubblesheets. This clash between ungrounded assessment and grounded
teaching has energized placement, and the research into placement, from
the beginning.
Consider
US post-secondary education's most characteristic writing course, freshman
English. In 1874 Harvard University added a writing component to their
entrance examinations, a short extemporaneous essay rated by teachers.
More than half of the students failed, and many had to take "subfreshman"
courses or undergo extra-curricular tutoring. Ten years later the outcomes
were no better, and Harvard moved its sophomore forensic course to the
first year, turning it into a remedial writing course required of everyone
who didn't exempt out of it. Yet as placement pressures were shaping
early curriculum, teachers were resisting such pressures. Frequently
one form of their resistance was to distance themselves from placement
by turning it over to someone else. The College Entrance Examination
Board began its work in regularizing the certification of applicants
in 1900. As documented by a number of fine studies touching upon the
history of writing placement, the rest of the century saw testing firms
grow ever more influential and departments of English grow ever more
divided between using ready-made goods, running their own placement
examinations, or foregoing placement altogether (Elliott, 2005; Mizell,
1994; Russell, 2002; Soliday, 2002; Trachsel, 1992; Wechsler, 1977).
A good
deal of the research into writing placement involves issues that inhere
in all acts of writing evaluation, such as rater agreement, rater bias,
reader response, test equivalency, construct validity, and social, cultural,
and political influences. For these assessment issues, good research
reviews are available: Ewell, 1991; Hartnett, 1978; Huot, 1990; Ruth
and Murphy, 1988; Schriver, 1989; Speck, 1998. Here I focus on three
assessment issues especially crucial for studies of placement per se:
writer reliability, indirect measurement, and predictability. The first
asks how much we can trust a single piece of writing from a person to
reflect that person's level of writing as shown by further pieces. The
answer, long demonstrated and long ignored in standard placement testing,
is not much. In a study whose careful controls can still be admired,
Kincaid (1953) found that 58% of Michigan State students significantly
changed their score on writing a second essay one day after the first.
A fifth of the lowest quartile rose from the bottom with their second
essay, and about half of the highest quartile fell from the top. Diederich
(1964) found a quarter of students at the University of Chicago changed
their grade with a second essay. A number of later studies show that
performance variability increases when a second prompt requires writing
in a different mode or on a different topic (e.g., Hake, 1986; Quellmalz,
Capell, and Chih-ping Chou, 1982). Although there are periodic calls
for more test-retest studies, few have been done in the last thirty
years. Testing firms are not about to find evidence that they need to
pay for more raters to achieve a valid score, nor are colleges that
give local tests eager to give more of them (although students allowed
to retest their placement decision show a high rate of reversal; Freedman
and Robinson, 1982).
On the
other hand, the issue of indirect versus direct testing of writing,
of even longer heritage, is still unsettled. Teachers have always wanted
students placed into their writing classes on evidence that they lack
but can learn the kind of rhetorical skills the course actually covers.
Normally the college writing curriculum does not center on mastery of
spelling demons, naming of grammatical parts, recognition of learned
vocabulary, and other skills measured by item examinations, that is,
by indirect testing of writing proficiency. Unfortunately, direct testing,
simply asking students to write an essay to show their essay-writing
proficiency, runs into problems of rating and therefore of cost. To
achieve acceptable scorer reliability, an essay has to be read independently
at least by two scorers and sometimes a third when the first two don't
agree. Historically the large testing firms, caught between efficiency
and credibility, have waffled. For two decades after 1900 the College
Entrance Examination Board read essays, but changed to short answer
after Hopkins (1921) proved that there was more variability between
raters than between essays. Eventually they switched back under pressure
from teachers, but during and after WWII many schools (including Harvard)
dropped the College Board's essay exams in favor of their Scholastic
Aptitude Test, which measured verbal skills with machine-scored items.
In 2004, under threat of a boycott of the SAT by the University of California,
the College Board decided to add a 25-minute essay, holistically scored.
Current holistic methods of scoring essays, which the College Board,
ACT, Pearson Educational, and other testing firms simplify to the point
it is cost effective, on the surface seem one viable resolution of the
clash between teachers and testers.
But only
so long as a third issue of writing placement is kept under wraps, the
issue of predictability. How well does holistic scoring or any other
method of writing assessment predict the future performance of students
in the courses into which it has placed them? As a sorting tool, what
is its instructional validity or—to use Smith's useful term (1993)—its
adequacy? The answer is that all these methods, direct and indirect,
have about the same predictive power, and it is painfully weak. This
has been known for a long time. Only six years after the College Entrance
Examination Board began operations, Thorndike (1906) began finding very
low correlations between standing in the exams and standing in junior
and senior classes. In 1927 A. D. Whitman found an average .29 correlation
of College Entrance Examination Board essay scores with future college
performance in courses. Huddleston (1954) reviewed fifteen studies of
indirect tests and McKendy (1992) thirteen more studies of direct tests,
all correlating test score with first-year college composition course
grade or teacher appraisal of student work, and the range was from random
to .4. Correlations can be increased with methods that institutions
rarely apply in actual placement—averaging scores on several writing
samples or running multiple regressions with variables such as high-school
English grades and verbal proficiency scores—but all with little
gain in predictability. A review of thirty or so more studies not reported
in Huddleston and McKendy supports their finding, that for decades college
writing placements have been made on scores that leave unexplained,
at best, two thirds of the variance in future course performance, and,
on average, nine-tenths of it. Yet supporters of the tests will sometimes
put such a positive spin on discouraging outcomes that even the most
statistically gullible shouldn't be fooled. Mzumara et al. (1999) write
about a machine-scoring placement procedure that they implemented at
their institution, "The placement validity coeficients for a sample
drawn from Fall of 1998 averaged in the mid-teens, but this finding
is still useful for placement purposes" (pp. 3-4). Useful, one supposes,
to argue that the procedure should be scrapped, since such coefficients
account for no more than 2% of student performance in the courses.
Converting
typical placement statistics to actual head count is a little dismaying.
Many students are put in courses too hard for them or, worse, suffer
the stigma of assignment to a remedial course when they could pass the
regular course. Matzen and Hoyt (2004), for example, rated first-week
course writing and calculated 62% of the students at their two-year
college had been misplaced by standardized testing. Smith (1993), which
goes further than any other study in plumbing and explaining the intricacies
of the placement/curriculum interface, determined that 14% of his university's
freshmen were being placed too low, and that was with a local placement
system far better than most, where the teachers of the target courses
read two-hour essays and assigned the writer to an actual course, not
to a point on an abstract scale of writing quality. Even when the degree
of misplacement may seem to impact very few students, Smith rightly
points out that "for the students and for the teachers, 'very few'
is too many" (p. 192).
The placement
system Smith designed, where scoring of essays is done by the teachers
of the curriculum affected, is only one of several attempts by local
writing programs to replace ungrounded national testing procedures with
local contextualized ones and then to validate the new outcomes.
Students with marginal scores can be assigned into the mainstream course
but with required ancillary work in a writing center (Haswell, Johnson-Shull,
and Wyche-Smith, 1994) or a "studio course" (Grego and Thompson,
1995). Or they can be assigned to a two-semester "stretch"
course (Glau, 1996). Students can submit a portfolio of previous writing,
allowing teacher-raters a better sense of where they should start (Huot,
2002; Willard-Traub et al., 1999). On evidence of essays written early
in a course, misplaced students can be reassigned by their teachers
to a more appropriate course (Galbato and Markus,1995, report that teachers
recommend retrofitting of one third of their students placed into their
courses by indirect testing). At the "rising-junior" or halfway
point in the undergraduate course of studies, portfolios of previous
academic written work can be quickly reduced to a few problematical
ones, which are evaluated by groups of expert readers that include a
teacher in the student's major (Haswell, 1998). All of the studies cited
here include careful validation of placement adequacy, although they
do so with a variety of methods. Perhaps on ethical grounds, however,
none uses the most informative method of validation, which would be
to test the predictability of placement decisions by mainstreaming all
of the students and conducting follow-up studies comparing putative
test placement with actual course performance (for one study using that
design, see Hughes and Nelson, 1991, which found 37% of the students
whom ASSET scores would have put into basic writing passed the regular
course).
Mainstreaming,
moreover, raises volatile issues connected with minority status, nonstandard
dialect, bilingualism, second-language acquisition, cultural and academic
assimilation, as well as the writing curriculum itself. Typically a
much higher portion of non-majority students than is represented in
the student body end up in basic writing courses. Does the placement
apparatus serve a classist system of controlled access to higher education
and of oppressive tracking within it? The evidence is mixed. Soliday
and Gleason (1997) found that most students who would have been barred
from general-education courses because of their poor performance on
a placement writing test performed passably if allowed to take the courses,
and Shor (2000) and Adams (1993) describe students who avoided their
placement in basic writing and passed the mainstream course. On the
other hand, retention studies have found that students who take basic
writing are more likely to stay in college and graduate (Baker and Jolly,
1999; White, 1995). These questions about tracking necessarily involve
issues of class, ethnicity, and language preference, and take us back
to commercial testing of writing proficiency, on which minority students,
first-language speakers or not, tend to perform worse than majority
speakers, especially on indirect tests (e.g., Larose et al, 1998; Pennock-Román,
2002; Saunders, 2000; White and Thomas, 1981).
Commercial
firms excuse the poor predictability of their tests, direct or indirect,
by arguing that academic performance is a "very fallible criterion"
(Ward, Kline, and Flaugher, 1986). Yet from the beginning theorists
have responded by pointing out that the tests themselves are fallible
since they measure performance not potential, are "examinations
of past work, rather than of the power for subsequent work" (Arthur
T. Hadley in 1900, cited by Wechsler, 1977). As Williamson put it nearly
a century later, teachers reading student writing truly for placement
"do not judge students' texts; they infer the teachability of the
students in their courses on the basis of the texts they read"
(1993, p. 19). In Haswell's terms (1991), diagnosis, good placement
practice, looks through a placement essay in order to predict the student's
future performance, whereas pseudodiagnosis, poor placement
practice, pretends to do that while actually only ranking the essay
in comparison with other essays (pp. 334-335).
Today the
situation of college entry-level writing placement could be called schizophrenic
with some justice. On the one hand, for reasons of expediency many open-admission
schools in the USA, Canada, and Australia are now placing students with
scores produced by computer analysis of essays (e.g., ETS's E-rater,
ACT's WritePlacer, the College Board's e-Write), even
though the scores correlate highly with human holistic rates and therefore
inherit the same weak predictability (Ericsson and Haswell, forthcoming).
On the other hand, studies of local evaluation of placement essays show
the process too complex to be reduced to directly transportable schemes,
with teacher-raters unconsciously applying elaborate networks of dozens
of criteria (Broad, 2003), using fluid, overlapping categorizations
(Haswell, 1998), and grounding decisions on singular curricular experience
(Barritt, Stock, and Clark, 1986). Perhaps in reaction to such dilemmas,
some institutions are returning to a venerable though largely unvalidated
method of placement, informed or directed self-placement, in which students
decide their own curricular fate based upon information and advice provided
by the program and upon their own sense of self-efficacy (Blakesley,
2002; Royer and Gilles (Eds.), 2003; Schendel and O'Neill, 1999) For
an overview of current placement practices, see Haswell,
2005.
All in
all, the most solid piece of knowledge we have from writing-placement
research is that systems of placement are not very commensurable. Winters
(1978) tested the predictive validity of four measures of student essays—general-impression
rate, the Diederich expository scale, Smith's Certificate of Secondary
Education analytic scale, and t-unit analysis—and found that they
would have placed students quite differently (on the first three, students
judged by their teachers as low performing did better than those judged
as high performing). Olson and Martin (1980) found that 1,002 (61%)
of their entering students would be placed differently by indirect proficiency
testing than by teacher rating of an essay, and 1,051 (64%) would be
placed differently by that teacher assessment than by their own self-assessment.
Meyer (1982) reports that faculty who read take-home essays placed 44%
of the students in a lower course than the one into which they would
have been placed by an indirect verbal-skills examination. Shell, Murphy,
and Bruning (1986) found that a holistic evaluation of a 20-minute essay
correlated with three measures of self-efficacy, the students' "confidence
in being able to successfully communicate what they wanted to say,"
at .32, .17, and .13. Much can be made of this often confirmed finding
of incommensurability, but one conclusion seems hard to gainsay. Educators
who wish to measure writing promise, through whatever the system of
placement, should implement multiple measures and validate with multiple
measures.
Works Cited
Adams, Peter Dow. 1993. Basic writing reconsidered. Journal
of Basic Writing, 12.1, 22-36.
Ewell, Peter T. 1991. To capture the ineffable: New forms
of assessment in higher education. In Janet Bixby; Gerald Grant (Eds.),
Review of research in education (No. 17) (75-126). Washington,
D. C.: American Educational Research Association.
Baker, Tracey; Peggy Jolly. 1999. The "hard evidence":
Documenting the effectiveness of a basic writing program. Journal
of Basic Writing, 18.1, 27-39.
Barritt, Loren; Patricia T. Stock; Francelia Clark. 1986.
Researching practice: Evaluating assessment essays. College Composition
and Communication, 37.3, 315-327.
Blakesley, David. 2002. Directed self-placement in the
university. WPA: Writing Program Administration, 25.2, 9-39.
Broad, Bob. 2003. What we really value: Beyond rubrics
in teaching and assessing writing. Logan, UT; Utah State University Press.
Diederich, Paul B. 1964. Problems and possibilities of
research in the teaching of written composition. In David H. Russell
(Ed.), Research design and the teaching of English: Proceedings of
the San Francisco Conference, 1963
(52-73). Campaign, IL: National Council of Teachers of English.
Elliott, Norbert. 2005. On a scale: A social history
of writing assessment in America. New York: Peter Lang.
Ericsson, Patricia; Richard H. Haswell (Eds.). Forthcoming.
Machine scoring of student essays: Truth and consequences.
Freedman, Sarah Warshauer; William S. Robinson. 1982.
Testing proficiency in writing at San Francisco State University. College
Composition and Communication,
33.4, 393-398.
Galbato, Linda; Mimi Markus. 1995. A comparison study of three
methods of evaluating student writing ability for student placement
in introductory English courses. Journal of Applied Research in the
Community College,
2.2, 153-167.
Glau, Gregory R. 1996. The "stretch program":
Arizona State University's new model of university-level basic writing
instruction. Writing Program Administration, 20.1-2, 79-91.
Grego, Rhonda C.; Nancy S. Thompson.
1995. The writing studio program: Reconfiguring basic writing/freshman
composition. Writing Program Administration 19.1-2, 66-79.
Hake, Rosemary. 1986. How do we judge what they write?
In Karen L. Greenberg, Harvey S. Wiener, and Richard D. Donovan (Eds.),
Writing assessment: Issues aned strategies
(153-167). New York: Longman.
Hartnett, Carolyn G. 1978. Measuring
writing skills. ERIC Document Reproduction Service, ED 170 014.
Haswell, Richard H. 1991. Gaining
ground in college writing: Tales of development and interpretation.
Dallas, TX: Southern Methodst University Press.
Haswell, Richard H. 2005. Post-secondary
entrance writing placement. http://comppile.tamucc.edu/placement.htm
Haswell, Richard H. 1998. Rubrics, prototypes, and exemplars:
Categorization and systems of writing placement. Assessing Writing,
5.2, 231-268.
Haswell, Richard H.; Lisa Johnson-Shull; Susan Wyche-Smith.
1994. Shooting Niagara: Making portfolio assessment serve instruction
at a state university. WPA: Writing Program Administration, 18.1-2, 44-55.
Hopkins, L. T. 1921. The marking system of the College
Entrance Examination Board. Harvard Monographs in Education, Series 1, No. 2.
Huddleston, Edith M. 1954. Measurement of writing ability
at the college entrance level: Objective vs. subjective testing techniques.
Journal of Experimental Education, 22, 165-213.
Hughes, Ronald Elliott;
Carlene H. Nelson. 1991. Placement scores and placement pracctices:
An empirical analysis. Community College Review, 19.1, 42-46.
Huot, Brian. 2002. (Re)articulating writing assessment
for teaching and learning.
Logan, UT; Utah State University Press.
Huot, Brian. 1990. The literature of direct writing assessment:
Major concerns and prevailing trends. Review of Educational Research,
60.2, 237-263.
Kincaid, Gerald Lloyd. 1953. Some factors affecting
variations in the quality of students' writing [dissertation]. East Lansing, MI: Michegan State University.
Larose, Simon; Donald U.
Robertson; Roland Roy; Fredric Legault. 1998. Nonintellectual learning
factors as determinants for success in college. Research in Higher
Education, 39.3, 275-297.
Lederman,
Marie Jean; Susan Remmer Ryzewic; Michael Ribaudo (1983), Assessment
and improvement of the academic skills of entering freshmen: A national
survey. ERIC Document Reproduction Service, ED 238 973.
Matzen, Richard N.; Jeff E. Hoyt. 2004. Basic writing
placement with holistically scored essays: Research evidence. Journal
of Developmental Education,
28.1, 2-4, 6, 8, 10, 12, 34.
McKendy, Thomas. 1992. Locally developed writing tests
and the validity of holistic scoring. Research in the Teaching of
English 26.2, 149-166.
Meyer, Russell J. 1982.
Take-home placement tests: A preliminary report. College English, 44.5, 506-510.
Mizell, Linda Kay. 1994. Major shifts in writing assessment
for college admission, 1874-1964 [dissertation]. Commerce, TX: East Texas State University.
Mzumara, Howard R.; Mark D. Shermis; Jason M. Averitt. 1999. Predictive validity
of the IUPUI web-based placement test scores for course placement at IUPUI: 1998-1999. Indianapolis, IN: Indiana University Purdue University Indianapolis.
Noreen, Robert C. 1977. Placement procedures
for freshman composition: A survey. College Composition and Communication
28.2, 141-144.
Olson, Margot A.; Diane Martin. 1980. Assessment of
entering student writing skill in the community college. ERIC Document Reproduction Service, ED 235 845.
Quellmalz, Edys S.; Frank
J. Capell; Chih-ping Chou. 1982. Effects of discourse and response mode
on the measurement of writing competence. Journal of Educational
Measurement, 19.4, 241-258.
Pennock-Román,
Maria. 2002. Relative effects of English proficiency on general admissions
tests versus subject tests. Research in Higher Education, 43.5,
601-623.
Royer, Daniel J.; Roger
Gilles (Eds.). 2003. Directed self-placement: Principles and practices. Cresskill, NJ: Hampton
Press.
Russell, David R. 2002. Writing in the academic disciplines:
A curricular history. 2nd
ed. Carbondale, IL: Southern Illinois University Press.
Ruth, Leo; Sandra Murphy. 1988. Designing writing
tasks for the assessment of writing.
Norwood, NJ: Ablex.
Saunders, Pearl I. 2000. Meeting
the needs of entering students through appropriate placement in entry-level
writing courses. ERIC Document Reproduction Service, ED 447 505.
Schendel, Ellen; Peggy O'Neill. 1999.
Exploring the theories and consequences of self-placement through ethical
inquiry. Assessing Writing 6.2, 199-227.
Schriver, Karen A. 1989. Evaluating text quality: The
continuum from text-based to reader-focused methods. IEEE Transactions
on Professional Communication,
32.4, 238-255.
Shor, Ira. 2000. Illegal illiteracy. Journal of Basic
Writing 19.1, 100-112.
Shell, Duane F.; Carolyn Colvin Murphy;
Roger Bruning. 1986. Self-efficacy and outcome expectancy: Motivational
aspects of reading and writing performance. ERIC Document Reproduction
Service, ED 278 969.
Smith, William L. 1993. Assessing the reliability and adequacy of
using holistic scoring of essays as a college composition placement
technique. In Williamson, Michael M.; Brian Huot (Eds.), Validating
holistic scoring for writing assessment: Theoretical and empirical foundations (142-205). Cresskill,
NJ: Hampton Press.
Soliday, Mary; Barbara Gleason. 1997. From remediation to enrichment:
Evaluating a mainstreaming project. Journal of Basic Writing, 16.1, 64-79.
Soliday, Mary. 2002. The politics of remediation:
Institutional and student needs in higher education. Pittsburgh, PA: University of Pittsburgh Press.
Speck, Bruce W. 1998. Grading student writing: An
annotated bibliography. Westport,
CT: Greenwood Press.
Thorndike, E. L. 1906. The future of
the College Entrance Examination Board. Educational Review
31 (May), 470-593.
Trachsel, Mary. 1992. Institutionalizing literacy:
The historical role of college entrance exams in English. Carbondale, IL: Southern
Illinois University Press.
Ward, William C.; Roberta
G. Kline; Jan Flaugher. 1986. College Board computerized placement
tests: validation of an adaptive test of basic skills. ERIC Document Reproduction
Service, ED 278 677.
Wechsler,
Harold. 1977. The qualified student: A history of selective college
admissions in America. New York: John Wiley.
White,
Edward M. 1995. The importance of placement and basic studies. Journal
of Basic Writing 14.2, 75-84.
White, Edward M.; Leon
L. Thomas. 1981. Racial minorities
and writing skills assessment in the California State University and
Colleges. College English, 43.3, 276-283.
Whitman, A. D. 1927. The selective value of the examinations
of the College Entrance Examination Board. School and Society, 25 (April 30), 524-525.
Willard-Traub, Margaret;
Emily Decker; Rebecca Reed; Jerome Johnston. 1999. The development of large-scale portfolio placement assessment at
the University of Michigan: 1992-1998. Assessing Writing, 6.1, 41-84.
Williamson,
Michael M. 1993. An introduction to holistic scoring: The social, historical
and theoretical context for writing assessment." In Williamson, Michael M.; Brian Huot (Eds.),
Validating holistic scoring for writing assessment: Theoretical and
empirical foundations (1-44). Cresskill, NJ:
Hampton Press.
Winters,
Lynn. 1978. The effects of differing response criteria on the assessment
of writing competence. ERIC Document Reproduction
Service, ED 212 659. |