Link to CSU home page

Information about this Journal

Call for Papers

Submission Guidelines

Events & Opportunities

List of Editorial Board members

List of Contributors

Link to ITL home page

Link to Exchanges home page

Passing Scores and Passive Voice:
A Case for Quantitative Analysis in Writing Assessment

Craig B. Wilson

Department of Teacher Education
California State University, East Bay



This study describes a quantitative analysis of discrete writing elements that can inform feedback to writing teachers and students. Analysis of 80 passing and failing graduate writing assessment essays showed the use of the passive voice to be positively correlated with success on the test. This finding suggests that basic-writing students may benefit from instruction in the rhetorical uses of the passive voice. This instruction should include attention to how proficient writers use various grammatical features to accomplish rhetorical tasks.


Providing students with timely, useful feedback is one of the cornerstones of effective instruction.1 In writing instruction, the essential role of feedback has been reinforced by the adoption of process approaches, in which exchanging written work with peers and engaging in peer feedback are essential steps. Whether coming from a teacher or a peer, feedback on writing always has the characteristic of being individualized. It consists of commentary on a single piece of writing or the writing of a single individual.

However, there are circumstances, as in the case of large-scale writing assessments, when feedback cannot be individualized. When basic writers (and their teachers) receive writing assessment results only in the form of numerical ratings that provide no guidelines for improving writing performance, they experience a “feedback gap.” Useful feedback to address this gap may be provided by quantitative analysis of discrete writing elements. The analysis reported here shows that one discrete element, the use of passive voice, is positively associated with success on a particular large-scale writing assessment.

Over the past 20 years, several broad questions have been raised regarding the delivery of timely, useful writing feedback. Among the questions addressed are those related to focus of feedback (e.g., errors, mechanics, grammar, content); form of feedback (e.g., degree of specificity, organization of written comments, checklists); and effect of feedback (e.g., student understanding, changes in subsequent drafts). In the feedback studies reviewed (including those with non-native as well as native English-language writers), the focus of feedback is most commonly reported to be on errors, particularly those in mechanics and grammar (Chandrasegaran, 1986; Cohen & Cavalcanti, 1990; Radecki & Swales, 1985). Identifying the optimal degree of feedback specificity (Dorow & Boyle, 1998) and the most effective format (Robb, Ross, & Shortreed, 1986; Willingham, 1990) constitute another set of concerns.

Regarding the effectiveness of feedback, Fathman and Whalley (1990) concluded in a study of English as a second language writers that feedback focused on content was less effective than grammar feedback. In another comparison of effectiveness, Saito and Fujita (2004) found that peer feedback was no better in improving writing than teacher feedback. The weakness of peer feedback, according to Nilson (2003), is the failure to provide students with feedback guidance to keep them focused on the text instead of making emotion-laden evaluations of their peers’ writing. In other studies, the value of feedback for improving writing may not always be evident. Cohen’s (1987) survey of students writing in their native and second languages revealed that writers mentally noted less than half of the teacher feedback, and that less than ten percent is actually incorporated into their subsequent drafts.

To an increasing extent, writing is required for large-scale gatekeeping assessments, such as those to meet a writing skills requirement for college graduation. Written work produced under large-scale testing conditions is treated differently from writing done as part of a class assignment. In cases where a numerical rating is the only indication that the essay has been read, writers do not have even the minimal forms of feedback described in the literature. Nevertheless, writing specialists assert the right of students to feedback, even in high-stakes assessments. Consider this statement by the Conference on College Composition and Communication (CCCC) Committee on Assessment (1995), asserting that “teachers and students must have access to the results in order to be able to use them to revise existing curricula and/or plan programs for individual students” (p. 432).

Background of the Study

An opportunity to conduct the study came through my association with the Writing Skills Sub-committee of the academic senate at California State University, East Bay. Since 1977, the California State University system has imposed a high-stakes writing assessment through the Graduate Writing Assessment Requirement (see CSU GWAR Review Committee, 2003). One outcome of the GWAR has been the responsibility placed on each CSU campus to implement writing assessment policies that are valid, reliable, and fair.

The Writing Skills Sub-committee reviews results of administrations of the Writing Skills Test (WST) in order to assure that students “successfully demonstrate a level of writing proficiency commensurate with the expectations for all CSU graduates” (Chamberlain, 2003, p. 28). The sub-committee receives an annual statistical report of each test administration, detailing passing rates for test-takers identified by a range of variables, including primary language, age, gender, and major. These annual reports supply the only “feedback” the committee receives regarding the graded essays.

In 2003, an analysis of completed WST essays was proposed, with the intention that this analysis might provide information useful for the sub-committee’s policy decisions. Because the locus of concern was the borderline between passing and failing essays, examining passing and failing essays abutting this borderline became a central motivation. 2 The line dividing a failing essay from passing essay is described as the difference between a “7” and an “8” on a scale of 0-12. Each WST essay is graded by two readers. Each reader can assign a score of “0” to “6.” If two readers assign a “4” to an essay, it passes. If one of the two assigns a “4” and the other assigns a “3,” the essay receives a failing grade of “7.” 3

Given the thousands of essays available from the spring 2003 WST administration, a small sample was chosen for analysis of essay characteristics. 4 The first problem was determining characteristics that could easily be measured and also serve as the basis for useful, representative feedback. Many of the characteristics identified as criteria for holistic assessments (e.g., strong thesis, adequate organization and development, clear and persuasive reasoning) do not lend themselves to easy identification and measurement. 5 On the other hand, quantification of discrete grammatical features, though an easier task, does not have an immediately apparent value as feedback. Fortunately, a positive relationship between the frequency of some discrete features and academic writing has been demonstrated by linguist Douglas Biber. Extensive statistical analysis by Biber and colleagues (1988, 2001, 2002) 6 shows that certain features (e.g., conjuncts, nominalizations, passives) occur with greater frequency in academic prose than in other genres. The relationship of discrete features in academic writing with passing/failing scores in a large-scale writing assessment program is examined in this study.

Method and Results


Fifteen “7” essays and 15 “8” essays were randomly selected from a pool of 215 “7” essays and of 430 “8” essays responding to one of three prompts. 7 An analysis of selected features took three “rounds.” The first round consisted of counting, calculating, and recording values for the following nine features in each essay: number of sentences, number of tokens, number of token characters; number of types, number of type characters, tokens/sentence, type/token ratio, characters/token, characters/type. 8

The second round of analyses increased coding reliability by multiple readings of each essay to count grammatical errors and misused words, spelling errors, and occurrences of key words and phrases from the prompt. These three categories have their origin in the WST grading criteria, which caution readers not to count errors but to note errors that impede communication.

In this round, three of the linguistic features identified by Biber (1988) as typical of written academic, professional, and official genres were also counted and compared. These features are conjuncts (e.g., consequently, furthermore, therefore, moreover); nominalizations (words ending in, e.g., -tion, -ment, -ness); and passives (by-passives and agentless passives combined into one category). A fourth analyzed feature combined the forms of the first-person pronoun (I, me, my, mine).9 First-person pronouns contrast with the other three features in that they are typical of telephone conversations and personal letters, and atypical of academic prose. 10

Preliminary t-test analysis from the 30 essays chosen in the first round showed that the occurrence of passive constructions in the passing essays was significantly greater than in the failing essays (p < .05), with comparison of other features not showing any significant differences. The final round of analysis was to enlarge the pool of essays from 30 to 80 in order to replicate the previous analysis. Consequently, 25 more “7” and 25 more “8” essays were randomly selected and added to the existing data set, resulting in a total of 40 “7” and 40 “8” essays.

After a count of all possible passive constructions from all 80 essays, the list of linguistic and discourse features was discussed 11 and compared on the basis of a full consensus model to develop simple criteria for determining “true” passive constructions. The two raters eliminated 10 percent of the constructions that did not fit these criteria and arrived at a final count of passive constructions for each essay, rendering a 90 percent inter-rater reliability. A ratio index for each essay was computed by dividing the word count for each essay by the number of passive constructions (see Appendix).


Of the nine features compared during the first round of analysis (word length, sentence length, etc.), independent t-tests showed significant differences only for passive constructions. Specifically, essays scored as “7” (failing) had more sentence errors (M = 53.00; SD = 21.27) than essays scored as “8” (passing) (M = 26.67; SD = 21.31), with the difference statistically significant, t (28) = 3.39, p = .002. It should be noted that this finding is to be expected and validates essay graders’ scores.

The addition of 50 essays and recalculation of the passive constructions in all 80 essays resulted in strong confirmation of the significance of the passive construction in distinguishing passing essays from failing essays. The Independent Samples t-test confirms that the greater ratio (M = 0.0035; SD = 0.0031) of passive constructions in the passing essays compared to the failing essays (M = 0.0035; SD = 0.0031) was statistically significant, t (78) = -3.29, p = .002.

Sentence errors and the use of the passive voice stand alone among the features to show statistically significant differences between passing and failing essays. As the list of identified passive constructions shows, and using a priori test with an alpha of .05, the number of passive voice constructions in the 40 passing essays (M = 4.08; SD = 3.63) is 163, compared to a total of 86 constructions for failing essays (M = 2.15; SD = 2.14), t (39) = -2.78, p = .008. Further validating this finding is the non-significance in the average number of words in passing (M = 606.80; SD = 189.02) versus failing (M = 604.55; SD = 148.33) essays, which is virtually the same (p = .679).


Sentence-structure and word-choice errors are expected differentiating features of failing and passing essays, attesting to the validity of grading rubrics. The significant role of the passive construction in passing essays, on the other hand, may surprise some, in part because of its negative reputation as a structure to be avoided in writing. Turning the positive findings of the passive voice into useful feedback requires a “makeover” to convince writing teachers and students that the passive voice can make writing better. One approach is to demonstrate the value of the passive voice in performing vital rhetorical functions.

The rhetorical uses of the passive voice have already been the subject of linguistic study (Lock, 1996; Shintani, 1979; Thompson, 1987). Further, these uses have been explained in practical terms; Celce-Murcia and Larsen-Freeman (1999) provide the following list of reasons for using the passive voice. The italicized examples come from the corpus of passing essays examined for this study.

  • The nonagent is more closely related than the agent to the theme of the text [thematic cohesion]. In the following example, using the passive voice keeps the focus on “television addict”: Another common trait among television addicts is ignoring their surroundings including their families and friends. Because the television addict is controlled by television….
  • The nonagent is a participant in the immediately preceding sentence [cross-clausal considerations]. Here, the idea of millions of deaths is carried over as “magnitude of the disaster” in the second sentence: The 1984 drought that claimed millions of my countrymen could have claimed the lives of many millions more. The magnitude of the disaster was downplayed by the government….
  • The agent is redundant or easy to supply. Children have needs that must be attended to at any given moment.
  • The agent is unknown.
  • The agent is very general. Watching TV was never mentioned as the solution.
  • The speaker/writer is being tactful.
  • The speaker/writer is being evasive.

(Adapted from Celce-Murcia & Larsen-Freeman, pp. 352-354)

The italicized examples above demonstrate that the passive constructions in the student essays do perform rhetorical tasks, especially the task of maintaining focus on a theme or previously introduced idea. Note the uses of the passive construction (italics added) in the excerpt below, taken from an “8” essay, #815:

There are many different programs that can be seen on T.V. some more educational oriented then others but you can still learn many things from it. The special program design for educational purposes like P.B.S., Discovery Channel and C.N.N. can be highly educational for those that views it. P.B.S. which many kids watch are use to teach kids the basic concept of learning and ideas to prepared them for what is expected when they start to attend school, and for those that have already started school programs showed on P.B.S. can be use to guide them and bring them to a better understanding of what they are taught in school or what is not cover.

Three examples of the passive promote thematic cohesion by maintaining focus on the themes “programs” and “PBS.” The other three passive constructions (those beginning with “what”) are instances in which the agent can easily be supplied by the reader. Note that none of the passive constructions includes an agent. This reflects the relative infrequency of agents in passive use. Most analyses, according to Celce-Murcia and Larsen-Freeman (1999), show agents appearing in only 15-20 percent of passive constructions.

Implications for Research

This study supports the claim that using a particular linguistic feature can improve a student’s chances of passing a writing assessment. It is apparent that readers respond to positive evidence of proficiency in academic writing, as well as to errors and other forms of negative evidence. Biber’s (1988) factor analysis of 67 linguistic features identified other linguistic features (modals, infinitives) that are associated with an argumentative style. These features should also be examined across failing and passing persuasive essays. The linguistic features of other genres may also be useful focal points for research.

Celce-Murcia (2002) has argued that structural elements above the sentence level should also be studied and taught to students. The passive voice is itself a structural element whose influence (in maintaining thematic cohesion) goes above the sentence level. Continued analysis of authentic writing should increase our understanding of the areas we should target in assisting students to gain mastery of a variety of academic genres, at all levels.

Implications and Suggestions for Practice

One implication of this study is that writing teachers should be well acquainted with how and why the passive voice is used in successful academic writing. This knowledge may lead to a change in attitude. Writing teachers and writing texts may represent the passive voice as something to be avoided. One example of misrepresentation can be found in the required text for a writing class at CSU East Bay. 12 The text provides only one example of the passive voice, an awkward distortion of a naturally active sentence, and advises writers to “use active verbs unless there is no comfortable way to get around using the passive voice” (p. 68).

Writing teachers may describe the passive voice as the enemy of clear writing. It is noteworthy, however, that one of the best-loved books championing vigorous, concise prose, Strunk and White’s (1962) The Elements of Style, provides examples of how to use the passive voice and of when it is preferred over the active voice. The text itself employs many passive constructions, 13 thus serving as a model of the usefulness of the passive in promoting written clarity.

A second implication for teaching writing is that teachers should help students identify the linguistic features and their functions in “authentic texts.” Scarcella (2003) argues that effective writing instruction entails teachers’ explicit knowledge of the linguistic features of writing, knowledge that is based on real language use. “Teaching grammar” does not mean a return to the workbook drills of earlier decades. On the contrary, learning about the features of academic writing should entail closer contact with real academic writing: “Language features of academic written English are, in general, best taught in the context of actual reading and writing assignments. Rather than learning rigid rules—a set of writing do’s and don’ts—students can investigate how language is used in real texts” (Scarcella, p. 48).

Finally, teachers should look for materials and approaches that promote attention to how rhetorical tasks are accomplished in real texts. Nilson (2003), for example, provides examples for replacing over-general peer-editing guidelines with more text-based guidelines. Students read more carefully when they have tasks that “demand that students carefully attend to the details …” (p. 36). Examples of text-based feedback prompts are “In each paragraph of the paper, underline the topic sentence,” “Underline all the logical transitions,” and “Bracket any sentences that you find particularly strong or effective.” If a teacher chooses to focus on the passive voice, students could be asked to identify the passive voice and explain why it is used. 14

In finding use of the passive voice as statistically significant for successful gate-keeping essays, this study shows that statistical feedback has a role in closing the feedback gap in large-scale writing assessments. This particular feedback suggests that trained readers are sensitive to the passive voice as evidence of proficiency in academic writing, thus confirming the corpus-based findings of Biber and colleagues (1988, 2001, 2002). Careful examination of good writing shows that the passive voice has several rhetorical uses, including promoting thematic cohesion. Blanket statements about the passive voice or any other linguistic feature being “bad” or “good” misrepresent our linguistic resources and shortchange basic writers.


  1. “Prompt feedback” is, for example, one of the Seven Principles of Good Practice in Undergraduate Education.


  2. At the time, the sub-committee was examining data that showed a sharp drop in passing rates for both undergraduates and graduates during the 2002-2003 academic year. Undergraduates had passed at 49%, down from 65% the previous year; the drop for graduate students was from 69% to 54%. A low pass rate was particularly conspicuous for English language learners, with only 22% of the 699 “ESL” undergraduate first-time test takers passing. Nearly three out of every 10 test takers identified themselves as speakers of English as a second language.


  3. With the elimination in 2002 of an objective (multiple choice) component on the WST, passing or failing is determined solely by the “7-8 line.” The actual range of possible ratings is 0-12; 99% of essays are distributed across the 4-10 range. In 2002-2003, 17% of test takers received a “7” rating and 34% received an “8” rating. Focusing on the “7” and “8” essays thus brings to light essays that are representative of 51% of those taking the test in that period.


  4. These initial decisions were made on the basis of face validity and jointly with Linda Smetana, Associate Professor in the Department of Teacher Education, CSU East Bay.


  5. These criteria are among those listed by the CSU Hayward (now East Bay) Writing Skills Test Essay Scoring Guide.


  6. Wolcott and Legg (1998) may have intentionally referred to the reader’s sensitivity to linguistic features in their discussion of holistic evaluation: “…holistic scoring is a matter of the reader’s mentally absorbing and balancing all the elements—rhetorical as well as mechanical and grammatical—that contribute to the overall impression a paper makes” (pp. 72-73).


  7. Prompt: “The self-confessed television addict often feels he ‘ought’ to do other things—but the fact that he doesn’t read and doesn’t plant his garden or sew or crochet or play games or have conversations means that those activities are no longer as desirable as television viewing. He is living in a holding pattern, as it were, passing up the activities that lead to growth or development or a sense of accomplishment. This is one reason people talk about their television viewing so ruefully, so apologetically. They are aware that it is an unproductive experience, that almost any other endeavor is more worthwhile by any human measure. Is TV viewing as worthless and damaging as this author seems to assert? Write an essay examining the validity of the author’s claims, developing your response with reasons and examples drawn from you own observations, readings, and/or experience.”


  8. “Sentences” end in a period; “tokens” is the total number of words; “characters” are letters; “types” are the unique words for the first 400 words of the essay. Biber (1988, pp. 238-239) recommends sampling types from the first 400 words of an essay. Counting types in this way eliminates the distortion that favors the type count for shorter essays, which are less likely to repeat words by virtue of the smaller number of tokens.


  9. The contribution of Biber’s analysis was its identification of linguistic features that could easily be counted and that might play a role in distinguishing passing from failing essays.


  10. Biber distinguishes the following 23 genres in his analysis of variations of linguistic features across speech and writing: press reportage, press editorials, press reviews, religion, hobbies, popular lore, biographies, official documents, academic prose, general fiction, mystery fiction, science fiction, adventure fiction, romantic fiction, humor, personal letters, professional letters, face-to-face conversations, telephone conversations, interviews, broadcasts, spontaneous speeches, and prepared speeches.


  11. These discussions took place between the author and Sarah Nielsen, assistant professor in the Department of English, CSU East Bay.


  12. Zinsser, W.K. (1998). On Writing Well, 25th Anniversary: The Classic Guide to Writing Non-Fiction. 6th Edition. NY: HarperCollins.


  13. Note, for example, the four passive constructions in this passage from Strunk and White: “Ordinarily, however, a subject requires subdivision into topics, each of which should be made the subject of a paragraph. The object of treating each topic in a paragraph by itself is, of course, to aid the reader. The beginning of each paragraph is a signal to him that a new step in the development of the subject has been reached. As a rule, single sentences should not be written or printed as paragraphs. An exception may be made of sentences of transition, indicating the relation between the parts of an exposition or argument” (p. 11). An arbitrary sampling of the section entitled “Avoid fancy words” in The Elements of Style yields a passive construction/word ratio of .0056, putting it in the top half of passive ratios among the passing essays in the study.


  14. Teachers looking for assistance in developing lessons on the passive voice may refer to a list of pedagogical resources provided in Celce-Murcia and Larsen-Freeman (1999) (page 360).



Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.

Biber, D., & Conrad, S. (2001). Quantitative corpus-based research: Much more than bean counting. TESOL Quarterly, 35(2), 331-336.

Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. (2002). Speaking and writing in the university: A multidimensional comparison. TESOL Quarterly, 36(1), 9-48.

CCCC Committee on Assessment. (1995). Writing assessment: A position statement. College Composition and Communication, 46(3), 430-437.

Celce-Murcia, M. (2002). On the use of selected grammatical features in academic writing. In M. Schleppergrell and M.C. Colombi (Eds.), Developing advanced literacy in first and second languages (pp. 143-158). Mahwah, NJ: Lawrence Erlbaum.

Celce-Murcia, M., & Larsen-Freeman, D. (1999). The passive voice. The grammar book (2nd ed., pp. 343-360). Boston: Heinle & Heinle.

Chamberlain, S. (2003). Writing skills test: Annual report 2002-2003. California State University, Hayward: Office of Assessment and Testing.

Chandler, J. (2003). The efficacy of various kinds of error feedback for improvement in the accuracy and fluency of L2 student writing. Journal of Second Language Writing, 12(3), 267-296.

Chandrasegaran, A. (1986). An exploratory study of EL2 students' revision and self-correction skills. RELC Journal, 17(2), 26-40.

Chickering, A. W., & Gamson, Z. F. (1987, March). Seven principles for good practice in undergraduate education. AAHE Bulletin.

Cohen, A. D. (1987). Student processing of feedback on their compositions. In A. Wenden & J. Rubin (Eds.), Learner strategies in language learning (pp. 57-69). Englewood Cliffs, NJ: Prentice Hall International.

Cohen, A. D., & Cavalcanti, M. C. (1990). Feedback on compositions: Teacher and student verbal reports. In B. Kroll (Ed.), Second language writing: Research insights for the classroom (pp. 155-177). Cambridge, England: Cambridge University Press.

CSU GWAR Review Committee. (2003). A review of the CSU graduation writing assessment requirement (GWAR) in 2002. Long Beach, CA: California State University.

Dorow, L.G., & Boyle, M.E. (1998). Instructor feedback for college writing assignments in introductory classes. Journal of Behavioral Education, 8(1), 115-129.

Fathman, A. K., & Whalley, E. (1990). Teacher response to student writing: Focus on form versus content. In B. Kroll (Ed.), Second language writing: Research insights for the classroom (pp. 178-190). Cambridge, England: Cambridge University Press.

Lock, G. (1996). Functional English grammar. Cambridge, England: Cambridge University Press.

Nilson, L.B. (2003). Improving student peer feedback. College Teaching, 51(1), 34-38.

Radecki, P., & Swales, J. (1985). ESL student reaction and response to feedback on their written work. Papers in Applied Linguistics, 1(2), 70-89.

Robb, T., Ross, S., & Shortreed, I. (1986). Salience of feedback on error and its effect on EFL writing quality. TESOL Quarterly, 20(1), 83-93.

Saito, H., & Fujita, T. (2004). Characteristics and user acceptance of peer rating in EFL writing classrooms. Language Teaching Research, 8(1), 31-54.

Scarcella, R. (2003). Academic English: A conceptual framework. University of California Linguistic Minority Research Institute. Technical Report 2003-1.

Shintani, M. (1979). The frequency and usage of the English passive. Unpublished doctoral dissertation, University of California, Los Angeles.

Strunk, W., Jr. & White, E. B. (1959). The elements of style. New York: Macmillan.

Thompson, S. (1987). The passive in English: A discourse perspective. In R. Channon & L. Shockey (Eds.), In honor of Ilse Lehiste (pp. 497-511). Dordrecht, Holland: Foris Publications.

Willingham, D.B. (1990). Effective feedback on written assignments. Teaching of Psychology, 17(1), 10-13.

Wolcott, W., & Legg, S. (1998). An overview of writing assessment: Theory, research and practice. Urbana, IL: National Council of Teachers of English.

Zinsser, W.K. (1998). On writing well, 25th anniversary: The classic guide to writing non-fiction (6th ed.). New York: HarperCollins.

Posted May 24, 2006.

All material appearing in this journal is subject to applicable copyright laws.
Publication in this journal in no way indicates the endorsement of the content by the California State University, the Institute for Teaching and Learning, or the Exchanges Editorial Board. ©2006 by Craig B. Wilson.

·· exchanges ·· reviews ·· top of this page ·· | ITL home