Stop the Harms of Testing

a close up shot of a student writing at her desk

Photo credit: PANITAN/STOCK.ADOBE.COM.

Now might be a good time to rethink the predominant role of testing in American schools. Most education leaders and lay citizens are aware of the negative consequences of too much testing. Teachers and students are stressed, and narrow test formats drive the curriculum at the expense of more creative and challenging learning goals. Yet, education leaders feel obliged to persist in these practices because of constraints in federal law and the sometimes-contradictory demands from parents for regular reports on student progress.

What few realize is that the tremendous growth over time in the amount of testing has systematically undermined the quality of many standardized tests and further worsened negative effects. This happens because high-quality tests are more expensive to develop and time-consuming to score. Lower quality tests are cheap and provide results quickly.

Assessment literacy is intended to solve these problems. Indeed, the various definitions and principles offered by proponents of assessment literacy are well-founded. By arming educators with a better understanding of measurement principles and assessment data, it is assumed that they will choose better tests or develop high-quality assessments themselves and will render better decisions.

In this article, however, I argue that stand-alone assessment literacy initiatives are not likely to improve assessment practices at either the policy or classroom level. Instead, I propose that fundamental changes in assessment are more likely to be successfully implemented if they are grounded in present-day research on learning and motivation. Also, they should be tied directly to teacher professional development aimed at implementation of engaging curricula and ambitious teaching practices. Before outlining this research base and alternative assessment principles, I give a short history of accountability testing and the proliferation of interim commercial tests, explaining how we got to where we are today. I also reference popular assessment literacy proposals and consider why these remedies may fall short.

A brief history of accountability testing

The passage of No Child Left Behind (NCLB) in 2001 prompted the dramatic growth of commercial interim testing products. If districts did not make NCLB’s adequate yearly progress (AYP) mandate toward attaining a goal of 100 percent of students proficient by the year 2014, increasingly draconian consequences would follow. Those consequences would start with notifications to parents, district funding of free tutorial services, and ultimately school closing or restructuring.

Alarmed by these stakes, superintendents and school boards purchased commercial interim testing products that promised to give early warning of end-of-year results. These products, administered three or more times per year, have only increased in popularity over time. In nearly all cases, these multiple-choice-only tests were of lower quality than state tests as measures of important learning goals. Thus, many districts now have two layers of narrow testing formats driving their curricula.

Definitions, intentions, and limitations

Definitions of assessment literacy are quite varied. Key ideas are shared across approaches, but different flavors of assessment literacy tend to emphasize either measurement or instructional perspectives. Everyone agrees on the cardinal measurement principles of validity, reliability, and fairness, but the realizing of these ideas should be quite different in classroom contexts.

Instructionally, reliability would be experienced by students as consistency in the criteria used to give feedback rather than test-retest correlations. To support learning, fairness should attend to culturally relevant pedagogy and honoring of students’ home backgrounds rather than statistical screening of test questions. In general, measurement versions of assessment literacy focus on formal test instruments instead of informal instructional processes by which student understandings are identified and built upon.

At the classroom level, measurement perspectives emphasizing interpretation of test scores can undermine learning. Studies of data-driven decision-making, for example, caution that examining data in professional learning communities (PLCs) can short circuit equity and deeper learning goals. This happens when accountability interpretations drive out commitments to instructional improvement or if students’ demographic characteristics are blamed for outcomes, according to data and equity research by Amanda Datnow, Jennifer C. Greene, and Nora Gannon-Slater. By contrast, instructional approaches to assessment focus on substantive insights needed to guide instructional next steps.

three students take a test

Photo credit: TWINSTERPHOTO/STOCK.ADOBE.COM

District leaders would be ill-advised to invest in standalone assessment literacy initiatives. Measurement-focused versions won’t stop the proliferation of limited, commercial interim tests because testing companies claim that their products meet validity, reliability, and fairness requirements and serve every purpose under the sun. It takes sophisticated psychometric training to recognize that highly correlated subtests can’t yield diagnostic information or to see that just because multiple-choice items fit somewhere within a standards framework doesn’t mean that they fully or adequately capture learning goals. A good thing to remember about multiple-choice tests is that asking students to pick right answers is not the same kind of learning as asking them to develop their own answers and explain their thinking.

Research on learning and motivation

From 50 years of research on learning and motivation, we know that many traditional teaching methods are counterproductive, especially if they rely on rewards and punishments, memorization before thinking, test-teach-test instructional routines, and passive student receptivity.

One critically important insight is that learning and motivation are much more thoroughly integrated than previously thought. A student’s developing identity and sense of self are completely entwined with cognitive growth. In addition, deep learning, critical thinking, and other 21st-century skills are best developed within the disciplines as part of subject-matter learning, according to research by the National Academy of Sciences, Engineering, and Medicine and the National Research Council.

New, more challenging learning goals—such as modeling or argumentation in mathematics—require interactive and participatory instructional practices. To develop thinking and reasoning skills, students need sustained and socially meaningful opportunities to try out their explanations and to ask each other critical questions. This is why there is so much emphasis on discourse-based instructional practices and collaborative learning. Formative assessment questioning and observation along with peer- and self-assessment work best if they are a part of interactive instructional routines.

Such integration is challenging for teachers if they are being asked to make sense of new instructional practices and new assessment ideas separately. Moreover, new standards call for more open-ended tasks and evidentiary reasoning. Inclusive and culturally relevant pedagogies require that teachers learn how to draw on students’ resources from home and community to solve these tasks and to help them translate everyday into canonical understandings. These ambitious practices are coherently aligned but, again, are difficult to implement if each idea is new and presented as a separate reform.

Participatory, engaged, and inclusive classroom structures are also supported by decades of research on motivation. Since the 1980s we have known that rewards and punishments do, indeed, change behavior, but that extrinsic rewards often drive out intrinsic motivation to learn.

Here the research on motivation and cognitive feedback are closely aligned. Feedback that focuses on the task and offers ways to improve enhances learning. In contrast, feedback that compares the individual to others harms subsequent learning, compared to no feedback. Yet, still in the shadow of NCLB, many schools today post children’s scores and proficiency levels on data walls in ways known to be detrimental to subsequent risk-taking and learning.

Assessment, not measurement

Teacher learning about assessment is supported best when integrated with assessment in disciplinary curricular reforms and ambitious teaching practices, not measurement. Instead of stand-alone assessment literacy initiatives, school district leaders would be better advised to invest in “professional development and coaching structures (e.g., time and supports for educator collaboration) that help to coordinate all of the different new things that teachers are being asked to learn.” Those things include learning and motivation theories, asset-based pedagogy, disciplinary practices, and classroom assessment principles.

This advisory statement is offered in support of a set of “Classroom Assessment Principles to Support Teaching and Learning” developed with my colleagues and our district and state department of education partners. Our 11 principles articulate an equity-focused vision for effective classroom assessment practices that are deeply integrated with instruction:

Develop a shared understanding of valued learning goals.
Integrate curriculum, instruction, and assessment based on well-founded theories of learning.
Recognize and build on the knowledge and experiences that students bring from their homes and communities.
Ensure that authentic instructional and assessment tasks are drawn from and connect to life outside of school to enhance both meaning and transfer.
Engage in instructional practices where students talk with each other around meaningful tasks to elicit and extend student thinking and to help students learn to listen and support the development of each other’s ideas.
Value student ideas by presenting tasks in multiple modes and by using artifacts and other representations to document their thinking and learning.
Provide accessible and actionable information about how students and teachers can improve.
Foster student agency and self-regulation.
Integrate linguistic and graphical scaffolds recommended for English learners as a regular part of both instruction and assessment.
Help students and teachers establish a productive relationship between formative feedback and summative assessments used for grading.
Develop grading practices that validly reflect intended learning goals and success criteria, while avoiding the use of grades as motivators.

To enact such a vision, district leaders would need to foster greater collaboration among curriculum, assessment, and professional development specialists. Teachers do, indeed, need support to enhance their assessment literacy, but professional development about assessment will have the greatest consequences for student learning if it is deeply connected with other strands of professional development, especially those aimed at subject-matter standards and diversity and inclusion.

Ideally, conversations in PLCs would switch from focusing on scores to substantive conversations around student work and instructional next steps that are directly responsive to student ideas. This also might be the time for districts to consider divesting from costly commercial testing systems that misrepresent subject matter goals and convey to students the false idea that the reason for learning is to score well on tests.

Lorrie A. Shepard (lorrie.shepard@colorado.edu) is University Distinguished Professor in the School of Education, University of Colorado Boulder, Boulder, Colorado.

九色视频

CUBE 2021 Annual Conference

CUBE 2022 Annual Conference

CUBE 2023 Annual Conference