Here’s a piece that was just posted on Diane Ravitch’s blog. Lots of comments! Hopefully this will move the conversation forward t0 good public policy.
Miriam Kurtzig Freedman, an attorney who represents public schools in education matters, including testing and special education—and is currently working to reform special education—posted this comment. Her website is http://www.schoollawpro.com.
Can we really use student tests to measure teacher effectiveness?
Miriam Kurtzig Freedman, M.A., J.D.
This is the year! Tests related to the Common Core State Standards (CCSS) are launching across our country. They are designed to measure how well students are learning the CCSS. Meanwhile, some states, with federal encouragement, plan to use them also to measure teacher effectiveness. Is this use valid?
There is no shortage of controversy about educational testing and, unfortunately, this controversy includes the opportunity to file lawsuits. The use of student achievement data to also evaluate teacher effectiveness is certainly controversial. Notably, Arne Duncan, the Secretary of Education, gave states a year’s reprieve on implementing this practice. Across the country, teacher unions have called it unfair. My concern is far more basic. It’s about validity.
As an attorney who has represented public schools for more than 30 years, I am concerned about this multipurpose use. It may not get us what we need—a valid, reliable, fair, trusted, and transparent accountability system. The tests at issue include the PARCC and SBAC, two multi-state consortia that are funded by the U. S. Department of Education and private funders. They were charged with developing an assessment system aligned to the CCSS by the 2014-15 school year.
At last count, these consortia have 27 states and the District of Columbia signed up— affecting 42% of U.S. students according to Education Week.
The media remind us constantly that our ‘failing’ schools need fixing; that, to do so, we should assess student skills and knowledge to help teachers improve instruction; that we also need to evaluate and rate teachers and weed out poor performers. And we are told that these tests can be multipurposed to do all of the above!
Sounds good? Actually, it sounds too good to be true. Does this multipurpose use to evaluate teacher effectiveness clear a key psychometric hurdle: test validity?
What is test validity?
At its core, it is the basic, bedrock requirement that a test measure what it is designed to measure. Thus, if a test is designed to measure how well 3rd graders decode, we judge the test according to how well it does that. Can students decode? If it is designed to be predictive; say, to measure if students are ‘on track’ or progressing toward college or career-readiness, we judge it accordingly. Either way, we must ask if a test whose purpose is to measure what students learn or whether they are ‘on track’ can also be used to measure something else—such as how well teachers teach?
So what are these tests’ purposes? For answers, let’s review the PARCC and SBAC websites. First PARCC, the Partnership for Assessment of Readiness for College and Careers:
PARCC is a group of states working together to develop a set of assessments that measure whether students are on track to be successful in college and their careers. These high quality, computer-based K–12 assessments in Mathematics and English Language Arts/Literacy give teachers, schools, students, and parents better information whether students are on track in their learning and for success after high school, and tools to help teachers customize learning to meet student needs.
PARCC is based on the core belief that assessment should work as a tool for enhancing teaching and learning. Because the assessments are aligned with the new, more rigorous Common Core State Standards, they ensure that every child is on a path to college and career readiness by measuring what students should know at each grade level. They will also provide parents and teachers with timely information to identify students who may be falling behind and need extra help. [Emphasis added]
Second, the SBAC, Smarter Balanced Assessment Consortium:
The [SBAC] is a state-led consortium working to develop next-generation assessments that accurately measure student progress toward college- and career-readiness. Smarter Balanced is one of two multistate consortia awarded funding from the U.S. Department of Education in 2010 to develop an assessment system aligned to the Common Core State Standards (CCSS)by the 2014-15 school year.
The work of Smarter Balanced is guided by the belief that a high-quality assessment system can provide information and tools for teachers and schools to improve instruction and help students succeed – regardless of disability, language or subgroup.
Smarter Balanced involves experienced educators, researchers, state and local policymakers and community groups working together in a transparent and consensus-driven process. [Emphasis added]
Clearly, these tests’ purpose is to (a) measure student progress on the Common Core State Standards (CCSS) and college or career readiness, (b) give teachers and parents better information about students, and (c) help improve instruction. No mention is made of gauging teacher effectiveness.
Yet, questions about the validity of using these tests in this multipurpose way seem to be missing from national discussions, even as other validity issues are raised. For example, questions are raised about score validity when tests are administered in different ways (on a computer or with paper and pencil) and at different times of the year.
Also discussed are questions about whether these tests are aligned to the CCSS. The media reports battles among states, unions, and others about how to measure teacher effectiveness through these tests; e.g., through value-added models, student growth percentages, or other approaches. But, questions of basic test validity from the get-go about this multipurpose use of these tests are not part of today’s public discourse.
They should be.
If we continue on this track of creating high stakes for teachers with tests designed for a different purpose, we may well end up with unintended consequences, including distrust of the system, questionable accountability, and lawsuits.
My suggestion? Given the reprieve for states and growing concern among the public about these tests and the CCSS themselves, test consortia and our federal and state governments should take a deep breath and do two things.
First, the consortia should remind the public that the purpose of these tests is to measure student achievement on the new CCSS and career and college readiness, provide better information to teachers and parents, and improve instruction.
Second, the states (with federal approval and encouragement) that intend to use these results also to evaluate teacher effectiveness must inform the public explicitly about how they intend to validate the tests for this new purpose. They need to provide solid proof that their proposed use, which differs from the stated purpose of these tests, is valid, reliable, and fair. The current silence is worrisome, not transparent, and unwise.
This test validity issue needs to be fully aired and resolved satisfactorily before we can begin to tackle the larger issues about the multiple uses of testing. Otherwise, in our litigious land of opportunity, the ensuing battles may be costly and not pretty. Let’s not go there