IS STANDARDIZED TESTING a gateway or a gatekeeper, a road to equal opportunity
or a means of maintaining white male privilege? American public opinion always
has been sharply divided. But you’d never know it from the last presidential campaign,
in which the candidates vied for the position of most enthusiastic test booster.
In one of the debates, George W. Bush scolded Al Gore for allegedly favoring only
“voluntary” testing of America’s students. “You can’t have voluntary testing,” Bush insisted.
“You must have mandatory testing. You must say that if you receive money, you must show
us whether or not children are learning to read and write and add and subtract…Testing is
the cornerstone of reform.”
Now, President Bush has a blueprint for education reform based on this principle. Called
“No Child Left Behind,” the plan, which forms the basis for bills pending in the Senate and
the House, says that “schools must have clear measurable goals focused on basic skills and
essential knowledge. Requiring annual state assessments in math and reading in grades 3–8
will ensure that the goals are being met for every child, every year.”
But of course testing alone ensures no such thing. Because tests are very visible and can
be put into place quickly, they often are instituted as the first step in educational reform,
before changes in curriculum standards and instruction are put into place. And testing can
divert resources that could otherwise be used to implement these crucial changes.
Many states have made testing a centerpiece of their education programs inrecent
years, only to find that improvements in student learning did not obediently follow the
implementation of the new assessments. Alaska,
Arizona, Illinois and Massachusetts are among
the states reporting failure rates of 50 percent or
more on some components of their statewide
exams.
Some schools have resorted to extraordinary
means in order to demonstrate score increases,
including an Oregon elementary school that was
acclaimed as the state’s most improved school
earlier this year. The school was found to have
tested only 55 percent of its third graders with
the standard state reading exam, mainly as a
result of exempting students with limited
English proficiency. Although the school was
evidently playing by the rules, its participation rate was far lower than the state average of
90 percent.
Outright cheating by school personnel on standardized tests has been reported in at least
a dozen states in recent years. The massive test cheating scandal in New York City, which
allegedly involved more than 50 educators, is still in the news two years after it was brought
to light.
Testing proponents argue that despite its rocky start, the standards-based reform
movement, which emphasizes accountability through testing, will ultimately boost student
achievement. The new approach, they claim, just needs some time to work. In the
meantime, what could be bad about monitoring student learning? One reply comes from an
unlikely test critic—Greg Anrig, who was the third president of Educational Testing
Service. Anrig used to say that testing grade school kids on a frequent basis is like
repeatedly pulling up carrots to see how they’re growing. Testing, in other words, can
interrupt the very process it is intended to assess.
The amount of classroom time spent on testing has escalated dramatically in recent
years. In California, which leads the nation in terms of hours devoted to standardized
testing, according to a recent Education Week survey, students in grades two through 11
spend an average of six to eight hours per year on tests. And the amount of time devoted to
assessment is likely to increase nationwide: In addition to annual state testing of students in
grades three through eight, the Bush plan declares that “a sample of students in each state
will be assessed annually with the National Assessment of Educational Pro g ress (NAEP)
fourth and eighth grade assessment in reading and math.”
In order to receive their full share of federal education dollars, states will have to
demonstrate progress by “disadvantaged” students on the states’ own tests, and these gains
will have to be “confirmed” by the NAEP results. (The House version of the bill gives states
the option of confirming their results with other tests that meet “widely recognized
professional and technical standards.”) It’s no wonder that Bush’s reform package is
referred to in some government circles as “No Child Left Untested.”
And of course, it is not merely the testing time itself that is lost when new assessment
programs are added. Teachers, parents and researchers all have bemoaned the “teaching to
the test” phenomenon, in which test preparation drills crowd out instruction on more
complex and important material.
In a national survey of public school teachers conducted by Education Week in 2000,
nearly 70 percent of teachers said that state standards have caused instruction to focus “far
too much” or “somewhat too much” on tests. One teacher quoted at a National Education
Association convention last year vividly described the current testing frenzy as an
“education-eating bacteria” that is overtaking our schools.
According to some critics, teaching to the test is the primary explanation for the “Texas
Miracle”—the large score gains on statewide tests for both minority and white students in
Bush’s home state. To see if these increases were reflected in other measures of achievement,
researchers at the Rand Corporation compared scores on the Texas Assessment of
Academic Skills (TAAS) to results for Texas and for the nation on the National Assessment
of Educational Progress—the very test that is to be used to confirm state gains, according to
“No Child Left Behind.”
The researchers, Stephen P. Klein, Laura S. Hamilton, Daniel F. McCaffrey and Brian
M. Stecher, focused on changes in fourth grade math and reading achievement and eighth
grade math achievement during the 1990s. (Data were not available for an analysis of eighth
grade reading.) TAAS and NAEP gains were compared in terms of “standardized
differences,” obtained by dividing the change in the average score by the standard
deviation, an index of the variability of the scores. Although NAEP results confirmed that
school achievement in Texas improved, only in fourth grade math were the Texas gains
substantially greater than those for the nation as a whole.
More significantly, the score gains on the TAAS dwarfed the NAEP increases, especially
for minority students. For example, between 1994 and 1998, the increase in fourth grade
reading achievement for African American students on the TAAS was about three times as
large as the gain on NAEP. And while the gap between minorities and whites on the TAAS
shrank between 1994 and 1998, this decrease was
not paralleled by the NAEP results. (A report
just released by the National Education Goals
panel, a bipartisan group of governors and
legislators, shows that the Texas score gap on
NAEP held steady during the 1990s, lending
support to the Rand conclusions.)
What is the reason for the discrepancies
between NAEP and TAAS? The Rand researchers speculated that “many schools are
devoting a great deal of class time to highly
specific TAAS preparation. It is also plausible
that the schools with relatively large percentages
of minority and poor students may be doing this
more than other schools.” The authors reasoned
that the preparation must have been quite narrow
in scope because, “if TAAS scores were
affected by test preparation, then the effects did
not appear to generalize to the NAEP exams.”
Just as it occupies classroom time, testing, of course, drains financial resources as well. A
question that is all too rarely asked is, “Could the money expended to add more testing be
put to use in a more effective way?” According to one state testing director, the cost of
assessing a child is roughly $15 per year, including test development, administration, scoring,
analysis and reporting. Not a huge sum, perhaps, but under the Bush plan, that’s $15 per
year for every third through eighth grader in the United States. One testing expert
anticipates that the Bush plan will add $150 million to the states’ expenditures on K–12
testing, currently estimated to be about $400 million. How else could we spend that money?
What if it were used to increase teachers’ salaries and improve their continuing education
opportunities; or to beef up course offerings and tutoring programs for students; or to repair
decaying school buildings and expand libraries and computing facilities? Can promoters of
increased assessment make the case that adding tests is a more effective use of resources?
On the subject of the “No Child Left Behind” proposal, Democratic Senator Barbara
Mikulski of Maryland remarked, “We’re worried that no child be left out of the appropriations process.”
Education reform requires a commitment of resources to the
improvement of teaching and learning, especially in poor communities. Testing should
follow rather than precede these changes.
Thermometers don’t cure fevers, and testing does not fix school problems. Testing is not
the cornerstone of educational reform. Learning is.