For the first time in history, every state has an approved accountability plan to ensure academic proficiency for every child. Achievement gaps are being identified and addressed. The success of schools is now being measured on the academic achievement of all students so that children who need help aren't hidden in averages."
This is part of what then-Secretary of Education Rod Paige had to say about the No Child Left Behind Act (NCLB) in a March 2004 "memo to editorial writers" that still appears on the U.S. Department of Education website. Now, as the 2007 reauthorization of NCLB approaches, a vast number of children who need help in school have vanished from attention entirely. And these children are likely to be racial or ethnic minorities, special education students, or students with limited English proficiency.

This may seem surprising, given that NCLB proponents have relentlessly touted the law's requirement that results be considered separately for ethnic and socioeconomic groups, for students with disabilities, and for those with limited English proficiency. Each school must calculate, for each grade, subject area and student group, the percentage of students who are proficient according to the state's federally approved standards. If even one of these groups, in a single grade and subject area, does not attain the targeted proficiency rate, the school in question will be labeled as having failed to make AYP-adequate yearly progress. Or that's how it's supposed to work.

From the very beginning, though, there has been an out: the "minimum n," or minimum group size. If the number of the students in a particular group falls below a minimum, the school is off the hook, because the results for that group do not count in AYP calculations. (These students are still included in the overall school results, but they are, in fact, "hidden in averages," the very situation NCLB was supposed to prevent.)

In April 2006, the Associated Press estimated, based on its own analysis, that nearly two million students were not being counted because of this loophole. Even more troubling was the fact that minority students were seven times as likely to be excluded as white students. In response to the AP analysis, Secretary of Education Margaret Spellings said, "We ought to do more about that," adding that the issue will be addressed during the reauthorization process.

So how did this disturbing situation come about? In their NCLB plans, states were asked to propose a minimum n—the number of students below which a group's results wouldn't count. These minimum group sizes, along with the rest of the states' accountability plans, had to be approved by the Department of Education. Although there was some variation, most states initially set minimum group sizes of 30 to 40 students. But in 2004, the states were invited to initiate changes to their NCLB plans, resulting in a flurry of state requests for amendments and exceptions.

The many modifications to state accountability policies have drawn the attention of a number of agencies, including the Council of Chief State School Officers (CCSSO), the Center on Education Policy (CEP), an independent research organization, and the Harvard Civil Rights Project, all of which have published reports on this topic. According to a November 2005 CEP report, 23 states asked for and received amendments to their minimum n rules in 2004 or 2005. Some states have now received approval to use minimum numbers as large as 100 or 200 (e.g., Florida and Virginia respectively).

As the CEP report notes, "Many states have responded [to the difficulty of making AYP] by raising the minimum subgroup size, so that more students with disabilities can be excluded from school-level AYP calculations." For example, Alaska obtained permission from the Education Department to increase its minimum group size from 20 to 40 for students with disabilities, and for students with limited English proficiency. The motivation for Alaska's request, according to an October 2005 CCSSO report, was that 65 of the state's schools had failed to make AYP because of missed targets for students with disabilities.

In addition, some states have received permission to use "confidence intervals" in making AYP determinations. According to the CEP, 31 states included these in their original plans, and 27 states were granted permission to amend or add a confidence interval procedure in 2004 or 2005.

Confidence intervals are widely applied in many statistical analyses, including political polls. How do they work? Suppose a pollster seeks to estimate the percentage of U.S. adults supporting Candidate X. We'll never know the true percentage since we are not contacting all US adults. Based on a random sample of 1,000 people, the pollster reports that the percentage supporting the candidate is 47 percent, plus or minus three percentage points. This conclusion is roughly equivalent to saying we're 95 percent confident that the "true" percentage lies between 44 percent and 50 percent. (A slightly more rigorous way of describing this result is that this procedure will lead to an interval that brackets the true value 95 percent of the time.) The interval ranging from 44 percent to 50 percent is called a 95 percent confidence interval for the true percentage of voters supporting Candidate X.

A similar idea is behind the use of confidence intervals for NCLB proficiency rates (although the situation is somewhat different, in that a school's estimated proficiency rate is not based on a random sample of students). Suppose a school's proficiency rate for a particular category of students is 47 percent, with a confidence interval ranging from 44 percent to 50 percent. If the school's target rate is 50 percent, the school is considered to have met the target, even though its actual rate is lower, because 50 percent is included in the interval. This may not seem particularly troublesome in this example, but in fact, the confidence interval for a small school would be much wider than six percentage points, which means that a school with a proficiency rate much lower than the target could escape accountability.

Confidence intervals "automatically" take into account the fact that, all other things being equal, a proficiency rate based on a smaller group of students is less precisely determined than a rate based on a larger number. While the computation of confidence intervals is a well-established procedure, the inconsistent application of this approach in NCLB compounds the inequities resulting from the lack of uniform minimum group size rules. Confidence interval rules essentially supersede minimum n rules-even if the minimum n has been met, the confidence interval provides a further route through which a group can be exempted from AYP calculations. If two states have the same minimum n, but one uses confidence intervals, the confidence interval state is, in effect, imposing a less stringent standard.

To further complicate things, the confidence interval procedures adopted by the states vary in their confidence level (and in other technical respects). In AYP calculations, 95 percent or 99 percent confidence intervals are typically used. For a given proficiency rate and number of students, a 99 percent interval will be wider, which, in effect, gives the state additional leeway.

To better understand the effect of differing policies involving minimum group sizes and confidence intervals, consider a hypothetical example involving four schools, one in each of four states. Suppose the target proficiency rate in, say, grade four math for all four states is 50 percent. Now let's focus on a particular student population—say, students with limited English proficiency. For simplicity, we'll assume there are exactly 30 such students in each of the four schools. Suppose that the schools have met AYP standards for all other student groups, so their fate rests on the determination that is made regarding the math proficiency of grade four students with limited English proficiency. Let's say the percentage of such students who are math-proficient is 49 percent in School one, 35 percent in School two, 30 percent in School three, and 25 percent in School four. Under very realistic conditions, we could find that School one, with the highest proficiency rate, did not make AYP, while the other three schools did.

How could that happen? Suppose School one is in a state that uses a minimum n of ten, like Louisiana. In that case, its math proficiency rate of 49 percent for grade four students with limited English proficiency counts toward the AYP decision and, although the rate is close to the 50 percent target, it falls short.

School two, with a proficiency rate of 35 percent, is located in a state that uses 95 percent confidence intervals, like Illinois. Because the interval (which I calculated using the proficiency rate and the number of students) stretches from 17.9 percent to 52.1 percent, the school's proficiency rate is deemed indistinguishable from the state target rate of 50 percent, and the state is in the clear.

School three, with a proficiency rate of 30 percent, is in a state that uses 99 percent confidence intervals, like Mississippi. Here, the interval ranges from 8.4 percent to 51.6 percent, so again, the conclusion is that the school's proficiency rate is not statistically different from the state target rate.

Finally, School four, with a proficiency rate of only 25 percent, has not jeopardized its AYP status because the state in which it is located uses a minimum n of 40 for students with limited English proficiency, like Minnesota.

Now, it's not bad that NCLB is flexible in its implementation. No one who cares about education would argue for rigid adherence to a one-size-fits-all policy for all aspects of the legislation. But the variation across states is troubling from two perspectives. First, some state actions—like Alaska's request to raise the minimum n for students with limited English proficiency and for students with disabilities—are clearly attempts to evade the law's intended focus on student groups that have historically been ignored. Students who are excluded from NCLB group-level calculations are, of course, likely to be those from smaller groups such as ethnic minorities. Second, the government's decisions about what policies to permit seem to be quite haphazard, resulting in an odd patchwork of inconsistent guidelines.

As a result, states can't be compared to each other, or even to themselves over time. Consider that states started out with different tests and different proficiency definitions, as well as different proposed trajectories toward the official goal of 100 percent proficiency. Now add to this the effect of the modifications and exceptions to state accountability plans. As the CCSSO report notes, "the accumulation of amendments and 'flexibility' may result in educational accountability systems that lack any real connection to the achievement goals they were supposed to realize."

Put another way, the federal government claims that under NCLB, all children are being counted. They are not.