THE FINDINGS reported in “Academically Adrift,” the recent book by Richard Arum and Josipa Roksa, have come as a shock for the higher education community. The authors report on a study in which the Collegiate Learning Assessment (CLA) was given to students at four-year colleges across the country at the beginning of their freshman year and in the spring semester of their sophomore year. The CLA measures critical thinking among college students. It is a written assessment that gauges the ability of students to make or break arguments in a coherent and well-reasoned fashion. According to the authors, only about half of the students in their sample showed any improvement on this assessment in the first two years of college. This is shocking: Could it be that so few students make any gain at all in critical thinking in the first two years?
Probably not. The basis for the claim that only 45 percent of students show any learning gains is an analysis that does not adhere to standard methods of calculating improvement. Every test administered to students results in a score that has two components. First, there’s the component that’s due to the student’s actual performance on the assessment. In the case of the CLA, this is the part of the score that reflects the student’s critical thinking ability. The second part is due to measurement error—this is the component of the score that is due to an essentially random set of circumstances that would make students score higher or lower than they ought to, given their actual level of critical thinking skills. For instance, some students with strong critical thinking skills may not respond well to a certain question on the test, while other students with weak critical thinking skills may possibly perform better because the exam asks about an area that they happen to know well. Measurement error on a well-designed assessment like the CLA is, on average, zero. Yet for any individual student, the amount of measurement error might be quite large.
Roksa and Arum, in a methodological appendix co-written with Melissa Velez, acknowledge that the CLA has a large amount of measurement error. Not to worry, they say, since measurement error simply makes estimates at the group level somewhat less precise. They argue that this is not problematic, as the CLA should not be used for high-stakes assessments of individual learning.
Yet, puzzlingly, the authors subject the students in their sample to exactly this kind of high-stakes test. To determine whether students have increased their critical thinking between entering college and the spring of their sophomore year, the authors compare each individual student’s scores from the first time they took the test to the second time. If the student’s score did not increase by more than a benchmark determined by the authors—a benchmark described by Alexander Astin in the Chronicle of Higher Education as “utterly arbitrary”—they say that the student did not learn during their three completed semesters in college. According to this calculation, 45 percent of students did not improve their critical thinking skills. This calculation is highly sensitive to the benchmark chosen. If one were to choose slightly different benchmarks for increases in learning, the proportion of students determined to have improved or not would change markedly.
This is not the way we usually calculate gains in learning. Instead of comparing individuals, we compare groups. When the authors compared the sophomores in their sample to the freshmen as groups, they found that, on average, scores did increase. This increase was modest, but real. In fact, when calculated on a semester-by-semester basis, the amount of gains in learning shown by students in the Arum and Roksa study is about the same as the amount of gains in learning calculated by Terenzini and Pascarella in their comprehensive summary of student learning in the 1980s. The gains shown by students in the Arum and Roksa data were apparent across sub-groups, including those with differing levels of parental education and students from different racial or ethnic groups.
Arum and Roksa’s results do provide us with an important addition to our body of knowledge. They confirm that students gain a modest amount of critical thinking skills in their first two years of college. Their subsequent analyses also provide us with important confirmation of several of the factors that have long been suspected to affect learning in higher education. Students who spend more time on classwork and homework perform better on critical thinking tests than their peers. Students who are more engaged in academic life—being enrolled in a campus where faculty have high expectations, studying by themselves out of class—show bigger increases in critical thinking than students who do not. Students distracted by other parts of their life, and most importantly by non-academic extracurricular activities, show lower increases in critical thinking skills.
The work by Arum and Roksa joins a long line of studies that indirectly measure some of the things we hope will happen in college. This includes increases in critical thinking, academic engagement, social engagement, changes in work habits, and general increases in maturity. Their work, like the other work, demonstrates that it is simply not enough to expect that colleges and universities, left to their own devices, will ensure that learning gains occur. While the increases in critical thinking may not be occurring in as few students as the authors suggest, there is no way to characterize it as anything other than modest among students as a whole.
It is important to know that increases in critical thinking are quite modest in the first two years of college. Yet most students don’t come to college to increase their critical thinking skills. They also don’t come to college to do the kinds of things that other indirect assessments, like the National Survey of Student Engagement, measure. Instead, most students come to college so they can get a better job. They expect that they will learn some specific concepts and actual skills so that they can be employable when they leave.
Similarly, a large part of the support that colleges and universities enjoy with the broader population and with the business community have little to do with any of the peripheral activities associated with a college education. Instead, the public and businesses expect that colleges and universities will create an educated workforce that can ensure graduates with a good standard of living.
The higher education community has developed several high-quality, but indirect, assessments of collegiate performance. We need more assessments that tell us exactly what students know and are able to do. Arum and Roksa’s finding of a 0.5 standard deviation gain in critical thinking skills over the first three semesters of college could be worrying or not—one would have to be quite knowledgeable about the psychometric properties of the exam to assess this finding. Business leaders and the public could readily assess the meaning of tests that told us what percentage of students are proficient in interpreting a budget, or drawing up a plan for directing a project, or interpreting the results of a laboratory test.
Despite the fact that two of the major constituencies of higher education are concerned primarily with specific skills and knowledge, we still have not turned the corner in assessing higher education to be able to say what students know and are able to do. The same arguments for why this isn’t appropriate were also aired during the standards revolution in K–12 education. They were just as unconvincing then.
In some of the few assessments that have been conducted recently that actually measure students’ specific knowledge and skills, the results were no more encouraging than the results from the Arum and Roksa study. For instance, the National Assessment of Adult Literacy found that only 31 percent of college students scored as being proficient in quantitative literacy, a stunning revelation given that these are precisely the kinds of skills most rewarded in the current labor market.
It’s time to stop beating around the bush and instead begin directly measuring what students know and are able to do. It’s not as though these kinds of assessments do not exist now. A large percentage of students, from several of the most popular undergraduate majors, must take standards-based exams to be licensed in their fields. Teachers, engineers, nurses and accountants, amongst others, must take a high-stakes exam that measures specifically what they know and are able to do in their current fields. Of course, there is debate about the quality and relevance of these exams. This hasn’t stopped their widespread adoption. The reason is that for these fields there is a general acknowledgment that we must have some standard of quality for graduates. It is not enough in these fields to simply trust that students will get what they need from college. But for the large group of undergraduates who do not plan to work in fields that require these certifications, there is no standard of quality beyond the degree from their institution. Why do we trust colleges in these areas but not in others?
Even though students in fields such as nursing or engineering have been subject to specific standards of learning that will be measured on exams, instruction in these areas hasn’t been completely devoted to “teaching to the test.” Instead, the traditions of academic freedom and faculty governance have worked exactly like they should. In institutions where the results of these tests are taken seriously, the faculty have shown a large amount of flexibility in adapting their instruction so that students can learn what faculty think is important and be able to measure up to the standards of the field. The two concepts are not divorced from one another.
Indeed, it is actually much easier, and intellectually richer, to shape instructional activities around specific learning goals rather than more general concepts such as “critical thinking.” While it seems likely faculty do indeed hope that their students gain in their ability to think critically, it also seems unlikely that they specifically teach this. Instead, one imagines that faculty teach what they’ve been hired to teach—English, chemistry, music, physics, and so on. It’s much more likely that a chemistry professor will be able to reshape her instructional techniques in order to improve learning in chemistry as opposed to improving critical thinking.
For the purposes of public policy, what we need to know is the level of educational capital in the population as a whole, including the population of college graduates. Work by Margaret Miller and Peter Ewell as part of the Measuring Up project has led the way toward this objective. We next need to understand the contribution of colleges and universities to this level of educational capital. Last, we need to understand how colleges and universities can go about ensuring appropriate levels of performance relative to real standards for specific topics. None of these steps needs to happen in exactly the same way in every state, nor for every college. There is adequate room at every stage for diversity and experimentation.
Continuing with further indirect measures of collegiate outcomes will give us the same kinds of results we have seen—appealing, revealing, yet ultimately difficult to act on. It’s time to get to the specifics of what needs to change.
William R. Doyle is an assistant professor of higher education at Vanderbilt University.