David Owen
David Owen

Fellowship Title:

Testing Teachers

David Owen
May 14, 1984

Fellowship Year

Teacher in classroomIn November of 1983, the Governor of Arkansas signed a bill requiring public school teachers to pass competency tests in order to keep their jobs. As you might expect, the new law was harshly criticized by teachers. As you might not expect, it was also criticized by the Educational Testing Service (ETS), the nation’s largest testing company and the manufacturer of the examination Arkansas originally wanted its teachers to take.

“It is morally and educationally wrong,” said Gregory R. Anrig, president of ETS, “to tell someone who has been judged a satisfactory teacher for many years that passing a certain test on a certain day is necessary to keep his or her job.” Anrig said his company would no longer sell its National Teacher Examinations (NTE) to states or school boards that use the tests to determine the futures of practicing teachers.

Education writers generally applauded Anrig’s announcement as an unexpected gesture from a tax-exempt $130-million company that has never been eager to acknowledge the limitations of its products (which include not only the NTE but also the Scholastic Aptitude Test, the Graduate Record Examinations, the Graduate Management Admission Test, parts of the Law School Admission Test, and dozens of occupational certification and licensing exams).

But the NTE controversy is more complicated than it may seem. The new ETS position arises more from self-interest than from concern about test use, and it raises hard questions that ETS has consistently found it lucrative to ignore. And attempts by ETS to defend its test only serve to reveal its deficiencies.

The decline of American schools has been a hot topic in recent years. Most people would probably agree with the National Commission on Excellence in Education, which determined in 1983 that deteriorating schools had produced “a rising tide of mediocrity that threatens our very future as a Nation and a people.” Most would probably also agree with the commission’s conclusion that American teachers are poorly trained, badly paid, and not very bright to begin with.

State legislators have been quick to seize the teaching issue. At least sixteen states now require that prospective teachers pass competency tests before receiving teaching certificates. Most use the NTE. “You’re eventually going to have to pass it,” a spokesperson for the Louisiana Board of Education told me, “if you want to get a regular teacher certificate.”

This, apparently, is a use of the NTE that ETS approves of. Indeed, the company’s confidential “Corporate Plan” for fiscal 1984 calls for the addition of “two new states as users of the NTE programs for certification.” In addition, ETS has begun actively marketing a Pre-Professional Skills Test (PPST), a shortened knock-off of the NTE “Core Battery,” as an “employment test.” The PPST, which was introduced in May of 1983, is now required for teacher certification in Delaware.

But if using the NTE and its offspring to test practicing teachers is “morally and educationally wrong,” as the president of ETS has asserted, why is it right to use the same tests to determine which aspiring teachers will be allowed to enter the profession? If the NTE doesn’t test knowledge that teachers need, why should people have to pass it in order to earn teaching certificates? And if it tests knowledge that teachers do need, what’s wrong with requiring teachers to take it?

The true source of Anrig’s concern about the NTE is surely not fairness but rather the flood of lawsuits that would inundate his company if veteran teachers began losing their jobs (or their raises or their promotions) as a result of their performance on an ETS multiple-choice examination. Prior to his announcement, the Arkansas Education Association had said that it would go to court to challenge the state’s new testing law. ETS doesn’t run into this problem with most of its other tests, because most other test-takers–unlike practicing teachers–aren’t represented by powerful unions. Nor are most test-takers in a position to prove ETS wrong, by demonstrating that they are actually capable of doing what ETS says they are not. A high school student who scores poorly on the SAT and as a result is rejected by the college of his choice could never prove in court that he had been treated unfairly.

Anrig told the New York Times last November that using the NTE “as a sole criterion for determining employment or pay scales violates all kinds of Federal laws about the relevance of tests to the workplace.” But if the NTE bears no relevance to the workplace–in this case, the classroom–then it ought to be abolished, not confined to people who lack the means to defend themselves.

How relevant is the NTE? The only way to find out is to look at the test. Curiously enough, much of the NTE–a test intended to separate the able from the inept–is barely literate. Choosing the desired answer to a sample item from the test for kindergarten teachers requires test-takers to form this sentence: “[T]he skill that is prerequisite to the successful completion of this experiment is the ability to be able to make accurate observations” [emphasis added]. Other items are worded so that selecting a single answer is impossible:

Of the following aids to the pronunciation of an unknown word, which would ordinarily be used by a reader after all others have failed?

      1. Configuration clues
      2. Context clues
      3. © Phonic analysis
      4. The dictionary
      5. Structural analysis

Since this item requires the test-taker to assume, in evaluating each choice, that all other choices have been tried without success, any of the choices is correct. It doesn’t matter which you choose if the necessary assumption with each is that you have eliminated all the others.

Frequently the NTE is simply wrong, as in this item from the test in “speech communication”:

Which of the following is the most effective way for a newspaper to protect itself against libel suits?

      1. Avoid the use of the names of celebrities
      2. Giving the source of questionable information
      3. Avoiding ambiguous headlines
      4. Printing an immediate retraction if challenged
      5. Obtaining the consent of persons being quoted

I once worked as a researcher for a magazine that was sued for libel with some regularity. In no case did “obtaining the consent of persons being quoted”–ETS’s “correct” answer–provide any protection whatsoever, because people typically do not sue publications over things that they themselves have said; they go to court over things the publications have said about them. In one memorable case–in which the magazine I worked for was sued for many millions of dollars–charges would never have been filed if the editor had adopted a policy of “avoiding ambiguous headlines.” In every instance where libel suits were feared, the magazine’s attorneys breathed easy if writers were able to provide “the source of questionable information.” Even when these other measures failed, legal threats could almost invariably be stemmed simply by “printing an immediate retraction.” It may also be worth noting that in the most widely publicized libel trial in recent years, Carol Burnett won substantial damages from the National Enquirer, which wouldn’t have been sued in the first place if it had made a practice of “avoiding the use of the names of celebrities.”

Validating Tests

 

It probably doesn’t matter whether “speech communication” teachers know even the first thing about avoiding libel suits. But they shouldn’t be denied jobs just because they know more than ETS.

The NTE examinations are written by ETS employees with the help of “professional educators from all sections of the country,” according to the 1983-84 Bulletin of Information for NTE programs. More interesting than how the tests are written, though, is how they are “validated.” Validity, in testing, is a technical term that refers to the relationship between a test and the purpose to which it is put. ETS measures the validity of the Scholastic Aptitude Test, for example, by comparing students’ scores on the test with their first-year college grades, which are referred to as the “criterion.” If the correlation between the scores and the criterion seems high enough, ETS judges the test to be valid for its intended use in choosing among college applicants.

Validating the NTE is much more difficult, because there is no real criterion against which to measure test scores. Teachers don’t receive grades the way college students do. No one knows how to define good teaching, much less how to measure it. But the law states that an employment test is illegal unless its validity has been proven. The NTE would be valid, ETS decided, if the material covered on the test bore a sufficient resemblance to the material taught in teacher education programs.

The NTE test that Delaware uses is called the PreProfessional Skills Test (PPST). To determine whether the questions in it were valid for certifying teachers in Delaware, ETS submitted a copy of the PPST to a professor from a teacher training school, a public school administrator, and 22 elementary, junior high, and high school teachers. These 24 experts were assembled one day and asked to answer the following questions about the PPST: 1) “Will those who must pass the tests have had the opportunity to learn the basic skills tested?” 2) “Are the tests valid assessments of the basic skills needed to teach in Delaware?” 3) “What would the minimally qualified teacher candidate have to score to pass the test?”

When ETS added up the results, it was mightily impressed. Although only one member of the panel actually worked in a teacher training program, “panel members overwhelmingly reported that candidates would have had the opportunity to learn the skills tested” according to the final report, which was written by ETS’s Gary Echternacht. Furthermore, although “most panel members reported finding some questionable test items, most panel members believed that fewer than 13%…of the items on any one test were questionable….These results strongly support the validity of the test.” (They also strongly support the elimination of a number of test items, but this was not done.) The only real sour notes, according to Echternacht, had come from a vocational education teacher who thought the test might be too difficult “because some voc-ed teaching certificates do not require college educations,” and from a panel member who thought that spelling ought to be tested in the multiple-choice writing test.

Setting the passing score was also easy. Panel members were instructed to arrange individual items from the test into “homogenous groups” and then estimate what percentage of the items in each group a “minimally qualified candidate” might be expected to answer correctly. These percentages were then multiplied by the number of questions in each group, and the resulting figures were added up to provide a minimum passing score for the test. Of course, panel members could have achieved the same result in less time by simply counting the number of items on the test that they felt were important. But ETS prides itself on its scientific approach to testing, and figuring percentages and performing multiplication seem vastly more sophisticated than merely adding up the number of relevant items. In his report, Echternacht refers to the percentage-and-multiplication procedure as “Ebel’s method.”

The estimates panel members arrived at through Ebel’s method varied widely. On the forty-item math test, for example, the panel’s suggested passing scores ranged from 15 to 38–from fewer than a third to all but two of the questions. To arrive at a single score, ETS suggested averaging all the estimates and then adding in a small cushion to allow for “measurement error.” This is what Delaware did.

Using an average as a cutting score is an interesting idea, since it means that one half of the expert panelists must believe that Delaware is now certifying unqualified teachers, while the other half must believe that it is rejecting qualified ones. But a test without a passing score wouldn’t be much use. And besides, finding qualified teachers isn’t the only, or even the most important, purpose of the PPST. ETS recommended to Delaware that it review its passing score every year “to take into account the supply and demand of teachers.” In other words, according to ETS, the test should be used not to identify a pool of qualified teachers but to control the competition for available jobs.

Frank B. Murray, Dean of the University of Delaware’s College of Education, had a different concern. In a letter to William B. Keene, superintendent of the state department of public instruction, Murray said, “I am concerned that the procedure we are following to set a cut-off score is not wise. Apart from the fact that the expert panel procedure (Ebel’s method) is only recommended when you cannot get data from a sample of test-takers, I am worried that the whole point of the competency testing of teachers will be defeated because expert panels, where they have been used, have uniformly set the score too high. To have to lower the cut-off score at some later time would be a public relations disaster. (There are also real questions about whether the panel even understood the Ebel procedure….)”

Despite the threat to its public relations, Delaware now uses the PPST as a requirement for teacher certification, and ETS is working hard to ensure that other states will join it.

Most people don’t distinguish between the question, “Should teachers be competent?” and the question, “Should teachers be required to pass competency tests?” But in fact the two are entirely different. No reasonable person would advocate the hiring of incompetent teachers. But a reasonable person might very well advocate the abolition of teacher competency tests, or at least of the NTE. In fact, one could argue that use of the NTE and tests like it actually work to reduce the quality of the nation’s teachers. This idea seems paradoxical at first, but it is actually quite simple. There are several reasons:

  1. The NTE adds yet another layer to the educational bureaucracy that makes it difficult to reward good teachers and nearly impossible to remove bad ones. If you think it’s hard to get rid of a bad teacher now, just try firing one whose “competence” has been certified by ETS.

  2. Minimum competency tests have a way of becoming maximum competency tests. School districts that have imposed competency tests as requirements for high school graduation have often discovered that the tests tend to establish ceilings rather than floors for student achievement. Why bother reading books if all you have to do to graduate is be able to write checks and read airline schedules.

  3. Far from facilitating educational reform, the existence of an NTE requirement makes it nearly impossible to improve teacher training programs, which in turn make it nearly impossible to change the NTE. Since ETS bases the content of the NTE on its understanding of what is taught in teachers colleges–and since teachers colleges not only admit students on the basis of NTE scores but also dedicate themselves to helping those same students prepare for NTE certifying tests–the tests and the school programs are mutually reinforcing. This is a disturbing phenomenon, particularly since there is now substantial agreement that one of the major causes of the current educational crisis is the dismal quality–and irrelevant curricula–of teacher training programs.

  4. Tests like the NTE are often used less to screen potential teachers than to limit the size of the job market–hence ETS’s recommendation that Delaware adjust its passing score annually to reflect supply and demand. Since minority candidates, as a group, tend to score lower on the NTE than white candidates do, the effects are particularly unfortunate.

Is there an alternative to the NTE? The question assumes that the tests now perform some necessary function. This is not the case. The easiest way to improve on the NTE would be to get rid of it. Beyond this, there are no easy answers. Use of the NTE is a simple-minded response to a very complicated problem. State legislators find the tests irresistibly appealing because they already exist and because requiring them costs taxpayers nothing. The NTE enables all of us to believe we’re taking concrete steps to improve our schools. But the tests, at least as ETS conceives them, can only exacerbate our problems.

No one stands to gain from the proliferation of shoddy competency tests, except the companies that manufacture them. The NTE tests constitute one of the fastest growing programs at ETS. In fiscal 1982, “testing of teachers and other professionals” earned the company $13.7 million. It’s money that should have been spent elsewhere.

©1984 David Owen


David Owen, senior writer on leave from Harper’s Magazine, is investigating standardized testing and American education.

David Owen
David Owen

FELLOWSHIP TITLE

FELLOWSHIP ARTICLES