“Teaching Skill Under Scrutiny”
by W. James Popham
Popham, W. J. (1971). Teaching skill under scrutiny. Phi Delta Kappan, 53(1), 599-602.
Results of a recently reported series of investigations reveal that experienced teachers may not be significantly more proficient than “people off the street” with respect to accomplishing intended be- havior changes in learners. In three separate replications, groups of experienced teachers were unable to out-perform non teachers in bringing about specified changes in learners. This article will 1) summarize those investigations and the rationale underlying them, 2) consider the generalizability of the results, and 3) offer recommendations for altering an unacceptable state of affairs in the teaching profession.
A Measure of Teacher Effectiveness
The research reported here stemmed from an attempt to isolate a readily usable indicator which could be employed to assess a teacher’s instructional skill. Anyone who has followed the search for a satisfactory measure of teaching proficiency must conclude that this area of inquiry may well represent one of the most high-investment/low-yield activities of our field. For over 70 years researcher after researcher has tried out such devices as administrator ratings, pupil ratings, systematic observations, and student performance on standardized tests. With few exceptions, the results have been thoroughly disappointing. Briefly, let’s see why.
Ratings of teaching skill, whether supplied by administrators, pupils, or a visiting mother-in-law, are notoriously inaccurate. The administrator-rater looks in on Mrs. Jones’s class and, if he sees her engaging in those splendid techniques he employed during his own marvelous moments as a classroom teacher, Mrs. Jones gets a good rating. Pupils may rate an instructor positively because he is a lenient grader or because he has a good sense of humor. In other words, ratings of teaching proficiency are based on highly variable conceptions of what constitutes good teaching. One rater’s “dynamic” teacher is another’s “unorganized” failure. That these variably derived and often unreliable ratings of teaching skill do not correlate well with most measures of pupil achievement should not surprise us.
Another widely used index of teaching effectiveness involves the use of systematic observations of the teacher’s classroom practices. Employing more or less systematized check sheets, someone observes the teacher in action and derives an estimate of the teacher’s skill based on the degree to which certain process variables are present (for example, frequency of teacher questions, pupil talk, etc.). The problem with the observation approach is that it is so process-focused that the observer rarely moves to the logical follow-up question: “What happens to pupils as a consequence of the teacher’s using these processes?” The chief problem for proponents of observation-derived estimates of teaching skill stems from the clear evidence that widely divergent instructional tactics can be used to promote identical instructional goals. For one teacher a nondirective approach may be ideal, while another teacher might find a highly directive approach preferable. Yet, because of their idiosyncratic personalities, prior experience, and other variables, both teachers’ approaches may be equally effective. Thus, while observational techniques may be helpful to a teacher for analyzing his instructional activities, they should not be employed as an index of teacher effectiveness. The correlation between instructional process and results in learners is not strong enough.
The third most widely used measure of teaching skill is pupil performance on standardized tests. But since standardized tests are designed chiefly to discriminate among learners,2 not necessarily to indicate the degree to which identifiable skills have been mastered, they have not provided us with sufficiently sensitive estimates of how much progress pupils have made with a given teacher. An even more important reason for eschewing standardized tests is the fact that different teachers have markedly different emphases, even in the same course. One geography instructor will emphasize topography, another will stress natural resources. Given the grossness of standardized tests to begin with, such instruments cannot accommodate teachers’ differential emphases.
Teaching Performance Tests
In an effort to provide a more defensible approach to the measurement of teaching skill, a series of investigations initiated at UCLA in 1964 resulted in the development of the teaching performance test, a heretofore untried vehicle for assessing instructional proficiency. This approach is predicated on the assumption that the chief reason for a teacher’s existence is to promote beneficial changes in learners. While we may expect a teacher to perform other functions, perhaps the most important role of the teacher is to modify learners so that they possess more knowledge, employ it more skillfully, cope more satisfactorily with their environment and, in general, function as more humane members of a perilously threatened world society. One crucial ingredient of the teacher’s skill rests on this ability to change learners. A teaching performance test measures such ability.
Briefly, teaching performance tests avoid the measurement problems arising from different teachers’ pursuits of different objectives. This is accomplished by asking teachers to achieve the same objectives, yet permitting them to employ their own pedagogical preferences in doing so. By holding instructional goals constant it becomes possible to contrast teachers with respect to their skill in accomplishing identical goals. Procedurally, a teaching performance test is carried out as follows:
1. The teacher is given one or more explicit instructional objectives (and, preferably, a sample of the measurement procedure used to assess each objective), plus any necessary background information related to the objectives.
2. The teacher is given sufficient time to plan an instructional sequence to accomplish the objective.
3. The teacher then instructs a group of learners in an effort to have the learners achieve the objective.
4. At the conclusion of the instruction the learners are measured with respect to the objectives, their per-formance providing an estimate of the teacher’s instructional skill.
Development and Validation
As the chief focus of a four-year investigation at UCLA, three teaching performance tests were developed in the field of social science, electronics, and auto mechanics.
The social science performance test dealt with the topic of social science research methods and consisted of 13 specific instructional objectives measured by a 68-item posttest. The electronics performance test created basic power supplies and contained 23 instructional objectives measured by a 47-item posttest. The auto mechanics performance test dealt with carburetion and possessed 29 instructional objectives measured by a 99-item posttest. In addition, all three performance tests contained a set of resource mate-rials which could be used in planning an instructional sequence to accomplish the objectives. All materials associated with each of the performance tests were reviewed during development by a number of practicing teachers and other subject matter experts. In addition, each test was subjected to several field trials before the final versions were assembled.
In deciding on a reasonable approach to validate this method of assessing teacher effectiveness, a con-struct validation strategy was selected. Considering the nature of the requirements of teaching performance tests, it seemed that these tests ought to be able at least to distinguish between grossly disparate groups such as credentialed, experienced teachers and those who were neither credentialed nor experienced. In other words, if one were to ask a group of experienced teachers to complete a given performance test, in contrast to a group of people off the street, the experienced teachers ought to markedly outperform their inexperienced counterparts.
To test this validation hypothesis, suitable numbers of teachers and non-teachers were recruited in the Southern California region. After several months of recruiting and establishing administrative arrangements, 13 high school social science teachers, 16 high school and junior college electronics teachers, and 28 high school and junior college auto mechanics teachers were chosen to participate in the research. Identical numbers of non-teachers were also located. For the social science performance test, state college students were selected who were social science majors or minors but who had never taught or completed any professional education course work. For the auto mechanics test, garage mechanics served as the non-teachers. The non teachers for electronics were television repairmen and electronics industries workers. All three performance tests were subjected to validation contrasts in school situations involving 2,326 public school students. Although there were slight differences in the three tests, the general procedure required that each participating teacher have at least two sections of an appropriate class. One of these classes was then randomly assigned to the non teacher, while another was randomly assigned to the regular teacher. Approximately two weeks prior to instruction, both the teacher and the non-teacher received the objectives for the performance test and the resource materials. Each was directed to plan a short unit of instruction to accomplish the objectives. No restrictions regarding instructional tactics were imposed; participants were asked only to achieve as many of the objectives as they could in the time available. Nine instructional hours were allowed for the electronics and auto mechanics test, four hours for the social science test.
On a prearranged date both the teacher and the non-teacher commenced instruction. At the close of the instructional period a member of the project research staff administered the post test , previously unseen by teacher and non-teacher participants, to all pupils. In addition, a brief affective questionnaire was administered to students regarding their feelings about the subject matter of the unit.
Contrary to prediction, the experienced teachers did not markedly outperform their inexperienced coun-terparts on any of the three teaching performance tests. Although there were slight differences in favor of the teachers. none reached statistical significance. Posttest results are presented in Table I using average classroom means as the analysis unit.
In addition, analyses of students’ responses to the anonymous questionnaires revealed no significant differ-ences between the teacher and non-teacher groups. In short, no reliable differences favoring the experienced teachers were found. Why?
Although space limitations preclude an examination of possible methodological defects which might contribute to these results. There appear to be no readily available loopholes by which we can explain away the non-significant outcomes. A more straightforward explanation is available. Experienced teachers are not particularly skilled at bringing about specified behavior changes in learners.
We should not be surprised that teachers are not skilled goal achievers. Certainly they have not been trained to be; teacher education institutions rarely foster this competence. Nor is any premium placed on such instructional skill after the teacher concludes pre-service training. The general public, most school systems, and professional teachers groups rarely attach special importance to the teacher’s attainment of clearly stated instructional objectives. .
For further corroboration of this interpretation, one needs only to speculate on the typical intentions of most public school teachers. They wish to cover the content of the course, to maintain classroom order, to expose the student to knowledge, and so on. Rarely does one find a teacher who, prior to teaching, establishes clearly stated instructional objectives in terms of learner behavior and then sets out to achieve those objectives. Only recently, in fact, do we find many teachers who are even familiar with the manner in which instructional objectives are stated in measurable form.
But while it may be true that experienced teachers in general -and there are obviously notable exceptions -are not particularly proficient in promoting learner attainment of specified instructional objectives, this is a totally unacceptable state of affairs. Every profession worthy of the name derives its professionalism precisely from the fact that its members possess a special expertise not present in non-members of the profession. Lawyers can prepare legal briefs. Surgeons can perform operations. Accountants can balance financial reports. People off the street can’t do these things. But do teachers bring anything to bear on an instructional situation other than a general education, native intelligence, reasonable dedication, and borrowed teaching tricks? These attributes will permit a teacher to get through the school day, and a number of pupils will undoubtedly learn something. But contrast our current educational situation with the enormous dividends we might be getting if members of the teaching profession possessed really unique capabilities to promote desirable behavior changes in learners.
What can be done to improve this situation? How can teachers become more skillful in accomplishing their major classroom mission? One general trend offers the promise of improvement along this line: specifically, the increasingly widespread support of objective-based instruction and, more broadly, the concept of educational accountability.4 Rather than attending almost exclusively to instructional process, where innovation is applauded for its own sake irrespective of what happens to learners, American educators are beginning to get concerned about outputs of the system. More and more we see educators trying to take responsibility for what happens to the learners under their tutelage. Frequently, such accountability strategies are organized around measurable instructional objectives. To illustrate, the Instructional Objectives Exchange, a nonprofit educational corporation, currently receives over 5,000 orders per month for its collections of measurable instructional objectives. Even assuming that many of these objectives collections never leave the educator’s bookshelf, their widespread circulation attests to the fact that many educators are becoming far more attentive to results than to process.
A more specific and direct approach can be taken to augment instructional skill. We can provide teachers with what any instructional psychologist would consider a critical ingredient for modifying one’s behavior, that is, we can provide teachers with practice in doing what we want them to do. First, we must amass a sufficient number of short-term teaching performance tests, perhaps involving instructional periods of no more than 15-30 minutes. At least one commercial firm is now distributing such teaching performance tests6 and, hopefully, more agencies will soon be developing them.
By employing commercially available performance tests or by constructing their own, both in-service and pre-service teacher educators can arrange for a series of teaching performance test clinics. To illustrate how such clinics might work, we can consider an in-service example. A departmental faculty -English, for instance -might meet once per week after school for a one-hour session. At the, beginning of the hour, as his colleagues: observe, one teacher would carry out a previously planned IS-minute lesson with a half dozen randomly selected learners. After the learners had been posttested on their attainment of the objectives, and ideally also on their affective responses to the teaching, they would be dismissed and the staff would clinically analyze the teacher’s instruction. The analysis should be nonpunitive in nature, for the focus must be on improving the instructional skills not only for the “teacher of the day” but for all those present. Furthermore, analysis of the teaching must be based on results displayed by the learners, not on the observers’ personal preferences. If the learners achieved the objectives, what aspects of the instructional plan seemed to contribute? If the objectives were unachieved, what alternative tactics might have been used? The main thrust of the clinic strategy is to make public a teacher’s instructional decision making and, obviously, to share demonstrably effective teaching tactics among colleagues. During subsequent weeks, other teachers can take their turns completing the same or different teaching performance tests. As always, the post-lesson clinical analyses would stem from observed results with learners.
Comparable applications, of course, can be designed for pre-service teacher education programs. Indeed, as a vehicle for assessing the adequacy of a teacher education program, such performance tests have considerable utility. If, for example, a pre-service credential program cannot demonstrate that its candidates are far more skilled on such performance tests than they were when they commenced the program, then program modifications are clearly in order.
A Crucial Component
The ability to bring about specified behavior changes in learners is by no means the only dimension to consider in evaluating a teacher. One can readily imagine an instructor who was quite skilled in changing specified learner behavior yet grossly deficient in a number of personal and et11ical categories. Even so, however, it may not be an overstatement to assert that the skill necessary to bring about intentional changes in learners should be considered a necessary but not sufficient attribute of the high-quality teacher. In view of research results such as those reported here, the teaching profession clearly must initiate schemes without delay whereby its members acquire this essential skill.
1. For a more extensive account of this research see W. James Popham. “Performance Tests of Teaching Proficiency: Rationale, Development, and Validation,” American Educational Research Journal, January, 1971, pp. 105-17.
2. R. Glaser, “Instructional Technology and the Measurement of Learning Outcomes: Some Questions,” American Psychologist, Vol. 18, 1963, pp. 519-21.
3. I am indebted to the San Diego City Schools and several Orange County school districts for their cooperation in this project.
4. See, for example, the recently published volume by John D. McNeil, Toward Accountable Teachers. New York: Holt, Rinehart, and Winston, 1971.
5. Box 24095, Los Angeles, California 90024.
6. Vimcet Associates, P.O. Box 24714, Los Angeles, California 90024, distributes a series of teaching performance tests plus filmstrip-tape programs regarding how to build and use such tests.