ZSfHD 1-2/1996: Fundamental Considerations of the Evaluation Process: Goals, Validity, and Utility

Zeitschrift für Hochschuldidaktik Nr. 1-2/1996:
Qualität der Hoschschullehre

William FULTON |
Fundamental Considerations of the Evaluation Process: Goals, Validity, and Utility
Grundsätzliche Überlegungen zu Evaluationsverfahren: Ziele, Validität und Nutzen

Introduction

This paper was presented, in a somewhat condensed form, together with Wim GIJSELAERS' paper of the same title, at a plenary session of the conference, Qualität der Lehre (in der Medizin). Wim presented his paper first, and I followed, the intention being to address the topic from two different perspectives. As can be seen from his paper, Wim has considerable experience in the development and implementation of a comprehensive course evaluation and instructional improvement program at the University of Limburg, in Maastricht in the Netherlands. Working in the department of educational development and educational research, he has also conducted several research projects on different aspects of the evaluation process.

My experience with the course evaluation process has been entirely on the practical side. As Faculty Coordinator, I have been responsible for the administration and processing of course evaluations for each course taught at Webster Vienna over the last 9 years, and for making decisions concerning the renewal of teaching contracts based to a large extent on the results of these evaluations.

Webster Vienna is a private American university in Vienna, one of four European branches of a liberal arts university based in St. Louis, Missouri. The Vienna branch has 380 students from 65 different countries, and employs adjunct faculty (LehrbeauftragtInnen) for 80% of all courses. There are 90 active adjunct faculty members, many of whom teach as few as one or two courses each year.

Virtually all American universities require students to fill out course evaluations for each of their courses each term, and we are no exception. The course evaluation process at Webster Vienna consists of the following steps:

Students fill out the course evaluation form anonymously.¹ It is administered by a staff member at the beginning of the penultimate class session. The instructor is not present during this process.
To preserve the anonymity of the student, the completed evaluation forms, including all written comments made by students, are entered into a spreadsheet program which calculates the average, high, and low ratings for each question, and lists all comments made.
The results of the evaluation, consisting of all student ratings, statistical data, and comments, are given to instructors after the grades for the course are turned in.

Following the same structure as Wim, I will present my thoughts on the goals of the course evaluation process, some comments on the validity of its results, and some conclusions about its utility, based on my experience working in the Webster Vienna environment.

Goals

"Good teachers are costly, but bad teachers cost more."

(Bob TALBERT, quoted in CHARLTON, 1994, p. 95)

In my view, the course evaluation process serves three main purposes:

To improve teaching.
To assess faculty members in order to decide whether to renew teaching contracts.
To assist students in the selection of their courses.

I will discuss these goals in turn.

Improve Teaching

Webster Vienna prides itself on being a teaching institution whose primary objective is to provide a high quality education to its students. We therefore put a high priority on teaching effectiveness, and the primary purpose of course evaluations is to provide feedback to instructors to allow them to improve their courses.

The course evaluation results provide the only formal feedback the instructor gets from students about the course and her or his effectiveness as an instructor. Students are guided to make specific comments about different aspects of the course and the instructor, so this feedback can be very useful in helping identify weaknesses and even providing suggestions for improvement. After all, students are the real experts on teaching effectiveness. They are exposed to a wide range of instructors using many different teaching styles and know from this experience what facilitates their learning and what may hinder it. They can provide suggestions for improvement based on practices or techniques used effectively by their other instructors.

Student suggestions for improvements are often very simple, such as a plea to make copies of the overhead transparencies so they don't have to waste time transcribing them, a complaint about the textbook being outdated or way too expensive. Sometimes they may be more global, such as a suggestion to use various teaching aids to improve the presentation of the material, or to use class time more effectively by keeping to the topic at hand, or to allow more time for discussions.

Instructors are always very interested in the course evaluation results and, I believe, in most cases they are responsive to suggestions for improvement. Some suggestions are very simple to implement, others require more effort, and some require a fundamental rethinking of the instructor's teaching style and goals. I have been very impressed over the years by many instructors who monitor the effectiveness of their teaching methods regularly and continually make changes to improve their performance. This has led to significant improvements in their course evaluation results.

Some instructors, of course, are less responsive to the results of their course evaluations. In this case, the results can be used by the administration to identify possible problems, especially recurring issues, and to engage the instructor in a discussion of these matters. The evaluations are objective and even verifiable through repetition, and thus provide a sound foundation for a discussion about an instructor's teaching.

In the last section, on utility, we will discuss further how course evaluations can be used to improve teaching. It is sufficient to note here that course evaluation results provide the information base for a process of continuous improvement in teaching that can significantly improve the general level of instruction at an institution. The important thing is to create an organizational culture in which continuous improvement is a high priority.

Assess Faculty

The second main purpose of course evaluations is to help in assessing the teaching effectiveness of faculty members, or their suitability for teaching specific courses. Evaluation results are, of course, only one of many tools used to assess an instructor's teaching effectiveness. Peer appraisals, direct consultation with the instructor, and anecdotal feedback from students or others, can also provide valuable insight into an instructor's teaching methods and effectiveness.

As part of the course evaluation process, we also ask instructors to fill out a questionnaire about the course that includes questions both about their students and about the extent to which they felt they succeeded in achieving their course goals. This formal feedback provides a very useful supplement to the student course evaluation results.²

Course evaluations are useful for assessment purposes because they provide regular and formal feedback that can alert the administration to possible problems. At Webster Vienna course evaluations and the supplemental faculty course feedback provide the primary input on an instructor's teaching effectiveness. We do not have a regular program of peer appraisals, though these are occasionally used as an additional mechanism when recurring problems are indicated by the course evaluations. We do, of course, also get direct feedback from students or the instructor, which may also suggest possible problems.

Assessment of instructors is essential in any institution concerned about teaching effectiveness. Teaching is, after all, an extremely difficult job, if it is done well. The effective instructor must not only master the subject but also be able to present it coherently and with enthusiasm, explain its complexities clearly, understand students' questions, direct discussions effectively, enliven the material by relating it to the real concerns of her or his students, be genuinely interested in students and motivate them to become interested in a subject that may be deadly dull to them. She or he should also take an active interest in didactics and in continuously improving the course by changing her or his teaching style and/or methods. To succeed in any of these areas is a considerable achievement, and it is virtually impossible for any single person to excel in all of them. Teaching is thus a very challenging job which affords ample opportunities for improvement.

Naturally, we try to hire only instructors who we believe will be effective teachers. Since most of our courses are taught by adjunct professors, however, we have the advantage of not having to make any long-term commitment to instructors. We therefore make it a principle to hire a new instructor for only a single course and to make subsequent assignments dependent on our assessment of her or his effectiveness in that first course. Over the last 9 years, our renewal rate for new instructors is about 47%. This means that contracts were not renewed for 53% of all new instructors. This is a much higher rejection rate than I would like, but I do not think it reflects badly on the very qualified people, both academic and professional, who have a genuine interest in teaching - rather, it merely confirms the enormous difficulty of the job.

Part of the problem of finding effective instructors is that we operate in an environment in which teaching effectiveness is not the highest priority for university professors. In one case we had to replace an instructor (who teaches at another institution) after the first two weeks of the class because of the considerably different expectations on each side. After this experience we started administering an early evaluation in the third week of the course for all new instructors to allow us to identify possible problems and react to them promptly. This serves the class in question, and gives the instructor a better chance to adapt to our expectations in that critical first course.

For current faculty, assessment of teaching effectiveness serves several purposes: identification of recurrent problems that instructors have not successfully dealt with themselves, determination of the suitability of an instructor to teach specific courses, and observation of slippage in an instructor's overall teaching effectiveness. On the positive side, assessment based on course evaluations also helps to identify candidates for the Excellence in Teaching award given each year.

Course evaluation results provide a good basis for a discussion with an instructor about recurring problems. The results are objective and generally provide a clear description of the problem. As faculty coordinator, I meet with the instructor and together we try to work out some way of dealing with such a problem.

However, the basis of discussion breaks down if the instructor does not accept the validity of the evaluations. This has happened to me on only two occasions. In both cases, the instructors were heavily into denial. They consistently denied the validity of the evaluation results. I like to think that by virtue of having to deal with these kinds of cases, my stay in purgatory should be significantly shortened. I will discuss these two cases in more detail in the section on utility/experience.

Course evaluation results are also very useful in helping identify the courses an instructor is best-suited to teach, and conversely, those he is not well-suited to teach. I have occasionally made the mistake of "stretching" some of our most effective instructors by having them teach courses outside their main area of competence. The course evaluation results have convinced me that it is always better to find specialists in each area who are, or can become, effective instructors.

If an instructor's overall evaluation results decline significantly, I would again use this as a basis for discussing the problem with her or him. There may be many reasons for such a decline and I would try to understand the cause of the problem and agree on a means of addressing it. If the subsequent results fall below an acceptable level, the instructor's contracts would not be renewed.

The importance of good teaching for a private institution like Webster, which is wholly supported by student tuition fees, cannot be overestimated. As Bob Talbert says, "Good teachers are costly, but bad teachers cost more."

Assist Students

An important subsidiary goal of the course evaluation process is to assist students in the selection of their courses. If the course evaluations results are made available to them, students can make much better choices about the courses they want to take. If a course is offered with two different instructors, they can compare the evaluation results and choose the instructor whose approach and strengths best suit their needs.

Many American universities make course evaluation results available to students, either in part or in their entirety. Since the evaluation input is provided by students, it seems logical to let them have access to the output as well. But there is also a strong emotional factor present in any discussion of the posting of course evaluation results, as we discovered in a small way at Webster Vienna.

Last year the student council approached the faculty with the request to post the results of the course evaluations for students to use in making their course selections. In a memo to the faculty, the student council president presented the following good reasons for posting the evaluation results:

to develop a better understanding of how an instructor is conducting his/her course
to see if an instructor is responding to students' comments
to make the Webster community more competitive.
to provide students with another source of information.
to provide instructors with the opportunity to look at other teachers' performance and to profit from it by developing probably different methods of teaching.
The students will fill out the course evaluation more conscientiously. (Louise ANDRASEVIC, 1995, p. 1)

At a meeting called by the Student Council to discuss the issue, students voted unanimously in favor of posting the results.

The question was then discussed at a general faculty meeting and after a brief but open exchange of views, the faculty voted overwhelmingly to make the evaluation results available to students in their entirety, that is, both ratings and a complete list of student comments. However, the issue was then revisited at a weekend faculty retreat held three days after the faculty meeting. Because of the concerns raised at the retreat it was decided to conduct a vote by mail of the entire faculty on the issue. After a concerted lobbying effort by the anti-posting faction, the general balloting failed by a single vote. ³

As a result, the subsidiary goal of serving students through the evaluation process has not yet been achieved at Webster Vienna. However, there is a general trend among American universities to make evaluation results available to students, and I am confident that this trend will catch up with Webster in time. I thus expect that we will be able to use the evaluation process in the not too distant future to serve students as well.

Validity

"Facts are stubborn things; and whatever may be our wishes, our inclinations, or the dictates of our passions, they cannot alter the state of facts and evidence."

(John ADAMS, quoted in BARTLETT, 1992, p. 337)

As we have said above, course evaluations are a tool that can be used to help achieve the goals outlined in the first section. The utility of this tool, that is, its efficacy in helping us achieve our goals, depends on its validity. We will therefore consider the validity of course evaluations in this section and then turn to a discussion of their utility in the following section.

Course evaluations have a long history, and the earliest research studies on the course evaluation process were made more than 70 years ago. In a review of this research, E. R. GUTHRIE (1954) includes studies as far back as 1924 (cited in ALEAMONI, 1987, p. 26). In the following discussion, I will first review the research findings concerning the validity of course evaluations and then make a few comments based on my own experience with the process.

Research

In a study reviewing faculty concerns about course evaluations, Lawrence ALEAMONI addresses the several common objections instructors have raised about the validity of evaluations, including these four:

Students are not qualified to evaluate their instructors.
Course evaluations are simply a popularity contest - instructors who are warm, friendly, and humorous get good evaluations and others don't.
There are many extraneous variables that effect the outcome of the evaluations, such as the size and constitution of the class, the age and maturity of students, the motivation of students according to whether the class is required or not, the difficulty of the class, and the grading pattern of the instructor.
Course evaluations are conducted too early, before students really know how much they have benefitted from a class. (ALEAMONI, 1987, pp. 25-26)

According to the first objection, students are too immature, inexperienced, and capricious to make consistent judgements about their instructors, and lack the knowledge necessary to accurately evaluate them. The only persons qualified to assess an instructor's effectiveness would be colleagues with excellent publication records and experience.

In answering this objection, ALEAMONI cites three studies of course evaluations (GUTHRIE, 1954; COSTIN, Greenough, and MENGES, 1971; COOPER and PETROSKY, 1976) that show students at both the university and high school levels are in fact very consistent in their evaluations of instructors (ALEAMONI, 1987, p. 26). As for the concern that only peer evaluations are valid, ALEAMONI cites four studies, including one of his own (ALEAMONI and YIMER, 1973; STALLINGS and SPENCER, 1967; LINSKY and STRAUS, 1975; MARSH, 1984), that show a high positive correlation between student ratings and peer ratings, suggesting that student ratings are at least as valid as peer ratings (ALEAMONI, 1987, pp. 26-27). Doron GIL reports on a similar study by R. T. BLACKBURN and M. J. CLARK (1975) which showed substantial agreement between students, faculty peers, and administration ratings. The only non-congruence was found between these three groups and faculty self-ratings (GIL, 1987, p. 62). This was confirmed in a study by J. A. CENTRA (1973), which also showed that faculty tended to give themselves better ratings than students gave them (cited in GIL, 1987, p. 62). This bias is perhaps natural but it does point to a very useful feature of the course evaluation process - that it provides instructors with a more objective view of their performance than they themselves possess.

Numerous studies have addressed the second objection, that course evaluations are nothing more than popularity contests. J. E. GRUSH and F. COSTIN (1975) looked at students' personal attraction to instructors and found a very low correlation between that attraction and the students' evaluations of their instructors (cited in ALEAMONI, 1987, p. 27). P. C. ABRAMI, L. LEVENTHAL, and R. P. PERRY (1982) examined the influence of instructor personality on student ratings (the "Dr. Fox" effect) and concluded that "higher ratings could not be attributed simply to the fact that the instructor was providing a nice, friendly, humorous atmosphere in the classroom; the students are much more discriminating than that" (cited in ALEAMONI, 1987, p. 27). And a comparison of written comments with numerical ratings by ALEAMONI (1976) showed that students do differentiate between various aspects of teaching. If an instructor is entertaining, tells great jokes, and commands the students' full attention, she or he will get good ratings for use of humor and classroom manner, but this does not influence ratings of other teaching skills such as using class time effectively or explaining clearly (cited in ALEAMONI, 1987, p. 27). All these studies confirm that while students do of course respond positively to an instructor's enthusiasm and personality, they are also very discriminating and will judge each aspect of teaching separately according to the instructors' merits.

The third objection, that many extraneous variables effect the outcome of evaluations, has also been the subject of many research studies. In a review of such studies (ALEAMONI and HEXNER, 1980), ALEAMONI concludes that there is "little or no relationship between such variables as class size, gender of the student or gender of the instructor, . . . , the major or nonmajor status of the student . . . and the way in which students rate a course or instructor" (cited in ALEAMONI, 1987, pp. 28-29). He found similar results in regard to grades - no correlation between the grades students receive, or expect to receive, and their ratings (ALEAMONI, 1987, p. 29).

Two factors that do seem to make a difference in evaluation results are age (or course level) and motivation. ALEAMONI's review of the research leads him to conclude that beginning students rate instructors more harshly than advanced students, and students taking a course as a requirement tend to rate the instructor lower than students taking the course as an elective (ALEAMONI, 1987, p. 28). These factors should thus be taken into consideration when using the evaluations for assessment purposes.

Some interesting long-term studies have been carried out to address the fourth concern, that students are not capable of making accurate judgements about a course or its instructor until some years after the course has finished. A. J. DRUCKER and H. H. REMMERS (1950, 1951) compared ratings by alumni who had been out of Purdue University for 5 and 10 years with ratings of the same instructors by current students and found a high positive correlation in both cases (cited in ALEAMONI, 1987, pp 27-28). This result was corroborated by a similar study by ALEAMONI and M. YIMER (1974) at the University of Illinois involving graduating students who looked back at the courses they had taken 2 and 3 years earlier (cited in ALEAMONI, 1987, p. 28), a procedure repeated with similar results by H. W. MARSH (1977) at the University of California, Los Angeles (UCLA) and the University of Washington (cited in ALEAMONI, 1987, p. 28).

This faculty concern about evaluations being carried out too soon is partly motivated by the feeling that course evaluations may not be a reliable predictor of the only thing that really matters in education, student learning. How well students learn is a complex matter influenced by many factors, only one of which is the role of the instructor. How are we to know what role the instructor plays, and whether the qualities attributed to "good instructors" are really those that promote student learning most effectively?

John CENTRA reports on several studies that compared student ratings with how much students learn from a given instructor. He cites three such studies (ALEAMONI and HEXNER, 1980; MCKEACHIE, 1979; and MARSH, 1984) that look at particular courses, such as introduction to psychology or composition, taught at large universities by 10 or 20 different instructors, where there is a common final exam, and where students are assigned randomly to the different instructors. All three studies showed a positive correlation between course evaluation results and student learning (cited in CENTRA, 1987, pp. 48-49). ALEAMONI cites four further studies (S. A. COHEN and BERGER, 1970; FREY, 1973; FREY, LEONARD, and BEATTY, 1975; and WHITE, HSU, and MEANS, 1978), all of which reported a fairly high correlation between the objective measures of learning and the way students rate their instructors (ALEAMONI, 1987, p. 28).

Another interesting study of student learning by David KEMBER and Lyn GOW (1994) compares two different teaching orientations, the traditional approach, which they call "knowledge transmission," and a more student-centered approach, which they call "learning facilitation." Using a standard tool to measure the quality of student learning (the Biggs Study Process Questionnaire), they found a significant correlation between the learning facilitation approach and more meaningful student learning on the one hand, and between knowledge transmission and less desirable study approaches on the other hand (KEMBER and GOW, 1994, p. 66). This strongly suggests that the methods used by an instructor can and do have a significant influence on student motivation and learning.

Doron GIL reports on a similar study relating teaching styles to student motivation (MCKEACHIE, LIN, MOFFETT, and DAUGHERTY, 1978), which divided instructors into three groups, "facilitator-persons," "experts," and "authorities," and found that instructors classified as facilitators were more effective than the others in terms of motivating students (GIL, 1987, p. 63).

The research on course evaluations thus provides substantial support for the conclusion that its results are valid and important, and, as the last two studies show, it also supports the view that student-centered teaching methods are more effective in achieving our learning objectives than traditional methods.

Experience

My own personal experience with evaluations from courses I have taught over the last 13 years at Webster Vienna corroborates the research conclusions presented above. I know from my own evaluation results that students are consistent in their ratings, that the results are multidimensional in that they discriminate between different aspects of teaching, and that they tend to be more critical than my own subjective self-evaluation. I also know that my ratings go down when I am insufficiently prepared, and conversely, they go up when I am able to prepare for classes to my own satisfaction. All of this convinces me personally of the validity of the course evaluation process.

Other instructors have reported similar experiences, for example, that their ratings vary directly in proportion to the amount of time they spend in preparing their courses. Three examples that illustrate the ability of students to differentiate between different aspects of teaching are the following:

An instructor who is very popular because he has excellent rapport with students, goes out of his way to help them, and always gives A's (the highest grade) to all his students, gets high ratings for his genuine interest in students and his ability to create a relaxed class atmosphere, but nonetheless gets very low marks in a number of other areas, such as his ability to explain the subject and his effective use of class time.
An outstanding instructor in every respect who was voted by students to receive the Teacher of the Year award in her second year of teaching nonetheless got weak ratings when she taught a course outside her main area of competence.
Another outstanding and popular instructor, who was voted twice by students to receive the Teacher of the Year award, nonetheless received very critical ratings by students when for a number of reasons his performance in the classroom began to decline.

In the first section above, I mentioned two instructors who consistently deny the negative aspects of their evaluation results. In both cases, the instructors' objections are similar to those given above as typical faculty concerns about the validity of evaluations. In one case, the instructor is consistently criticized for his diversions, telling long stories unrelated to the subject at hand. The instructor maintains that these diversions are not unrelated at all but rather illustrate important points about the subject matter. In his opinion, students who complain about the diversions simply fail to see the connections and their comments should thus be ignored.

In the other case, when the instructor gets disappointing evaluation results he often argues that they are invalid because they were administered too early - always at a critical point in the course when student frustrations were at a peak. If we would only administer the evaluations some time after the course were finished, the results would be very good. I think the research evidence addresses both of these concerns quite well, but I am also sure that no amount of evidence will be sufficient to convince all instructors of the validity of the process as it relates to them.

This is, of course, not to say that every evaluation result is necessarily valid. There are certainly exceptions, and the validity of the entire process depends on a number of critical factors, chief among which are the following:

The course evaluation questionnaire must be a reliable survey instrument. It should be tested before being put into general use.
The course evaluations must be administered by responsible persons in the absence of the instructor. The administrator should carefully explain the purpose and particulars of the survey and answer any questions students may have about it.
The anonymity and confidentiality of student responses must be strictly guarded to assure confidence in the process on the part of students. Nothing will undermine the validity of the process faster than if students feel that faculty members receive or have access to their questionnaires.
Any special circumstances that would cast doubt on the significance of the results, such as sampling bias or an unusually small sample size, should be noted both internally and in communicating the results to faculty members.

I will close this section on a cautionary note by describing two egregious examples of invalid evaluation results. These examples show that even with the most careful procedures and seriousness of purpose, there will always be exceptions - evaluations that for one reason or another are not valid. In the first case, a very strong-willed instructor had a personal conflict with a very strong-willed student. The student was initially a great fan and friend of the instructor but after an unfortunate incident involving a personality clash, turned against him and became quite hostile towards him. The student complained to me about the incident and the instructor, and then not satisfied that I was sufficiently convinced of the instructor's lack of suitability for teaching, organized a campaign against him during the course evaluation process. As a consequence, the integrity of student responses was effected and the results rendered worthless. This is the only case known to me of a deliberate attempt to influence the evaluation results but it represents a very real danger to the integrity of the entire process because one may not learn that such a campaign is under way.

The second case is especially troublesome because it suggests that in some exceptional cases the faculty concern about students being qualified to evaluate instructors may be justified. The instructor in question was a charismatic person who conducted his class in a very dynamic and colorful manner. Students were completely enthralled by him and one student wrote in the course evaluation, which was extremely positive, that he was very much like the main character, a teacher, in the film, "Dead Poets Society," played by Robin Williams. He was certainly entertaining, the only problem was that he was not competent to teach the course, which was an introductory course in international politics.

In discussions with the instructor during the term, I realized that his approach was demagogic, that he held conspiracy theories about international politics that he could not rationally defend, and that he could not write a coherent English sentence. He was a kind of academic Elmer Gantry - an excellent entertainer but a poor excuse for a teacher. I do believe that if he had taught subsequent courses, students would soon have recognized his lack of depth, but there is no question that in that first course he was able to win their approval with very few reservations. This experience confirms the first part of Abraham LINCOLN's famous saying, "you may fool all the people some of the time; you can even fool some of the people all the time, but you can't fool all of the people all the time" (Quoted in BARTLETT, 1992, p. 451). Fortunately, my experience with course evaluations also confirms the last part of the dictum.

I think these examples clearly show that course evaluations are not infallible. Evaluation results should therefore never be used as the sole criterion for the assessment of teaching effectiveness.

Utility

"Don't try to fix the students, fix ourselves first. The good teacher makes the poor student good and the good student superior. When our students fail, we, as teachers too, have failed."

(Marva COLLINS, quoted in CHARLTON, 1994, p. 93)

The utility of the course evaluation process, that is, the extent to which it succeeds in achieving its purposes, depends to a large extent on the support systems put in place to translate its results into actions. This is particularly true of the primary purpose of the evaluation process, namely to improve instruction. We will focus in this section on how evaluation results can best be used to achieve this end. Once again, I will proceed by reviewing the research literature on the utility of the process and then discuss my own experiences with it.

Research

Robert WILSON studied the utility of a support system developed at the University of California, Berkeley, for helping faculty members use their evaluation results to improve teaching. The system had two main features:

A formal means of conveying the experience of the most effective instructors to others according to areas of weakness indicated by the course evaluation results.
The use of teaching improvement consultants to work with faculty members to select and implement ideas for improvement.

A database of 450 teaching tips - ideas and practices solicited from instructors who had received the university's Distinguished Teaching Award was established and sorted according to 30 teaching characteristics that corresponded to questions on the course evaluation questionnaire. After a course was evaluated, the teaching improvement consultant would meet with the instructor, identify areas for improvement according to weaknesses indicated by the evaluation results, and agree on one or two teaching tip items to use the next time the course was taught (WILSON, 1987, pp. 9 - 13).

WILSON compared the results of course evaluations before and after the teaching improvement program was used and found significant improvement in the evaluation of target items for a plurality of instructors in each case. The percentage of instructors who improved varied according to the item, ranging from 39% for "varies the speed and tone of his/her voice" to 90% for "states objectives of each class session." Improvement in the overall effectiveness of teaching was found for 52% of the 46 instructors participating in the program, while there was no significant change for 28% of the instructors and a decrease for 20% (WILSON, 1987, p. 14 and 18).

These results show that substantial improvements can be made in an instructor's teaching effectiveness in a very short period of time using an extensive support program. The use of teaching improvement consultants to work with instructors in selecting and explaining teaching ideas to use in their classes was an important element in the success of the program. In an effort to reduce the cost of the program, the consultants were replaced by a written compendium of 200 of the original teaching tips, with disappointing results. When a test group of instructors was given the compendium along with their evaluation results, only half of them tried to use it at all and fewer than 10% used more than 2 ideas (WILSON, 1987, p. 22). The compendium of ideas is available as one of two final reports on the project (WILSON and TAUXE, 1986).

A meta-analysis of research on course evaluations conducted by Peter COHEN (1980) confirms the importance of involving consultants in the process. He concluded that the use of course evaluations led to a 15% improvement in teaching effectiveness, but that a significantly greater improvement was achieved when the feedback was accompanied by consultation (cited in STEVENS, 1987, p. 34). COHEN cites an additional study (ALEAMONI, 1978) supporting this conclusion (cited in MCKEACHIE, 1987, p. 3).

Doron GIL argues that impersonal course evaluations are far less effective in changing faculty behavior than interpersonal feedback provided by consultants (GIL, 1987, p. 58). He quotes W. H. BERQUIST and S. R. PHILLIPS approvingly as follows:

Change is a subtle and complex process. It is not encouraged by the use of an insensitive, often arbitrary, reliance on evaluative ratings of performance. Preparation for change . . . occurs when the teacher is confronted with information that is discrepant with his self-image but which does not deflate his self-esteem. This information is . . . descriptive, rather than evaluative; it is concrete, rather than general; it is presented in a context of trust, rather than threat. The process of change takes place only when the instructor is presented with information, training and consultation directly related to perceived needs. (quoted in GIL, 1987, p. 60)

GIL emphasizes the importance of positive feedback and training in motivating instructors to improve their teaching. He says, "positive feedback is more desirable and has a better impact than negative feedback," and consultants should train faculty "about teaching, about themselves, about the ways students learn, and about what prevents students from learning" (GIL, 1987, p. 58, 62).

According to GIL, training is important because very few instructors have any familiarity with the literature about teaching and learning (GIL, 1987, p. 63). But it is equally important to tailor training to the needs of each individual faculty member:

"University professors can be looked on as learners, each of them with individual strengths, weaknesses, capabilities, and limitations. Thus, those responsible for faculty development need to individualize faculty evaluation and development and the processes through which they reach faculty." (GIL, 1987, p. 61)

Faculty consultants, it would seem, should facilitate faculty development in the same way they would have faculty facilitate student learning. If we would have faculty address the individual learning needs of their students, we too should address the individual information and training needs of the faculty.

W. J. MCKEACHIE agrees with GIL about providing for each instructor's individual need for feedback from students, saying that the course evaluation process should be flexible enough to allow instructors to add their own questions (MCKEACHIE, 1987, p. 5). This tailoring of the questionnaire is also a feature of the University of Limburg system described by Wim GIJSELAERS in the companion paper to this one.

What about Webster Vienna? How do we address the individual information and training needs of the faculty? How successful is the Webster system in improving instruction? These questions bring us to our own experience.

Experience

Karl Popper has observed that we learn only from our mistakes. Following this dictum, I have learned a great deal about the utility of course evaluations. One of my worst mistakes was to fall behind in the processing of the evaluations so that I was returning them several months after the courses had been held. This delay significantly reduced the utility of the process since many instructors had taught other classes by the time they received the earlier results, so the class in question was no longer fresh in their minds. Our goal now is to return the evaluation results when instructors turn in their final course grades.

I also learned the hard way about the detrimental effect of negative feedback in motivating instructors. One of the first things I did to try to improve the utility of the process was to write individual notes to instructors highlighting salient features of their results. The mistake was that in my misguided concern to improve every perceived shortcoming, I focussed my attention on the weaknesses indicated by student comments and ratings. In consequence, faculty complained that I saw only the negative aspects of their performance and not the positive elements. Repeatedly harping on the negative was de-motivating and hence counter-productive.

Course evaluations provide both positive and negative feedback, and I know from my own experience as an instructor that both are valuable. The positive feedback is highly rewarding and motivating, and the critical feedback suggests possible mistakes from which we can learn. The trick for the faculty consultant is to learn how to use the positive feedback to motivate faculty to address the concerns expressed in the negative feedback. As I quickly learned, the head-on, horn-locking approach is futile. Charles Schwab, the first president of the U.S. Steel Company and an eminently successful manager, took the opposite approach. He confirms what GIL and BERQUIST and PHILLIPS say about the utility of positive feedback as follows:

I consider my ability to arouse enthusiasm among my people the greatest asset I possess, and the way to develop the best that is in a person is by appreciation and encouragement. There is nothing . . . that so kills the ambitions of a person as criticisms from superiors. I never criticize anyone. I believe in giving a person incentive to work. So I am anxious to praise but loathe to find fault. If I like anything, I am hearty in my approbation and lavish in my praise. (Quoted in CARNEGIE, 1992, pp. 46 - 47).

I think this is a very difficult lesson to learn because it often runs counter to our natural instincts. When someone screws up, our gut reaction is often to want to make that person immediately, unequivocally, and uncomfortably aware that she or he has really done it this time. Our motive of course is entirely pure and even altruistic - we want to help the person avoid making such mistakes in future. Unfortunately, the effect is often just the opposite. Dale CARNEGIE argues compellingly that all criticism is futile, counter-productive, and even dangerous (CARNEGIE, 1992, pp. 34 - 45).

A shortcoming of the Webster evaluation system is that it does not allow instructors to tailor the evaluation form to their own needs. Increased functionality always has to be weighed against its cost. We once expanded the questionnaire by doubling the number of questions and guiding students to comment on different aspects of the course and the instructor's performance but found that the increased effort of processing the additional input was so great that we soon had to scale back.

On the positive side, we do try to address the individual information and training needs of the faculty. As faculty coordinator, I see my role as GIL describes it - to facilitate faculty development by providing feedback and training. The course evaluations provide formal feedback, which is supplemented informally by hearty approbation and lavish praise where appropriate, and individual consultations where necessary. Information is provided in the form of regular memos to the faculty focussing on teaching didactics, which I frequently illustrate with positive and negative examples drawn from the course evaluations. ⁴ I also write a regular faculty newsletter that addresses teaching and policy issues, and which also serves as an open forum for faculty exchanges.

As for faculty training, I work with the faculty development committee to plan training events focussing on teaching issues or skills, as well as regular faculty meetings at which an experienced instructor introduces and leads a discussion on some aspect of teaching. These meetings and events are very useful in that they allow faculty members to share their experiences in an informal way and thus learn from each other. Another very useful forum for faculty interaction are our annual weekend faculty retreats, which combine training sessions with discussions of teaching issues and other faculty concerns.

New faculty members receive a faculty handbook that describes our academic policies and guidelines, and contains as an appendix a number of my recent teaching memos. ⁵ Orientation meetings are also held regularly for new instructors to introduce them to the Webster Vienna environment and explain our expectations about teaching.

In my view, the faculty coordinator or consultant should use the feedback provided by course evaluations to engage the faculty, individually and as a whole, in an ongoing dialog on teaching. She or he should be a kind of animator, praising, cajoling, consoling, informing, and learning together with the faculty how to improve teaching effectiveness.

The fundamental goal should be to create a climate in which instructors feel comfortable to experiment with new approaches, and an institutional culture of continuous improvement. Course evaluations provide an information base for continuous improvement, but their utility depends on a proactive support system of supplemental information and training.

What about our success in improving teaching effectiveness? I am very pleased with what we have been able to achieve thus far at Webster, but I am also a great believer in the need for continuous improvement.

I am often surprised by how well instructors respond to their course evaluation results and to various training activities. Many instructors have made significant changes in the way they conduct their courses with substantial improvement in their course evaluation results. These changes are often a direct result of training activities, such as the use of mind-mapping techniques in preparing lecture notes or the use of practical projects to apply theoretical knowledge. Recently, a visiting faculty member who teaches in New York called me to ask if I could put an item in the faculty newsletter to find a company to sponsor a real-life project for his upcoming marketing class. After attending a workshop in Vienna on the use of real-life projects in teaching, which was conducted by a fellow instructor, he now uses this method regularly in his courses.

Not all experiments succeed, or succeed fully. Each instructor has to find the teaching method and style that best suits her or his unique personality and abilities. There is no one right way to teach, but a plethora of possibilities, and a university should accommodate a wide range of styles and methods, as long as they achieve its learning goals.

One of the most exciting recent developments at Webster Vienna has been the increase in the level of interaction and productive COOPERation among faculty members. Last summer four instructors joined forces to conduct an interdisciplinary course on the transition in South Africa that covered media communications, political science, psychology, and computer science, and included a research project. The class was comprised of 10 Webster Vienna students and 10 South African students. The South African students came to Vienna, along with one of their instructors, for the first part of the course, and then the entire group went to South Africa to carry out interviews and do the research for the project. The course was thus not only inter-disciplinary, but also included a very valuable cross-cultural experience, both for the South African and for the Webster students. This experience will be repeated again this summer.

Other very promising ideas for inter-disciplinary COOPERation were born at each of our last two faculty meetings. Currently, two faculty members are conducting a major research project in finance funded by the Nationalbank together with a class of graduate students, and next fall we will offer an inter-disciplinary course on media and politics that will focus on the U.S. presidential elections. Another project involves an inter-disciplinary course on art, media, and psychology.

These kinds of projects have had a very stimulating influence on both students and faculty. At present there is a remarkable dynamic working at Webster Vienna, driven by a number of relatively new, dedicated instructors, whose enthusiasm is infectious.

How much of the improvement in teaching and the development of this COOPERative dynamic can be attributed to the course evaluation process? The course evaluations provide the foundation by giving the instructor feedback about her or his effectiveness that is the necessary condition for making improvements. By supplementing this feedback with information and training, we have been able to develop a climate that fosters experimentation and continuous improvement. But in the end, everything depends on the instructors. It is they who must act on the information they receive and translate it into an effective teaching style that best suits their own interests and abilities.

Conclusion

"Who dares to teach must never cease to learn."

(John Cotton DANA, quoted in CHARLTON, 1994, p. 97)

The instructor is the key element in the learning enterprise. As Sidney HOOK says,

"Everyone who remembers his own educational experience remembers teachers, not methods or techniques. The teacher is the kingpin of the educational situation. He makes or breaks the program." (Quoted in CHARLTON, 1994, p. 93)

Because of the critical importance of the instructor, the first task of the faculty coordinator is to select and retain effective instructors. I know of no sure way of judging in advance a person's suitability to teach. But course evaluations provide a very good means of determining the effectiveness of an instructor after the fact. They are therefore highly useful in weeding out instructors who show little or no promise of becoming effective teachers.

The course evaluation process can also be very effective in improving teaching, especially if it is supplemented by a teaching improvement program. If evaluation results are simply returned to faculty members without any further support, some instructors, the 10 - 15% who are most highly motivated, will try to make changes to improve their performance, and the rest will more or less ignore the evaluations. More improvement can be made if instructors are given additional information about different teaching methods or techniques, or specific ideas about how to improve different aspects of teaching. If this is provided by teaching consultants or experienced colleagues, significant improvements can be made.

University instructors typically have no training in how to do their principal job, namely teaching, apart from having been exposed to teachers during their own education. Unfortunately, much of that exposure is counter-productive, since the general level of teaching at most institutions is relatively low. There is therefore a great need for training about teaching and also about the psychology of learning. An effective teaching improvement program would therefore include regular training events as well as feedback, information, and individual consultations. If we are to be successful in motivating our students to learn, we must ourselves be open to the learning process, and actively seek to improve our teaching effectiveness.

References

ABRAMI, P. C., LEVENTHAL, L., and PERRY, R. P. (1982).: "Educational Seduction." Review of Educational Research, 52 (3), 446-464.
ALEAMONI, Lawrence M. (1976).: "Typical Faculty Concerns About Student Evaluation of Instruction." National Association of Colleges and Teachers of Agriculture Journal, 20 (1), 16-21.
ALEAMONI, Lawrence M. (1978).: "The Usefulness of Student Evaluations in Improving College Teaching." Instructional Science, 7, 95-105.
ALEAMONI, Lawrence M. (1987).: "Typical Faculty Concerns About Student Evaluation of Teaching." In Lawrence M. ALEAMONI (ed.), Techniques for Evaluating and Improving Instruction, 25-31. San Francisco: Jossey-Bass.
ALEAMONI, Lawrence M., and HEXNER, P. Z. (1980).: "A Review of the Research on Student Evaluation and a Report on the Effect of Different Sets of Instructions on Student Course and Instructor Evaluation." Instructional Science, 9, 67-84.
ALEAMONI, Lawrence M., and YIMER, M. (1973).: "An Investigation of the Relationship Between Colleague Rating, Student Rating, Research Productivity, and Academic Rank in Rating Instructional Effectiveness." Journal of Educational Psychology, 64, 274-277.
ALEAMONI, Lawrence M., and YIMER, M. (1974).: Graduating Senior Ratings' Relationship to Colleague Rating, Student Rating, Research Productivity, and Academic Rank in Rating Instructional Effectiveness. Research Report No. 352. Urbana: Office of Instructional Resources, Measurement and Research Division, University of Illinois.
ANDRASEVIC, Louise (1995).: "Posting of the Course Evaluation Results and Maintenance of an Exam File." Memo to Faculty, Webster University Vienna, February 2, 1995, 1.
BERQUIST, W. H., and PHILLIPS., S. R. (1975).: "Components of an Effective Faculty Development Program." Journal of Higher Education, 46, 177-211.
BARTLETT, John (1992).: Familiar Quotations. Sixteenth Edition. Boston: Little, Brown and Company.
BLACKBURN, R. T., and CLARK, M. J. (1975).: "An Assessment of Faculty Performance: Some Correlates Between Administrator, Colleague, Student, and Self-Ratings." Sociology of Education, 48, 242-256.
CARNEGIE, Dale (1992).: How to Win Friends and Influence People. Revised Edition. Great Britain: Cedar.
CENTRA, John A. (1973).: "Self-Ratings of College Teachers: A Comparison With Student Ratings." Journal of Educational Measurement, 10, 287-295.
CENTRA, John A. (1987).: "Formative and Summative Evaluation: Parody or Paradox?" In Lawrence M. ALEAMONI (ed.), Techniques for Evaluating and Improving Instruction, 47-55. San Francisco: Jossey-Bass.
CHARLTON, James (1994).: A Little Learning is a Dangerous Thing. New York: St. Martin's Press.
COHEN, Peter A. (1980).: "Effectiveness of Student-Rating Feedback for Improving College Instruction: A Meta-Analysis of Findings." Research in Higher Education, 13 (4), 321-341.
COHEN, Peter A. (1981).: "Student Ratings of Instruction and Achievement: A Meta-Analysis of Multisection Validity Studies." Review of Educational Research, 51 (3), 281-309.
COHEN, S. A., and BERGER, W. G. (1970).: "Dimensions of Students' Ratings of College Instructors Underlying Subsequent Achievement on Course Examinations." Proceedings of the 78th Annual Convention of the American Psychological Association, 5, 605-606.
COOPER, C. R., and PETROSKY, A. (1976).: "Secondary School Students' Perceptions of Math Teachers and Math Classes." Mathematics Teacher, 69 (3), 226-233.
COSTIN, F., Greenough, W. T., and MENGES, R. J. (1971).: "Student Ratings of College Teaching: Reliability, Validity, and Usefulness." Review of Educational Research, 41, 511-535.
DRUCKER, A. J., and REMMERS, H. H. (1950).: "Do Alumni and Students Differ in Their Attitudes Toward Instructors?" Purdue University Studies in Higher Education, 70, 62-64.
DRUCKER, A. J., and REMMERS, H. H. (1951).: "Do Alumni and Students Differ in Their Attitudes Toward Instructors?" Journal of Educational Psychology, 42, 129-143.
FREY, P. W. (1973).: "Student Ratings of Teaching: Validity of Several Rating Factors." Science, 182, 83-85.
FREY, P. W., LEONARD, D. W., and BEATTY, W. W. (1975).: "Student Ratings of Instruction: Validation Research." American Educational Research Journal, 12 (4), 435-447.
GIL, Doron H. (1987).: "Instructional Evaluation as a Feedback Process." In Lawrence M. ALEAMONI (ed.), Techniques for Evaluating and Improving Instruction, 57-64. San Francisco: Jossey-Bass.
GRUSH, J. E., and COSTIN, F. (1975).: "The Student as Consumer of the Teaching Process." American Educational Research Journal, 12, 55-66.
GUTHRIE, E. R. (1954).: The Evaluation of Teaching: A Progress Report. Seattle: University of Washington.
KEMBER, David, and GOW, Lyn (1994).: "Orientations to Teaching and Their Effect on the Quality of Student Learning." Journal of Higher Education, 65 (1), 58-74.
LINSKY, A. S., and STRAUS, M. A. (1975).: "Student Evaluations, Research Productivity, and Eminence of College Faculty." Journal of Higher Education, 46, 89-102.
MCKEACHIE, W. J. (1979).: "Student Ratings of Faculty: A Reprise." Academe, 65, 384-397.
MCKEACHIE, W. J. (1987).: "Can Evaluating Instruction Improve Teaching?" In Lawrence M. ALEAMONI (ed.), Techniques for Evaluating and Improving Instruction, 3-5. San Francisco: Jossey-Bass.
MCKEACHIE, W. J., LIN, Y-G., MOFFETT, M., and DAUGHERTY, M. (1978).: "Effective Teaching: Facilitative Versus Directive Style." Teaching of Psychology, 5, 193-194.
MARSH, H. W. (1977).: "The Validity of Students' Evaluations: Classroom Evaluations of Instructors Independently Nominated as Best and Worst Teachers by Graduating Seniors." American Educational Research Journal, 14, 441-447.
MARSH, H. W. (1984).: "Students' Evaluations of University Teaching: Dimensionality, Reliability, Validity, Potential Biases, and Utility." Journal of Educational Psychology, 76 (5), 707-754.
STALLINGS, W. M., and SPENCER, R. E. (1967).: Ratings of Instructors in Accountancy 101 from Videotape Clips. Research Report No. 265. Urbana: Office of Instructional Resources, University of Illinois.
STEVENS, Joseph J. (1987).: "Using Student Ratings to Improve Instruction." In Lawrence M. ALEAMONI (ed.), Techniques for Evaluating and Improving Instruction, 33-38. San Francisco: Jossey-Bass.
WHITE, W. F., HSU, Y. M., and MEANS, R. S. (1978).: "Prediction of Student Ratings of College Instructors from Multiple Achievement Test Variables." Educational and Psychological Measurement, 38 (4), 1077-1083.
WILSON, Robert C. (1987).: "Toward Excellence in Teaching." In Lawrence M. ALEAMONI (ed.), Techniques for Evaluating and Improving Instruction, 9-24. San Francisco: Jossey-Bass.
WILSON, Robert C., and TAUXE, C. (1986).: Faculty Views of Factors that Effect Teaching Excellence in Large Lecture Classes. Berkeley: Teaching Innovation and Evaluation Services, Research on Teaching Improvement and Evaluation, University of California.

Footnotes

Three of the questionnaires we have used at Webster Vienna over the last 9 years are provided in the appendices of the workshop report, "How Can We Use Course Evaluations to Improve Teaching and the Curriculum?", which is included in the proceedings of the conference.
This questionnaire, the faculty survey form, is provided in Appendix E of the workshop report, "How Can We Use Course Evaluations to Improve Teaching and the Curriculum?", which is included in the proceedings of the conference.
By the time the voting deadline arrived, however, the outcome had become a purely academic, as the anti-posting faction had taken their case to the home office in St. Louis, which decreed that course evaluations could not be made available to students under any circumstances as a matter of university policy.
When I use a positive example, I always praise the instructor in question, and when I use a negative examples, I omit the instructor's name.
I am happy to send copies of the faculty handbook to anyone interested in receiving it. My address is given in the end of this journal.

Zeitschrift für Hochschuldidaktik Nr. 1-2/1996: Qualität der Hoschschullehre

William FULTON | Fundamental Considerations of the Evaluation Process: Goals, Validity, and Utility Grundsätzliche Überlegungen zu Evaluationsverfahren: Ziele, Validität und Nutzen

Fundamental Considerations of the Evaluation Process: Goals, Validity, and Utility Grundsätzliche Überlegungen zu Evaluationsverfahren: Ziele, Validität und Nutzen

Introduction

Goals

Improve Teaching

Assess Faculty

Assist Students

Validity

Research

Experience

Utility

Research

Experience

Conclusion

References

Footnotes

Zeitschrift für Hochschuldidaktik Nr. 1-2/1996:
Qualität der Hoschschullehre

William FULTON |
Fundamental Considerations of the Evaluation Process: Goals, Validity, and Utility
Grundsätzliche Überlegungen zu Evaluationsverfahren: Ziele, Validität und Nutzen

Fundamental Considerations of the Evaluation Process: Goals, Validity, and Utility
Grundsätzliche Überlegungen zu Evaluationsverfahren: Ziele, Validität und Nutzen