The Shibboleth Blog

Assessment of student learning is the process of evaluating the extent to which participants in education have developed their knowledge, understanding and abilities. This blog tackles all about our ideas of education especially on the lessons in Assessment of Student's Learning commonly called Ed 103 subject under the instructions of Dr. Ava Clare Marie Robles.

Ed 103: What is it All About

This course is designed to acquaint students with major

methods and techniques of evaluation used to assess and report growth, development, and

academic achievement of learners in elementary and secondary schools, including

interpretation of standardized test information.



Course Objectives: General course objectives for the student include:

• Awareness of the role of assessment in teaching

• Understanding of the various methods of assessment and circumstances for

appropriate use of each

• Skill building in the development of various teacher-made tests and evaluative

procedures

• Awareness of the needs of special populations, such as those with disabilities,

multicultural populations and those not proficient in English, related to

assessment

• Understanding of elementary statistics as related to the interpretation and

utilization of data provided by standardized tests

• Awareness of trends and issues in assessment with regard to educational reform.

Animoto Shibboleth Wise

Create your own video slideshow at animoto.com.

Friday, April 29, 2011

Constructing and Scoring Essay Tests

Many teachers believe that essay tests are the easiest type of instrument to construct and score. This is not actually true. The expenditure of time and effort is necessary if essay items and tests are to yield meaningful information. An essay test permits direct assessment of the attainment of numerous goals and objectives. In contrast with the objective test item types, an essay test demands less construction time per fixed unit of student time but a significant increase in labor and time for scoring. This chapter exposes you to the problems and procedures involved in developing, administering, and scoring of essay tests.
General Types of Essay Items
          There are two types of essay items: extended response and resricted response.
          An extended response essay item is one that allows for an in-depth sampling of a student’s knowledge, thinking processes, and problem-solving behavior relative to a specific topic. The open-ended nature of the task posed by an instruction such as “discuss essay and objective tests” is challenging to a student. In order to answer this question correctly, the student has to recall  specific information and organize, evaluate, and write an intelligible composition. Since it is poorly structured, such a free-response essay item would tend to yield a variety  of answers from the examinees, both with respect to content and organization, and thus inhbit reliable grading. The potential ambiguity of an essay task is probably the single most important contributor to unreliability. In addition, the more extensive the responses required and the fewer questions a teacher may ask would definitely result to lower content validity of the test.
          On the other hand, a restricted response essay item is one where the examinee is required to provide limited response based on a specified criterion for answering the question. It follows, therefore, that a more restricted response essay item is, in general, preferable. An instruction such as “discuss the relative advantages and disadvantages of essay test with respect to (1) reliabillity, (2) objectivity, (3) content validity, and (4) usability” presents a better defined task more likely to lend itself to reliable scoring and yet allows examinees sufficient opportunityor freedom to organize and express their ideas creatively.
Learning Outcomes Measured Effectively with Essay Items
          Essay questions are designed to provide the students the opportunity to answer questions in their own words (Ornstein, 1990). They can be used in assessing the student’s skill in anlyzing, synthesizing, evaluating, thinking logically, solving problems, and hypothesizing. According to Gronlund and Linn (1990), there are 12 complex learning outcomes that can be measured effectively with essay items. These are the abilities to:
·         Explain cause-effect relationsips;
·         Describe application of principles;
·         Present relevant arguments;
·         Formulate tenable hypotheses;
·         State necessary assumptions;
·         Describe the limitations of data;
·         Explain methods and procedures;
·         Produce, organize, and express ideas;
·         Integrate learning in different areas;
·         Create original forms; and
·         Evaluate the worth of ideas.
Content versus Expression
          It is frequently claimed that the essay item allows the student to present his or her knowledge and understanding and to organize the material in a unique form and style. More often or not, factors like expression, grammar, spelling and the like are evaluated in relation to content. If the teacher has attempted to develop student’s skills in expression, and if this learning outcome is included in the table of specifications, the assessment of scuh skills is just right and valid. If these skills are not part of the instructional program, it is not right to assess them. If the score of each essay question includes an evaluation of the mechanics of English, this should be made known to the student. If possible, separate scores should be given to content and expressions.
Specific Types of Essay Questions
          The following set of essay questions is presented to illustrate how an essay item is phrased or worded to elicit particular behaviors and levels of response.
     I.        Recall
A.   Simple Recall
1.   What is the chemical formula for sodium bicarbonate?
2.   Who wrote the novel, “The Last of the Mohicans”?
B.   Selective Recall in which a basis for evaluation or judgment is suggested
1.   Who among the Greek philosophers affected your thinking as a student?
2.   Which method of recylcling is the most appropriate to use at home?
   II.        Understanding
A.   Comparison of two phenomena on a single designated basis
1.   Compare 19th century and present-day Filipino writers with respect to the involvement in societal affairs.
B.   Comparison of two phenomena in general
1.   Compare the Philippine Revolution of 1896 with that of People’s Power Revolution of 1986.
C.   Explanation of the use or exact meaning of a phrase or statement
1.the legal system of the Mesopotamians was anchored on the principle of an eye for an eye, a tooth for a tooth. What does this principle mean?

D.  Summary of a text or some portion of it
1.   What is the central idea of communism as an economic system?
E.   Statement of an artist’s purpose in the selection or organization of material
1.   Why did Hemingway describe in detail the episode in which Gordon, lying wounded, engages the incoming enemy?
  III.        Application. It should be clearly understood whether or not a question requires application depends on the preliminary educational experience. If an analysis has been taught explicitly, a question of analysis is but a simple recall.
A.   Cause or Effects
1.   Why did Fascism prevail in Germany and Italy but not in Great Britain and France?
2.   Why does frequent denpendence on pennicilin for treatment of minor ailment result in its reduced effectiveness against major invation of body tissues by infectious bacteria?
B.   Analysis
1.   Why was Hamlet torn by conflicting desires?
2.   Why was the Propagande Movement a successful failure?
C.   Statement of Relationship
1.   A researcher reported that teaching styles correlates with student’s achivement at about 0.75. What does this correlation mean?
D.  Illustrations or Examples of Principles
1.   Identify three examples of the uses of the hammer in a typical Filipino home.
E.   Application of Rules or Principles in a Specified Situations
1.   Would you weigh more or less on the moon? Why or why not?
F.   Reorganization of Facts
1.   Some radical Filipino historians assert that the Filipino revolution against Spain was a revolution from the top not from below. Using the same obeservation, what other conclusion is possible?
 IV.        Judgment
A.   Decision for or against
1.   Should members of the Communist Party of the Philippines be allowed to teach in colloges and universities? Why or why not?
2.   Nature is more influtential than the environment in shaping an individual’s personality. Prove or disprove this statement.
B.   Discussion
1.   Trace the events that led to the downfall of the dictatorial regime of Ferdinand Marcos?
C.   Criticism of the adequacy, correctness, or relevance of a statement
1.   Former President Joseph Estrada wa convicted for the case of plunder by the Sandiganbayan. Comment of the adequacy of the evidence used by the tribunal in reaching a decision on the case filed against the fromer chief executive of the country.
D.  Formulation of new questions
1.   What should be the focuse of researches in education to explain the incidence of failure among students with high intilligence quotient?
2.   What questions should parents ask their children in order to determine why they join fraternities and sororities?
          Following are examples of essay questions based on Bloom’s Taxonomy of Cognitive Objectives.
A.   Knowledge
Explain how Egypt came to be called the gift of the Nile.
B.   Comprehension
What is meant when the person says, “I had just crossed the bridge”?
C.   Application
Give at least three examples of how the law of supply operates in our economy today.
D.  Analysis
Explain the causes and effects of the People’s Power Revolution on the political and social life of the Filipino people.
E.   Synthesis
Describe the origin and significance of the celebration of Christmas the world over.

Sources of Difficulty in the Use of Essay Tests
          There are four sources of difficulty that are likely to be encountered by teachers in the use of essay tests (Greenberg et al, 1996). Let us go over each of these difficulties and look into ways to minimize them.
          Question Constructions. The preparation of the essay item is the most important step in the development process. Language uses and word choice are particularly important during the construction process. The language dimension is very critical not only because it controls the comprehension level of the item for the examinee, but it also specifies the parameters of the task. As a text constructor, you need to narrowly specify, define, and clarify what it is that you want from the examinees. Examine the example essay question, “Comment on the significance of Darwin’s Origin of Species.” The question is quite croad considering that there are several ways of responding to it. while the intention of the teacher who wrote this item was to provide opportunity for the students to display their mastery of the material, students could write for an hour and still not discover what their teahcer really wants them to do relative to the aforementioned topic. An improved version of the same question follows: “Do you agree with Darwin’s  concept of natural selection resulting in the survival of the fittest and the elimination of the unfit? Why or why not?”
          Reader Reliability. A number of studies had been conducted then and now on the rliability of grading free-response test items. Results of these researches failed to demonstrate consistently satisfactory agreement among essay raters (Payne, 2003). Some of the specific contributory factors in the lack of reader reliability include the following: quality of composition and penmanship; item readability; racial or ethnic prejudice on essay scoring, and subjectivity of human judgment.
          Instrument Reliability.  Even if an acceptable level of scoring is attained, there is no guarantee that measurement of desired behaviors will be consistent. There remains the issue of the sampling of objectives or behaviors represented by the test. One way to increase the reliability of an essay test is to increase the number of questions and restrict the lenght of the answers. The more specific and narrowly defined the questions, the less likely they are to be ambiguous to the examinee. This procedure should result in more uniform understading and performance of assigned tasks, and hence in the increased reliability of the instrument and scoring. It also helps ensure better coverage of the domain of objectives.
          Insrument Validity. The number of test questions influences both the validity and reliability of essay questions. As commonly constructed, an essay test contains a small number of items; thus, the sampling of desired behaviors represented in the table of specifications will be limited, and the test suffers from decreased or lowered content validity.
          There is another sense in whch the validity of an essay test may be questioned. Theoretically, the essay test allows the examinees to construct a creative, organized, unique and integrated communication. Nonetheless, these examinees spend most of their time very frequently in simply recalling and organizing information, rather than integrating it. the behavior elicited by the test, then, is not that hoped for by the teacher of dictated by the table of specifications. Again, one way of handling the problem is by increasing the number of items on the test.

Guidelines for Constructing, Evaluating and Using Essay Tests
Consider the following suggestions for constructing, evaluating and using essay tests:
·         Limit the problem that the question poses so that it will have a clear or definite meaning to most students.
·         Use simple words which will convey clear meaning to the students.
·         Prepare enough questions to sample the material of the subject are broadly, within a reasonable time limit.
·         Use the essay question for purposes it best serves, like organization, handling complicated ideas and writing.
·         Prepare questions which will require considerable thought, but which can be answered in relatively few words.
·         Determine in advance how much weight will be accorded each of the various elements expected ina complete answer.
·         Without knowledge of sutdents’ names, score each question for all students.
·         Require all students to answer all questions on the test.
·         Write questions about materials immediately relevant to the subject.
·         Study past questions to determine how students performed.
·         Make gross judgments of the relative excellence of answers as a first step in grading.
·         Word a question as simple as possible in order to make the task clear.
·         Do not judge papers on the basis of external factors unless they have been clearly stipulated.
·         Do not make a generalized estimate of an entire paper’s worth.
·         Do not construct a test consisting of only one question.

Scoring Essay Tests
Most teachers would agree that the scoring of essay items and tests is among the most time-consuming and frustrating tasks associated with classroom assessment. Teachers are frequently not willing to devote a large chunk of time necesary for checking essay tests. It almost goes without sayingthat if reliable scoring is to be achieved, there is need for the teacher to spend considerable time and effort.
          Before focusing on the specific methods of scoring essay tests, let us consider the following guidelines. First, it is critical that the teacher prepare in adavnce a detailed ideal answer. This is necessary as it will serve as the criterion by which each student’s response will be judged. If this is not done, the results could be terrible. The subjectivity of the teacher could seriousl prevent consistent scoring, and it also possible that student responses might dictate what constitutes corect answers. Second, student papersshould be scored anonymously, and that all answers to a given item be scored one at a time, rather than grading each student’s total test separately.
          As already pointed out, essay questions are the most difficult to check owing to the absence of uniformity of response on the part of the students who took the test. Moreover, there are a number of distractors on the students’ responses that can contribute to subjective scoring of an essay item (Hopkins et al, 1990). These distractors include the following: handwriting, style, grammar, neatness, and knowledge of the students.
          There are two ways of scoring an essay test: holistic and analytic (Kubszyn & Borich, 1990).
          Holistic scoring. In this type of scoring, a total score is assigned to each essay question basedon the teacher’s general impression or over-all assessment. Answers to an essay question are classified into any of the following categories: outstanding; very satisfactory; fair; and poor. A score value is then assigned to each of these categories. Outstanding response gets the highest score, while poor response gets the lowest score.
          Analytic scoring. In this type of scoring, the essay is scored in terms of its componenets. An essay scored in this manner has separate points for organization of ideas; grammar and spelling; and supporting arguments or proofs.
          As an essay test is difficult to check, there is a need for teachers to ensure objectivity in scoring students’ responses (Hopkins et al, 1990). To minimize subjectivity in scoring an essay test, the following guidelines have to be considere by the teacher (Airisian. 1994):
·         Decide what factors constitute a good answer before administering an essay question.
·          Explain these factors in the test item.
·          Read all the answer to a single essay question before reading other questions.
·         Reread essay answers a second time after initial scoring.

Administering and Scoring Objective Paper-and-Pencil Tests

While it is true that test formats and content coverage are important ingredients in constructing paper-and-pencil tests, the conditions under which students shall take the test are equally essential.
          This chapter is focused on how tests should be administered and scored.
Arranging Test Items
          Before administering a teacher-made test, test items have to be reviewed. Once the review is completed, these items have to be assembled into a test. The following guidelines should be observed in assembling a test (Airasian, 1994; Jacobsen et al, 1993).
1.   Similar items should be grouped together. For example, multiple choice items should be together and separated from the true-false items.
2.   Arrange the test items logically. Test items have to be arranged from the easiest to the most difficult.
3.   Selection items should be placed at the start of the test and supply items at the end.
4.   Short-answer items should be placed before essay items.
5.   Specify directions that student have to follow in responding to each set of grouped items.
6.   Avoid cramming items too close to each other. Leave enough space for the students to write their answers.
7.   Avoid splitting multiple-choice or matching items across two different pages.
8.   Number test items consecutively.
Administering the Test
          Test administration is concerned with the physical and psychological setting in which students take the test, for the students to do their best (Airisian, 1994). Some guidelines that teachers should observe in administering a test are discussed below.
·         Provide a quiet and comfortable setting.  This is essential as interruptions can affect student’s concentration and performance on their test.
·         Anticipate questions that students may ask. This is also necessary as students’ questions can interrupt test-tasking. In order to avoid questions, teachers have to proofread their test questions before administering it to the class.
·         Set a proper atmosphere for testing. This means that students have to know in advance that they will be given a test. In effect, such information can leave them to prepare for the test and reduce test anxiety.
·         Discourage cheating.  Students cheat for a variety of reasons. Some of these are pressures from parents and teachers, as well as intensive competition in the classroom. To prevent and discourage cheating, Airisian (1994) recommends the following strategies: strategies before testing; and strategies during testing.
Strategies Before Testing
·         Teach well.
·         Give students sufficient time to prepare for the test.
·         Acquaint the students with the nature of the test and its coverage.
·         Define to the students what is meant by cheating.
·         Explain the discipline to be imposed when caught cheating.
Strategies During Testing
·         Require students to remove unnecessary materials from teir desks.
·         Have students seat in alternating seats.
·         Go around the testing room and observe students during testing period.
·         Prohibit the borrowing of materials like pen and erasers.
·         Prepare alternate forms of the test.
·         Implement established cheating rules.
·         Help students keep track of time.
Scoring Tests
          After the administration of the test the teacher needs to check the students’ test papers in order to summarize their performance on the test. The difficulty of checking a test differs with the kind of test items used. Selection items are the easiest to score followed by short answer response and completion items.
Scoring Objective Test. The following guidelines have to be considered by a teacher in scoring an objective test:
·         Key to correction has to be prepared in advance for used in scoring the test papers
·         Apply the same rules to all students in checking students’, responses to the test questions
·         Score each part of the test to have a clear picture of how students fared in order to determine areas they failed to master
·         Some up the scores for grading purposes.
Conducting Post-test Review
          After scoring a test and recording results, teachers have to provide students information on their performance these can be done by writing comments on the test paper to indicate how students fared in the test answers to the item have to be reviewed in class for the students to know where they committed mistakes in so doing, students will become aware of the write answer and how test was scored and graded.

REFERENCES
Airisian, P. W. (1994). Classroom Assessment 2nd Ed. New York: McGraw-Hill,   Inc.
Kubiszyn, T. & G. Borich (1990). Educational Testing and Measurement.  Glenview, Illinois: Scott Foresman.
Hopkins, K.D. et al (1990). Educational and Psychological Measurement and     Evaluation 7th Ed. Englewood Cliffs, NJ: Prentice-Hall, Inc.
Jacobsen, D. et al (1993). Methods for Teaching: A Skill Approah.  Boston:        
Allyn & Bacon.