Guidelines for Test Design and Construction

 

 

Step 1:   Defining the constructs you want to measure and outline the proposed content of the Test

 

 

Aptitude tests for job applicants :

 

First conduct a job analysis (task analysis)  :  listing the important components of the position you are trying to fill.

 

The Job analysis will contain the critical incidents,

A list of work related behaviors which are essential for successful completion of the job.

 

A well designed aptitude test will contain items which measure the entire cross-section of critical incidents.

 

To paraphrase, a properly constructed test will measure a representative sample of most critical incidents.

 

 

Test Planners must consider a variety of issues :

 

 

1.  What are the topics and materials to be tested ?

 

 

 

2.  What Kind of Questions should be constructed ?

 

 

 

3. What item and test formats should be used ?

 

 

 

4.  When, where, and how is the test to be given ?

 

 

 

5. How should the tests be scored ?

 

 

 

 

Answers to questions 1,2, and 3 covered in Chapter 2

 

Answers to questions 4 and 5 covered in Chapter 3

 

 

 

 

 

1. What are the topics and materials to be tested ?

 

 

For employment aptitude tests, this calls for the job analysis with special attention being paid to the critical incidents.

 

For achievement tests,  this calls for a content analysis in which the key subject areas are listed and the percentage of the test to be devoted to each individual subject area is decided.

 

Content analysis for classroom achievement tests can be highly subjective if created by a single individual with no feedback by knowledgeable colleagues.

 

 

 

 

2.What Kind of questions should be constructed ?

 

 The answer to this question is partially dependent upon the Educational Objectives you want to include in the test.

 

Since the 1950’s researchers have developed several taxonomies (or hierarchical categorization) of cognitive, affective, and psychomotor objectives to be addressed within testing situations.

 

 

2 Taxonomies of Cognitive Objectives

 

Taxonomy of Educational Objectives:

The Cognitive Domain.   :  This list was developed by Bloom & Krathwohl in 1956, and the latest revision appeared in 1984)

 

 

Gerlach & Sullivan taxonomy of 1967

 

 

The Taxonomy of Educational Objectives:

The Cognitive Domain

 

Lists 6 categories which vary in difficulty with respect to cognitive abilities or level of understanding.

 

I.  Knowledge  :  recall of specific facts.  Knowledge questions can be identified by key verbs such as define, identify, list, and name.

 

II. Comprehension :  understanding the purpose or meaning of something.  Comprehension questions can be identified with key  verbs such as convert, explain, and summarize.

 

III. Application :  using information and ideas in novel situations.  Application questions can be identified with key verbs such as compute, determine, and solve.

 

IV. Analysis :  Breaking down large pieces of information in order to examine the structure and interrelationships among its component parts.  Analysis questions can be identified by key verbs such as analyze, differentiate, and relate.

 

 

 

 

 

Bloom’s Taxonomy of Cognitive Objectives (continued)  

 

 

V. Synthesis : Combining various elements or parts into a structural whole. Synthesis questions can be identified by key verbs such as design, devise, formulate, and plan.

 

VI. Evaluation : making a judgement based upon reasoning. Evaluation questions can be identified by key verbs such as compare, critique, evaluate, and judge.

 

 

These 6 categories are progressively inclusive.

In order for a test taker to succeed on questions from the higher categories, they must have the ability to answer the lower order categories.

 

i.e.   Critiquing a position or theory requires Knowledge, and the ability to analyze and synthesize  the subject material to be evaluated.

 

 

 

The Gerlach & Sullivan taxonomy of Cognitive Objectives  (1967)

 

Also has six levels of varying difficulty.

 

1.  Identifying :  consists of indicating which member of a set belongs in a particular category :

 

       Which of the following tests is considered a projective test of personality ?

 

       A. MMPI

       B. The draw a person test

C. The 16PF

       D. The Rorschach inkblot test

 

2.  Naming : Supplying the proper verbal label for a referent, or a set of referents.

 

       Collectively, the MMPI, the Rorschach, and the 16PF are considered tests of _______________ .

 

3. Describing :  consists of reporting relevant categories of objects, events, properties, or relationships.

      

       What are the ten scales of measurement associated with the MMPI-II and what do high scores on each of those scales indicate ?

 

 

The Gerlach & Sullivan taxonomy of Cognitive Objectives  (1967)  (continued)

 

 

4. Constructing : creating a product according to certain specifications.

 

       Give the outline of a treatment method for a fear of flying that combines Systematic Desensitization with the classic elements of Rogerian therapy .

 

 

5. Ordering : consists of arranging two or more referents in a specific ranking.

 

List the following events in chronological order :

 

A. Americans begin using civil service exams.             B. Chinese begin using civil service exams.

C. Wilhelm Wundt operates the first psychological laboratory.

       D. Sigmund Freud publishes “The interpretation of Dreams”

 

 

6. Demonstrating : Performing a certain behavior to accomplish a test relevant task.

 

       Give a clinical interview and develop a DSM-IV diagnosis on the next individual who enters the room.

 

When the test developer has fully identified the content areas to be tested and the cognitive objective to be measured, she can create a table of specifications which will guide her through the test creation.

 

The table of specifications allows for a thorough analysis of content and difficulty, and provides a framework for specific test item construction.

 

 

Test Developers also have to take into consideration time constraints when determining the length of tests.

 

General Guidelines for timing deadlines :

 

Essay Tests :  Five ˝ page essays may be completed

In an hour.

 

Multiple Choice Questions :  1 per minute.

 

True-False Questions :  2 per minute

 

Performance Tasks :  task dependent

 

 

Constructing Specific Test Items

 

 

Many decisions must be made by the test developers

 

Should the test be Objective (multiple choice, true false) or Subjective (essay tests) in nature.  The answer to this depends upon the cognitive objectives being assessed.  Some tests may have a combination of objective and subjective methods.

 

Objectivity and Subjectivity refer to the standardization of the scoring procedures.

 

 

If we decide on an objective test, we must then decide whether or not to supply the answer choices.  This indicates the difference between testing recognition memory and testing recall memory

 

Also referred to as    supply  vs.  selection

       or    Constructed response vs. identification    

 

Free Recall is considered more difficult in most cases.

 

Essay Tests  vs.  Multiple Choice Tests

 

Advantages of Essay Tests :

 

Trivially Easy to construct an essay test, compared to the time required to construct a multiple choice test.

 

Essay Tests allow for the examination of higher order cognitive objectives.

 

Allows test takers to show the depth of knowledge they have of a particular subject area.

 

Allows test takers to practice their writing skills.

 

Disadvantages of Essay Tests

 

Subjectivity in grading means it is possible for two people evaluating the same essay to calculate different grades.

 

Grading essays takes a significant amount of time (may not be feasible for tests of a large group of people).

 

Because of time it takes to write essay, may not be able to survey a large portion of the subject material. 

(problem with representative sampling)

 

Students may answer a different question than asked.

 

Advantages of Multiple Choice Tests

 

Objective scoring procedures means anyone can score a particular exam and come up with the same grade.  (helps to increase reliability of the test)

 

A representative sample of questions from all subject areas to be tested can be easily obtained.

 

Exams can be scored relatively quickly, very suitable for large numbers of test takers.

 

 

Disadvantages of Multiple Choice Tests :

 

Take considerable time to create, compared to essay tests.

 

Much more difficult to assess higher order cognitive objectives such as analysis and evaluation.

 

Can be made unintentionally more difficult by the use of : Negatives or (even worse) double negatives within the question  and

Interlocked Items : Where you must get the correct answer to a preceding question in order to get the right answer to the current question.

 

Can be made easier by the use of interrelated items.

 

 

Standard Multiple Choice Formats

 

 

Short Answer Questions :  Test taker must supply the answer.

 

3 Major Guidelines :

 

Questions are preferable to incomplete statements

 

If fill-in-the-blank format is used, the blank should come at the end of the statement. 

 

Avoid multiple blanks in the same item.

 

       The ____________ is a good example of a __________   test.

 

 

True-False Items :  Some of the most commonly used and simplest to create test items.  True false items are sometimes criticized for encouraging rote memorization.

 

True-False answers are affected by the use of specific determiners such as never, always, and only, indicating a statement which is usually false.

 

Often, sometimes, and usually commonly indicate a true statement

Guidelines for constructing good True-False statements :

 

 

1.  Ensure the statements deal with non-trivial information to discourage rote memorization.

 

2.  Keep statements short in length and unambiguous.

 

3. Avoid negatively stated items, especially double negatives.

 

4. Avoid specific Determiners such as always, never, and only.

 

5. On Opinion statements, cite the source of the opinion.

 

6. Avoid tricky items.

 

7. Make true and false statements about equal length, and include an equal amount of both.

 

8.  Make wrong answers more attractive by wording items in such a way that they are not obviously wrong.

 

 

Guidelines for Matching Items

 

 

Matching items are reasonably easy to construct.

 

The drawback is that matching items primarily test lower order cognitive objectives, and thus encourage rote memorization.

 

5 Guidelines for Matching Items 

 

1. Arrange the premise and response options in a clear logical column format, with question stem on left hand side and to be matched items on the right.

 

2. Number the question stems sequentially, and use letters to differentiate among response choices.

 

3. Use between 6 to 15 question stems, and two to three more response options than question stems.

 

4. Clearly specify whether the matching in one-to-one, one-to-many, and whether or not response choices can be used more than once.

 

5.  Put the entire matching section of a single test page for clarity.

 

 

Ranking and rearrangement items are special type of matching questions in which order counts.

 Multiple – Choice Questions

 

 

Are very versatile as you can measure the attainment of both lower and higher order cognitive objectives.

 

Scores on Multiple choice questions are less affected by response bias (response set) than are true-false items.

 

The key to constructing a high quality multiple choice question is to have good, plausible distractors.

 

 

Shortcomings of multiple choice items :

 

1. Good items take time and thought to construct

 

2. Multiple choice stresses recognition over recall.

Familiarity with an answer can lead to correct guessing.

 

3.  Require more time to answer than true-false questions.

 

 

 

Guidelines for constructing multiple choice items

 

 

1. Question format preferred to incomplete statement.  If incomplete statement is used, blank should be at end of statement.

 

2. State question clearly and at the appropriate reading level.

 

3. Place as much of the item is the stem as possible.  Answer choices should be as short as possible.

 

4. Use opinion questions sparingly and cite the source or authority of the opinion.

 

5.  Four to five choices are standard, but two or three choices can be used as well.

 

 How many choices appear is partially dependent on the ease of generating high quality distractors

 

6. If the answer choices have a natural order, arrange them in that fashion. (Dates, ages).  Otherwise answer choices should be randomly arranged.

 

7. Try to make answer choices equal in length and complexity.

 

 

 

Guidelines for Multiple Choice Items (continued)

 

8.  Try to make all answer choices plausible, but only one correct answer.

 

9. Formulate a logical reason why someone who doesn’t know the correct answer would select from the distracter set.

 

10. Avoid the use of negatives, and in particular, double negatives.

 

11. Ambiguous and tricky options should be avoided.

 

12.  Specific determiners should be avoided, and “all of the above” and ‘none of the above’ questions should be infrequent.

 

13. Place options in Stacked format, rather than row by row.

 

14.  Make sure the amount of items tested is appropriate for the time constraints of the testing session.

 

15. Item difficulty should be such that overall performance is halfway between chance (pure guessing) and 100%. This will give your testing measure the maximum ability to separate according to performance.

 

              Constructing Distractor Items

 

Usually, the question and the answer that you want to ask is relatively easy to develop.

 

But what often takes the most time is coming up with plausible distracters.

 

 

Two approaches can be taken :

 

 

The Rational Approach : The test developers understanding of the subject material and their ability to organize that material leads them to adopt specific distracters for specific test items

 

The Empirical Approach : You select distractors based on pre-test data.

 

 You administer the test without any answer

Choices and have people complete the measure as a short answer test.

 

Your test subjects should be similar in composition to the population you will actually be testing.

 

You compile a list of incorrect answers for each question, and use the most popular (most frequent)  

Incorrect answer as your distracter choices

When you reassemble the exam.