Step 1: Defining the constructs you want to measure
and outline the proposed content of the Test
Aptitude tests for job applicants :
First conduct a job
analysis (task analysis) : listing the important components of the
position you are trying to fill.
The Job analysis will contain the critical incidents,
A list of work related behaviors which are essential for successful completion of the job.
A well designed
aptitude test will contain items which measure the entire cross-section of
critical incidents.
To paraphrase, a
properly constructed test will measure a representative sample of most
critical incidents.
Test Planners must consider a variety of issues :
1. What are the topics and materials to be tested ?
2. What Kind of Questions should be constructed
?
3. What item and test formats should be used ?
4. When, where, and how is the test to be given
?
5. How should the
tests be scored ?
Answers to questions
1,2, and 3 covered in Chapter 2
Answers to questions
4 and 5 covered in Chapter 3
1. What are the
topics and materials to be tested ?
For employment
aptitude tests, this calls for the job analysis with special attention being
paid to the critical incidents.
For achievement
tests, this calls for a content analysis in which the key
subject areas are listed and the percentage of the test to be devoted to each
individual subject area is decided.
Content analysis for
classroom achievement tests can be highly subjective if created by a single
individual with no feedback by knowledgeable colleagues.
2.What Kind of questions should be constructed ?
The answer to this question is partially
dependent upon the Educational Objectives you want to include in the test.
Since the 1950’s
researchers have developed several taxonomies (or hierarchical categorization)
of cognitive, affective, and psychomotor objectives to be addressed within
testing situations.
Taxonomy of
Educational Objectives:
The Cognitive Domain. : This
list was developed by Bloom & Krathwohl in 1956, and the latest revision
appeared in 1984)
Gerlach &
Sullivan taxonomy of 1967
The Taxonomy of Educational Objectives:
The Cognitive Domain
Lists 6 categories
which vary in difficulty with respect to cognitive abilities or level of
understanding.
I. Knowledge :
recall of specific facts.
Knowledge questions can be identified by key verbs such as define, identify, list, and name.
II. Comprehension : understanding the purpose or meaning of
something. Comprehension questions can
be identified with key verbs such as convert, explain, and summarize.
III. Application : using information and ideas in novel situations. Application questions can be identified with
key verbs such as compute, determine, and solve.
IV. Analysis : Breaking down large pieces of information in order to examine the
structure and interrelationships among its component parts. Analysis questions can be identified by key
verbs such as analyze, differentiate, and relate.
Bloom’s Taxonomy of Cognitive Objectives (continued)
V.
Synthesis : Combining various
elements or parts into a structural whole. Synthesis questions can be
identified by key verbs such as design,
devise, formulate, and plan.
VI. Evaluation : making a judgement based
upon reasoning. Evaluation questions can be identified by key verbs such as compare, critique, evaluate, and judge.
These 6 categories
are progressively inclusive.
In order for a test
taker to succeed on questions from the higher categories, they must have the
ability to answer the lower order categories.
i.e. Critiquing a position or theory requires Knowledge,
and the ability to analyze and synthesize
the subject material to be evaluated.
The Gerlach & Sullivan taxonomy of Cognitive Objectives (1967)
Also has six levels
of varying difficulty.
1. Identifying
: consists of indicating which
member of a set belongs in a particular category :
Which of the following tests is
considered a projective test of personality ?
A. MMPI
B. The draw a person test
C.
The 16PF
D. The Rorschach inkblot test
2. Naming
: Supplying the proper verbal label for a referent, or a set of referents.
Collectively, the MMPI, the Rorschach,
and the 16PF are considered tests of _______________ .
3. Describing : consists of reporting relevant categories of objects, events,
properties, or relationships.
What are the ten scales of measurement
associated with the MMPI-II and what do high scores on each of those scales
indicate ?
The Gerlach & Sullivan taxonomy of Cognitive Objectives (1967) (continued)
4. Constructing : creating a product according to certain
specifications.
Give the outline of a treatment method for a fear of flying
that combines Systematic Desensitization
with the classic elements of Rogerian therapy .
5. Ordering : consists of arranging two or
more referents in a specific ranking.
List the following
events in chronological order :
A.
Americans begin using civil service exams. B.
Chinese begin using civil service exams.
C. Wilhelm Wundt operates the first psychological laboratory.
D. Sigmund Freud publishes “The
interpretation of Dreams”
6. Demonstrating : Performing a certain
behavior to accomplish a test relevant task.
Give
a clinical interview and develop a DSM-IV diagnosis on the next individual who
enters the room.
When the test
developer has fully identified the content areas to be tested and the cognitive
objective to be measured, she can create a
table of specifications which will guide her through the test creation.
The table of specifications allows for a
thorough analysis of content and difficulty, and provides a framework for
specific test item construction.
Test Developers also have to take into consideration time constraints when determining the length of tests.
General Guidelines for timing deadlines :
Essay Tests
: Five ˝ page essays may be completed
In an hour.
Multiple Choice Questions : 1 per minute.
True-False Questions
: 2 per minute
Performance Tasks
: task dependent
Many decisions must be made by the test developers
Should the test be Objective (multiple choice, true false) or Subjective (essay tests) in nature. The answer to this depends upon the cognitive objectives being assessed. Some tests may have a combination of objective and subjective methods.
Objectivity and Subjectivity refer to the
standardization of the scoring procedures.
If we decide on an
objective test, we must then decide whether or not to supply the answer
choices. This indicates the difference
between testing recognition memory and testing recall memory
Also referred to
as supply vs. selection
or Constructed response vs. identification
Free Recall is
considered more difficult in most cases.
Advantages of Essay Tests :
Trivially Easy to
construct an essay test, compared to the time required to construct a multiple
choice test.
Essay Tests allow for
the examination of higher order cognitive objectives.
Allows test takers to
show the depth of knowledge they have of a particular subject area.
Allows test takers to
practice their writing skills.
Subjectivity in grading
means it is possible for two people evaluating the same essay to calculate
different grades.
Grading essays takes
a significant amount of time (may not be feasible for tests of a large group of
people).
Because of time it
takes to write essay, may not be able to survey a large portion of the subject
material.
(problem with
representative sampling)
Students may answer a
different question than asked.
Objective scoring
procedures means anyone can score a particular exam and come up with the same
grade. (helps to increase reliability
of the test)
A representative
sample of questions from all subject areas to be tested can be easily obtained.
Exams can be scored
relatively quickly, very suitable for large numbers of test takers.
Disadvantages of Multiple Choice Tests :
Take considerable
time to create, compared to essay tests.
Much more difficult
to assess higher order cognitive objectives such as analysis and evaluation.
Can be made
unintentionally more difficult by the use of : Negatives or (even worse) double negatives within the question and
Interlocked Items
: Where you must get the correct answer to a preceding question in order to get
the right answer to the current question.
Can be made easier by
the use of interrelated items.
Short Answer Questions
: Test taker must supply the answer.
3 Major Guidelines :
Questions are preferable to incomplete statements
If fill-in-the-blank format is used, the blank should come
at the end of the statement.
Avoid multiple blanks in the same item.
The ____________ is a good example of a
__________ test.
True-False Items
: Some of the most commonly used and
simplest to create test items. True
false items are sometimes criticized for encouraging rote memorization.
True-False answers
are affected by the use of specific
determiners such as never, always,
and only, indicating a statement which is usually false.
Often, sometimes, and usually commonly indicate a true statement
Guidelines for constructing good True-False statements :
1. Ensure the statements deal with non-trivial information to discourage rote memorization.
2. Keep statements short in length and
unambiguous.
3. Avoid negatively
stated items, especially double negatives.
4. Avoid specific
Determiners such as always, never, and only.
5. On Opinion
statements, cite the source of the opinion.
6. Avoid tricky items.
7. Make true and
false statements about equal length, and include an equal amount of both.
8. Make wrong answers more attractive by
wording items in such a way that they are not obviously wrong.
Matching items are reasonably
easy to construct.
The drawback is that
matching items primarily test lower order cognitive objectives, and thus
encourage rote memorization.
1. Arrange the
premise and response options in a clear logical column format, with question
stem on left hand side and to be matched items on the right.
2. Number the
question stems sequentially, and use letters to differentiate among response
choices.
3. Use between 6 to
15 question stems, and two to three more response options than question stems.
4. Clearly specify
whether the matching in one-to-one, one-to-many, and whether or not response
choices can be used more than once.
5. Put the entire matching section of a single
test page for clarity.
Ranking and rearrangement
items are special type of matching questions in which order counts.
Multiple – Choice Questions
Are very versatile as
you can measure the attainment of both lower and higher order cognitive
objectives.
Scores on Multiple
choice questions are less affected by response bias (response set) than are
true-false items.
The key to
constructing a high quality multiple choice question is to have good, plausible
distractors.
Shortcomings of multiple choice items :
1. Good items take
time and thought to construct
2. Multiple choice
stresses recognition over recall.
Familiarity with an
answer can lead to correct guessing.
3. Require more time to answer than true-false
questions.
1. Question format
preferred to incomplete statement. If
incomplete statement is used, blank should be at end of statement.
2. State question clearly
and at the appropriate reading level.
3. Place as much of
the item is the stem as possible. Answer
choices should be as short as possible.
4. Use opinion
questions sparingly and cite the source or authority of the opinion.
5. Four to five choices are standard, but two
or three choices can be used as well.
How many choices appear is partially
dependent on the ease of generating high quality distractors
6. If the answer
choices have a natural order, arrange them in that fashion. (Dates, ages). Otherwise answer choices should be randomly
arranged.
7. Try to make answer
choices equal in length and complexity.
Guidelines for Multiple Choice Items (continued)
8. Try to make all answer choices plausible,
but only one correct answer.
9. Formulate a logical
reason why someone who doesn’t know the correct answer would select from the distracter
set.
10. Avoid the use of
negatives, and in particular, double negatives.
11. Ambiguous and
tricky options should be avoided.
12. Specific determiners should be avoided, and “all
of the above” and ‘none of the above’ questions should be infrequent.
13. Place options in
Stacked format, rather than row by row.
14. Make sure the amount of items tested is
appropriate for the time constraints of the testing session.
15. Item difficulty
should be such that overall performance is halfway between chance (pure
guessing) and 100%. This will give your testing measure the maximum ability to separate
according to performance.
Constructing Distractor Items
Usually, the question
and the answer that you want to ask is relatively easy to develop.
But what often takes
the most time is coming up with plausible distracters.
Two approaches can be
taken :
The Rational Approach : The test developers understanding of the subject material and their ability to organize that material leads them to adopt specific distracters for specific test items
Choices and have
people complete the measure as a short answer test.
Your test subjects
should be similar in composition to the population you will actually be
testing.
You compile a list of
incorrect answers for each question, and use the most popular (most frequent)
Incorrect answer as
your distracter choices
When you reassemble
the exam.