Norming Distributions and Standardization



Since most psychological tests are not mastery tests with criterion references which determine performance, a different way must be used to classify scores as low or high.


       In order to assess overall performance, most psychological tests employ a standardization sample which allows the test makers to create a normal distribution which can be used for comparison of any specific future test score.


Standardization Sample :  a large sample of test takers who represent the population for which the test is intended.  This standardization sample is also referred to as the norm group (or norming group).


       We convert the raw scores of the sample group into percentiles in order to construct a normal distribution to allow us to rank future test takers.


Norms are not standards of performance, but serve as a frame of reference for test score interpretation.


Norm groups can range in size from a few hundred to a hundred thousand people. he more people we use in our norm group, the close the approximation to a normal population distribution we get.


Sampling methods for selecting a norming group


Sample must be representative :  Test children if you are developing a test of children's IQ;  test adults if you are interested in assessing adult interests.


The closer the match between your sample and your intended population of test takers, the more accurate the distribution will be as a ranking guide.


Simple Random Sampling : every person in the target population has an equal chance of being in the standardization sample.


Stratified Sampling : Test developer takes into account all demographic variables which can accurately describe the population of interest and then selects individual at random, but proportional to the demographic portrait of the test population.

Most accurate way of developing norm group.

Common demographics to stratify :  age, gender, socioeconomic status, geographic region.


Cluster Sampling :sampling begins by dividing a geographic region into blocks and then randomly sampling within those blocks. 

More likely than random  sampling to come up with a representative sample and less time consuming than stratified sampling.



                     Item Sampling


Often, test developers need to produce more than one version of a standardized test.


This is particularly important if you believe you will have an individual complete a psychological test more than once.


Item sampling refers to the procedure of giving two norm groups different items from  the same exam.


This allows us to shorten the time it takes to conduct our representative sampling.



Difference between group norms and local norms :

       Sometimes educators are interested how students performed relative to other students in the same grade, or other students in adjacent districts.

       For these purposes, test scores will develop local norms for statistical comparison, rather than using the group norm supplied with the test.

       When scoring is done by computer, local norms can be easily developed.



Converting Raw scores into percentile ranks.



Remember, one major assumption in both psychology and psychological measurement is that all variables of psychological interest are normally distributed.


Since these variables fall into a normal distribution, we can specify what proportion of the population falls at or below (or at or above, or beteen) any score on a particular test.


The average value is the midpoint of the distribution and has a percentile rank of 50%.



By knowing the mean (arithmetic average) and the standard deviation (average variation) of any psychological test, we can construct the normal distribution.


68 % of all scores fall within  +/- 1 standard deviation from the mean.


96% of all scores fall within +/- 2 standard deviations from the mean


IQ distribution has a mean of 100 and a standard deviation of 15.



Specific Types of Normal Distributions commonly used in psychology.


Psychologists refer to these distributions often because there is a common reference for understanding raw scores of these particular distributions :



The Z distribution :  The Z distribution has a mean of 0 and a standard deviation of 1.

  Extremely easy to tell from a Z score :


Whether a score is above or below average (by the sign, positive or negative)


Whether the score falls within average or deviate ranges.  -1 to +1, an average score,  -1 to -2 and +1 to +2, above or below average,  Z scores <-2 or >+2 are atypical scores (outliers).


The T distribution : Has a mean of 50 and a standard deviation of 10.   Easy to tell from a T score:

Whether a score is above or below average (T<50 below average, T>50 above average)

How far above or below because standard deviation is in units of ten.

Sometimes preferred to Z because negative T values are extremely rare.



Converting Raw Scores to Z scores and reverse




Z =   (Raw Score - Average) /  Standard Deviation


Through simple algebra, we can isolate any term we are interested in solving for :


Raw Score = ( Z * SD) + Average


Average = Raw Score - (Z * SD)


SD = (Raw Score - Average)/ Z


Understanding this relationship, we can convert a z score into any type of distribution we like.




T = 10Z + 50




SAT scores : 100Z +500




Parallel and Equated Tests



When more than one version of a standardized test is needed, alternate forms must be developed.




Parallel Forms :  If the two tests have the same types and numbers of items of equal difficulty, the alternate versions are said to have parallel form.

Scores on parallel forms are highly correlated.


Parallel Forms are difficult to develop because the mean and standard deviation on both tests must be equivalent.


Equated Forms :  When we can't develop two alternate forms with the exact same mean and standard deviation, we can still compare tests of equivalent difficulty through the use of a common metric, for example the Z score distribution.


Item Response Theory can be used to equate difficulty and discriminability of two tests through linking