Professional Testing, Inc.
Providing High Quality Examination Programs

From the Item Bank

The Professional Testing Blog

 

Reliability and Decision Consistency

August 12, 2015  | By  | 

There are many philosophical debates as to what constitutes validity or the validity of test score inferences. Within certification we hear things like “a job analysis is typically conducted every five years unless the content changes more or less frequently.” You may have also heard that more than eight SMEs are required for test development activities, while others suggest six or more is sufficient. Some suggest that SMEs should not overlap with separate test development meetings.  So in essence psychometricians will typically say “it depends” in response to questions about the appropriate way to design a certification examination program.

One element of the validation process that has a more definitive answer is reliability. Generally speaking reliability refers to the consistency or replicability of scores. In a perfect world, if a test is given twice to the same person, the person would achieve the same score. Reliability is a requirement for claims of validity. However, reliability does not constitute validity as one could have consistent scores measuring the wrong content.

In certification we are primarily concerned with the pass/fail decision when discussing reliability. While an estimate of internal consistency (e.g., Coefficient Alpha) can bring some insight into an examination’s reliability, other methods contribute more to the validation process discussion. There are two approaches to evaluating whether the instrument is making consistent classifications (pass and fail). The first approach is test-based and the second is conditional standard errors.

Test-based approaches to estimating decision consistency reliability are those that return a value, theoretically, between 0.0 and 1.0. Like internal consistency estimates, higher values represent stronger reliability.  In Crocker and Aglina’s 1986 text book Introduction to Classical and Modern Test Theory, they list four factors that may affect decision consistency:

  1. Test length
  2. Location of the cut-score in the score distribution
  3. Test score generalizability
  4. Similarity of score distributions of different exam forms.

What separates this from traditional measures of internal consistency is bullet number two above. The decision consistency estimates include a variable to consider where the cut-score is on the scoring continuum. So, assuming all other variables being equal, a distribution of test scores that has a cut-score at or immediately near the mean of the observed scores will produce a lower decision consistency reliability estimate than a cut-score one or two standard deviations away from the mean. There are a number of methods for calculating this including, but not limited to methods by Livingston and Brennan-Kane, Hunyh, Subkoviak. These and other decision consistency estimates are well described In Crocker and Aglina’s 1986 text book Introduction to Classical and Modern Test Theory. It is important for certification programs to document some type of evidence related to the decision consistency of scores. In essence decision consistency is passing candidates should pass if tested again within a short time frame and failing candidates should fail.

For adaptive exams, conditional standard errors are used to make final decisions. In this approach, candidates are given an exam that adapts based on item selection criteria after each item is administered. The goal is to reduce the conditional standard error band around a candidate’s score. This error band reflects a small area in which we expect scores to vary .The result of the process is creating an exam so that the candidate’s conditional standard error does not to include the exam cut-score so that a pass/fail decision can be made. A more detailed discussion of this can be found here.

Validating an exam program includes making judgments about the appropriateness of the inferences made from a test score. With certification examinations, the judgments made are typically pass or fail. These decisions need to be reliable.

Categorized in:

Comments are closed here.