Test Item Validation
Bell Curve
The Bell Curve
When you were a student, in any grade up to or even including College, you probably ran across a teacher (or two or three) who graded “on the curve.” This method was used to distribute grades across certain thresholds, and provide a “balanced” distribution of scores. This works fine for academic results, but not so much when an employer is trying to determine the competency of their workers. I guarantee you that the next time your car is in the shop, you don’t want your mechanic to have passed his/her training because the instructor “graded on the curve.” I am certain that you would prefer that this person had demonstrated an established level of competence in training before working on your vehicle.
Further, I’m sure you would prefer it if your mechanic had gotten 100% on all of the performance and knowledge tests in auto repair school. Or to take a more extreme example, wouldn’t you want your Doctors or Dentists to have gotten perfect scores on all of their tests when they were in school? Unfortunately, this is not a reasonable expectation.
You can, or course, demand high performance, but you can’t require perfection when it comes to learning new skills or knowledge. Which begs the question: how is an appropriate level of competence established when becoming proficient at a new job? If 100% is too high, what about 90%? Now it seems rather arbitrary to pick a percentage out of thin air. Fortunately, there is a process that can help test designers set an appropriate “cut score” as it is called.
What is Test Item Validation?
Dr. William Angoff wrote about his test item validation process way back in 1971. It is a means of certifying that the tests given in conjunction with a training course or class are a reliable and fair indication of how well learners have mastered the material being tested. By using a panel of experts (preferably five, but you can use three or four) known as Subject Matter Experts (SMEs), test writers are able to develop questions that are clear, reliable, and valid. These SMEs are knowledgeable and competent in the content area being tested. Plus, they are able to determine what an acceptable “passing score” might be for a newly trained, yet competent trainee.
What is a Minimally Competent Person?
You can’t expect a newly trained person to be an expert on the job. You may want that, but it’s not realistic. But what you can expect is the following from trainees:
• Knows basic functions required for the job – can do low level tasks
• Willing and able to learn
• Knows basic fundamentals
• Able to build on concepts taught
• Can pull from resources
• Knows where to find things
• Has initiative out of school
• Practices safe working habits
What are SMEs Responsible For?
Each SME on the panel will read through the test that has been developed, as if they were a student in the training course. They are instructed to consider a “minimally” competent person (newly trained) when making their determinations. The question then becomes: Out of 100 minimally competent people, how many do you think should be able to answer the question correctly? A super easy question would get a 100. A tougher question might get 70, 80 or 90.
In addition to establishing cut scores, SMEs must also evaluate three additional attributes for each question:
- Linkage (or Validity). Is it a good question to ask? Is it relevant to the job? Does it test the behavior in the training objective?
- Clarity. Is it clear? Does it make sense? Can a student understand what is being asked?
- Reliability. Can you expect all students to give the same answer once they understand the material?
If the test questions fail any of the above attributes, then the Facilitator will lead a discussion on how to “fix” the test item. If it can be re-worded easily, then it can be saved. Otherwise, the item will need to be deleted from the test.
What Else does the Facilitator Do?
In addition to leading the discussed described above, the Facilitator (person running the validation session) has the following responsibilities:
- Poll the SMEs for their cut scores, and enter ratings into a spreadsheet
- Check the standard deviation for the scores on each individual item. Wide extremes between ratings are not acceptable, and must be discussed.
- Watch the clock so the meeting doesn’t get bogged down.
More About Cut Scores
Cut scores are established for each individual test item (question). It is the average of the ratings from all of the SMEs. The average of all the scores for all the items on a test becomes the cut score for the test. For example, if there are 10 questions that each have a cut score of 90, the learner must get 9 out of 10 questions correct to pass the test. Cut scores must be reasonable. The should never be so high that no one can pass the test. They should also not be so low that anyone off the street could pass the test.
Benefits of Using The Angoff Method
The Angoff Method of Test Item Validation is a systematic and well documented approach. Using this method will ensure that your tests are reliable, valid, and fair. In addition, they will be legally defensible as a means to determine employee competency. Finally, it has been a widely used and accepted approach to test validation for over 40 years.