The Public Safety Group Blog

 

How to Set a Cut Score

by  Nora Vanni     Jun 18, 2015

There are many ways to construct an exam. You can use short answer questions, fill in the blank questions, multiple-choice questions, true/false questions, essay questions: the possibilities are endless! When it comes to grading that exam, some questions have a single correct answer and some are graded on the accuracy and quality of the response. But how do you determine who passes the test?

What is a Cut Score?

The score at which a student passes or fails is called the cut score. It is the score that separates passing examinees from failing examinees. This standard is for a minimally acceptable candidate (MAC). Such an examinee isn’t an A or B student, but more a borderline C or D performer.

But… isn’t a cut score completely subjective or capricious? How do you determine it? Setting a cut score on what is an acceptable level of knowledge to possess is a critical and difficult task. Arbitrary cut scores, commonly 70% in typical letter grading, are a harmful standard, as they lack detailed information about the difficulty of the items and their importance.

Norm-referenced grading (“grading on a curve”) can be similarly problematic. Designed to only pass the “best of the best,” the results with norm-reference are relevant to the cohort’s performance: some students definitely pass and some definitely fail. With this method, there may be candidates who have mastered the content that still fail.

Standard Setting

This process of determining an effective and valid cut score is referred to as standard setting.1 There are a number of methods for doing so, some of which are more common than others.

Many methods of standard setting utilize the expertise of Subject Matter Experts (SMEs). These item reviewers are experts in their fields, as well as (preferably) educators working in close proximity to students. The contribution of SMEs is significant because it impacts the validity of the exam items. By their involvement, SMEs lend the credibility of their experience and credentials to the exam itself.

When standard setting, SMEs must determine what the desired level of competence. This benchmark represents the material students should be expected to know in order to succeed in their field. This level of competence remains the same for all test candidates and is not adjusted between student cohorts.

Criterion Based Standard Setting

At Fisdap, we utilize two common methods of criterion based standard setting. Criterion base is a standard established before the exam is administered. With the bar set ahead of time, there’s the possibility that everybody can pass, or everybody can fail (as opposed to the previously mentioned norm-referenced grading, in which there will definitely be some passing and failing candidates).

In this post, we’ll cover Nedelsky’s Method and Angoff’s Method, though these are only two of many models of standard setting that exist. Other methods include the Bookmark method, the Direct Consensus method, the Contrasting Groups method, the Borderline Groups method, and more. To learn more about methods for determining a cut score, take a look at Standard Setting by Cizek and Bunch, which was heavily referenced in writing this post.

The Angoff Method

The Angoff method is by far the most common standard setting tool used in the context of modern certification and licensure examinations.2 In the Angoff method, each reviewer individually estimates what percentage of minimally acceptable entry-level students would answer the question correctly. The average of those numbers then becomes the cut score for that item. The cut score for the exam is an average of the individual cut scores for each item in the exam.

For a better understanding, take a look at this sample illustration of the Angoff process:

SME Item 1 Item 2 Item 3 Item 4
Luke 65 70 90 85
Han 70 75 80 70
Leia 85 80 70 60
Cut Score 73 75 80 72

In the Angoff Method, the overall cut score directly relates to the difficulty of the items on the exam. To obtain the exam cut score, we then average each item’s cut score:

Item 1 Item 2 Item 3 Item 4 Exam Cut Score
73 75 80 72 75

The Nedelsky Method

In the Nedelsky method, three Subject Matter Experts (SMEs) individually assign a probability that a minimally qualified candidates would be able to rule out incorrect options. (This method of standard setting is typically only used for multiple-choice questions, due to its focus on ruling out distractors.) The “Nedelsky value” is then calculated by taking the reciprocal of the remaining items.

For example, let’s say a group of reviewers evaluated that for a certain four-option item, a minimally-qualified student could be expected to rule out two of the four distractors. This leaves the student with two answers to choose from. The Nedelsky rating for this case would be ½ = 0.50. The cut score for the exam is determined by adding up the average Nedelsky values for each item.

Standard Error of Measurement

Even after using this rigorous process, there’s still a possibility that standard error could result in a competent student failing the exam. To mitigate the effect of error on pass rates, all Fisdap exams include a 5% standard error of measurement (colloquially know as the “fudge factor”). This buffer helps remove the influence of factors such as ambiguous test items, test-taker fatigue, and  guessing. By protecting against these variables, we keep students from missing the cut off “by one question,” so to speak.

No Exam is Perfect

All this said, an exam standard is ultimately just a recommendation. The final evaluation of a student’s performance lies with educators and educational institutions. No exam is perfect: an educator will always know more about a student than any exam ever could. When considering a student’s performance on a summative exam, educators should take into account the body of evidence that student has built up during the duration of the academic career.

Fisdap encourages all educators to attempt the exam(s) that are administered to their students. This helps establish a respect for the suggested cut score with an understanding of the exam’s difficulty. If you are an educator, contact Fisdap support to attempt an exam and provide Mike Bowen with feedback.

Assess for Success
Learn how Fisdap Testing takes your students from admission 
to graduation, ensuring student preparedness the entire way
 ›

Sources Cited:
1. Gregory J. Cizek and Michael B. Bunch.  Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests (Thousand Oaks, CA: Sage Publications, 2007), 5.
2. Ibid., 82.

Stay Connected

Categories

Search Blogs

Featured Posts

How to Set a Cut Score

by  Nora Vanni     Jun 18, 2015

There are many ways to construct an exam. You can use short answer questions, fill in the blank questions, multiple-choice questions, true/false questions, essay questions: the possibilities are endless! When it comes to grading that exam, some questions have a single correct answer and some are graded on the accuracy and quality of the response. But how do you determine who passes the test?

What is a Cut Score?

The score at which a student passes or fails is called the cut score. It is the score that separates passing examinees from failing examinees. This standard is for a minimally acceptable candidate (MAC). Such an examinee isn’t an A or B student, but more a borderline C or D performer.

But… isn’t a cut score completely subjective or capricious? How do you determine it? Setting a cut score on what is an acceptable level of knowledge to possess is a critical and difficult task. Arbitrary cut scores, commonly 70% in typical letter grading, are a harmful standard, as they lack detailed information about the difficulty of the items and their importance.

Norm-referenced grading (“grading on a curve”) can be similarly problematic. Designed to only pass the “best of the best,” the results with norm-reference are relevant to the cohort’s performance: some students definitely pass and some definitely fail. With this method, there may be candidates who have mastered the content that still fail.

Standard Setting

This process of determining an effective and valid cut score is referred to as standard setting.1 There are a number of methods for doing so, some of which are more common than others.

Many methods of standard setting utilize the expertise of Subject Matter Experts (SMEs). These item reviewers are experts in their fields, as well as (preferably) educators working in close proximity to students. The contribution of SMEs is significant because it impacts the validity of the exam items. By their involvement, SMEs lend the credibility of their experience and credentials to the exam itself.

When standard setting, SMEs must determine what the desired level of competence. This benchmark represents the material students should be expected to know in order to succeed in their field. This level of competence remains the same for all test candidates and is not adjusted between student cohorts.

Criterion Based Standard Setting

At Fisdap, we utilize two common methods of criterion based standard setting. Criterion base is a standard established before the exam is administered. With the bar set ahead of time, there’s the possibility that everybody can pass, or everybody can fail (as opposed to the previously mentioned norm-referenced grading, in which there will definitely be some passing and failing candidates).

In this post, we’ll cover Nedelsky’s Method and Angoff’s Method, though these are only two of many models of standard setting that exist. Other methods include the Bookmark method, the Direct Consensus method, the Contrasting Groups method, the Borderline Groups method, and more. To learn more about methods for determining a cut score, take a look at Standard Setting by Cizek and Bunch, which was heavily referenced in writing this post.

The Angoff Method

The Angoff method is by far the most common standard setting tool used in the context of modern certification and licensure examinations.2 In the Angoff method, each reviewer individually estimates what percentage of minimally acceptable entry-level students would answer the question correctly. The average of those numbers then becomes the cut score for that item. The cut score for the exam is an average of the individual cut scores for each item in the exam.

For a better understanding, take a look at this sample illustration of the Angoff process:

SME Item 1 Item 2 Item 3 Item 4
Luke 65 70 90 85
Han 70 75 80 70
Leia 85 80 70 60
Cut Score 73 75 80 72

In the Angoff Method, the overall cut score directly relates to the difficulty of the items on the exam. To obtain the exam cut score, we then average each item’s cut score:

Item 1 Item 2 Item 3 Item 4 Exam Cut Score
73 75 80 72 75

The Nedelsky Method

In the Nedelsky method, three Subject Matter Experts (SMEs) individually assign a probability that a minimally qualified candidates would be able to rule out incorrect options. (This method of standard setting is typically only used for multiple-choice questions, due to its focus on ruling out distractors.) The “Nedelsky value” is then calculated by taking the reciprocal of the remaining items.

For example, let’s say a group of reviewers evaluated that for a certain four-option item, a minimally-qualified student could be expected to rule out two of the four distractors. This leaves the student with two answers to choose from. The Nedelsky rating for this case would be ½ = 0.50. The cut score for the exam is determined by adding up the average Nedelsky values for each item.

Standard Error of Measurement

Even after using this rigorous process, there’s still a possibility that standard error could result in a competent student failing the exam. To mitigate the effect of error on pass rates, all Fisdap exams include a 5% standard error of measurement (colloquially know as the “fudge factor”). This buffer helps remove the influence of factors such as ambiguous test items, test-taker fatigue, and  guessing. By protecting against these variables, we keep students from missing the cut off “by one question,” so to speak.

No Exam is Perfect

All this said, an exam standard is ultimately just a recommendation. The final evaluation of a student’s performance lies with educators and educational institutions. No exam is perfect: an educator will always know more about a student than any exam ever could. When considering a student’s performance on a summative exam, educators should take into account the body of evidence that student has built up during the duration of the academic career.

Fisdap encourages all educators to attempt the exam(s) that are administered to their students. This helps establish a respect for the suggested cut score with an understanding of the exam’s difficulty. If you are an educator, contact Fisdap support to attempt an exam and provide Mike Bowen with feedback.

Assess for Success
Learn how Fisdap Testing takes your students from admission 
to graduation, ensuring student preparedness the entire way
 ›

Sources Cited:
1. Gregory J. Cizek and Michael B. Bunch.  Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests (Thousand Oaks, CA: Sage Publications, 2007), 5.
2. Ibid., 82.

Tags