Informed judgment method - This is a test-based approach to standard setting in which relevant stakeholders review the overall test in order to suggest a percent-correct score that each believes ought to be earned by a minimally competent examinee. This method is in contrast to the conjectural methods, such as the modified Angoff, in which a panel of judges suggest percent-correct scores for each item.
Internal consistency - The internal consistency measure of reliability estimates how well the set of items on a test form correlate with one another. This method of reliability is likely to produce higher values for norm-referenced tests than for criterion-referenced tests, given that CRTs are often designed to measure a broad range of content topics.
Interrater reliability - This method of reliability is used when a test includes performance tasks, or other items that need to be scored by human raters. Interrater reliability estimates the consistency, or dependability, of the scores produced by the human raters.
Item - A test question is referred to more formally as a test item. This term is used because, while the examinee is always being asked to respond to something, the item is frequently not structured as a direct question.
Item analysis - This refers to a set of statistical procedures used to evaluate the quality of test items. The item analysis is conducted after examinees have responded to the set of items. The measures most commonly included in the item analysis are the item difficulty index, item discrimination index, and distractor analysis.
Item bank - The item bank, or item pool, is typically comprised of the entire set of items that have been written for the exam program. The bank may include items that have not yet been pretested, retired items that are no longer being used, along with items that are available for current, operational use. In most exam programs, additional information about the items, such as the content and cognitive classifications, is stored along with the item text.
Item banking software application - This refers to the database-type software program that may be used to store the exam program's items, along with additional information about the items.
Item difficulty index - This is a measure of the proportion of examinees who responded to the item correctly. It is also referred to as the p-value.
Item discrimination index - This is a measure of how well an item can distinguish between examinees who performed well on the overall test, and those who did not.
Item types - This refers to the variety of test item structures or formats that can be used to measure examinees' knowledge, skills, and abilities.
Item review - This refers to a test development phase in which items are examined by subject matter experts, professional editors, measurement experts, and others to ensure that they satisfy a variety of quality criteria.
Item specifications - Item specifications are very detailed requirements sometimes provided to the subject matter experts (SMEs) who are tasked with writing items for an exam program.