Reliability

Is the exam reliable and replicable? Does it distinguish consistently between a high and low level of satisfactory fulfilment among students (inter-item reliability), and is the exam and its result rated in the same way by different parties (e.g. teachers and external examiner) (inter-examiner reliability)?

ASSESSMENT METHODS RELIABILITY
MCQ and similar tests Objectivity in the definition of right and wrong answers results in very high reliability.
Written invigilated exam without aids Using a number of questions with relatively unambiguous answers increases reliability.
Written invigilated exam with aids The more questions that have unambiguous answers, the greater the reliability.
Written paper If the question/problem is formulated independently by the student, the actual question also needs to be included in the marking in order to ensure reliability.
Portfolio To ensure high reliability, it is important to establish clear marking criteria for the portfolio.
Logs It is important to establish clear marking criteria for the specific log postings.
Internship report Work experience placements vary greatly, so reliability is low.
Oral exam/Viva without preparation Reliability depends on the person. Using different questions diminishes reliability.
Oral exam/Viva with preparation without aids Reliability depends on the person. Using different questions diminishes reliability.
Oral exam/Viva with preparation and aids Reliability depends on the person. Using different questions diminishes reliability.
Student presentations The criteria for a good student presentation must be clear, and the students must be aware of these, in order to ensure reliability.
Objective structured clinical exam Increases with the number of stations.
Practical test High reliability is contingent on clear marking criteria.
Active participation In terms of reliability, it is crucial that the criteria for the individual assignments are unambiguous and known to the students.
Oral presentation based on synopsis As this type of exam focuses on one or more questions, it is difficult to achieve a high level of reliability. If the question/problem is formulated independently by the student, the actual question also needs to be included in the marking in order to ensure reliability.
Written paper with oral defence Usually low, as a written paper usually only deals with part of the syllabus. However, it is possible to ask detailed questions about the rest of the syllabus in the oral exam.Relatively high, depending on the types of questions.
Portfolio and oral exam To ensure high reliability, it is important to establish clear marking criteria for the portfolio. The oral exam gives the internal examiner the opportunity to ask detailed questions.
Project exam Usually low, as a project report usually only deals with one or few topics.

If the outline is formulated independently by the student(s), the actual outline also needs to be included in the marking in order to ensure reliability.

 

More about reliability

This type of exam must enable precise, uniform marking, partly in order to fulfil statutory requirements and partly to ensure students are evaluated fairly.

The desire for reliability also equates to the desire for the ideal and most accurate assessment of the student’s performance. This is why ensuring reliability is also a matter of minimising possible sources of error in the marking. Making the marking criteria clear and unambiguous is an attempt to prevent the examiners from introducing subjective and normative assessments of the student’s performance.

As already stated, reliability can be increased by making the exam paper and the marking criteria clear and unambiguous. This applies in relation to the students so what is expected of them is made clear, as is the point at which they have reached the target. However, this also applies between examiners (usually the internal and external examiner), who must be in agreement with regard to the nature of the assignment and how it is to be assessed in order to ensure consistent marking.

In this connection, the number of examiners can also be regarded as an expression of reliability. The greater the number of examiners, the greater the likelihood of a fair mark. A type of exam that is linked to one internal examiner and one external examiner is therefore inherently more reliable than a type of exam with only the internal examiner examining.

In designing the exam, reliability can be enhanced by increasing the number of problems to be examined. The more sub-questions an exam comprises, the less chance there is of the student being marked on the basis of coincidence and good or bad luck. An exam result is not reliable, for example, if only one problem is examined, and the student just happens to have been reading up about that recently. Or if the one problem tested in the exam coincides with the one problem definition of the subject that the student happens to find difficult to get to grips with.

In practice, striking a balance between validity and reliability is often the basis of the negotiated soundness of this type of exam. To the right, there are four scenarios combining high and low validity and reliability respectively. The centre of the target represents the key learning targets, and the “bullets” are the students’ answers. High validity and high reliability are, of course, preferable, and the combination of low validity and low reliability is not worth striving for. However, the two intermediate positions are perhaps the most realistic, and here all we can do is encourage exam planners to be aware of the strengths of the particular type of exam they want, and of where additional questions or combinations of exams could usefully be added.

(The target metaphor is derived from Babbie, E. (2010). The practice of social research (12th ed.) Wadsworth: Cengage Learning)