Statisticians, start your calculators!

timba

Posts: 6071

Free Member

Topic starter

Is a 50 question exam with an even-numbered pass mark as fair and equal to candidates as a 100 question exam with the same pass mark?
On the face of it you award 1% per correct answer, or 2% per correct answer for the shorter version, and with an even-numbered pass mark get a pass/fail result. The exam is multiple-choice with four options per question.
The questions for both exams will be from the same 100 question paper, just 50 fewer of them in one case...and while writing this I see a potential element of unfairness by selecting questions...
With the 100 question version you can get more answers wrong before failing, is this an advantage?

Posted : 27/01/2019 8:02 am

thisisnotaspoon

Posts: 41642

Free Member

Surely yes, but would depend on the format as to by how much. Multiple choice and it becomes twice as likely to pass just by guessing. "Pick any one question and write an essay" and they probably approach the same level of difficulty (if you can't answer 1in50 you probably can't answer the rest). Would depend on timing, is the exam deliberately short so that most candidates would only answer 50 and only the above average get the chance to stretch themselves on further questions?

Depends on the difficulty of the questions too. I've done exams like that but the questions get progressively harder. So group a had to answer q1-75 and b 26-100. Same pass mark for either but how that was then graded would have been different.

Posted : 27/01/2019 8:18 am

pjm60

Posts: 47

Free Member

I am not a statistician, but...

Wouldn't, all other variables the same, the 100 question exam gives students more opportunity to trend toward their 'actual' performance, whereas comparatively 50 questions is more likely to be affected by accidents (e.g. reading the question wrong).

Posted : 27/01/2019 8:34 am

timba

Posts: 6071

Free Member

Topic starter

Qs are theoretically at the same level (knowledge only content of a text book, no understanding/calcs). The 50 Qs are to gain an hour of classroom time over 100

Posted : 27/01/2019 8:36 am

timba

Posts: 6071

Free Member

Topic starter

@pjm Yep, that makes sense

Posted : 27/01/2019 8:46 am

hols2

Posts: 0

Free Member

I see a potential element of unfairness by selecting questions…

Raw scores from a test are always sample dependent on the selection of questions, that's why reported scores are adjusted psychometrically to compensate for the difficulty of the questions. There is a huge amount of research on this if you look at the academic literature.

Questions are never of the same difficulty, regardless of whether they came from the textbook or not. On top of that, the distractors in multiple-choice questions will vary in their attractiveness, so that will affect difficulty. However, having a larger number of questions will mean that the average question difficulty will be more stable between different versions of the test.

The longer test will have higher reliability (i.e. the rank-ordering of the students will be more stable if you compare scores from multiple versions of the test). Think of it as like a sports tournament. If you have a round-robin type format, a single lucky or unlucky fluke result tends to get cancelled out because there are multiple chances. In a sudden-death elimination, there is no chance to recover from a single unlucky event. For example, South Africa were beaten in the last Rugby World Cup by Japan. Great result for Japan and the fans, but not really representative of the quality of the teams. If those two teams played 10 games, S.A. would probably have won 9 out of 10. Tests are the same. A longer test is always fairer because the effect of luck is reduced.

A multiple-choice test of 50 questions is going to be pretty rubbish reliability wise. 100 questions considerably better. 200 probably quite good.

Posted : 27/01/2019 8:57 am

poly

Posts: 8699

Free Member

Your question was “is it as fair and equal” - but I’m not sure that is really what you mean.

It depends partly on the diversity of the syllabus/questions. If the 100 questions cover a broader range of subjects than the 50 questions then it potentially removes the benefit from a student who happened to have studied (or better underestood) the right half of the course; and of course the corollary for the student who knew all the questions you didn’t ask!

It also depends if you assume that a student who doesn’t know an answer will just guess (and therefore if it is marked “negatively” for wrong answers). The proper understanding may even need to consider how often a student makes a simple recording error (eg determined the answer to be C but writes it as D).

However consider for a moment that you split the 100 question paper in two, with the questions randomly assigned into two groups of 50 of supposedly similar overall difficulty. If you gave the same students the two papers consecutively you wouldn’t expect every student to get the same mark on both papers. The similarity of the two would be a good test of your ability to set fair papers. The question you may be trying to understand would be how often would a student who got 48% (assuming a 50% pass mark) on paper A get >=52% on paper B? (And vice versa). The more proper construction might be how often would a student pass or fail paper A or B but get the other outcome on the combined total?

But, assuming you could define more learning points (or ask questions about the same learning point in different ways) why is 100 question automatically the correct answer? Would 200 questions not be better? So at what point does student fatigue kick in and you get poorer measurements of understanding?

Posted : 27/01/2019 4:01 pm

timba

Posts: 6071

Free Member

Topic starter

I'll be finding an hour somewhere else, thanks all

Posted : 28/01/2019 5:47 am

poly

Posts: 8699

Free Member

I’ll be finding an hour somewhere else, thanks all

I’m not sure you understood the answers you were given...

Posted : 28/01/2019 9:22 am

timba

Posts: 6071

Free Member

Topic starter

Thanks to all of the great replies I understand (and can justify) that cutting to 50 Qs isn't ideal, and that the exam isn't the place to save an hour

Posted : 28/01/2019 6:19 pm

poly

Posts: 8699

Free Member

Timba - I don’t think that is what you were told though! What was said was that all the other variables in setting a test will be far greater than the difference the granularity of 51 or 52% being the cut off.

If your concern is the performance of students right on the threshold I would think spending that extra hour with those students learning would be more valuable than trying to manipulate the statistics in their favour.

Posted : 28/01/2019 6:41 pm

hols2

Posts: 0

Free Member

Or split it into two 50 item tests and give students near the cut point the option of taking the second test after a week's revision. Or just give everybody two 50 item tests over two weeks and use the best result. Kinda depends on what the purpose of the test is - is it meant to be a measurement instrument (in which case reliability will be important and a longer test preferred) or a formative tool to encourage students to revise the work (in which case multiple short tests over several weeks might be preferable).

Posted : 28/01/2019 11:10 pm

timba

Posts: 6071

Free Member

Topic starter

is it meant to be a measurement instrument

Yes, somethingive rather than formative, and there is an option on a retake as well. I've enough info for my needs now, thanks, someone better paid will make the decision 🙂

Posted : 29/01/2019 7:32 am

theotherjonv

Posts: 24498

Free Member

i found this an interesting discussion. I've often wondered about the validity of multi-choice exams, as I reckon with technique they're quite easy to 'game' and get higher than expected scores for your actual knowledge (assuming you can follow the instructions and not muck up the selection process that is!!)

- a proportion of questions you know the answer to anyway, you just need to tick the right box

- a proportion you think you know and then seeing that answer on the selection confirms it

- a proportion you don't know but can eliminate 1, 2 or 3 that you know it isn't and increase the odds of a right guess, even up to 100% if you can identify 3 wrong ones (assuming for sake of argument 4 options)

- and a proportion you have no idea at all but still a 1/4 chance of a correct guess

So unless there's a sophisticated marking system taking marks off for wrong answers you're measuring knowledge plus logic plus lucky guessing.

Posted : 29/01/2019 7:34 am

hols2

Posts: 0

Free Member

I’ve often wondered about the validity of multi-choice exams, as I reckon with technique they’re quite easy to ‘game’ and get higher than expected scores for your actual knowledge

That's been extensively researched. If you're really interested, look for research by Howard Wainer on the subject. Firstly, "valididy" refers to the inferences you make from scores, not to exams themselves. There are plenty of valid inferences you can make from properly designed multi-choice exams.

All knowledge is partial knowledge, nobody knows everything about any subject. The point of any exam (multi-choice, short-answer, essay, etc.) is that a person with more partial knowledge of the subject will have a higher probability of answering correctly than a person with less partial knowledge. Although you can succeed on a single MC question by lucky guessing, over a long test, your proportion correct will generally be 1/n, where n equals the number of response options. Depending on how the distractors are designed, some distractors may be easy to eliminate with very little partial knowledge, others may require more partial knowledge to eliminate. This means that the distractors are part of the test item, so designing effective distractors is extremely important.

This means that so called "guessing strategies" are not actually guessing, they are application of relevant knowledge, assuming that the test was properly developed with effective distractors. Many tests, especially classroom tests are, badly developed (teachers are trained to be teachers, not psychometricians). That doesn't mean that all MC tests are somehow invalid, just like the existence of shitty cheap BSOs doesn't mean that all bikes are shit.

Posted : 29/01/2019 8:03 am

[Closed] Statisticians, start your calculators!

Singletrack Issue 163: Stop The Drops!

Singletrack Issue 163: Mumblings from a bike mechanic

Rampage: I was not entertained

Fresh Goods Friday 779: The Where’s Benji? Edition