The best of the best: Building a fairer test of General Mental Ability

"Look for three things in a person - intelligence, energy & integrity. If they don't have the last one, don't even bother with the first two." - Warren Buffet

It's difficult to argue with Warren, but for raw work performance the most useful thing to measure is a slice of his first requirement. Psychometric tests that do this are assessing General Mental Ability (GMA). This is the part of intelligence that's independent of our ability to use learned knowledge, skills, and experience. And it's particularly valuable because GMA is important for every job. That's because we're all required to "think on our feet", absorb fresh information, learn new processes, and solve problems on a regular basis. As such GMA is a powerful predictor of work performance and the ability to learn. In fact it's often the best single predictor of performance and trainability for all jobs, ranging from those that are unskilled to graduate-level roles.

GMA tests are also fairer than other cognitive measures, in particular those that specifically assess verbal and numerical ability. Additionally, research has shown they are language and culture free, and are less affected by socio-economic factors and educational attainment.

Sounds good, doesn't it? However, there are two really important questions to ask: how much weight can be placed on an individual GMA test result? And what's that got to do with test taker experience? Let's unpack this further.

Making a good decision

When scientists are trying to understand how accurate a measurement is in classifying people, they use a "confusion matrix." It looks like this:

Confusion Matrix

This is how it works. Imagine you already know how much GMA someone has and you get them to complete a test. They have a high level of GMA (True Class) and the test confirms they have a high level of GMA (Predicted Class). This is a "true positive' (TP) result. Likewise if they have a low level of GMA, and the test confirms it, this is a "true negative" (TN) result. This all makes perfect sense and it would be reasonable to act on the results.

Now I suspect you can probably see where this is going — in terms of decision making, the two categories that are concerning are 'false positive" (FP) and "false negative" (FN).

Controlling cheating

Taking FP first. This means that someone gets a higher result than they should. Of course this is impossible — unless someone knows the answers in advance of doing the test. This would be cheating and test designers do their best to stop this from happening. Obviously, in this case making sure the correct answers do not become public knowledge is very important, and great efforts are taken to keep them secret.

There are also things that can be done with the design of a test to stop people memorizing the sequence of answers, and then passing that on to someone else. For example, the answer options can be presented randomly each time someone attempts the test. Also, it's possible to have a number of parallel versions of the same test and to rotate these between test takers - this way no one knows which version they will be asked to complete.

rolling the dice on cheating, not very good odds

You might also be thinking that a FP result could arise because of random guessing. In reality this is unlikely to make a big impact on the overall result. This is easily illustrated by considering a 20 question test with four answer options per question. On average, if someone guessed the answer to every question, they would end up with five correct. However, guessing like this is very unusual in candidates. Besides it's the average result — on many occasions such a strategy would result in fewer correct answers. And just in case you're wondering what the odds are of correctly guessing all the answers in the same 20 question test, they're 1 in 1,099,511,627,775!

Confronting the elephant

Unsurprisingly the real elephant in the room is the FN result. This is when the test taking experience impacts someone's score. In particular when they have a higher level of ability than that suggested by their test results. If a decision was only based on this result it would clearly be unfair. Naturally there could be all sorts of reasons why someone is unable to demonstrate their potential. They may just get very anxious when taking tests or dislike being assessed under timed conditions — especially in a high stakes hiring situation. It could be that they process information more slowly than other people or are not very good at remembering things when under pressure. The fact is there are many, many ways of getting a lower score than you deserve. The main thing is, what can be done about it?

Making tests fairer

To recap, for the two "true" conditions it's reasonable to take test results at face value as long as the test is reliable and valid: it produces consistent results that are directly related to someone's ability and to work performance. Whereas to keep the lid on FP results, measures like keeping the "scoring key" carefully under wraps, making it difficult to remember sequences of answers, and having multiple versions of a test make it very difficult to cheat.

However when it comes to FN results, the best way to avoid these is through thoughtful user design:

  • Make the instructions short and easy to understand, and keep words to a minimum.
  • Provide examples and practice questions — you might also want to set up a practice site with a full version of the test and explanations of the correct answers.
  • Keep the test itself as short as possible — within the bounds of keeping it psychometrically sound.
  • Remove the time limit — no timers ticking down to zero!

This last point is one of the best ways to control test anxiety. And of course, from a practical perspective this only works for short tests.

The challenge is to build all these features, which makes tests fairer, into an engaging and friendly assessment experience.

 

Traitify is hard at work to bring you a fair GMA assessment! If you'd like to learn more, connect with us.