As presented, it did seem a bit rigid to me. Now, of course, there may have been leeway in actual practice, which is why I asked about those three scenarios. It's one thing to literally only accept the precise three (four depending on how you count contractions, which I tend to count as two words though the standard seems to be to count them as one) words "I don't know", but another to simply accept the admission that the given answer is just a process to reach an imperfect answer, for example. The original poster may have accepted a broader range of answers than the post suggested. Or maybe not. That's still their right as the one hiring I suppose.
I brought up the example of fairy-tale style tests like getting suitors to choose between a gold cask, a silver cask, and a lead cask where the "correct" answer is, of course, to choose the lead cask since it supposedly indicates a noble lack of desire for worldly things. Of course, there are a lot of problems with such tests from a perspective outside the realm of fairy tales and simplistic morality. For starters, in pretty much all of these tales, the ones giving the test are always filthy rich. So, you have the situation where some king or nobleman or whatever who almost certainly cares a great deal about worldly riches is judging people for their materialism. I mean, it's not really a paradox. It could be simple hypocrisy or even a non-hypocritical desire to actually try to maximize the daughter's happiness. It still comes off a bit judgy given who is making the test though. Then, of course, there's the cynicism aspect I mentioned. Anyone shrewd enough, regardless of greed level, recognizes that the right choice in these classical tests is the least greedy-seeming option. So, such tests are probably more likely to select for the most conniving weasel rather than the most virtuous. Then, of course there's the fundamental flaw of assuming that seeking some sort of material wealth is not virtuous. Clearly, it's normally considered to be virtuous to seek a paycheck to feed, clothe, and house your family, for example. For the example with the casks of different value, how does the test-giver tell the difference between "If I take the gold cask my fortune will grow even more!" and "Wow, think how many starving widows and orphans I can feed when I sell that!"?
Of course, a real world, and frankly tragic (among many pejoratives that could be used), version of that kind of fairy tale simplistic logic is an old standard over whether or not to stone a female rape victim to death. Basically, the way it went is that if she was in the city, and didn't cry out, she would be stoned to death (while presumably in the clear for public stoning at least, but probably facing other social consequences and/or punishment up to and including potentially being killed anyway, if she cried out and also could prove it because someone actually heard it and did not ignore her). In the country, she was clear of the crying out requirement (though still facing potentially terrible consequences). So, these rules follow the simple logic that she is complicit if she does not take specific action to prevent the rape and assuming that she will be heard in the city and not in the country. Nice and simple, and incredibly stupid (not to mention misogynistic, hyper judgmental, and frankly just murderous). Anyone with an iota of sense recognizes some basic problems with that requirement, for example that a person can have their mouth covered, be knocked out, be dragged into a building where screams simply won't reach the outside, be attacked in, or moved to, a place where everyone nearby is basically complicit and does nothing, be attacked in a place where everyone around are "good, decent" people who are still effectively complicit through apathy or self interest and simply ignore cries for help, etc. There's also the obvious knife to the throat/other threat of death with a warning to keep quiet, but for the hyper judgemental, that one is no problem since they would likely conclude that the virtuous should rather die than be raped, so if they didn't let themselves be killed, they are guilty. Same "logic" applies to those who realize that, if they draw attention to the attack, the consequences from society, even without being stoned to death, will be terrible and still may include a shallow grave somewhere. Obviously, for the hyper judgemental, that kind of reasoning is simply not virtuous. So, that's a bit of a ramble, but that's a real world example that's always bothered me tremendously. The "logic" involved is ridiculously rigid and based on a ridiculously simple model of the world. Now, someone would have adjudicated this, of course. Some doing the judging very well might have actually taken all of the real-world circumstances into consideration and made a more reasonable judgement than the stupidity of the actual law (note that when I say "more reasonable", that's still in a paradigm where women are being held criminally liable for being raped, so take with a kilogram of salt, which still would not be enough). There certainly also would have been those judging who simply say "the law is the law" and ignore actual fairness in favor of a rigid interpretation.
So, in a very long-winded way I am saying that overly rigid tests that don't really consider all the possibilities are problematic. Working with a rigid rule, but adapting and making judgement calls may ameliorate things somewhat, but may not make up for the basic problem of the rigidity of the rule in the first place. Basically all behavioral tests to see if someone will do the "right thing" in a manufactured (or poorly modeled) situation are inherently flawed. For the very specific "I don't know" test we're discussing, a lot hinges on how the answer actually gets judged and, ultimately, if the employer wants to apply such a test they can, but they get what they get, for better or for worse.