I’ve seen it once. It may be a random test selection or limited to queries it arbitrarily decides that it has a good (always 100% wrong) answer for.
I’ve seen it once. It may be a random test selection or limited to queries it arbitrarily decides that it has a good (always 100% wrong) answer for.
All of their models have consistently done pretty good on any sort of standard test, and then performed horribly in real use. Which makes sense, because if they can train it specifically to make something that looks like the answers to that test it will probably be good at making the answers to that, but it’s still fundamentally just a language parser and predictor without knowledge or any sort of internal modeling.
Their entire approach is just so fundamentally lazy and grifty, burning massive amounts of energy on what is fundamentally a dumbshit approach to building AI. It’s like trying to make a brain by just making the speech processing lobe bigger and bigger and expecting it’ll eventually get so good at talking that the things it says will be intrinsically right instead of only looking like text.
You can just say “what if the dog was driving the average american suburban assault vehicle.”