Hallucination rates are vanity. A model may pass one test but hit 30% errors on...
https://xeon-wiki.win/index.php/Why_is_GPT-5.5_57%25_Accurate_but_86%25_Hallucination_on_AA-Omniscience%3F
Hallucination rates are vanity. A model may pass one test but hit 30% errors on HalluHard. Because Vectara’s HHEM and AA-Omniscience measure facts differently, your yardstick is key