Activity

freadhrkvt posted an update 1 month ago

How Claude Opus 4.5 Misclassified 38% of Implicit Assumptions in Real-World Tests The data suggests Claude Opus 4.5 struggles more than advertised at spotting hidden premises in complex prompts. In a suprmind.ai controlled benchmark of 2,500 real-world scenarios – legal summaries, clinical vignettes, technical design briefs – the model flagged only 62% of implicit assumptions that human reviewers