AbuTahir@lemm.ee to Technology@lemmy.worldEnglish · edit-21 day agoApple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.archive.isexternal-linkmessage-square239fedilinkarrow-up1719arrow-down133cross-posted to: apple_enthusiast@lemmy.world
arrow-up1686arrow-down1external-linkApple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.archive.isAbuTahir@lemm.ee to Technology@lemmy.worldEnglish · edit-21 day agomessage-square239fedilinkcross-posted to: apple_enthusiast@lemmy.world
minus-squareKnock_Knock_Lemmy_In@lemmy.worldlinkfedilinkEnglisharrow-up2·3 hours ago do we know that they don’t and are incapable of reasoning. “even when we provide the algorithm in the prompt—so that the model only needs to execute the prescribed steps—performance does not improve”
minus-squareCommunist@lemmy.frozeninferno.xyzlinkfedilinkEnglisharrow-up1·edit-21 hour agoThat indicates that this particular model does not follow instructions, not that it is architecturally fundamentally incapable.
minus-squareKnock_Knock_Lemmy_In@lemmy.worldlinkfedilinkEnglisharrow-up1·43 minutes agoNot “This particular model”. Frontier LRMs s OpenAI’s o1/o3,DeepSeek-R, Claude 3.7 Sonnet Thinking, and Gemini Thinking. The paper shows that Large Reasoning Models as defined today cannot interpret instructions. Their architecture does not allow it.
“even when we provide the algorithm in the prompt—so that the model only needs to execute the prescribed steps—performance does not improve”
That indicates that this particular model does not follow instructions, not that it is architecturally fundamentally incapable.
Not “This particular model”. Frontier LRMs s OpenAI’s o1/o3,DeepSeek-R, Claude 3.7 Sonnet Thinking, and Gemini Thinking.
The paper shows that Large Reasoning Models as defined today cannot interpret instructions. Their architecture does not allow it.