o1-preview (via Web) performs much better on "trick" math reasoning problems than other language mod...

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

o1-preview (via Web) performs much better on "trick" math reasoning problems than other language models. Paper: Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning.

Post Flair (click to view more posts with a particular flair)

Comments

[not loaded or deleted]

yaosio

LLMs seems to have an issue with understanding the entire question, like part of it is just ignored because it's not deemed to be needed information. In this case it seems as though LLMs ignore that you gave the answer, that one of the drinks was poisoned. I wonder if the problem lays in how the LLM determines what's important and what isn't, rather than it's ability to reason.

Edit: If I demand the LLM pay attention to how many drinks are poisoned then they seem to get it right more often. At the end I put "Key information, pay attention!" with the information that's important. It seems to be about how the LLM determines importance rather than it's ability to understand the question because they don't seem to even notice it's only one drink that's been poisoned.

Author

Account Strength

100%

Account Age

4 years

Verified Email

Yes

Verified Flair

Total Karma

106,990

Link Karma

79,791

Comment Karma

21,398

Profile updated: 4 days ago

Wiskkey

Subreddit

r/singularity

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 1 month ago
Reddit URL: View post on reddit.com
External URL: arxiv.org/abs/2405.06680