GPT-4: Is It Really That Smart? The Tea on AI Math Skills

So, like, everyone’s been hyping up GPT-4, right? Saying it’s, like, a super genius AI that can do anything. But this new research paper is throwing some serious shade, especially when it comes to its maths abilities. Turns out, our fave AI might be a really good copycat and not the maths whiz we thought it was. Let’s get into it.

The main claim? GPT-4 doesn’t understand maths; it just regurgitates what it’s already seen. It’s like that kid in class who just repeats the teacher, but can’t explain it themselves. If you give GPT-4 a maths problem that isn’t already on the internet, it’s gonna fail, big time. They tested this by crafting questions with proofs not readily available on the web and GPT-4 couldn’t solve them. The researchers reckon that GPT-4 just can reproduce, rephrase, and polish proofs that it’s seen before, and not grasp mathematical concepts.

These researchers are using “source criticism,” which is basically like checking the receipts of where your info comes from. They’re trying to figure out what GPT-4 has been trained on because that’s what it’s basing all its answers on. They point out that the sources encountered by GPT-4 are vital for appraising its problem-solving capabilities. It’s like, how can we tell if it’s actually smart or just copying if we don’t know what it’s been shown?

Here’s the real kicker: OpenAI is keeping the training set secret. This means it’s impossible to know if GPT-4 is actually flexing its own brainpower or if it’s just repeating stuff from its training data. If you don’t know what it has been trained on, you can’t really judge its “understanding”. The researchers point out that creators of similar models have been transparent about their training data, which makes OpenAI’s secrecy even more suspect.

Even though GPT-4 claims it’s a “fixed model” that hasn’t been updated since January 2022, it’s actually getting better over time. The researchers gave it the exact same math problems months apart and it suddenly started providing correct proofs when it couldn’t before. This suggests that it’s not so “fixed” after all and that it may be continuously learning from its prompts and other sources.

People are wasting time prompting GPT-4 over and over again to solve stuff, thinking it’s a fixed model with set capabilities. The authors reckon it’s not actually helping. If it eventually solves a problem, it’s probably because the answer has been added to its database, not because it’s figured out some new logic. The paper questions whether repeatedly prompting GPT-4 is valuable for machine learning or for theorem proving.

The paper argues that solving math problems with GPT-4 is more like searching for an answer than actually thinking it through. It suggests that proving mathematical theorems is more of a retrieval task, like using a search engine, rather than predicting the next word. It’s like Google but for maths, and not as effective. The paper proposes that building powerful search engines for mathematical libraries might be a better approach than training language models on all the available proofs.

They talk about “extrapolation,” which is like trying to figure stuff out based on what you already know. GPT-4 does extrapolate, but it’s often bad at it. It’ll either rephrase something it’s already seen, or it’ll make wild guesses, which can lead to a totally wrong answer. According to the paper, the failures of GPT-4 are due to the unavailability of proofs in a formal language and not a lack of exposure to certain mathematical concepts.

The researchers gave GPT-4 some simple math problems where the proofs weren’t readily available online, and it totally flopped. Like, we’re talking basic stuff that a person who understands maths would get, and GPT-4 couldn’t hack it. The study used Google to verify that these proofs aren’t readily available online.

The paper is calling for OpenAI to be more transparent about its training process and its models’ capabilities. It’s like, they need to drop the act and tell us what’s going on. The authors suggest that such transparency would benefit the community as a whole and might also benefit OpenAI in the long run.

So, to sum it up, don’t get too hyped about GPT-4’s maths skills. It’s good at copying, but not so good at thinking for itself, and we need to know where the info it’s using is coming from. It seems like there’s still a lot of work needed before AI can truly understand maths. It seems that the ability of the model to reproduce the contents of its training set may provide a way for the model to measure its confidence. The paper suggests that extrapolation may be useful in identifying the success and failure cases of large language models.

Original paper

LARGE LANGUAGE MODELS’ UNDERSTANDING OF MATH: SOURCE CRITICISM AND EXTRAPOLATION

stemaccent
stemaccent

Fuel Your STEM Curiosity