News | Research Suggests That the Accuracy of OpenAI’s GPT-4 May Have Decreased

Research Suggests That the Accuracy of OpenAI’s GPT-4 May Have Decreased

Published by: Insights Desk Released: Jul 21, 2023 Source: DemandTalk

Highlights:

In March, GPT-4 successfully tackled 97.6% of assigned mathematical problems, but its proficiency declined drastically to 2.4% by June.
Between March and June, the proportion of queries answered by GPT-4 with “directly executable” code, or code that can be executed without modification, decreased by more than 40%.

According to a recent study article, GPT-4, OpenAI LP’s most powerful artificial intelligence model, may have grown less successful at performing specified tasks.

Recently, Ars Technica revealed the paper’s findings. It was written by three researchers from Stanford University and the University of California, Berkeley, and was originally published on July 18. Following the paper’s publication, several AI experts questioned whether GPT-4 has become less accurate.

The paper’s authors assessed GPT-4’s reasoning abilities by requiring it to complete a series of tasks twice: once in March and again three months later. Then, they compared the outcomes of the two tests.

One subset of the tasks assigned to GPT-4 by the researchers required the AI to solve mathematical problems. In March, it effectively resolved 97.6% of the questions. This percentage had dropped to 2.4% by June.

The paper’s authors believe the decline might be due to “drifts of chain-of-thoughts’ effects.”

When researchers asked GPT-4 to tackle math problems, they interacted with the model using chain-of-thought prompting. In addition to requesting an answer from the model, they also requested a detailed explanation of its thought process. This procedure has been demonstrated to increase the precision of language models.

The researchers hypothesize that the observed change in GPT-4 accuracy may be attributable to the chain-of-thought prompts. In one test, they input a chain-of-thought query asking the model to determine if 1,7077 is a prime number. In March, GPT-4 provided the correct response and a step-by-step analysis of its thought process. However, three months later, it provided an incorrect response without giving a breakdown.

The accuracy of GPT-4 was also evaluated concerning other categories of tasks. A portion of the utilized evaluations required the model to compose software code. Between March and June, the proportion of queries answered by GPT-4 with “directly executable” code, or code that can be executed without modification, decreased by more than 40%.

Certain AI specialists have expressed doubts about the paper’s findings. Professor of computer science at Princeton University, Arvind Narayanan, noted that the fact that the code generated by GPT-4 could not be run promptly did not inherently imply that it was less accurate. Sometimes, the code could not be executed because GPT -4’s responses also contained explanatory prose.

Respected Software Engineer Simon Willison concurred. Willison told Ars Technica, “A decent portion of their criticism involves whether or not code output is wrapped in Markdown backticks or not.” Backticks are used to format software code in Markdown.

Logan Kilpatrick, OpenAI’s Head of Developer Relations, stated, “The team is aware of the reported regressions and looking into it.” Peter Welinder, the AI startup’s Vice President of product and partnerships, stated, “No, we haven’t made GPT-4 dumber. Quite the opposite.”

The paper’s authors published this week about the precision of GPT-4 also evaluated GPT-3.5, an earlier OpenAI model with fewer capabilities. Between March and June, they discovered that the accuracy of the latter model did not decline but instead improved. In three months, the accuracy with which GPT-3.5 solved math problems increased from 7.4% to 86.8%.

ai governance for the enterprise...

empower ai and real-time insights at the edge...

power ai and analytics workloads with performance,...

how to choose the right ai foundation model...

pros enterprise ai for the industrial industries (...

unlocking ai’s potential: challenges and opportu...

transforming procurement with ai: opportunities, c...

adobe acrobat ai assistant: reinventing productivi...

adobe acrobat ai assistant: reinventing productivi...

ai, automation, and the strategic cao...

an introduction to ai in customer service...

5 ways ai can transform your customer experience...

ciso guide to generative ai attacks...

10 reasons to hire a customer-led voice assistant...

10 reasons to hire a customer-led voice assistant...

the definitive buying guide for contact center her...

cfo's guide to ai...

discover the future of business innovation with ge...

preparing for the future of cx by harnessing the p...

tableau gpt: innovate for the future with generati...

profitable ai-powered data management solutions to...

business-centric cognitive architecture revolution...

ai use cases – innovations for business success...

the role of ai in software development...

ai in cybersecurity – your digital guardian...

how chatbot marketing supports today’s business ...

advanced adaptive ai bolsters business intelligenc...

the dynamic impact of ai in procurement...

ai in customer service – revealing common applic...

how to use dall-e for marketing success...

rpa vs ai: a comparative analysis for business aut...

maximizing business efficiency through ai integrat...

7 trendiest ai marketing campaigns igniting commer...

liquid neural network unveiling the fluid intellig...

the art of prompt engineering in general & marketi...

what is amazon bedrock?...

decode data like never before: chatgpt for data an...

workforce planning models –the power of ai skil...

black friday and the impact of ai in e-commerce...

how digital brain is a game changer for business s...

microsoft introduces bing generative search in lim...

cytoreason raises usd 80 m in the funding round in...

google unveils a suite of new features for ai apps...

kindo reels in usd 20.6 m and acquires whiterabbit...

microsoft’s spreadsheetllm enhances ai’s compr...

herculesai raises usd 26 m to develop and expand i...

intel capital leads usd 15 m investment in ai cons...

aws unveils app studio to accelerate app developme...

captions llc raises usd 60 m for generative video ...

enso technologies secures usd 6 m for smb-focused ...

hebbia raises usd 130 m to develop data search pla...

meta releases four open-source language models...

harvey is reportedly raising usd 100 m at usd 1.5 ...

cloudflare introduces a new no-code feature to pre...

redactive raises usd 7.5 m to expand headcount and...

rapid7 acquires noetic cyber to help businesses fi...

runway ai aims for usd 450 m amid ai startup inter...

gen ai coding assistant startup magic ai aims to r...

anthropic introduces new program to fund enhanced ...

meta to open-source meta llm compiler for code opt...

role of machine learning in networking...

Research Suggests That the Accuracy of OpenAI’s GPT-4 May Have Decreased

Highlights:

Insights Desk

Related posts

Microsoft Introduces Bing Generative Search in Lim...

CytoReason Raises USD 80 M in the Funding Round In...

Google Unveils a Suite of New Features for AI Apps...

Kindo Reels in USD 20.6 M and Acquires WhiteRabbit...

Microsoft’s SpreadsheetLLM Enhances AI’s Compr...

HerculesAI Raises USD 26 M to Develop and Expand i...

Intel Capital Leads USD 15 M Investment in AI Cons...

AWS Unveils App Studio to Accelerate App Developme...

Captions LLC Raises USD 60 M for Generative Video ...

Enso Technologies Secures USD 6 M for SMB-focused ...

Our Brands