As emotional as GPT-4 was at launch, some bystanders have observed that has lost some of its delicacy and power. These compliances have been posted online for months now, including on the OpenAI forums.
These passions have been out there for a while, but now we may eventually have evidence. A study conducted in collaboration with Stanford University and UC Berkeley suggests that GPT-4 has not bettered its response proficiency but has in fact gotten worse with further updates to the language model.
The study, called How Is ChatGPT’s geste Changing over Time?, tested the capability between GPT-4 and the previous language interpretation GPT-3.5 between March and June. Testing the two model performances with a data set of 500 problems, experimenters observed that GPT- 4 had a97.6 delicacy rate in March with 488 correct answers and a2.4 delicacy rate in June after GPT- 4 had gone through some updates. The model produced only 12 correct answers months latterly.

Another test used by experimenters was a chain-of-study fashion, in which they asked GPT- 4 Is 17,077 a high number? A question of logic. Not only did GPT- 4 inaptly answer no, it gave no explanation as to how it came to this conclusion, according to experimenters.
The study comes just six days after an OpenAI superintendent tried to quell reservations that GPT-4 was, in fact, getting dumber. The tweet below implies that the decline in the quality of answers is a cerebral miracle from being a heavy stoner.
especially, GPT- 4 is presently available for inventors or paid members through ChatGPT Plus. To ask the same question to GPT-3.5 through the ChatGPT free exploration exercise as I did, gets you not only the correct answer but also a detailed explanation of the fine process.
also, law generation has suffered with inventors at LeetCode having seen the performance of GPT- 4 on its dataset of 50 easy problems drop from 52 delicacy to 10 delicacy between March and June.
To add energy to the fire, Twitter judge,@svpino noted that there are rumors that OpenAI might be using “ lower and specialized GPT- 4 models that act also to a large model but are less precious to run. ”
This cheaper and faster option might be leading to a drop in the quality of GPT- 4 responses at a pivotal time when the parent company has numerous other large associations depending on its technology for collaboration.
Not everyone thinks the study proves anything, however. Some have made the point that a change in geste doesn’t equate to a reduction in capability. This is conceded in the study itself, stating that “ a model that has a capability may or may not display that capability in response to a particular advisement. ” In other words, getting the asked result may bear different types of prompts from the stoner.
When GPT-4 was first blazoned OpenAI detailed its use of Microsoft Azure AI supercomputers to train the language model for six months, claiming that the result was a 40 advanced liability of generating the “ asked information from stoner prompts. ”
ChatGPT, grounded on the GPT-3.5 LLM, was formerly known for having its information challenges, similar as having limited knowledge of world events after 2021, which could lead it to fill in gaps with incorrect data. still, information retrogression appears to be a fully new problem noway seen before with the service. druggies were looking forward to updates to address the accepted issues.
CEO of OpenAI, Sam Altman lately expressed his disappointment in a tweet in the wake of the Federal Trade Commission launching a disquisition into whether ChatGPT has violated consumer protection laws.
“ We’re transparent about the limitations of our technology, especially when we fall suddenly. And our limited-gains structure means we aren’t incentivized to make unlimited returns, ” he twittered.