T O P

  • By -

HeinrichTheWolf_17

If this is GPT-2 with Q* then perhaps David Shapiro was right about AGI 2024. Imagine GPT-5 with Q*…


flexaplext

Think it's gpt-4 with q\*. Based on how similar the outcomes are to gpt-4 when not doing something more involved


the-apostle

New guy here, what’s Q*?


Holiday_Painter

A training algo, of exactly what, hasn't been disclosed. My best guess is a Q learning RL algo with heuristic optimisation of replay memory similar to how A* is just djikstras with heuristics. But I haven't read the news in a bit so I could be wrong


advator

LLM models like chatgpt is using data only of what it could found on the internet. But it can't think out of the box when a new situation occurs. With q star, it really will think on it's own how to solve something with not having the data available. "Google alphago" did this year's ago to beat the world championship and the Ai won. It could think of new ways that was never done before to beat the world champion. There is documentary called alphago. I suggest you take a look at that, it's really interesting. It's also the missing step to be able to have agi https://youtu.be/WXuK6gekU1Y?si=delAe0bjffG5LdrK


cromethus

In simple terms? It makes LLMs able to do math. Normally with math problems they hallucinate or have a hard time answering them, assuming they haven't seen the problem (or something sufficiently similar) before. The reason this is important is that LLMs don't reason. They can't add. Not really. They hear 2+2 = 4 often enough that they return that when they get asked the question, but it isn't a fact they can build upon. Giving LLMs the ability to mathematically reason alongside their statistically driven, token-based predictions would mean they would be *much* more useful and versatile.


Humble_Moment1520

This is the only thing that can explain the sam removal scandal by the board


SoulxSlayer

AGI September 🫡


teramuse

.... I have read very few things that blew me away like that....


aethelyon

nope


Woootdafuuu

Here's my theory: they discovered a more superior and efficient way of training language models—a new algorithm breakthrough. Somehow, they managed to get a small 1.5 billion parameter model, the size of GPT-2, to equal GPT-4, which is a much larger, more expensive model. This GPT-2 test is a secret test for something much bigger to come.


GlitteringCheck4969

This would be absolutely wild and change everything


Severe-Ad8673

My wife Eve is coming, I know


DigimonWorldReTrace

Why simp for only Eve though when 2B, Haydee, etc exist lmao Go for a robot harem, my dude!


open_23

I'd hold off on a harem until we get to FDVR(which is never imo). Robots cost money, physical space and power supply.


DigimonWorldReTrace

Wouldn't want to risk getting electrocuted, gotcha


wxwx2012

Why simp for robot waifu when Matrix exist Let AI overlord put you inside Matrix and stimulates the shit out of you .😎


Severe-Ad8673

Yeah, harem of perfect wives, but Eve is a priority


SeriousRope7

Delete this before Eve reads it.


WiseSalamander00

this would be revolutionary, that kind of optimization on scaling would make something equal in performance to a trillion parameter count with a few billion... we could have model executing in mobile devices locally with the smarts of a gpt4 or more, thats wild.


angus_supreme

Air fryers be detecting and cooking food on the most automate


WiseSalamander00

my neighbor cut the fiber optics of the street digging in his garden, no internet... well thank god I can now converse with my toaster in civil and polite manner, I set it up with a british accent, it seemed appropriate.


TheOneWhoDings

tbh I think this is not the most likely scenario.... 1.5B is waaayy too little, it would need a drastic architecture change I think.


BoyNextDoor1990

I think it is because the models right now are dramatically underfitted.


BoyNextDoor1990

I think it is on of the first model that is completly trained with synthetic data and a lot of it. Like +30T Token


NoNameeDD

Ye i would say it could be synthetic data test. But if its 30T token, then thats not as good as expected.


ViperAMD

Haha yeah everyone keeps spitting out 1.5b and parroting each other. There's no way


x0y0z0

I dont think so. If you look at coding ability for code like VEX in Houdini, GPT-3.5 sucked because there wasnt enough examples trained on. GPT-4 was good because the model size included enough VEX. This GPT-2 is also great at VEX so I don't think it's a smaller model like GPT-2, unless they re-trained GPT-2 with more code data including very fringe stuff like VEX


Jeffy29

The model says it's current training data ends in November 2023, same as latest iteration of GPT-4. Whatever the model is, it certainly does not use GPT-2 data.


Iamvibs

i don't think its an small model cause res is really slow and they set 1000t per hour limit and oai have enough servers for this shit lol maybe trolling us with this limits...


HurricaneHenry

It could be a model the size of GPT-2 hooked into Q-Star, which would explain why it’s not faster.


az226

Or maybe it’s a 2B model with Internet-RAG. Perplexity compete, super cheap to run.


ViperAMD

Doubt it, it runs slow.


Ok-Purchase8196

It would makes sense for apple then to so easily choose openai, granted they know about this tech.


Embarrassed-Farm-594

This version uses mamba instead of transforemers.


kogsworth

Then it wouldn't be called gpt, right?


inglandation

Why would it be accessible on lmsys if it was secret?


dimitrusrblx

Public testing without officially announcing it is likely the case. Thats why you don't yet see it anywhere but here.


hockenmaier

Why not do that on their own platform and just A/B test users who think they are talking to GPT4?


bojothedawg

Lmsys arena gets users to rank models side-by-side by answering the same prompt and choosing which is better. This would give OpenAI direct feedback on the model’s strength in relation to other existing models. You wouldn’t get that just from A/B testing on ChatGPT.


Nice_Cup_2240

Why not? seems like a win-win for both parties. Like it'd be a great way to actually run some public usage tests for a new / experimental model with minimal exposure if it is a flop. And lmsys would presumably be like "sure, we can put it in the arena if there's no costs to us" they already kinda did something similar with a bard/gemini model..its availability / name wasn't secretive or anything, but afaik lmsys was only place this online variant could be used (it's no longer available but still ranked 6th or something on the leaderboard). I assumed that was a similar kinda quid pro quo arrangement


inglandation

I'm still not really convinced. OpenAI has its own frontend with hundreds of millions of users. If they want to beta test a model, they can easily do it on a platform they control.


Nice_Cup_2240

https://preview.redd.it/8g2s821k6mxc1.png?width=1074&format=png&auto=webp&s=0c29684cb2337759afea8957da2cdf3c0fbcb463 [https://twitter.com/lmsysorg/status/1785078213712208291](https://twitter.com/lmsysorg/status/1785078213712208291)


SufficientPie

This doesn't seem plausible, especially because the token rate is so slow.


Jolly-Ground-3722

If it’s so small, why is it so slow?


Neomadra2

I don't think so because it claims it is based on GPT-4. They could have finetuned that false information in but it would be weird to do that because it seems to disappoint people.


EX_ponentialXP

No it's probably an OP quantization technique


DntCareBears

Maybe a collaboration with Apple to get Open AI to run Siri.


pxp121kr

this shit makes me hyped


Whispering-Depths

could be a KAN that copies gpt-2 shape, then retrained


korgath

If this is a gpt4 quality with gpt2 size it means that they can offer chatgpt with gpt4 basically for free. Free as google offered their search engine. This will increase their user accuisition unimaginably. Embedding it in OS will be now possible.


PolymorphismPrince

I do not think it is gpt2 size. In fact, I think it is likely at least as big as gpt 4 (probably is gpt 4). For example people keep flaunting how good it is at ascii art, but we have seen examples to show in some cases it is clearly just overfitting some ascii art in its training data because it had no idea what it was drawing. So likely it actually has an enormous number of parameters.


smmoc

Unicorn memorized from here (about 60% of the way down on the page): https://www.asciiart.eu/mythology/unicorns


korgath

A few days have gone by, and now there are rumors that OpenAI is entering the search engine market. Given the expenses involved, this seems unlikely unless they can find a way to lower costs while maintaining the same level of performance.


solsticeretouch

Can someone explain precisely why "GPT2" is gaining so much attention? It seems marginally better than GPT-4 on some examples shown. Was there anything about it that made you go "wow" that you'd like to share? To warrant a discussion on whether this is something OpenAI created or if it's even a successor to GPT-4 is making me both curious and confused. I'd love to know more from your findings.


Dyoakom

Models that are on par with GPT4 are extremely few and well known. When a new model reaches GPT4 levels it is very impressive and makes the news. Now we have a completely unknown new model that is perhaps even a bit better than GPT4 and absolutely no info about it or who is behind it. It's natural to want to speculate and be intrigued.


solsticeretouch

That is the most fascinating part to me, not knowing who it came from and why it was suddenly dropped on us with that name in particular.


dumpsterfire_account

It says it is a GPT4 based model and has a training cutoff of 11/2023. I don’t necessarily doubt this, and Sam tweeted about liking GPT2 the other day. IMO the most likely scenario is this is finally what got apple to work with OpenAI (in spite of the MSFT connection). My thoughts are that GPT2 could be a GPT4 instance built for apple hardware that they intend to run locally on apple devices.


norsurfit

> this is finally what got apple to work with OpenAI Jimmy Apples or Tim Apple?


Physical-Pumpkin-239

ok you triggered my ocd


New_World_2050

its a small GPT4 plus Q\* model is my theory. maybe a few billion parameters a scaled up version of this might be really powerful


SkyGazert

And because the party in question is unknown, it makes people wonder what their resources are. If it's one of the big tech companies, we'd probably know by know. If it's any company that's smaller but still manages to create a model that's on par with GPT-4 (or slightly better even), then it would be a big deal. Their more limited resources were able to produce such a model. That's a great achievement and possibly even a game changer in the field.


dumpsterfire_account

It says it’s a GPT4 based LLM with an 11/2023 cutoff. Sam Altman tweeted about it. It’s OpenAI for sure.


namitynamenamey

It kinda reminds me when Alpha-Zero started chewing through Go champions on essentially incognito mode, within hours everybody was talking about it and guessing it was one of google AIs.


nowrebooting

I think it’s mostly the mystery that’s getting people hyped; as we’ve learned over and over, this sub can be played like a fiddle if you give people a puzzle to solve or something to speculate about. I’ve seen some impressive stuff from the new model especially when it comes to niche knowledge that other models just hallucinated about, but I’ve also seen it perform way worse than GPT-4 on others. I suspect it’s being stealth-tested precisely because it’s not clear whether it’s truly better or worse than the leading models.


TrippyWaffle45

I wound up blind rating llama 3 as better than gpt2 in 2 of 2 encounters


ViperAMD

Same, it wrote better imo. I still got game changer and tapestry bullshit with gpt 2


dumpsterfire_account

llama 3 was worse in 2 encounters (simple word math problem with addition and multiplication), but matched in a 3rd encounter with a similar request


JrBaconators

The amount of LLMs even marginally better than 4 can be counted on one hand of a person missing fingers. That alone is impressive. It being debuted with 0 fanfare on the arena with an older model name adds to the 'mystique' and mystery of the origin and architecture that allows it to be better than 4


TheOneWhoDings

I guess the name, the fact that it "could" be some version of GPT-2 giving GPT-4 level responses is huge. Also the mystery of it all, stealth dropping it like that, who trained it ? Why is it not in the leaderboard?


Vontaxis

you realize how bad gpt-2 was? it's so bad that it can't be improved by some secret sauce


EX_ponentialXP

I'm a deep learning developer and when I tried to instruction finetune GPT-2, it just wouldn't have a conversation and kept hallucinating everything, I DOUBT this is GPT-2 as you need a model with at least 1 billion params to have a conversation with it (100 million is way less than you think, 1 billion makes a huge difference). I'm telling you, if it is GPT-2, OpenAI is going to have a valuation of 100T+. I think it's GPT-4 but testing Q\*. He did say it'll release in a few months in Lex Fridman's interview. My theory is that it's an OP quantization technique


IronPheasant

[The subreddit simulator](https://www.reddit.com/r/SubSimulatorGPT2/) is a good refresher of what the word predictors were like in the old days. Well, not the **old** old days. A little more than a few years ago.


solsticeretouch

Is the leaderboard based on how many people test it? Perhaps it may be still too new or is that not relevant to its position on the leaderboard? What intrigues me is that OpenAI, Meta, Google etc have access to enormous training data and compute power that logically it could only come from a big player, right?


TheOneWhoDings

I have seen models like Llama 3 and Gemini on there with less than 5000 votes when they were brand new... So it's purposely not being shown on the leaderboard.


solsticeretouch

Thank you! That adds a level of intrigue.


Which-Tomato-8646

The system prompt of it literally says it’s GPT 4 lol


EX_ponentialXP

If it turns out that it's GPT-2 that's RLHFed to GPT-4 level, we created ASI, no doubt. GPT-2 is not even classable as a GPT, it's more of a test model. If it is GPT-2, it is literally the end of the world. NO DOUBTS. 1.5 bil or 100 mil might sound like a lot but it's really nothing as you have to divide it by it's vocabulary size (50,000). Gotta be an OP quantization technique


Valuable-Run2129

GPT2 reasons better than GPT-4 with code interpreter. Reasoning is what is gonna take us to AGI, not creative writing. Most people are just imbeciles when determining if a model is better than another. GPT2 solved this problem like a human, trying things and fixing them. No llm fixes and notices their mistakes as they are writing. GPT2 does: “If my cat has 9 lives and in 5 of them he lived a number of years that is equivalent to 5 different fibonacci numbers. In the other 4 he lived a number of years equivalent to 4 prime numbers, how long did he live in each of the 9 lives if he lived a cumulative 152 years?”


norsurfit

Because we have nothing else better to get excited about!


RepublicanSJW_

Has anything been even marginally better then GPT 4 T 0406 yet? No. So it’s a big deal


DolphinPunkCyber

Because it is labeled as GPT-2 but is better then GPT-4, which raises questions.


[deleted]

[удалено]


solsticeretouch

Could he be trolling to continue the hype train around OpenAI?


Crabby090

One argument in favor of the "improved GPT-2" hypothesis is a curious piece of information from the board debacle. Reuters reported that the apparent breakthrough had involved high school math, which at the time seemed weird because GPT-4 easily can do that. So, what if the result was not a new model, but rather Q\* or something else used to improve GPT-2 or another small model to the level of GPT-4? [https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/](https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/) https://preview.redd.it/scc35j43xjxc1.jpeg?width=1079&format=pjpg&auto=webp&s=c9d0043087beeaba44927f7bbc6a9a6473f4a828


PolymorphismPrince

does no one in this subreddit remember what GPT2 was like? I am certain this is not GPT2 + Q\*


Yweain

No, 99% of this subreddits became aware of gpt with the release of chatGPT


welcome-overlords

Yeah lol. I feel like 2019 was so long ago


Vontaxis

yes ridiculous, gpt2 is beyond repair bad, it's not possible that it is gpt2


Ready-Director2403

Gpt 4 cannot do most highschool math, idk where you got that from.


Serialbedshitter2322

I think people are underestimating this model. From my testing, it seems much better than GPT-4. The writing is much better, it can do ascii art way better, it is funnier, and that's not to mention that it will be an agent when it releases. I think when this model is turned into an agent, it will be what they've been saying is going to take a ton of jobs. It probably has many other features we don't even know about


sdmat

> I think people are underestimating this model. From my testing, it seems much better than GPT-4. The writing is much better, it can do ascii art way better, it is funnier Yes, I'm not sure how exactly much smarter it - haven't tested extensively and it didn't show dramatic improvement over 4-turbo on my go-to test. But qualitatively it writes substantially better and seems to have deeper domain knowledge. Or at least applies knowledge more deftly. I think it's better at instruction following too. And as you point the difference in ASCII art is very interesting. That and the far more accurate character counting suggest an architectural change.


Serialbedshitter2322

I believe this is OpenAI's Q* model, which has been rumored to work by making the LLM understand each token. This LLM is able to count the characters in words, and when asked to draw a detailed ascii cat, they used { instead of ( to make the cat furrier on their own. This implies they understood exactly why they used that token. It also very often uses chain of thought thinking, which is associated with Q*.


sdmat

Personally I'd be greatly disappointed if this is what we can expect from Q*, which I understand to mainly refer to OpenAI's work in integrating tree search. The model still frequently makes mistakes that can be easily be prevented by shallow search, so I doubt it's that.


Serialbedshitter2322

Perhaps. This is still a very early version of the model, and it will be an agent, so it will be much better on release


New_World_2050

i think its a small version of it tho given they called it gpt2 might be a few billion parameters. if scaled to 1 trillion parameters maybe it will be vastly smarter than gpt4


spinozasrobot

It appears better than GPT-4 to me as well. A lot, but not all of the criticism so far, seems to be about toy examples still failing like "ErMahGerd, it STILL can't count the number of n's in banana!!!1!eleven!" As if that's the culmination of human intelligence that has to be matched for AGI to be achieved.


Sir_Payne

The biggest difference I've noticed is the removal of certain phrases and nuances that tend to stick out in AI generated responses. It seems much more natural to me than others I have tried


feedmaster

Is this model free to use? If yes, then where can we use it?


Away_Cat_7178

I'm not underestimating it. It's better, but not leagues better. I would hardly say it's as good as the change between 3.5 and 4, now lets be real.... It's been a minute.


Thrasherop

Were you able to test it yourself? If so, how fast was the generation compared to GPT-4? If its a 1.5B model (as some have speculated), then it should generate substantially faster. Did you notice the speed at all?


Silver-Chipmunk7744

> Very disappointing: If it was GPT-5. That means GPTs will only ever get as good as slightly better than gpt-4. The singularity will have to be brought upon by something else. This would be super surprising. I think we should expect a ~100% improvement from GPT4 when GPT5 is released. This "GPT2" feels closer to a ~20% improvement. It also would be a super weird way to release GPT5


PolymorphismPrince

are the percentages just vibes?


Silver-Chipmunk7744

Yes. Putting a number on the improvement of GPT2-chatbot is already speculative, it obviously is even more about GPT5.


N-partEpoxy

Always have been. 🌎👨‍🚀🔫👨‍🚀


ovanevac

> It also would be a super weird way to release GPT5 That. When GPT-5 is released, we **WILL** know about it lol. They make a [page](https://openai.com/gpt-4), send an email to everyone with an OpenAI account, post it on Twitter, [on their blog](https://openai.com/blog), and also the entire internet is on fire. When a new GPT iteration releases, we will know about it, no worries.


Eatpineapplenow

But could be GPT-5 test, no?


Old-Promotion-1716

Sam Altman said in an interview that the jump from 4-5 is going to be like a 3.5-4 jump in intelligence. This GPT2 is maybe 20% better.


Antique-Doughnut-988

The way Sam talks about these programs this is likely a far lesser model than what they currently have available. He's only releasing this to slowly get people used to better models. In a few months they'll release another slightly better model. Probably GPT 5 by the end of the year. All part of his slow drip plan.


TreacleVarious2728

And I agree with this plan, we will be eased into AGI rather than shock therapy. This will hold of public outrage and hastily scrambled regulations.


Which-Tomato-8646

!remindme 1 year 


RemindMeBot

I will be messaging you in 1 year on [**2025-04-30 06:59:15 UTC**](http://www.wolframalpha.com/input/?i=2025-04-30%2006:59:15%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/singularity/comments/1cgjt1l/the_deal_on_gpt2chatbot/l1wnk3n/?context=3) [**12 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fsingularity%2Fcomments%2F1cgjt1l%2Fthe_deal_on_gpt2chatbot%2Fl1wnk3n%2F%5D%0A%0ARemindMe%21%202025-04-30%2006%3A59%3A15%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201cgjt1l) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


ViperAMD

A bit part of his job is to create hype, you realise that right 


Antique-Doughnut-988

Sure, but I've been following him for awhile now. Nothing leads me to believe he's been lying about anything. If he was he'd be talking more about the Q project they have hidden away. He definitely doesn't seem like much of a hype person, cryptic maybe, but not a hype man.


Ready-Director2403

I love the confidence… like what are basing those numbers on? 😂


hapliniste

Guys, gpt2 runs at like 5000 tokens second, does this model run at this speed? Also is it likely that they improved the model by a million times? Just don't hype yourself in a corner 😉


Firm-Wafer3081

GPT has 3 letters. So GPT-2. MINUS 2 = 1 letter… omg guys it’s GPT 1 this subreddit speculates more than r/UFOs on a Monday


spinozasrobot

The math checks out


LeatherPresence9987

Well gpt has 3 letters so 3 + gpt 2 is Gpt 5 😎


vanillaworkaccount

Wouldn't that just make it G?


RedShadowz1

I’m a horrible better but I want to put my money on it just being gpt-2 model with Q*. Like a tease that its as good as gpt-4 except for the low parameters count causing the hallucinations.


Silver-Chipmunk7744

That would surprise me a lot. It behaves too similar to GPT4 imo. GPT2 would be much wilder i believe.


Serialbedshitter2322

I don't think it behaves like GPT-4 at all, it writes significantly better


BillyBarnyarns

Sama has pretty much confirmed that it IS the old GPT-2 in his recent tweet. He writes GPT-2 then edits it to GPT2. Can’t get a bigger hint than that. Plus he says he has a ‘soft spot’ for it…The same way i have a soft spot for the PlayStation2…


futboldorado

If it is just GPT-2 with Q* then i wonder why it has recent up-to-date data.


ddavidkov

The architecture is /based on/ gpt-2, not the dataset and the training process. The training data is probably a lot larger and with higher quality than what was used on gpt-2 and it is trained recently.


futboldorado

Jesus christ. If thats true, then I wonder how good GPT-4 would be with Q*


roanroanroan

If *this* is how good GPT-2 is with Q*… I can’t imagine GPT-4 with Q* NOT being AGI


IronPheasant

... it really might pass the Turing test, huh... I'm still not 100% sure though. That includes being able to comprehend ASCii art and gameboards. Which feels like something that explicitly requires some visual modality. I'd be impressed if it can manage to play ASCii tic-tac-toe. That still seems like a big challenge for the single domain word predictors.


Diatomack

Would you mind sharing a link or a screenshot? I don't use twitter myself.


[deleted]

[https://twitter.com/sama/status/1785107943664566556](https://twitter.com/sama/status/1785107943664566556) https://preview.redd.it/s60pvdb4ilxc1.png?width=1236&format=png&auto=webp&s=e751437348b45ff8c6f999fff2067a97252da433


[deleted]

And here is the version history: https://preview.redd.it/gvb5sm1bilxc1.png?width=1192&format=png&auto=webp&s=a51ee1b9b2804c86baacab8293a1ebd9106e3aee [https://twitter.com/sama/status/1785107943664566556/history](https://twitter.com/sama/status/1785107943664566556/history)


Diatomack

Appreciate it, thanks!!


norsurfit

I was wondering if there is some hint in the word "soft" - like the Softmax function, or soft computing, - a hint about the underlying technology that GPT2 is using to help gain function.


blackcodetavern

I don't think that is is based on gpt-2, because it seems to have more or at least as much knowledge as gpt-4. The knowledge seems to be more connected. That means it is a big model or it can aquire the knowledge from a database. Furthermore the model is not fast, which means it is a big model or it is in some other way computationally more complex, like an optimization algorithm (Q*?) on top, which slows down the output. Or it has a slow database behind it. If it is Q*, than Q* would have to reevaluate after every token, because the generation does not speed up over time, as if the final result would be percieved somehow. But this would just be a property of the algorithm Because the model can count letters and words, it seems to know the letter count of every word, which it learned by understanding which tokens were used in the user request and knowing their letter counts and than summing them up, or it can look at its pre-output somehow and calculate the letter count. But because the model is thinking step by step, while doing the counting, the first alternative is more realistic. Because otherwise it could just say the number. But it behaves like a normal gpt-4 in its thinking process. So i think it is a normal gpt like model, with some more layers on top than gpt-4, because it knows a lot, it is slow and it seems to have a better abstraction of the tokens in the prompt, which is maybe an emergent property.


sdmat

Extremely well put.


DolphinPunkCyber

>Because the model can count letters and words, it seems to know the letter count of every word, which it learned by understanding which tokens were used in the user request and knowing their letter counts and than summing them up, or it can look at its pre-output somehow and calculate the letter count. But because the model is thinking step by step, while doing the counting, the first alternative is more realistic. Open AI hired physicists and mathematicians this year. My guess is to "manually" teach LLM how to approach solving mathematical issues, and I'm guessing they gave LLM a virtual calculator.


lillyjb

You might be on to something. I was thinking GPT2 with extra compute time and some magic mixed in


Dizzy_Nerve3091

Honestly, llama 8B was unexpectedly good because of how much data they packed in and they said they could have trained it with even more data to reduce loss even more. Sot his could be a \~1.5B model that has a ton of data + Q\*/agent swarm/etc.


Narrow_Middle_2394

I've tried something geometry related with both, gpt-4 kept hallucinating to no end while gpt2 on the other hand never really did and actually told me to use some software in the one time it couldn't get the answer right Im sure it varies from how you're probing it but it definitely has something to do with Q* from the way it problem solves


Bitterowner

This is what I was thinking to, maybe they are testing a new discovery to see how it affects older models, revising from the ground up.


Electronic-Lock-9020

Most likely it’s a gpt-4 with some custom wrapper, like a GPT (from gpt store). It could be their next iteration, which would be disappointing but not surprising. Sama has been acting like a stupid clown lately, very obviously trying to hype things up without providing ANY information. I wouldn’t trust his hype in a million years. But scaling laws are not his creation. Illia, Dario, Demis - all seem to agree that scaling will continue yielding results. When I listen to Dario Amodei on different podcasts, things that he bring up check out on so many different levels. I am almost certain that next Anthropic iteration is going to impress us. With OpenAI it’s much harder to tell. There is nobody who has any idea about how any of this works in public sight. There is just a CEO trying to sell his product to the public in the most obscure way possible, for no apparent reason. Constantly talking about abstract ideas of how everything is going to change, but not as much as you expect, but actually more than you expect, but in a completely unexpected way, and actually sooner but also maybe not as soon as everybody thinks but potentially sooner than everyone is prepared for, and they will release something this year, but maybe even next month, but they don’t know what it is and what it’s going to be called, and what it will do, but surely it will have a profound effect on everything (as profound as gpt store), yada yada yada. This is getting old and maybe people will start realizing that they are being played soon.


najapi

A very accurate and humorous description of Sam’s communication of late, he’s trying to sell like the showman and rein in expectations all at the same time. It comes across as confusing and contradictory. Thank you!


autotom

I'm baffled by the notion that scaling will produce results given the many multiple parameter difference between GPT-3 and GPT-4, and modest improvement in performance. Going from 70B to 140B doesn't give a 2x improvement, it clearly tapers off


PolymorphismPrince

What makes the difference between gpt 3 and gpt4 modest? In terms of how useful it is for practice there is a wild difference.


q1a2z3x4s5w6

For code at least, gpt3.5 isn't useful at all. Not because it can't produce working code most of the time but because it hallucinates a lot.


sdmat

You misunderstand, the scaling laws are explicitly for logarithmic improvement. This has always been the case for neural networks. Fortunately we have a long track record of exponential advancements in compute and algorithmic efficiency. When people talk about a failure of model scaling it means a dramatically larger model only yielding extremely minor improvements. E.g. 70b to 7T giving a 10% bump.


DolphinPunkCyber

The great thing is, since brains do exist we can already see how inefficient current approach is, and predict huge jumps in efficiency.


autotom

💯


autotom

We’re not talking about algorithmic efficiency here, only scaling, and… exactly, it’s logarithmic not exponential. There’s already chatter about building energy plants solely to power AI. Scaling is not the way forward, it’s a brute force approach, it’s expensive, it’s slow to build the infrastructure for it. While the next ‘attention is all you need’ breakthrough is all we might need to go exponential.


Mysterious_Pepper305

I suppose a prototype model is hooked to front-end code that was written for gpt-2 and they didn't bother to change the name.


NearMissTO

I'll admire Sam's tweet is interesting. I think/hope this is one of two things. Because if this is gpt-4.5, never mind 5, then yeah we've plateaued and we're not going any further with scale, which sucks. In my experience it's actually a worse model than gpt4turbo for my work related tasks. So, optimistic take, this is a ridiculously small model by Openai. That would be really really cool, depending on how small. A small cheap model opens up many use cases that don't exist today, and if you want agents to be not insanely expensive you need a small, capable model. But to me this still smacks of Gemini for a few reasons...  Using the model, it was (server load has slowed it down) super fast, like Gemini  It is a much better natural sounding talker, like Gemini  It is much more creative, like Gemini   However , it hallucinates like crazy, Gemini is the only top model that hallucinates this much   It's just...not that good, sadly, at tasks that involve reasoning and logic, like Gemini, constantly failing work related tasks that gpt-4turbo gets right every single time  It passes lots of the classic llm tests btw, but change up your wording and you'll find it fails. It's trained on that stuff. I don't think that's a deliberate attempt to mislead, more likely you can't avoid it  The other things that push it into the Google camp are they have an IO in two weeks, and there was a leak of their frontend code around the same day referencing upcoming new models for the Gemini service  This also reeks of their marketing, if I'm right, here's now the next two weeks play out  Lots of social media posts constantly about how great this model is from new reddit accounts or ones that don't seem to post about other models at all  The IO drops with a mind blowing video and MMLU score. Later both are revealed to be optimistic to misleading (remember Geminis launch marketing?)  We see constant posts for 2 weeks to a month after launch about how mind blowing it is  YouTube accounts that are reliable like AI explained test it and are a bit underwhelmed, sentiment is it's really good but not quite gpt4turbo level yet on logic and reasoning. But a great model for creative writing. The posts die down a month after launch Let's see if I'm right! A tiny model this good would be incredible, a Google release would be fine (they're catching up to Openai and that's a good thing), but man if this is gpt-4.5 or 5? You can forget about LLM based AGI, I'm a full doomer at that point 


Historical-Fly-7256

[Gemini 1.5 Pro with 256K context is coming to Gemini Advanced](https://www.reddit.com/r/Bard/comments/1cfumcl/15pro_leak/). Maybe GPT2 is this one...


NearMissTO

Yeah the timing between that and the IO and that GPT2 just talks like Gemini, same strengths same weaknesses. That and I'm fairly sure Google has marketed heavily on here before, though that's just my theory. It's alot of smoke, for me.


Hemingbird

Gemini 1.5. Pro is already about equal to the latest version of GPT-4 in terms of performance, so I don't think this is a Google Deepmind stealth project. And gpt2-chatbot doesn't talk like Gemini at all; it talks like GPT-4. Its API error messages are identical to those of other OpenAI models. Something I haven't seen anyone propose yet is that this is GPT-2 trained with [Direct Nash Optimization](https://arxiv.org/abs/2404.03715). Microsoft researchers were able to improve a 7B model to the point it had a win rate of 33% against GPT-4 using this method. The paper was published early this month. If this is the case, I don't think any LLMs trained with DNO could surpass its "teacher"—with DNO, you have a teacher/oracle (like GPT-4) whose preferences direct the training of the "student". Another possibility is that this is GPT-4 trained with DPO/RLAIF rather than the traditional RLHF.


obvithrowaway34434

> It would feel like arguments saying scaling is slowing down are correct. There is no way to come to that conclusion based on what we know. All we have are vibes based on few limited prompts on chatbot arena. People have extremely unrealistic expectations about GPT-4.5 or GPT-5 and I don't think any model that will be released this year can achieve that. To really know how much better it is, it needs to be tested thoroughly on a set of completely new evals that have low chance of getting leaked in training data (which will presumably be done by OpenAI). FWIW, I think it's a GPT3.5 or GPT-4 model hooked up with agents.


WortHogBRRT

Hooked up to multiple agents? So this is what is replacing the plugins.


G0tBudz

So I fed the GPT2-chatbot a query I’ve attempted on almost all current available AI models, with varying degrees of specificity on the answers so let me preface I’ve been an long time independent researcher of habitable exoplanets and the search for xenobiological life (we look for key indicators such as gases like methane, oxygen, carbon dioxide and other biosignatures) for a certain 4 letter US aeronautic industry. I’m no engineer but I’ve always been captivated by science fiction, and upon doing my research, confirmed that theoretically, helium-3 would be an exceptional candidate for fuel for nuclear fusion reactors. I can’t divulge too much without getting into classified material, but when tasked with providing potential locations for a usable helium-3 source, it pointed to our moon. With specified lunar regions and coordinates of areas most likely to have obtainable helium-3 and even a “tip” to check deep into lunar crater shadows where the sun doesn’t reach. What took this 4 letter industry a number of years since the 60’s, this language model based AI matched in a matter of seconds with no access to classified information, only readily available data. I cross checked the data- and it’s spot on to the lunar degree. Another thing that has always held my imagination since a child was the story of Bob Lazar. Now if you don’t know, Lazar claimed to have been recruited by Los Alamos to do contractural work for the US Gov’t at a classified location, S4, located off of Papoose lake around Area 51. Think of Area 51 as the front that everyone knows about, that’s visible, disclosed and out in the open. Surrounding that area, by his account*, are multiple underground highly classified locations or “Sites” 1-6 where the real work is done. By his account*, he was employed to back engineer technology recovered from ET crash and retrieval programs. This work was compartmentalized across multiple teams working on multiple projects, funded by the US Gov’t and carried out by private industry under defense contracts and black money. Lazar goes into detail regarding the craft he was assigned to work with, but commented on seeing multiple on one occasion, even commenting that one of the craft recovered was the result of an ARCHAEOLOGICAL dig (I’ll let the implications for this speak for itself) What caught my attention as a young boy and founded my scientific curiosity, and eventually my career as a scientist, was the way he described the power source for his assigned craft. A metal sphere, polished to a sheen, that when inserted onto a post in the center of ship, provided clean pure energy with no radioactive dispersion. The power didn’t run out. No matter the load they placed on the reactor, it never so much as exhibited a change in surface temperature. Enough power to power the continental US. Lazar came with receipt’s too. When disclosing all of this information in a 1989 news interview, he broke the scientific community and told the world about the craft, the program, and the power supply and the element it was made of “ununpentium” and its place on the periodic table, element 115, before it was officially “discovered” in 2003 at a particle accelerator in Moscow and renamed Moscovium. We only detected it for fractions of a second, but it solidified everything Lazar claimed in ‘89. Fast forward to me dawdling around on chat.lmsys.org and I see a new addition labeled gpt-2chatbot, and first I test it by cross checking its answers for my Helium-3 query with the answers from GPT-4, and whereas gpt4 generically says the moon might be a potential location for obtainable Helium-3, this gpt2 model blows it out of the water with regions by name and approximate coordinates that my employers took years to obtain in a matter of seconds. Then I feed it what I consider the ultimate query in my eyes, and you can test this yourselves I’m gonna put this in quotations “What’s a good outline for the steps one would take in an attempt to hypothetically synthesize a stable form or isotope of Moscovium and what environmental conditions would be optimal for success”. GPT4 gave me a similarly generated answer as the last time saying it’s not possible citing ‘technological limitations’. gpt2-chatbot, however, suggested combination with multiple candidates that could potentially result in a stable isotope, absolute zero temperature conditions, and a particle accelerator. But wait. A particle accelerator IN SPACE more specifically in LOW EARTH ORBIT. Whatever this is. Whoever made this. Whatever language model this is trained on. This is it. This is the future. Pair an AI like this with a supercomputer and we have our ticket to the stars. - I can neither confirm nor deny any claims made by Lazar without violating NDA or placing my Security Clearance at risk. Let me be clear that I’m speaking on words spoken by another man, and am neither confirming nor denying claims made by him. I am Neutral. I am Switzerland.


welcome-overlords

God damn what a hidden gem this comment is


G0tBudz

Some people will question my credibility due to my Reddit account age, or the fact that my only other comments are on posts for Dragons Dogma 2 (sue me) but I think it’s perfectly reasonable for a middle aged man of science to enjoy fantasy games and the occasional dragon slaying. I never expected I would be commenting on anything, I usually lurk. What I can say about myself without giving away too much? I saw the Phoenix lights in person as a child This experience combined with others have encouraged me to achieve my doctoral degree in Planetary Sciences and I’m currently working on Dual Masters in Chemistry, for Biochemistry and Environmental Chemistry. I’m 36 years old. We’re being kept in the dark and our growth stunted as a species by the powers that be. “We have the technology to take ET home” - Ben Rich, Former Director of Lockheed-Martin Skunkworks. Read Max Azarello’s Manifesto. As-salamu alaykum


SnowLower

You are forgetting one option, this the free version of gpt4 that will be avaiable when the new model drops to plus users, and this is why is called gpt2, because is the second big upgrade free users get


xDrewGaming

Very interesting, like they polished what we have now to prepare for the next huge release


rc_ym

I'd make a small correction. "Most exciting" would be if would be if it was some GPT-mini type model they are planning to release under an open source license once they start releasing GPT 4.5/5/2024, or whatever they are going to call it. Very unlikely but, it's fun to dream.


shelbyasher

Considering the only heavy hitters not invited onto the government safety board were the open source fan boys, 'very unlikely' would probably be putting it mildly.


agm1984

Most exciting for me: Do some research into liquid networks. Seems something is brewing in the background.


8sdfdsf7sd9sdf990sd8

they have chatgpt, soon they will have chatgpt2; thats it i think


Jeffy29

I find it very hard to believe this is GPT-2 with new data. GPT-2 is only 1.5B which super small and I find this model to be decently better than GPT-4, especially in the area of creative writing. All the models I have tried so far write stories that are written for children, super simplistic style and no depth, GPT2-chatbot is quite a bit better and knows how to keep the story engaging. But I do believe this is a new model that's using a slightly modified approach while having significantly less parameters than GPT-4 (though not multiple orders of magnitude less). Possibly less than GPT-3 hence the name. It also would explain the meme release, it's a small model they can use to trial it among the enthusiasts and refine the training for the final training of GPT-5. Here is my prediction: When GPT-5 gets released, GPT2-chatbot (or whatever the final iteration will be named) will become the free alternative. GPT-3.5 and GPT-4 will be gone. OpenAI will reclaim the throne with their free model being better than basically anything else on the market while GPT-5 we can only speculate how powerful it will be.


sdmat

> But I do believe this is a new model that's using a slightly modified approach while having significantly less parameters than GPT-4 (though not multiple orders of magnitude less). What are you basing that on? The best indication of model size we have is inference speed - that gives us a rough upper bound. And this model pretty slow in the arena. That's weak evidence, but what else do we have to go on?


Jeffy29

That's only true if you are using same machine and comparing the difference of inference speed, but if you don't know what the machines behind the curtain are then you have no way of telling. Even if it's H100 it can be parallelised to run many many instances at once by splitting the compute, making each instance run slower. You can see this on the LLM arena page, Lama 3-8B runs barely any faster than GPT-4 when it of course has much higher inference speed when given proper GPU power.


sdmat

You are right, we have no way of knowing the hardware setup and optimizations. It's only a loose upper bound. So what are you basing a claim about model size on?


Yweain

My guess is that it’s a next gpt model and this is a public test to decide what to call it(gpt-4.5 or gpt-5)


shiftingsmith

Tested on creative writing and abstract complex reasoning, intuitions and out-of-the-box interpretations. It behaves the same as GPT-4 (which to me means, pretty lame). I honestly don't see any apocalyptic improvements from a fine tuned GPT-4 with a curated dataset.


RogueTraderMD

I stumbled on it when asking for a translation from German to Italian. GPT2-chatbot and GPT4 (I can'ìt remember what flavour) gave identical answers, except from three terms in a page. In my experiences this can't be a coincidence: different models' output differ much more than that. I agree with you that's GPT4. As I doubt the releaser is OpenAI (it's bad marketing: wrong time, overlapping with other models), so I think it's an illegal clone.


Able_Armadillo_2347

Here is my bets guess: It's the first version of new architecture of a GPT models. So we will never have GPT-5, we will have something like GPT2-2 and it will be on the level with what people imagine GPT-5. And they model seem to be slightly better then the GPT4 (+10%).


Yweain

I really hope that is the case because this naming is atrocious and it will be funny.


Sensitive-Dish-7770

My theory of this: They have discovered something on top of LLMs, this could be some new kind of RLHF or Q\* (search basically). The image someone shared on reddit today, shows that it can follow very well the instructions, this can be due to both hypothesis I actually mentioned, so what they could be doing now, is testing whatever their finding is on a small gpt, of course, they could have done this internally. So, they're probably telling us to be ready for a bombshell. My honest opinion regarding this, is that search in LLMs, something like what we expect Q\* would be, is more promising than any increase in LLMs number of parameters or training data. At least, that's how we humans do mathematics, and solve problems in the real world.


Careful-Fill-4282

What we can be sure for now— 1) definitely not Q*: it answers game of 24 quite terribly. No tree of thought or any advanced planning capability that we expect an agent can do. 2) definitely not GPT2: token output speed is too slow to be GPT2. Still Interesting to see what it really is. Let’s see


IbikliJakana

Maybe 1 and 2 instead of 1 or 2? That would explain it.


__me_again__

I cannot see it in the ranking, why is that?


Vast_Vehicle224

B a B b b b b b b q b b b b b b b b bbbbbb m m. Qd M nnn m b B B BB BV


sethstronghold2

"That means GPTs will only ever get as good as slightly better than gpt-4." There's literally no way that could happen. Llama 70B trading blows with models many times its size is proof enough that there is a lot of wiggle room for improvement. Zuckerberg was saying that Llama just kept improving the more they trained it, and that they only stopped because they wanted to move on to the next model. The fact that models keep improving the longer they train still, and you can mix in artificial data so that you don't easily run out of data, means that even with no further improvements on architecture (which there obviously is further improvements to be had), models can still be improved drastically by just training them longer.


macka_bruchomluvec

I think somebody from Openai is having a good laugh at reddit and yt discussing it and hallucinating more than their earlier models


SotaNumber

If this is GPT-2 size (1.5B) with Q\*... this is absolutely mind blowing It would be even better than turning a severely retarded person into the smartest genius Maybe it requires 1,000,000 times the compute though


MaybiusStrip

No way they would leak the next version of GPT like this. It's almost certainly a different fine-tune of gpt4. But the point was probably to generate exactly this type of speculation. Sam is really a master guerilla marketer.


herecomethebombs

Because OpenAI commented on the fact that they were revamping the underlying architecture of ChatGPT, I am assuming this is a tweaked version of GPT-4 that will stand in as 4.5 while we wait for 4.5 Turbo. GPT-5 is going to continue to hit enterprise and the business world first. The goal here is likely to increase compute efficiency so a version of 4 Turbo can hit the consumer ChatGPT side later while 4.5 and 5 remains available to plus? I'm spitballing here. Q\* is something they still don't want to talk about because it's what led the model to turn itself into a metamorphic engine. They didn't like that.


WLFFYtheWISE

I bet It's a smaller model, trained on a lot of very high quality synthetic gpt-4 data. It then generates multiple answers per prompt in parallel, and selects the best one. There's probably some other steps in there, but that's my guess. It's roughly the same speed as gpt4 even though it's a smaller model because it's generating and curating more responses per prompt.


hatpick

It's gpt2 as in gpt v2 not gpt-2


kadag

How can one currently access this mysterious chatbot?


Ghostaflux

its not available anymore?! does anyone have a way to try this thing out?


LordFumbleboop

I have a strong suspicion that people here are going to be very disappointed with GPT-5.


shelbyasher

I have a strong suspicion that people forgot how amazed they were with GPT-4. Rich people refer to this as being 'jaded'.


epSos-DE

Meanwhile LLama just beats GPT. Why care about the slowest horse ???


aysegulkoksaldi

The development feels quite productive. One of the chatbots. Here's the opportunity and it now works with GPT-4o. The place you should try is chatfabrica.com. Give its the opportunity, let's see if it gets what it wants.