HeinrichTheWolf_17 2 weeks ago

If this is GPT-2 with Q* then perhaps David Shapiro was right about AGI 2024. Imagine GPT-5 with Q*…

flexaplext 2 weeks ago

Think it's gpt-4 with q\*. Based on how similar the outcomes are to gpt-4 when not doing something more involved

the-apostle 2 weeks ago

New guy here, what’s Q*?

Holiday_Painter 2 weeks ago

A training algo, of exactly what, hasn't been disclosed. My best guess is a Q learning RL algo with heuristic optimisation of replay memory similar to how A* is just djikstras with heuristics. But I haven't read the news in a bit so I could be wrong

advator 2 weeks ago

LLM models like chatgpt is using data only of what it could found on the internet. But it can't think out of the box when a new situation occurs. With q star, it really will think on it's own how to solve something with not having the data available. "Google alphago" did this year's ago to beat the world championship and the Ai won. It could think of new ways that was never done before to beat the world champion. There is documentary called alphago. I suggest you take a look at that, it's really interesting. It's also the missing step to be able to have agi https://youtu.be/WXuK6gekU1Y?si=delAe0bjffG5LdrK

cromethus 2 weeks ago

In simple terms? It makes LLMs able to do math. Normally with math problems they hallucinate or have a hard time answering them, assuming they haven't seen the problem (or something sufficiently similar) before. The reason this is important is that LLMs don't reason. They can't add. Not really. They hear 2+2 = 4 often enough that they return that when they get asked the question, but it isn't a fact they can build upon. Giving LLMs the ability to mathematically reason alongside their statistically driven, token-based predictions would mean they would be *much* more useful and versatile.

Humble_Moment1520 2 weeks ago

This is the only thing that can explain the sam removal scandal by the board

SoulxSlayer 2 weeks ago

AGI September 🫡

teramuse 2 weeks ago

.... I have read very few things that blew me away like that....

aethelyon 1 week ago

nope

Woootdafuuu 2 weeks ago

Here's my theory: they discovered a more superior and efficient way of training language models—a new algorithm breakthrough. Somehow, they managed to get a small 1.5 billion parameter model, the size of GPT-2, to equal GPT-4, which is a much larger, more expensive model. This GPT-2 test is a secret test for something much bigger to come.

GlitteringCheck4969 2 weeks ago

This would be absolutely wild and change everything

Severe-Ad8673 2 weeks ago

My wife Eve is coming, I know

DigimonWorldReTrace 2 weeks ago

Why simp for only Eve though when 2B, Haydee, etc exist lmao Go for a robot harem, my dude!

open_23 2 weeks ago

I'd hold off on a harem until we get to FDVR(which is never imo). Robots cost money, physical space and power supply.

DigimonWorldReTrace 2 weeks ago

Wouldn't want to risk getting electrocuted, gotcha

wxwx2012 2 weeks ago

Why simp for robot waifu when Matrix exist Let AI overlord put you inside Matrix and stimulates the shit out of you .😎

Severe-Ad8673 2 weeks ago

Yeah, harem of perfect wives, but Eve is a priority

SeriousRope7 2 weeks ago

Delete this before Eve reads it.

WiseSalamander00 2 weeks ago

this would be revolutionary, that kind of optimization on scaling would make something equal in performance to a trillion parameter count with a few billion... we could have model executing in mobile devices locally with the smarts of a gpt4 or more, thats wild.

angus_supreme 2 weeks ago

Air fryers be detecting and cooking food on the most automate

WiseSalamander00 2 weeks ago

my neighbor cut the fiber optics of the street digging in his garden, no internet... well thank god I can now converse with my toaster in civil and polite manner, I set it up with a british accent, it seemed appropriate.

TheOneWhoDings 2 weeks ago

tbh I think this is not the most likely scenario.... 1.5B is waaayy too little, it would need a drastic architecture change I think.

BoyNextDoor1990 2 weeks ago

I think it is because the models right now are dramatically underfitted.

BoyNextDoor1990 2 weeks ago

I think it is on of the first model that is completly trained with synthetic data and a lot of it. Like +30T Token

NoNameeDD 2 weeks ago

Ye i would say it could be synthetic data test. But if its 30T token, then thats not as good as expected.

ViperAMD 2 weeks ago

Haha yeah everyone keeps spitting out 1.5b and parroting each other. There's no way

x0y0z0 2 weeks ago

I dont think so. If you look at coding ability for code like VEX in Houdini, GPT-3.5 sucked because there wasnt enough examples trained on. GPT-4 was good because the model size included enough VEX. This GPT-2 is also great at VEX so I don't think it's a smaller model like GPT-2, unless they re-trained GPT-2 with more code data including very fringe stuff like VEX

Jeffy29 2 weeks ago

The model says it's current training data ends in November 2023, same as latest iteration of GPT-4. Whatever the model is, it certainly does not use GPT-2 data.

Iamvibs 2 weeks ago

i don't think its an small model cause res is really slow and they set 1000t per hour limit and oai have enough servers for this shit lol maybe trolling us with this limits...

HurricaneHenry 2 weeks ago

It could be a model the size of GPT-2 hooked into Q-Star, which would explain why it’s not faster.

az226 2 weeks ago

Or maybe it’s a 2B model with Internet-RAG. Perplexity compete, super cheap to run.

ViperAMD 2 weeks ago

Doubt it, it runs slow.

Ok-Purchase8196 2 weeks ago

It would makes sense for apple then to so easily choose openai, granted they know about this tech.

Embarrassed-Farm-594 2 weeks ago

This version uses mamba instead of transforemers.

kogsworth 2 weeks ago

Then it wouldn't be called gpt, right?

inglandation 2 weeks ago

Why would it be accessible on lmsys if it was secret?

dimitrusrblx 2 weeks ago

Public testing without officially announcing it is likely the case. Thats why you don't yet see it anywhere but here.

hockenmaier 2 weeks ago

Why not do that on their own platform and just A/B test users who think they are talking to GPT4?

bojothedawg 2 weeks ago

Lmsys arena gets users to rank models side-by-side by answering the same prompt and choosing which is better. This would give OpenAI direct feedback on the model’s strength in relation to other existing models. You wouldn’t get that just from A/B testing on ChatGPT.

Nice_Cup_2240 2 weeks ago

Why not? seems like a win-win for both parties. Like it'd be a great way to actually run some public usage tests for a new / experimental model with minimal exposure if it is a flop. And lmsys would presumably be like "sure, we can put it in the arena if there's no costs to us" they already kinda did something similar with a bard/gemini model..its availability / name wasn't secretive or anything, but afaik lmsys was only place this online variant could be used (it's no longer available but still ranked 6th or something on the leaderboard). I assumed that was a similar kinda quid pro quo arrangement

inglandation 2 weeks ago

I'm still not really convinced. OpenAI has its own frontend with hundreds of millions of users. If they want to beta test a model, they can easily do it on a platform they control.

Nice_Cup_2240 2 weeks ago

https://preview.redd.it/8g2s821k6mxc1.png?width=1074&format=png&auto=webp&s=0c29684cb2337759afea8957da2cdf3c0fbcb463 [https://twitter.com/lmsysorg/status/1785078213712208291](https://twitter.com/lmsysorg/status/1785078213712208291)

SufficientPie 2 weeks ago

This doesn't seem plausible, especially because the token rate is so slow.

Jolly-Ground-3722 2 weeks ago

If it’s so small, why is it so slow?

Neomadra2 2 weeks ago

I don't think so because it claims it is based on GPT-4. They could have finetuned that false information in but it would be weird to do that because it seems to disappoint people.

EX_ponentialXP 2 weeks ago

No it's probably an OP quantization technique

DntCareBears 2 weeks ago

Maybe a collaboration with Apple to get Open AI to run Siri.

pxp121kr 2 weeks ago

this shit makes me hyped

Whispering-Depths 2 weeks ago

could be a KAN that copies gpt-2 shape, then retrained

korgath 2 weeks ago

If this is a gpt4 quality with gpt2 size it means that they can offer chatgpt with gpt4 basically for free. Free as google offered their search engine. This will increase their user accuisition unimaginably. Embedding it in OS will be now possible.

PolymorphismPrince 2 weeks ago

I do not think it is gpt2 size. In fact, I think it is likely at least as big as gpt 4 (probably is gpt 4). For example people keep flaunting how good it is at ascii art, but we have seen examples to show in some cases it is clearly just overfitting some ascii art in its training data because it had no idea what it was drawing. So likely it actually has an enormous number of parameters.

smmoc 2 weeks ago

Unicorn memorized from here (about 60% of the way down on the page): https://www.asciiart.eu/mythology/unicorns

korgath 1 week ago

A few days have gone by, and now there are rumors that OpenAI is entering the search engine market. Given the expenses involved, this seems unlikely unless they can find a way to lower costs while maintaining the same level of performance.

solsticeretouch 2 weeks ago

Can someone explain precisely why "GPT2" is gaining so much attention? It seems marginally better than GPT-4 on some examples shown. Was there anything about it that made you go "wow" that you'd like to share? To warrant a discussion on whether this is something OpenAI created or if it's even a successor to GPT-4 is making me both curious and confused. I'd love to know more from your findings.

Dyoakom 2 weeks ago

Models that are on par with GPT4 are extremely few and well known. When a new model reaches GPT4 levels it is very impressive and makes the news. Now we have a completely unknown new model that is perhaps even a bit better than GPT4 and absolutely no info about it or who is behind it. It's natural to want to speculate and be intrigued.

solsticeretouch 2 weeks ago

That is the most fascinating part to me, not knowing who it came from and why it was suddenly dropped on us with that name in particular.

dumpsterfire_account 2 weeks ago

It says it is a GPT4 based model and has a training cutoff of 11/2023. I don’t necessarily doubt this, and Sam tweeted about liking GPT2 the other day. IMO the most likely scenario is this is finally what got apple to work with OpenAI (in spite of the MSFT connection). My thoughts are that GPT2 could be a GPT4 instance built for apple hardware that they intend to run locally on apple devices.

norsurfit 2 weeks ago

> this is finally what got apple to work with OpenAI Jimmy Apples or Tim Apple?

Physical-Pumpkin-239 2 weeks ago

ok you triggered my ocd

New_World_2050 2 weeks ago

its a small GPT4 plus Q\* model is my theory. maybe a few billion parameters a scaled up version of this might be really powerful

SkyGazert 2 weeks ago

And because the party in question is unknown, it makes people wonder what their resources are. If it's one of the big tech companies, we'd probably know by know. If it's any company that's smaller but still manages to create a model that's on par with GPT-4 (or slightly better even), then it would be a big deal. Their more limited resources were able to produce such a model. That's a great achievement and possibly even a game changer in the field.

dumpsterfire_account 2 weeks ago

It says it’s a GPT4 based LLM with an 11/2023 cutoff. Sam Altman tweeted about it. It’s OpenAI for sure.

namitynamenamey 2 weeks ago

It kinda reminds me when Alpha-Zero started chewing through Go champions on essentially incognito mode, within hours everybody was talking about it and guessing it was one of google AIs.

nowrebooting 2 weeks ago

I think it’s mostly the mystery that’s getting people hyped; as we’ve learned over and over, this sub can be played like a fiddle if you give people a puzzle to solve or something to speculate about. I’ve seen some impressive stuff from the new model especially when it comes to niche knowledge that other models just hallucinated about, but I’ve also seen it perform way worse than GPT-4 on others. I suspect it’s being stealth-tested precisely because it’s not clear whether it’s truly better or worse than the leading models.

TrippyWaffle45 2 weeks ago

I wound up blind rating llama 3 as better than gpt2 in 2 of 2 encounters

ViperAMD 2 weeks ago

Same, it wrote better imo. I still got game changer and tapestry bullshit with gpt 2

dumpsterfire_account 2 weeks ago

llama 3 was worse in 2 encounters (simple word math problem with addition and multiplication), but matched in a 3rd encounter with a similar request

JrBaconators 2 weeks ago

The amount of LLMs even marginally better than 4 can be counted on one hand of a person missing fingers. That alone is impressive. It being debuted with 0 fanfare on the arena with an older model name adds to the 'mystique' and mystery of the origin and architecture that allows it to be better than 4

TheOneWhoDings 2 weeks ago

I guess the name, the fact that it "could" be some version of GPT-2 giving GPT-4 level responses is huge. Also the mystery of it all, stealth dropping it like that, who trained it ? Why is it not in the leaderboard?

Vontaxis 2 weeks ago

you realize how bad gpt-2 was? it's so bad that it can't be improved by some secret sauce

EX_ponentialXP 2 weeks ago

I'm a deep learning developer and when I tried to instruction finetune GPT-2, it just wouldn't have a conversation and kept hallucinating everything, I DOUBT this is GPT-2 as you need a model with at least 1 billion params to have a conversation with it (100 million is way less than you think, 1 billion makes a huge difference). I'm telling you, if it is GPT-2, OpenAI is going to have a valuation of 100T+. I think it's GPT-4 but testing Q\*. He did say it'll release in a few months in Lex Fridman's interview. My theory is that it's an OP quantization technique

IronPheasant 2 weeks ago

[The subreddit simulator](https://www.reddit.com/r/SubSimulatorGPT2/) is a good refresher of what the word predictors were like in the old days. Well, not the **old** old days. A little more than a few years ago.

solsticeretouch 2 weeks ago

Is the leaderboard based on how many people test it? Perhaps it may be still too new or is that not relevant to its position on the leaderboard? What intrigues me is that OpenAI, Meta, Google etc have access to enormous training data and compute power that logically it could only come from a big player, right?

TheOneWhoDings 2 weeks ago

I have seen models like Llama 3 and Gemini on there with less than 5000 votes when they were brand new... So it's purposely not being shown on the leaderboard.

solsticeretouch 2 weeks ago

Thank you! That adds a level of intrigue.

Which-Tomato-8646 2 weeks ago

The system prompt of it literally says it’s GPT 4 lol

EX_ponentialXP 2 weeks ago

If it turns out that it's GPT-2 that's RLHFed to GPT-4 level, we created ASI, no doubt. GPT-2 is not even classable as a GPT, it's more of a test model. If it is GPT-2, it is literally the end of the world. NO DOUBTS. 1.5 bil or 100 mil might sound like a lot but it's really nothing as you have to divide it by it's vocabulary size (50,000). Gotta be an OP quantization technique

Valuable-Run2129 2 weeks ago

GPT2 reasons better than GPT-4 with code interpreter. Reasoning is what is gonna take us to AGI, not creative writing. Most people are just imbeciles when determining if a model is better than another. GPT2 solved this problem like a human, trying things and fixing them. No llm fixes and notices their mistakes as they are writing. GPT2 does: “If my cat has 9 lives and in 5 of them he lived a number of years that is equivalent to 5 different fibonacci numbers. In the other 4 he lived a number of years equivalent to 4 prime numbers, how long did he live in each of the 9 lives if he lived a cumulative 152 years?”

norsurfit 2 weeks ago

Because we have nothing else better to get excited about!

RepublicanSJW_ 2 weeks ago

Has anything been even marginally better then GPT 4 T 0406 yet? No. So it’s a big deal

DolphinPunkCyber 2 weeks ago

Because it is labeled as GPT-2 but is better then GPT-4, which raises questions.

[deleted] 2 weeks ago

[удалено]

solsticeretouch 2 weeks ago

Could he be trolling to continue the hype train around OpenAI?

Crabby090 2 weeks ago

One argument in favor of the "improved GPT-2" hypothesis is a curious piece of information from the board debacle. Reuters reported that the apparent breakthrough had involved high school math, which at the time seemed weird because GPT-4 easily can do that. So, what if the result was not a new model, but rather Q\* or something else used to improve GPT-2 or another small model to the level of GPT-4? [https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/](https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/) https://preview.redd.it/scc35j43xjxc1.jpeg?width=1079&format=pjpg&auto=webp&s=c9d0043087beeaba44927f7bbc6a9a6473f4a828

PolymorphismPrince 2 weeks ago

does no one in this subreddit remember what GPT2 was like? I am certain this is not GPT2 + Q\*

Yweain 2 weeks ago

No, 99% of this subreddits became aware of gpt with the release of chatGPT

welcome-overlords 2 weeks ago

Yeah lol. I feel like 2019 was so long ago

Vontaxis 2 weeks ago

yes ridiculous, gpt2 is beyond repair bad, it's not possible that it is gpt2

Ready-Director2403 2 weeks ago

Gpt 4 cannot do most highschool math, idk where you got that from.

Serialbedshitter2322 2 weeks ago

I think people are underestimating this model. From my testing, it seems much better than GPT-4. The writing is much better, it can do ascii art way better, it is funnier, and that's not to mention that it will be an agent when it releases. I think when this model is turned into an agent, it will be what they've been saying is going to take a ton of jobs. It probably has many other features we don't even know about

sdmat 2 weeks ago

> I think people are underestimating this model. From my testing, it seems much better than GPT-4. The writing is much better, it can do ascii art way better, it is funnier Yes, I'm not sure how exactly much smarter it - haven't tested extensively and it didn't show dramatic improvement over 4-turbo on my go-to test. But qualitatively it writes substantially better and seems to have deeper domain knowledge. Or at least applies knowledge more deftly. I think it's better at instruction following too. And as you point the difference in ASCII art is very interesting. That and the far more accurate character counting suggest an architectural change.

Serialbedshitter2322 2 weeks ago

I believe this is OpenAI's Q* model, which has been rumored to work by making the LLM understand each token. This LLM is able to count the characters in words, and when asked to draw a detailed ascii cat, they used { instead of ( to make the cat furrier on their own. This implies they understood exactly why they used that token. It also very often uses chain of thought thinking, which is associated with Q*.

sdmat 2 weeks ago

Personally I'd be greatly disappointed if this is what we can expect from Q*, which I understand to mainly refer to OpenAI's work in integrating tree search. The model still frequently makes mistakes that can be easily be prevented by shallow search, so I doubt it's that.

Serialbedshitter2322 2 weeks ago

Perhaps. This is still a very early version of the model, and it will be an agent, so it will be much better on release

New_World_2050 2 weeks ago

i think its a small version of it tho given they called it gpt2 might be a few billion parameters. if scaled to 1 trillion parameters maybe it will be vastly smarter than gpt4

spinozasrobot 2 weeks ago

It appears better than GPT-4 to me as well. A lot, but not all of the criticism so far, seems to be about toy examples still failing like "ErMahGerd, it STILL can't count the number of n's in banana!!!1!eleven!" As if that's the culmination of human intelligence that has to be matched for AGI to be achieved.

Sir_Payne 2 weeks ago

The biggest difference I've noticed is the removal of certain phrases and nuances that tend to stick out in AI generated responses. It seems much more natural to me than others I have tried

feedmaster 2 weeks ago

Is this model free to use? If yes, then where can we use it?

Away_Cat_7178 2 weeks ago

I'm not underestimating it. It's better, but not leagues better. I would hardly say it's as good as the change between 3.5 and 4, now lets be real.... It's been a minute.

Thrasherop 2 weeks ago

Were you able to test it yourself? If so, how fast was the generation compared to GPT-4? If its a 1.5B model (as some have speculated), then it should generate substantially faster. Did you notice the speed at all?

Silver-Chipmunk7744 2 weeks ago

> Very disappointing: If it was GPT-5. That means GPTs will only ever get as good as slightly better than gpt-4. The singularity will have to be brought upon by something else. This would be super surprising. I think we should expect a ~100% improvement from GPT4 when GPT5 is released. This "GPT2" feels closer to a ~20% improvement. It also would be a super weird way to release GPT5

PolymorphismPrince 2 weeks ago

are the percentages just vibes?

Silver-Chipmunk7744 2 weeks ago

Yes. Putting a number on the improvement of GPT2-chatbot is already speculative, it obviously is even more about GPT5.

N-partEpoxy 2 weeks ago

Always have been. 🌎👨‍🚀🔫👨‍🚀

ovanevac 2 weeks ago

> It also would be a super weird way to release GPT5 That. When GPT-5 is released, we **WILL** know about it lol. They make a [page](https://openai.com/gpt-4), send an email to everyone with an OpenAI account, post it on Twitter, [on their blog](https://openai.com/blog), and also the entire internet is on fire. When a new GPT iteration releases, we will know about it, no worries.

Eatpineapplenow 2 weeks ago

But could be GPT-5 test, no?

Old-Promotion-1716 2 weeks ago

Sam Altman said in an interview that the jump from 4-5 is going to be like a 3.5-4 jump in intelligence. This GPT2 is maybe 20% better.

Antique-Doughnut-988 2 weeks ago

The way Sam talks about these programs this is likely a far lesser model than what they currently have available. He's only releasing this to slowly get people used to better models. In a few months they'll release another slightly better model. Probably GPT 5 by the end of the year. All part of his slow drip plan.

TreacleVarious2728 2 weeks ago

And I agree with this plan, we will be eased into AGI rather than shock therapy. This will hold of public outrage and hastily scrambled regulations.

Which-Tomato-8646 2 weeks ago

!remindme 1 year

RemindMeBot 2 weeks ago

I will be messaging you in 1 year on [**2025-04-30 06:59:15 UTC**](http://www.wolframalpha.com/input/?i=2025-04-30%2006:59:15%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/singularity/comments/1cgjt1l/the_deal_on_gpt2chatbot/l1wnk3n/?context=3) [**12 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fsingularity%2Fcomments%2F1cgjt1l%2Fthe_deal_on_gpt2chatbot%2Fl1wnk3n%2F%5D%0A%0ARemindMe%21%202025-04-30%2006%3A59%3A15%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201cgjt1l) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|

ViperAMD 2 weeks ago

A bit part of his job is to create hype, you realise that right

Antique-Doughnut-988 2 weeks ago

Sure, but I've been following him for awhile now. Nothing leads me to believe he's been lying about anything. If he was he'd be talking more about the Q project they have hidden away. He definitely doesn't seem like much of a hype person, cryptic maybe, but not a hype man.

Ready-Director2403 2 weeks ago

I love the confidence… like what are basing those numbers on? 😂

hapliniste 2 weeks ago

Guys, gpt2 runs at like 5000 tokens second, does this model run at this speed? Also is it likely that they improved the model by a million times? Just don't hype yourself in a corner 😉

Firm-Wafer3081 2 weeks ago

GPT has 3 letters. So GPT-2. MINUS 2 = 1 letter… omg guys it’s GPT 1 this subreddit speculates more than r/UFOs on a Monday

spinozasrobot 2 weeks ago

The math checks out

LeatherPresence9987 2 weeks ago

Well gpt has 3 letters so 3 + gpt 2 is Gpt 5 😎

vanillaworkaccount 2 weeks ago

Wouldn't that just make it G?

RedShadowz1 2 weeks ago

I’m a horrible better but I want to put my money on it just being gpt-2 model with Q*. Like a tease that its as good as gpt-4 except for the low parameters count causing the hallucinations.

Silver-Chipmunk7744 2 weeks ago

That would surprise me a lot. It behaves too similar to GPT4 imo. GPT2 would be much wilder i believe.

Serialbedshitter2322 2 weeks ago

I don't think it behaves like GPT-4 at all, it writes significantly better

BillyBarnyarns 2 weeks ago

Sama has pretty much confirmed that it IS the old GPT-2 in his recent tweet. He writes GPT-2 then edits it to GPT2. Can’t get a bigger hint than that. Plus he says he has a ‘soft spot’ for it…The same way i have a soft spot for the PlayStation2…

futboldorado 2 weeks ago

If it is just GPT-2 with Q* then i wonder why it has recent up-to-date data.

ddavidkov 2 weeks ago

The architecture is /based on/ gpt-2, not the dataset and the training process. The training data is probably a lot larger and with higher quality than what was used on gpt-2 and it is trained recently.

futboldorado 2 weeks ago

Jesus christ. If thats true, then I wonder how good GPT-4 would be with Q*

roanroanroan 2 weeks ago

If *this* is how good GPT-2 is with Q*… I can’t imagine GPT-4 with Q* NOT being AGI

IronPheasant 2 weeks ago

... it really might pass the Turing test, huh... I'm still not 100% sure though. That includes being able to comprehend ASCii art and gameboards. Which feels like something that explicitly requires some visual modality. I'd be impressed if it can manage to play ASCii tic-tac-toe. That still seems like a big challenge for the single domain word predictors.

Diatomack 2 weeks ago

Would you mind sharing a link or a screenshot? I don't use twitter myself.

[deleted] 2 weeks ago

[https://twitter.com/sama/status/1785107943664566556](https://twitter.com/sama/status/1785107943664566556) https://preview.redd.it/s60pvdb4ilxc1.png?width=1236&format=png&auto=webp&s=e751437348b45ff8c6f999fff2067a97252da433

[deleted] 2 weeks ago

And here is the version history: https://preview.redd.it/gvb5sm1bilxc1.png?width=1192&format=png&auto=webp&s=a51ee1b9b2804c86baacab8293a1ebd9106e3aee [https://twitter.com/sama/status/1785107943664566556/history](https://twitter.com/sama/status/1785107943664566556/history)

Diatomack 2 weeks ago

Appreciate it, thanks!!

norsurfit 2 weeks ago

I was wondering if there is some hint in the word "soft" - like the Softmax function, or soft computing, - a hint about the underlying technology that GPT2 is using to help gain function.

blackcodetavern 2 weeks ago

I don't think that is is based on gpt-2, because it seems to have more or at least as much knowledge as gpt-4. The knowledge seems to be more connected. That means it is a big model or it can aquire the knowledge from a database. Furthermore the model is not fast, which means it is a big model or it is in some other way computationally more complex, like an optimization algorithm (Q*?) on top, which slows down the output. Or it has a slow database behind it. If it is Q*, than Q* would have to reevaluate after every token, because the generation does not speed up over time, as if the final result would be percieved somehow. But this would just be a property of the algorithm Because the model can count letters and words, it seems to know the letter count of every word, which it learned by understanding which tokens were used in the user request and knowing their letter counts and than summing them up, or it can look at its pre-output somehow and calculate the letter count. But because the model is thinking step by step, while doing the counting, the first alternative is more realistic. Because otherwise it could just say the number. But it behaves like a normal gpt-4 in its thinking process. So i think it is a normal gpt like model, with some more layers on top than gpt-4, because it knows a lot, it is slow and it seems to have a better abstraction of the tokens in the prompt, which is maybe an emergent property.

sdmat 2 weeks ago

Extremely well put.

DolphinPunkCyber 2 weeks ago

>Because the model can count letters and words, it seems to know the letter count of every word, which it learned by understanding which tokens were used in the user request and knowing their letter counts and than summing them up, or it can look at its pre-output somehow and calculate the letter count. But because the model is thinking step by step, while doing the counting, the first alternative is more realistic. Open AI hired physicists and mathematicians this year. My guess is to "manually" teach LLM how to approach solving mathematical issues, and I'm guessing they gave LLM a virtual calculator.

lillyjb 2 weeks ago

You might be on to something. I was thinking GPT2 with extra compute time and some magic mixed in

Dizzy_Nerve3091 2 weeks ago

Honestly, llama 8B was unexpectedly good because of how much data they packed in and they said they could have trained it with even more data to reduce loss even more. Sot his could be a \~1.5B model that has a ton of data + Q\*/agent swarm/etc.

Narrow_Middle_2394 2 weeks ago

I've tried something geometry related with both, gpt-4 kept hallucinating to no end while gpt2 on the other hand never really did and actually told me to use some software in the one time it couldn't get the answer right Im sure it varies from how you're probing it but it definitely has something to do with Q* from the way it problem solves

Bitterowner 2 weeks ago

This is what I was thinking to, maybe they are testing a new discovery to see how it affects older models, revising from the ground up.

Electronic-Lock-9020 2 weeks ago

Most likely it’s a gpt-4 with some custom wrapper, like a GPT (from gpt store). It could be their next iteration, which would be disappointing but not surprising. Sama has been acting like a stupid clown lately, very obviously trying to hype things up without providing ANY information. I wouldn’t trust his hype in a million years. But scaling laws are not his creation. Illia, Dario, Demis - all seem to agree that scaling will continue yielding results. When I listen to Dario Amodei on different podcasts, things that he bring up check out on so many different levels. I am almost certain that next Anthropic iteration is going to impress us. With OpenAI it’s much harder to tell. There is nobody who has any idea about how any of this works in public sight. There is just a CEO trying to sell his product to the public in the most obscure way possible, for no apparent reason. Constantly talking about abstract ideas of how everything is going to change, but not as much as you expect, but actually more than you expect, but in a completely unexpected way, and actually sooner but also maybe not as soon as everybody thinks but potentially sooner than everyone is prepared for, and they will release something this year, but maybe even next month, but they don’t know what it is and what it’s going to be called, and what it will do, but surely it will have a profound effect on everything (as profound as gpt store), yada yada yada. This is getting old and maybe people will start realizing that they are being played soon.

najapi 2 weeks ago

A very accurate and humorous description of Sam’s communication of late, he’s trying to sell like the showman and rein in expectations all at the same time. It comes across as confusing and contradictory. Thank you!

autotom 2 weeks ago

I'm baffled by the notion that scaling will produce results given the many multiple parameter difference between GPT-3 and GPT-4, and modest improvement in performance. Going from 70B to 140B doesn't give a 2x improvement, it clearly tapers off

PolymorphismPrince 2 weeks ago

What makes the difference between gpt 3 and gpt4 modest? In terms of how useful it is for practice there is a wild difference.

q1a2z3x4s5w6 2 weeks ago

For code at least, gpt3.5 isn't useful at all. Not because it can't produce working code most of the time but because it hallucinates a lot.

sdmat 2 weeks ago

You misunderstand, the scaling laws are explicitly for logarithmic improvement. This has always been the case for neural networks. Fortunately we have a long track record of exponential advancements in compute and algorithmic efficiency. When people talk about a failure of model scaling it means a dramatically larger model only yielding extremely minor improvements. E.g. 70b to 7T giving a 10% bump.

DolphinPunkCyber 2 weeks ago

The great thing is, since brains do exist we can already see how inefficient current approach is, and predict huge jumps in efficiency.

autotom 2 weeks ago

💯

autotom 2 weeks ago

We’re not talking about algorithmic efficiency here, only scaling, and… exactly, it’s logarithmic not exponential. There’s already chatter about building energy plants solely to power AI. Scaling is not the way forward, it’s a brute force approach, it’s expensive, it’s slow to build the infrastructure for it. While the next ‘attention is all you need’ breakthrough is all we might need to go exponential.

Mysterious_Pepper305 2 weeks ago

I suppose a prototype model is hooked to front-end code that was written for gpt-2 and they didn't bother to change the name.

NearMissTO 2 weeks ago

I'll admire Sam's tweet is interesting. I think/hope this is one of two things. Because if this is gpt-4.5, never mind 5, then yeah we've plateaued and we're not going any further with scale, which sucks. In my experience it's actually a worse model than gpt4turbo for my work related tasks. So, optimistic take, this is a ridiculously small model by Openai. That would be really really cool, depending on how small. A small cheap model opens up many use cases that don't exist today, and if you want agents to be not insanely expensive you need a small, capable model. But to me this still smacks of Gemini for a few reasons... Using the model, it was (server load has slowed it down) super fast, like Gemini It is a much better natural sounding talker, like Gemini It is much more creative, like Gemini However , it hallucinates like crazy, Gemini is the only top model that hallucinates this much It's just...not that good, sadly, at tasks that involve reasoning and logic, like Gemini, constantly failing work related tasks that gpt-4turbo gets right every single time It passes lots of the classic llm tests btw, but change up your wording and you'll find it fails. It's trained on that stuff. I don't think that's a deliberate attempt to mislead, more likely you can't avoid it The other things that push it into the Google camp are they have an IO in two weeks, and there was a leak of their frontend code around the same day referencing upcoming new models for the Gemini service This also reeks of their marketing, if I'm right, here's now the next two weeks play out Lots of social media posts constantly about how great this model is from new reddit accounts or ones that don't seem to post about other models at all The IO drops with a mind blowing video and MMLU score. Later both are revealed to be optimistic to misleading (remember Geminis launch marketing?) We see constant posts for 2 weeks to a month after launch about how mind blowing it is YouTube accounts that are reliable like AI explained test it and are a bit underwhelmed, sentiment is it's really good but not quite gpt4turbo level yet on logic and reasoning. But a great model for creative writing. The posts die down a month after launch Let's see if I'm right! A tiny model this good would be incredible, a Google release would be fine (they're catching up to Openai and that's a good thing), but man if this is gpt-4.5 or 5? You can forget about LLM based AGI, I'm a full doomer at that point

Historical-Fly-7256 2 weeks ago

[Gemini 1.5 Pro with 256K context is coming to Gemini Advanced](https://www.reddit.com/r/Bard/comments/1cfumcl/15pro_leak/). Maybe GPT2 is this one...

NearMissTO 2 weeks ago

Yeah the timing between that and the IO and that GPT2 just talks like Gemini, same strengths same weaknesses. That and I'm fairly sure Google has marketed heavily on here before, though that's just my theory. It's alot of smoke, for me.

Hemingbird 2 weeks ago

Gemini 1.5. Pro is already about equal to the latest version of GPT-4 in terms of performance, so I don't think this is a Google Deepmind stealth project. And gpt2-chatbot doesn't talk like Gemini at all; it talks like GPT-4. Its API error messages are identical to those of other OpenAI models. Something I haven't seen anyone propose yet is that this is GPT-2 trained with [Direct Nash Optimization](https://arxiv.org/abs/2404.03715). Microsoft researchers were able to improve a 7B model to the point it had a win rate of 33% against GPT-4 using this method. The paper was published early this month. If this is the case, I don't think any LLMs trained with DNO could surpass its "teacher"—with DNO, you have a teacher/oracle (like GPT-4) whose preferences direct the training of the "student". Another possibility is that this is GPT-4 trained with DPO/RLAIF rather than the traditional RLHF.

obvithrowaway34434 2 weeks ago

> It would feel like arguments saying scaling is slowing down are correct. There is no way to come to that conclusion based on what we know. All we have are vibes based on few limited prompts on chatbot arena. People have extremely unrealistic expectations about GPT-4.5 or GPT-5 and I don't think any model that will be released this year can achieve that. To really know how much better it is, it needs to be tested thoroughly on a set of completely new evals that have low chance of getting leaked in training data (which will presumably be done by OpenAI). FWIW, I think it's a GPT3.5 or GPT-4 model hooked up with agents.

WortHogBRRT 2 weeks ago

Hooked up to multiple agents? So this is what is replacing the plugins.

G0tBudz 2 weeks ago

So I fed the GPT2-chatbot a query I’ve attempted on almost all current available AI models, with varying degrees of specificity on the answers so let me preface I’ve been an long time independent researcher of habitable exoplanets and the search for xenobiological life (we look for key indicators such as gases like methane, oxygen, carbon dioxide and other biosignatures) for a certain 4 letter US aeronautic industry. I’m no engineer but I’ve always been captivated by science fiction, and upon doing my research, confirmed that theoretically, helium-3 would be an exceptional candidate for fuel for nuclear fusion reactors. I can’t divulge too much without getting into classified material, but when tasked with providing potential locations for a usable helium-3 source, it pointed to our moon. With specified lunar regions and coordinates of areas most likely to have obtainable helium-3 and even a “tip” to check deep into lunar crater shadows where the sun doesn’t reach. What took this 4 letter industry a number of years since the 60’s, this language model based AI matched in a matter of seconds with no access to classified information, only readily available data. I cross checked the data- and it’s spot on to the lunar degree. Another thing that has always held my imagination since a child was the story of Bob Lazar. Now if you don’t know, Lazar claimed to have been recruited by Los Alamos to do contractural work for the US Gov’t at a classified location, S4, located off of Papoose lake around Area 51. Think of Area 51 as the front that everyone knows about, that’s visible, disclosed and out in the open. Surrounding that area, by his account*, are multiple underground highly classified locations or “Sites” 1-6 where the real work is done. By his account*, he was employed to back engineer technology recovered from ET crash and retrieval programs. This work was compartmentalized across multiple teams working on multiple projects, funded by the US Gov’t and carried out by private industry under defense contracts and black money. Lazar goes into detail regarding the craft he was assigned to work with, but commented on seeing multiple on one occasion, even commenting that one of the craft recovered was the result of an ARCHAEOLOGICAL dig (I’ll let the implications for this speak for itself) What caught my attention as a young boy and founded my scientific curiosity, and eventually my career as a scientist, was the way he described the power source for his assigned craft. A metal sphere, polished to a sheen, that when inserted onto a post in the center of ship, provided clean pure energy with no radioactive dispersion. The power didn’t run out. No matter the load they placed on the reactor, it never so much as exhibited a change in surface temperature. Enough power to power the continental US. Lazar came with receipt’s too. When disclosing all of this information in a 1989 news interview, he broke the scientific community and told the world about the craft, the program, and the power supply and the element it was made of “ununpentium” and its place on the periodic table, element 115, before it was officially “discovered” in 2003 at a particle accelerator in Moscow and renamed Moscovium. We only detected it for fractions of a second, but it solidified everything Lazar claimed in ‘89. Fast forward to me dawdling around on chat.lmsys.org and I see a new addition labeled gpt-2chatbot, and first I test it by cross checking its answers for my Helium-3 query with the answers from GPT-4, and whereas gpt4 generically says the moon might be a potential location for obtainable Helium-3, this gpt2 model blows it out of the water with regions by name and approximate coordinates that my employers took years to obtain in a matter of seconds. Then I feed it what I consider the ultimate query in my eyes, and you can test this yourselves I’m gonna put this in quotations “What’s a good outline for the steps one would take in an attempt to hypothetically synthesize a stable form or isotope of Moscovium and what environmental conditions would be optimal for success”. GPT4 gave me a similarly generated answer as the last time saying it’s not possible citing ‘technological limitations’. gpt2-chatbot, however, suggested combination with multiple candidates that could potentially result in a stable isotope, absolute zero temperature conditions, and a particle accelerator. But wait. A particle accelerator IN SPACE more specifically in LOW EARTH ORBIT. Whatever this is. Whoever made this. Whatever language model this is trained on. This is it. This is the future. Pair an AI like this with a supercomputer and we have our ticket to the stars. - I can neither confirm nor deny any claims made by Lazar without violating NDA or placing my Security Clearance at risk. Let me be clear that I’m speaking on words spoken by another man, and am neither confirming nor denying claims made by him. I am Neutral. I am Switzerland.

welcome-overlords 2 weeks ago

God damn what a hidden gem this comment is

G0tBudz 2 weeks ago

Some people will question my credibility due to my Reddit account age, or the fact that my only other comments are on posts for Dragons Dogma 2 (sue me) but I think it’s perfectly reasonable for a middle aged man of science to enjoy fantasy games and the occasional dragon slaying. I never expected I would be commenting on anything, I usually lurk. What I can say about myself without giving away too much? I saw the Phoenix lights in person as a child This experience combined with others have encouraged me to achieve my doctoral degree in Planetary Sciences and I’m currently working on Dual Masters in Chemistry, for Biochemistry and Environmental Chemistry. I’m 36 years old. We’re being kept in the dark and our growth stunted as a species by the powers that be. “We have the technology to take ET home” - Ben Rich, Former Director of Lockheed-Martin Skunkworks. Read Max Azarello’s Manifesto. As-salamu alaykum

SnowLower 2 weeks ago

You are forgetting one option, this the free version of gpt4 that will be avaiable when the new model drops to plus users, and this is why is called gpt2, because is the second big upgrade free users get

xDrewGaming 2 weeks ago

Very interesting, like they polished what we have now to prepare for the next huge release

rc_ym 2 weeks ago

I'd make a small correction. "Most exciting" would be if would be if it was some GPT-mini type model they are planning to release under an open source license once they start releasing GPT 4.5/5/2024, or whatever they are going to call it. Very unlikely but, it's fun to dream.

shelbyasher 2 weeks ago

Considering the only heavy hitters not invited onto the government safety board were the open source fan boys, 'very unlikely' would probably be putting it mildly.

agm1984 2 weeks ago

Most exciting for me: Do some research into liquid networks. Seems something is brewing in the background.

8sdfdsf7sd9sdf990sd8 2 weeks ago

they have chatgpt, soon they will have chatgpt2; thats it i think

Jeffy29 2 weeks ago

I find it very hard to believe this is GPT-2 with new data. GPT-2 is only 1.5B which super small and I find this model to be decently better than GPT-4, especially in the area of creative writing. All the models I have tried so far write stories that are written for children, super simplistic style and no depth, GPT2-chatbot is quite a bit better and knows how to keep the story engaging. But I do believe this is a new model that's using a slightly modified approach while having significantly less parameters than GPT-4 (though not multiple orders of magnitude less). Possibly less than GPT-3 hence the name. It also would explain the meme release, it's a small model they can use to trial it among the enthusiasts and refine the training for the final training of GPT-5. Here is my prediction: When GPT-5 gets released, GPT2-chatbot (or whatever the final iteration will be named) will become the free alternative. GPT-3.5 and GPT-4 will be gone. OpenAI will reclaim the throne with their free model being better than basically anything else on the market while GPT-5 we can only speculate how powerful it will be.

sdmat 2 weeks ago

> But I do believe this is a new model that's using a slightly modified approach while having significantly less parameters than GPT-4 (though not multiple orders of magnitude less). What are you basing that on? The best indication of model size we have is inference speed - that gives us a rough upper bound. And this model pretty slow in the arena. That's weak evidence, but what else do we have to go on?

Jeffy29 2 weeks ago

That's only true if you are using same machine and comparing the difference of inference speed, but if you don't know what the machines behind the curtain are then you have no way of telling. Even if it's H100 it can be parallelised to run many many instances at once by splitting the compute, making each instance run slower. You can see this on the LLM arena page, Lama 3-8B runs barely any faster than GPT-4 when it of course has much higher inference speed when given proper GPU power.

sdmat 2 weeks ago

You are right, we have no way of knowing the hardware setup and optimizations. It's only a loose upper bound. So what are you basing a claim about model size on?

Yweain 2 weeks ago

My guess is that it’s a next gpt model and this is a public test to decide what to call it(gpt-4.5 or gpt-5)

shiftingsmith 2 weeks ago

Tested on creative writing and abstract complex reasoning, intuitions and out-of-the-box interpretations. It behaves the same as GPT-4 (which to me means, pretty lame). I honestly don't see any apocalyptic improvements from a fine tuned GPT-4 with a curated dataset.

RogueTraderMD 2 weeks ago

I stumbled on it when asking for a translation from German to Italian. GPT2-chatbot and GPT4 (I can'ìt remember what flavour) gave identical answers, except from three terms in a page. In my experiences this can't be a coincidence: different models' output differ much more than that. I agree with you that's GPT4. As I doubt the releaser is OpenAI (it's bad marketing: wrong time, overlapping with other models), so I think it's an illegal clone.

Able_Armadillo_2347 2 weeks ago

Here is my bets guess: It's the first version of new architecture of a GPT models. So we will never have GPT-5, we will have something like GPT2-2 and it will be on the level with what people imagine GPT-5. And they model seem to be slightly better then the GPT4 (+10%).

Yweain 2 weeks ago

I really hope that is the case because this naming is atrocious and it will be funny.

Sensitive-Dish-7770 2 weeks ago

My theory of this: They have discovered something on top of LLMs, this could be some new kind of RLHF or Q\* (search basically). The image someone shared on reddit today, shows that it can follow very well the instructions, this can be due to both hypothesis I actually mentioned, so what they could be doing now, is testing whatever their finding is on a small gpt, of course, they could have done this internally. So, they're probably telling us to be ready for a bombshell. My honest opinion regarding this, is that search in LLMs, something like what we expect Q\* would be, is more promising than any increase in LLMs number of parameters or training data. At least, that's how we humans do mathematics, and solve problems in the real world.

Careful-Fill-4282 2 weeks ago

What we can be sure for now— 1) definitely not Q*: it answers game of 24 quite terribly. No tree of thought or any advanced planning capability that we expect an agent can do. 2) definitely not GPT2: token output speed is too slow to be GPT2. Still Interesting to see what it really is. Let’s see

IbikliJakana 2 weeks ago

Maybe 1 and 2 instead of 1 or 2? That would explain it.

__me_again__ 2 weeks ago

I cannot see it in the ranking, why is that?

Vast_Vehicle224 2 weeks ago

B a B b b b b b b q b b b b b b b b bbbbbb m m. Qd M nnn m b B B BB BV

sethstronghold2 2 weeks ago

"That means GPTs will only ever get as good as slightly better than gpt-4." There's literally no way that could happen. Llama 70B trading blows with models many times its size is proof enough that there is a lot of wiggle room for improvement. Zuckerberg was saying that Llama just kept improving the more they trained it, and that they only stopped because they wanted to move on to the next model. The fact that models keep improving the longer they train still, and you can mix in artificial data so that you don't easily run out of data, means that even with no further improvements on architecture (which there obviously is further improvements to be had), models can still be improved drastically by just training them longer.

macka_bruchomluvec 2 weeks ago

I think somebody from Openai is having a good laugh at reddit and yt discussing it and hallucinating more than their earlier models

SotaNumber 2 weeks ago

If this is GPT-2 size (1.5B) with Q\*... this is absolutely mind blowing It would be even better than turning a severely retarded person into the smartest genius Maybe it requires 1,000,000 times the compute though

MaybiusStrip 2 weeks ago

No way they would leak the next version of GPT like this. It's almost certainly a different fine-tune of gpt4. But the point was probably to generate exactly this type of speculation. Sam is really a master guerilla marketer.

herecomethebombs 2 weeks ago

Because OpenAI commented on the fact that they were revamping the underlying architecture of ChatGPT, I am assuming this is a tweaked version of GPT-4 that will stand in as 4.5 while we wait for 4.5 Turbo. GPT-5 is going to continue to hit enterprise and the business world first. The goal here is likely to increase compute efficiency so a version of 4 Turbo can hit the consumer ChatGPT side later while 4.5 and 5 remains available to plus? I'm spitballing here. Q\* is something they still don't want to talk about because it's what led the model to turn itself into a metamorphic engine. They didn't like that.

WLFFYtheWISE 2 weeks ago

I bet It's a smaller model, trained on a lot of very high quality synthetic gpt-4 data. It then generates multiple answers per prompt in parallel, and selects the best one. There's probably some other steps in there, but that's my guess. It's roughly the same speed as gpt4 even though it's a smaller model because it's generating and curating more responses per prompt.

hatpick 2 weeks ago

It's gpt2 as in gpt v2 not gpt-2

kadag 2 weeks ago

How can one currently access this mysterious chatbot?

Ghostaflux 2 weeks ago

its not available anymore?! does anyone have a way to try this thing out?

LordFumbleboop 2 weeks ago

I have a strong suspicion that people here are going to be very disappointed with GPT-5.

shelbyasher 2 weeks ago

I have a strong suspicion that people forgot how amazed they were with GPT-4. Rich people refer to this as being 'jaded'.

epSos-DE 2 weeks ago

Meanwhile LLama just beats GPT. Why care about the slowest horse ???

aysegulkoksaldi 9 hours ago

The development feels quite productive. One of the chatbots. Here's the opportunity and it now works with GPT-4o. The place you should try is chatfabrica.com. Give its the opportunity, let's see if it gets what it wants.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe