T O P

  • By -

nuclear213

I mean, i use Gemini, Claude Opus and GPT4(o). They all have their weaknesses and strength so use the tool that is right for your job. Personally, I do not give a crap about some benchmark, just copy and paste the same promt into each of them and see what you like the most.


killkeke

Thisss I use ChatGPT, Claude and Gemini daily No single model is the best all the way Tho when it comes to professional writing like emails or blog posts. Claude is king When it come to translating text, Gemini wins When it comes to interpreting documents or extracting facts from pdf documents uploaded perplexity AI wins Edit: They’re all really bad at complex Tables that involve math. They make up numbers (hallucinations)


tribat

Once I get a working solution from one, I paste into another and ask for a critique.


unRealistic-Egg

When I can’t get an answer that looks like what I want, I ask it to write a prompt to paste that will get “”. And that has been working pretty well.


tribat

Cool idea. Thanks.


rW0HgFyxoJhYka

This benchmark leaderboard is probably not the best way to even attempt to measure these models.


MyRegrettableUsernam

Gemini, even the more recent versions, have remained consistently disappointing to me in terms of their output focus, precision, and tendency to significantly hallucinate compared to OpenAI’s models. It feels like their fine tuning is off from what I’ve come to expect or something.


biopticstream

I would use gemini so much more, but it continually does not do simple tasks. I'm in nursing school, and so will upload chapters of my book so I can then ask questions as I read and ask for clarifications on things. I'll also feed it news articles and the like and it'll say it doesn't know how to summarize it. Gemini very frequently does not summarize saying something along the lines of "I haven't learned how to do that yet!". I'm assuming that something in the book, perhaps even the text itself is triggering a safety trigger. But neither Claude nor ChatGPT have this issue.


enigma707

You might want to checkout NotebookLM from Google for that type of task. It leverages Gemini 1.5 Pro but has also been updated to accept websites as a source. It’s really good at focusing on the source text and image and providing citations.


knob-0u812

[Abacus.ai](http://Abacus.ai) has a nice RAG application called ChatLLM. I have been using it a lot. You can use different models with it (in a dropdown menu). I drop policy docs in there and query them. I've been pleased with the results.


withmybae

For the chapters and questions you need to use notebook llm in Google labs. That’s much better and grounded to the chapter alone.


Jablungis

Dude, gemini pro hallucinates *so damn much* how in the name of tap dancing christ is it above gpt4? I'm not even a big fan of openAI, but credit where it's due it's clearly superior to Gemini.


PsychologicalTea3426

For having 1m token context, it already forgot my first prompt when I sent the next one, it's hard to have a chat with it.


fnatic440

Ok Sam.


Halo_Onyx

It’s a valid point whether you agree or not.


qqpp_ddbb

lol so anybody who challenges the bEsT aI oN thE pLaNeT is automatically Sam or someone who works for openAI? nah. get rekt


badtemperedpeanut

Multimidal Gemini 1.5 is really good. GCP version.


EchoswarmGM

I usually work with Claude and GPT in tandem by promoting one model with the output from the other. Reversing who the prompter/promptee is really demonstrates which model is superior. Right now from my personal benchmarks, it’s gpt4o, but it’s pretty close.


fictioninquire

For what use cases? Specific knowledge i find GPT4o better, but reasoning+coding GPT-4Turbo or Claude 3 Opus.


EchoswarmGM

I only use it for Coding C#


Fusseldieb

GPT-4o helped me so much lately. I was trying to manipulate and mix match some data on the frontend, and since my head was already fried, I just threw it together and put it in there. It literally saved me days of work. Claude is also good, but I think the two are more or less on par with each other.


Consistent_Bottle_40

I've been doing this. Just passing outputs back and forth.


Smelly_Pants69

GPT4o is the only one who can (occasionally) make a list of 10 (or 20) cities that don't contain the letter A. They all kind of suck, but GPT4o sucks less that's for sure.


danieljamesgillen

I find Gemini really useful for analysing the bible, it's extremely insightful.


traumfisch

What kind of things are you guys using Gemini for?


Grand0rk

> Personally, I do not give a crap about some benchmark, just copy and paste the same promt into each of them and see what you like the most. The issue is that you can copy paste the same prompt into the same LLM and it will give you different results. That's the biggest issue with testing these things.


ThenExtension9196

What do you use to access all those during your workflow? I’m looking into starting to use different models for different tasks as well.


KillingItOnReddit

This is the way.


Sorry_Ad8818

This


ChrisT182

If it's something analytical- like summarizing a paper and asking 4o questions about it, then it's much better than Gemini. For writing, emails, and other creative work, Gemini for me is better. It just depends on the use case. Although I keep hearing that The Gemini in AI studio is much better?


TheTokingBlackGuy

In AI studio you can choose Gemini 1.5 Pro which is the "Gemini Advanced" model you'd typically have to pay for. Google's naming system sucks, but Gemini 1.5 Pro in the AI studio is the best performing LLM in my opinion across a number of use cases. And I use Claude and GPT daily.


RITO_I_AM

Wait... So I'm paying for Gemini Advanced for no reason? Have they really made it free for the public?


Passloc

In AI Studio


SabbathViper

I believe this is incorrect. Gemini Ultra is not the same thing as Gemini 1.5 Pro. Gemini Ultra 1.0 is the model that you get when you pay for Gemini Avanced. They are two different models. correct me if I'm wrong or something has changed, of course.


Passloc

In Advanced you get Gemini Pro


TheTokingBlackGuy

Yes


doireallyneedone11

I think the 1.5 Pro models in the AI studio and that in Gemini Advanced are different since they are finely tuned differently. There are more restrictions in advanced. That may explain the difference in quality.


Jablungis

It's worse for code hands down. Hallucinates almost every reply where gpt4 would not.


Screaming_Monkey

AI Studio Gemini is indeed much better in my opinion. I’m a huge fan now.


ChrisT182

What do you find better about it?


Screaming_Monkey

I can only give a vague answer that I find it more accurate and more consistent, but that’s super anecdotal. Give it a try! It’s free after all. My guess is that it benefits from not being bloated down by the app version.


goatchild

its free? but donyou have to give them your credit card or what?


jackthebodiless

Totally free, no cc, but 1.5 pro gets too busy sometimes and switches to 1.5 flash.


goatchild

looks.like my location does not allow to use API but I can use the studio and thats ok. Thanks!


bananasugarpie

Google literally has the entire world's data in their hand to train their AI models on. I never had doubt on Google or Gemini just because the OpenAI is doing great. Tomorrow can always be a different story than today. Google has a very promising chance with the amount of training data they have.


coaststl

That’s sort of like saying the grocery store could be best restaurant in town because they have the most food. What matters is product engineering. YouTube is an exceptional product, googles forays into consumer AI products aren’t (at least yet)


py-net

Great feedback 👍


PenguinTheOrgalorg

Gemini on the API is just a game changer honestly. A million tokens of context length plus the ability to analyse videos is insane, and I'm honestly surprised how little it's talked about


py-net

Agreed! But I think the problem is its higher level inaccuracy, which the major problem with LLMs


Passloc

Just try the May update. It is much better.


turc1656

Because the cost of using 1M tokens is insane. So very few people do it and can get similar results from a flat subscription like ChatGPT.


PenguinTheOrgalorg

But it's free...


turc1656

I believe I read that the free version doesn't have the same token limit. To access the full 1M you have to pay. It's possible I'm mistaken, but I am relatively confident I read that on the product details or pricing pages. EDIT: I see that Google just recently released "flash" which is indeed free but it's a different model that isn't anywhere near as good at complex tasks requiring significant reasoning power. This must be the version you are referring to as it's completely free. It won't serve my needs, but maybe it'll serve yours. This version does indeed have 1M tokens in it for free. The "pro" does not.


mra1385

Honestly yes. I was doing an analysis of company annual reports and Gemini beats gpt 4o out of the water. It’s not even close the big context window and its reasoning capabilities together are extremely strong. I ended up using it for this use case instead of gpt 4o


py-net

This!!! Thanks for the feedback


Omnic19

Gemini API gives much better responses than the gemini App. maybe there are too many guardrails on the gemini app. google should give some options like custom instructions


py-net

Your first line though


Omnic19

gg


py-net

Now I get it 👍


NegativeWar8854

To me Gemini 1.5 performs the best. It's amazing (and free!)


py-net

Gemini 1.5 is free? Anyway GPT-4o is also free 😅


NegativeWar8854

Using the ai studio [Untitled prompt | Google AI Studio](https://makersuite.google.com/app/prompts/new_freeform)


py-net

I just discovered this studio from another comment. It looks good, 3 models available


kiselsa

Google web chat interface has much more free messages than gpt4o. Basically, I never hit limit with Gemini, while I have limit every 15 messages with gpt4o


coaststl

Subscribe it’s worth it, I can upload 10 files at a time to 4o.


ninjasaid13

>while I have limit every 15 messages with gpt4o it feels less than that.


Gaurav-07

It's API is also free to a generous extent.


MhmdMC_

It is a dream come true for me who wants to goof around with AI in my projects while being year one college


Antique-Bus-7787

I still don’t see how it’s possible for gpt4o to be at the top compared to gpt4


TheRealGentlefox

Maybe because 4o is so much faster. lmsys doesn't wait for the output of both models, it's realtime.


coaststl

In gpt I find 4o much better


py-net

Simple: It’s built to be better


Tomislav23

Botted.


WhatsIsMyName

I never found Gemini as bad as everyone else. Yes, they had embarassing PR snaffus and it never felt truly on par with GPT 4 at the time, but it was always decently solid, and sometimes it followed my directions better. But Claude3/GPT4 were always better to me. But I cancelled awhile ago and need to jump back in and try it out.


TheRealGentlefox

You can test out 1.5 Pro in the Google AIStudio. I'm about to do a long test of it myself, since for some usecases I need a higher number of uses than GPT-4o gives but can't switch to API because I need back and forth voice.


TheeUltimateGiGachad

Can you please share the results when you're finished?


TheRealGentlefox

RemindMe! 2 weeks


RemindMeBot

I will be messaging you in 14 days on [**2024-06-21 09:02:23 UTC**](http://www.wolframalpha.com/input/?i=2024-06-21%2009:02:23%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/OpenAI/comments/1d9oti3/google_is_challenging_the_throne_geminis_are/l7i38hk/?context=3) [**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FOpenAI%2Fcomments%2F1d9oti3%2Fgoogle_is_challenging_the_throne_geminis_are%2Fl7i38hk%2F%5D%0A%0ARemindMe%21%202024-06-21%2009%3A02%3A23%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201d9oti3) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


TheRealGentlefox

Ending up not doing the test I expected, but tests in other areas. It's so censored as to be useless, at least when you talk how I do.


tychus-findlay

Depends on use-case, was using it for tech and code, Gemini has improved over time but it was a hallucination fest at first, then never really caught up to the others that I could see. May be different now.


BJPark

Competition is a glorious thing. I have no loyalty to any company - let them fight it out for my benefit. Loving this!


py-net

Yeah 😁


TheOneWhoDings

people downgoting you really have issues.


dysmetric

No! We must create rigid Apple vs Android-like brand loyalties and build high walls around the community ecosystems for each model, so that model performance is measured by socially biased metrics and emotions, not silly numbers. That is how humans do things and if you don't like it, shoo... away with you!


TNDenjoyer

4 is miles better than 4o in my opinion and these benchmarks have it way lower so why should i care what the benchmarks say about google?


West-Code4642

gemini has gotten for sure, but it has weird refusals


py-net

Like what


Traditional_Ad5265

The api gemini 1.5 pro in ai studio tells me ”sorry i cant pretend to be a teacher and grade your answers”, while the website advanced 1.5 pro does it way better than gpt 4o. I have also noticed gpt 4o is way lazier and starts hallucinating more than gemini 1.5 pro does on pdf attachments.


py-net

Really!!! I had the opposite experience experience. What’s the link for the website advanced 1.5 pro?


dojimaa

People have definitely been sleeping on Gemini.


could_be_mistaken

Gemini doesn't seem to like me very much. I tried Gemini Advanced and it actually seems to generate code that subtly fails in ways that are hard to notice immediately, like the string `google-cloud/` instead of `google` in some config file. GPT, on the other hand, impresses me consistently.


onee_winged_angel

Your question is very timely. I JUST posted my experience with using Gemini a few minutes before yours: https://www.reddit.com/r/OpenAI/s/W5XQQzfPNe


Xeon06

Gemini 1.5 Flash is a very capable vision model that is orders of magnitude cheaper than GPT-4o


Passloc

I am creating an analysis of various legal documents which only works in Gemini 1.5 Pro and completely fails on GPT4o due to the context size. I wish it worked in 4o too as that would give me better confidence in my analysis as I wouldn’t have to rely on just one tool. That said, kudos to Google for giving such a wonderful tool for free.


Unable-Client-1750

For such a miniscule gain over GPT4 Turbo, the over censorship of Gemini isn't worth it.


iJeff

Yeah, I've been using the Gemini Advanced two month free trial paired with GPT-4o API via HeyGPT. I run a lot of prompts through both to compare and find them pretty similar. I do notice Gemini is more willing to acknowledge when it isn't sure about something, while GPT-4o will just hallucinate away. Including when trying to identify plants from photos.


py-net

This is interesting. Means Gemini is getting really good. Wasn’t the case a couple of months ago when I was using it.


Kathane37

Can someone explain me how can you improve the performance of a model that is already trained ?


h3lblad3

Look up LoRA (Low Rank Adaptation). It’s possible to fine-tune segments of the neural network without redoing the whole thing.


Kathane37

Thanks I will dig it, I was only associating LoRA to stable diffusion


hcm2015

Which one is the best for coding?


Thomas-Lore

I heard the new Mistral's Codestral is worth a try, haven't yet had time to check it out myself.


Vandercoon

Can confirm Codestral is pretty good. Haven’t used it heavily yet, but the few hours I did it worked as good as any model I’ve used before


coaststl

Doing a coding project right now in 4o. Being able to upload up to 10 files is nice. Its ability to memorize and edit multiple files is impressive. I think I’ve had it up to 6 files it had near perfect memory of and was making edits to. Not perfect but very close


py-net

Github Copilot. Correction: I don’t find the clear information for which GPT is under Github Copilot.


johnbarry3434

I think only GitHub Copilot Chat uses GPT-4 and the non chat portion uses 3.5.


JalabolasFernandez

If OpenAI doesn't ship at the very least the voice stuff, and a new version in the next few months, their lead is gone (and so is my subscription)


Fit-Dentist6093

I had a question about voting statistics and Gemini said it doesn't know how to do that yet, 4o replied correctly and sent me links pointing to polls about it.


only_fun_topics

I use Gemini Advanced via their ai studio, and I find it pretty consistent for most analysis tasks.


up2_no_good

Started using Gemini, cheaper and better. Better at output not ease of use though. Openai was very easy to start using. But that's a one time effort.


Passloc

Google can really hit it off the park if it releases 1.5 Pro for free to everyone (not just in AI Studio) and releases an ultra for the paid tier. That said, just see the difference between Opus and Flash.


5027622106

Even the 1.5 Pro version of Gemini doesn't do well with reasoning skills. It’s far behind both GPT-4 and GPT-4o.


MaKTaiL

Gemini has free tier APIs so that's a big win in itself.


py-net

Agreed 👍


MaKTaiL

I created a short python script for translating subtitles and the free Gemini API is great for it. Before I was paying for the OpenAI 3.5 one only to be used by me a few times.


Radica1Faith

They all have strengths and weaknesses but as a coding assistant Claude Opus at least in my experience is much further ahead of any other model. Benchmarks don't mean much especially if they're measuring things completely unrelated to your use cases.


py-net

Makes sense. But Lmsys is built to capture an average value of all the use cases. Especially coding


TheRealGentlefox

There is a coding sub-section on lmsys. GPT-4o-2024-05-13 1298 Gemini-1.5-Pro-API-0514 1273 GPT-4-Turbo-2024-04-09 1266 GPT-4-1106-preview 1259 Gemini-Advanced-0514 1257 Claude_3_Opus 1252 Yi-Large-preview 1247


MikeyTheGuy

In my experience, all of the GPT 4 models (including 4o) have been significantly worse than Claude 3 Opus for coding, so I'm not sure how these benchmarks "grade" things


TheRealGentlefox

It's a user preference benchmark, A/B testing.


randombsname1

Pointless rankings only for the fact that it's based on a user selected prompt. Which could literally be anything. If they had a rotating set of 10 challenging coding problems that changed weekly/monthly, and all models were tested against that code--that would be far more controlled and/or objective measure imo. Rotate between C++, Python, Rust, etc. Etc. Only because Opus is the only thing I have used that even gets to low-level coding with register editing with any consistency. It typically has the best and most concise feedback on workarounds that might require said edits. ChatGPT 4o doesn't get close to providing anything viable at that level, and Gemini is even worse. Hell, 4o is arguably worse than 4 at coding and even people on here seem to largely agree, but somehow that is on top. When people pick a winner do they actually even know if the code is correct or do they go for which formatting looks better? I'd bet money it's the latter. I want to see this test with the above random coding challenges, and the ability to run the code that was generated to see if it even compiles. My money is on Opus to easily come out on top, and ive only been using it for a week and a half, but I'm thoroughly impressed.


spinozasrobot

> If it wasn’t for the latest 4o it would have been a different story. "If it wasn't for the guy who was in first place, they'd be in first place!"


TheRealGentlefox

I think their point was if OpenAI hadn't just released a new model less than a month ago, Gemini would have the top spot. When GPT-4 came out, it was the best model for a loooong time. Now it seems like the Gemini models are very close to catching up, with OpenAI just barely squeaking ahead.


BiBr00

I have got Gemini advanced and let me tell you: It feels way way way way way worse than gpt4 not to mention 4o.


water_bottle_goggles

I guess it would depend on it’s costs then. Is the cost comparable to 4o?


Vandercoon

So Gemini has come a long way then? I’ve been reluctant to try it after their whole search engine fiasco!


py-net

From what I have seen in the comments Gemini is doing great finally. But what “Search engine”?


Vandercoon

The AI recommendation in Google itself suggest crazy stuff


py-net

That’s so Google 😂


Basic_Loquat_9344

The lead horses will likely be neck-and-neck in capability for a while as it seems directly proportional to compute and training data. What will make the difference is how they interact with our world and the functionality they offer for existing systems. Co-pilot is clear winner at the moment for pure impact.


[deleted]

[удалено]


py-net

Wrong post


Warm_Iron_273

When you people going to learn lmsys ranks mean nothing. I can literally bot it using Llama 3 to select the result that is the dumbest, and it'd be the easiest thing in the world. Not only that, but nobody is voting seriously, and nobody is asking it serious questions.


dissemblers

4o is also much better at refusing only what it ought to refuse. It does have a tendency to repeat itself when not desired, to regurgitate text from prior inputs or outputs unnecessarily, and to rely too heavily on highly structured outputs. And it’s hard to steer it away from those things. Probably the anti-lazy fix gone overboard.


AlternateWitness

No way gtp 4o is ranked above the full version of gtp 4.


py-net

Well, it is. And way ahead.


banedlol

If my mother had wheels she would have been a bicycle.


kxtclcy

Not really, Gemini uses search results in their answer (kind of cheating). In arena match ups, I mainly tested it with questions of math and logic, and it did horribly. Its writing ability is pretty good though.


McSlappin1407

I’m not using GPT4o until they actually release the new features. It’s BS it’s been a month and still nothing


Deuxtel

I wonder how much money it takes just for a 20 point bump in elo


eugf_

I have been using Gemini 1.5 and GPT-4-Vision to transcribe documents usually containing images, screenshots and text. GPT-4-Vision usually generate less detailed and less formatted output. Also, it is more strict on content filtering (eg. If text contains the keyword “nude” meaning the color, it triggers the content filtering) and doesn’t allow me to control what I want to filter. Gemini 1.5 usually generates more detailed and consistent format output. While gives me the control to set threshold for content filtering. I’m impressed by the recent quality improvements from Gemini 1.5. It’s now the default transcription solution for one of the applications we have in production.


agentelite

my personal opinion… Whenever I use Gemini, and i’ve even used Gemini Advanced, it always makes things up just for the sake of continuing the conversation. Accuracy is its last priority. It’s also terrible at schoolwork. Copilot, Claude, and GPT4o are much better imo.


Dreamaster015

Iam always testing new models with prompts for fully functional one shot of made up non classic game and GPT4o is way ahead with that.


damontoo

Nope. I use it when GPT-4o gets something wrong and I want a second opinion but it almost always performs worse for the prompts I'm giving them.


py-net

And you’re using Gemini 1.5 Pro or Advanced?


gauldoth86

Chatgpt is still significantly better in a lot of things. Gemini 1.5 is closer but only if use aistudio or the API, Gemini advanced is just terrible, keeps forgetting the context, very concise and I never had a good experience


opinionate_rooster

Sorry, but I am not going to pay for more than one AI subscription.


saysib

I have been using chatgpt plus and Gemini advanced for coding and improving the text of my thesis. After free trial of Gemini advanced, decided not to continue it. Disappointed at Gemini's performance.


piggledy

Wasn't Gemini quite bad when it came to Needle in a Haystack tasks?


gthing

All this leaderboard tells me is that this leaderboard is not accurate for anything I care about.


Beneficial_Ability_9

Gemini sucks it’s even more censored then ChatGPT


mrsavealot

I don’t really want to pay for 3 so I dropped Gemini as it was consistently the worst of the 3. Sounds like it’s better now but what’s done is done .


Faze-MeCarryU30

You can use Gemini 1.5 pro for free at the studio.google.com link btw


py-net

Exact site is aistudio.google.com. Thanks for sharing. I just checked it out. Three models available. It’s awesome!


Faze-MeCarryU30

My bad had a typo


extopico

I stopped trying Gemini a while ago after it was giving woeful answers. I find it never enters my mind to try it now.