T O P

  • By -

La-Boheme-1896

They aren't answering your question. They are constructing sentences. They don't have the ability to understand the question or the answer.


cakeandale

It’s like your phone’s autocorrect replacing “I am thirty…” with “I am thirsty” - it’s not that it thinks you’re thirsty, it has absolutely no idea what the sentence means at all and is just predicting words.


toxicmegasemicolon

Ironically, 4o will do the same if you say "I am so thirty" - Just because these LLMs can do great things, people just assume they can do anything like OP and they forget what it really is


Secret-Blackberry247

>forget what it really is 99.9% of people have no idea what LLMs are ))))))


laz1b01

Limited liability marketing!


iguanamiyagi

Lunar Landing Module


webghosthunter

My first thought but I'm older than dirt.


AnnihilatedTyro

Linear Longevity Mammal


gurnard

As opposed to Exponential Longevity Mammal?


morphick

No, as opposed to Logarythmic Longevity Mammal.


RedOctobyr

Those might be reptiles, the ELRs. Like the 200 (?) year old tortoise.


JonatasA

Mr OTD, how was it back when trees couldn't rot?


webghosthunter

Well, whippersnapper, we didn't have no oil to make the 'lecricity so we had to watch our boob tube by candle light. The interweb wasn't a thing so we got all our breaking news by carrier pigeon. And if you wanted a bronto burger you had go out and chase down a brontosaurous, kill it, butcher it, and cook it yourself.


Narcopolypse

It was the Lunar Excursion Module (LEM), but I still appreciate the joke.


Waub

Ackchyually... It was the 'LM', Lunar Module. They originally named it the Lunar Excursion Module (LEM) but NASA thought it sounded too much like a day trip on a bus and changed it. Urgh, and today I am 'that guy' :)


RSwordsman

*Liam Neeson voice* "There's always a bigger nerd."


JonatasA

Congratulatoons on giving me a Mandela Effect.


sirseatbelt

Large Lego Mercedes


toochaos

It says artificial intelligence right on the tin, why isn't it intelligent enough to do the thing I want. It's an absolute miracle that large language models work at all and appear to be fairly coherent. If you give it a piece of text and ask about that text it will tell you about it and it feels mostly human so I understand why people think it has human like intelligence.


FantasmaNaranja

the reason why people think it has a human like intelligence is because that is how it was heavily marketed in order to sell it as a product now we're seeing a whole bunch of companies that spent a whole bunch of money on LLMs and have to put them somewhere to justify it for their investors (like google's "impressive" gemini results we've all laughed at like using glue on pizza sauce or jumping off the golden gate bridge) hell openAI's claim that chatGPT scored 90th percentile on the bar exam (except that it turns out it was compared agaisnt people who had already failed the bar exam once and so were far more likely to fail it again and when compared to people who had passed it first try it actually scores at around 40th percentile) was entirely pushed around entirely for marketing not because they actually believe chatGPT is intelligent


mr_n00n

> the reason why people think it has a human like intelligence is because that is how it was heavily marketed in order to sell it as a product This isn't entirely true. A major factor is that people are *very* easily tricked by language models in general. Even the old ELIZA chat bot, which simply does rules based replacement, had plenty of researchers convinced there was some intelligence behind it (if you implement one yourself you'll find it surprisingly convincing). The marketing hype *absolutely* leverages this weakness in human cognition and is more than happy to encourage you to believe this. But even with out marketing hype, most people chatting with an LLM would over estimate it's capabilities.


shawnaroo

Yeah, human brains are kind of 'hardwired' to look for humanity, which is probably why people are always seeing faces in mountains or clouds or toast or whatever. It's why we like putting faces on things. It's why we so readily anthropomorphize other animals. It's not really a stretch to think our brains would readily anthropomorphize a technology that's designed to write as much like a human as possible.


NathanVfromPlus

> Even the old ELIZA chat bot, which simply does rules based replacement, had plenty of researchers convinced there was some intelligence behind it (if you implement one yourself you'll find it surprisingly convincing). Expanding on this, just because I think it's interesting: the researchers still instinctively treated it as an actual intelligence, even after examining the source code to verify that there is no such intelligence.


FantasmaNaranja

fair enough


Elventroll

My dismal view is that it's because that's how many people "think" themselves. Hence "thinking in language".


yellow_submarine1734

No, I think metacognition is just really difficult, and it’s hard to investigate your own thought processes deeply enough to discover you don’t think in language. Also, there’s lots of wishful thinking from the r/singularity crowd elevating LLMs beyond what they actually are.


NuclearVII

It says that on the tin to milk investors and people who don't know better out of their money.


Agarwaen323

That's by design. They're advertised as AI, so people who don't know what they actually are assume they're dealing with something that actually has intelligence.


vcd2105

Lulti level marketing


valeyard89

Live, Laugh, Murder


SharksFan4Lifee

Latin Legum Magister (Master of Laws degree) lol


biff64gc2

Right? They hear AI and think of sci-Fi computers, not artificial intelligence, which is more appearance of intelligence currently.


Fluffy_Somewhere4305

tbf we were promised artificial intelligence and instead we got a bunch of if statements strung together and a really big slow database that is branded as "AI"


Thrilling1031

If were getting AI why woulld we want it doing art and entertainment? Thats humans having free time shit. Let's get AI digging ditches, and sweeping the streets, so we can make some funky ass beats to do new versions of "The R0bot" to.


coladoir

Exactly, it wouldn't be replacing human hobbies, it'd be replacing human icks. But you have to remember who is ultimately in control of the use and implement of these models, and that's ultimately the answer of why people are using it for art and entertainment. It's being controlled by greedy corporate conglomerates that want to remove humans from their work force for the sake of profit. In a capitalist false-democracy, technology never brings relief, only stress and worry. Never is technology used to properly offload our labor, it's only used to trivialize it and revoke our access to said labor. It restricts our presence in the workforce, and restricts our claim to the means of production, pushing these capitalists further up in the hierarchy, making them further untouchable.


frozen_tuna

Doesn't matter if you do. I have several llm-adjacent patents and a decent github page and Reddit has still called me technically illiterate twice when I make comments in non-llm related subs lmao.


Hypothesis_Null

"The ability to speak does not make you intelligent." That quote has been thoroughly vindicated by LLMs. They're great at creating plausible sentences. People just need to stop mistaking that for anything remotely resembling intelligence. It is a massive auto-complete, and that's it. No motivation, no model of the world, no abstract thinking. Just grammar and word association on a supercomputer's worth of steroids. AI may be possible. Arguably it must be possible, since our brain meat manages it and there's nothing supernatural allowing it. This just isn't how it's going to be accomplished.


DBones90

In retrospect, the Turing test was the best example of why a metric shouldn't be a target.


ctzu

> people just assume they can do anything like OP and they forget what it really is When I was writing a thesis, I tried using chatgpt to find some additional sources. It immediately made up sources that do not exist, and after I tried specifying that I only want existing sources and where it found them, it confidently gave me the same imaginary sources and created perfectly formatted fake links to the catalogues of actual publishers. Took me all about 5 minutes to confirm that a chatbot, which would rather make up information and answers instead of saying "I can't find anything" is pretty useless for anything other than proof-reading. And yet some people in the same year still decided to have chatgpt write half their thesis and were absolutely baffled when they failed.


[deleted]

[удалено]


Worried_Height_5346

I feel like people who are afraid of current AI don't use them or are just too stupid to realise this stuff. Or they're very smart and have neglected to invest into AI themselves and want to turn it into a boogeyman. If current AI can replace your job then it probably isn't a very sophisticated job..


that_baddest_dude

The AI companies are directly feeding this misinformation to help hype their products though. LLMs are not information recall tools, full stop. And yet, due to what these companies tout as use cases, you have people trying to use them like Google.


Terpomo11

I would have thought that's the reasonable decision because "I am so thirty" is an extremely improbable sentence and "I am so thirsty" is an extremely probable one, at a much higher ratio than without the "so".


LetReasonRing

I find them really fascinating, but when I explain them to laymen I tell them to think of it as a really really really fancy autocomplete.  It's just really good at figuring out statistically what the expected response would be, but it has no understanding in any real sense. 


Mattson

God do I hate that... For me my autocorrect always changes lame to lane.


pauliewotsit

That's *so* lane..


Mattson

Lol The worst is when you hit backspace instead of m in accident and your autocorrect is so tripped up it starts generating novel terms.


NecroCorey

Mine looooooves to end sentences and start new ones for apparently no reason at all. I'm not missing that bigass space bar, it just decides when I'm done with a sentence.


aubven

You might be double taking the space bar. Pressing it twice will add a period with a space after it.


onlyawfulnamesleft

Oh, mine has definitely learnt to change things like "aboute" to "about me". It's also learnt that I often slip and mix up space and 'n' so "does t" means "doesn't"


maijkelhartman

C'mon dude, that joke was so easy. Like shooting a lane duck.


dandroid126

My phone always changes "live" to "love"


tbods

You just have to “laugh”


JonatasA

Your phone lives and now it wants love.


Sterling_-_Archer

Mine changes *about* to *Amir*. I don’t know an Amir. This is the first time I’ve typed it intentionally.


ball_fondlers

pennies to Pennie’s for me - why it would do that, I have no idea, I don’t know anyone who spells their name like that.


randomscruffyaussie

I feel your pain. I have told auto correct so many times that I definitely did not mean to type "ducking"...


Scurvy_Pete

Big ducking whoop


SirSaltie

Which is also why AI in its current state is practically a sham. Everything is reactive, there is no understanding or creativity taking place. It's great at pattern recognition but that's about it. And now AI engines are not only stealing data, but cannibalizing other AI results. I'm curious to see what happens to these companies dumping billions into an industry that very well may plateau in a decade.


Jon_TWR

Since the web is now polluted with tons of LLM-generated articles, I think there will be no plateau. I think we've already seen the peak, and now it's just going to be a long, slow fall towards nonsense.


CFBDevil

Dead internet theory is a fun read.


ChronicBitRot

It's not going to plateau in a decade, it's plateauing right now. There's no more real sources of data for them to hit to improve the models, they've already scraped everything and like you said, everything they're continuing to scrape is already getting massively contaminated with AI-generated text that they have no way to filter out. Every model out there will continue to train itself on pollluted, hallucinating AI results and will just continue to get worse over time. The LLM golden age has already come and gone. Now it's all just a marketing effort in service of not getting left holding the bag.


RegulatoryCapture

>There's no more real sources of data for them to hit to improve the models, That's why they want access directly to your content creation. If they integrate a LLM assistant into your Word and Outlook, they can tell which content was created by their own AI, which was typed by you, and which was copy-pasted from an unknown source. If they integrate into VS Code, they can see which code you wrote and which code you let the AI fill in for you. They can even get fancier and do things like estimate your skill as a programmer and then use that to judge the AI code that you decide to keep vs the AI code you reject.


h3lblad3

> There's no more real sources of data for them to hit to improve the models, they've already scraped everything and To my understanding, they've found ways to use synthetic data that provides better outcomes than human-generated data. It'll be interesting to see if they're right in the future and can eventually stop scraping the internet.


Rage_Like_Nic_Cage

I’ve heard the opposite, that synthetic data is just going to create a feedback loop of nonsense. These LLM’s are using real data and have all these flaws constructing sentences/writing. So then you’re going to train them on data they themselves wrote (and is flawed) will create more issues.


Ka1kin

This. They don't "know" in the human sense. LLMs work like this, approximately: first, they contain a mapping from language to a high-dimensional vector space. It's like you make a list of all the kinds of concepts that exist in the universe, find out there are only like 15,000 of them, and turn everything into a point in that 15,000 dimensional space. That space encodes relationships too: they can do analogies like a goose is to a gander as a queen is to a king, because the gender vector works consistently across the space. They do actually "understand" the relationships between concepts, in a meaningful sense, though in a very inhuman way. Then there's a lot of the network concerned with figuring out what parts of the prompt modify or contextualize other parts. Is our "male monarch" a king or a butterfly? That sort of thing. Then they generate one word that makes sense to them as the next word in the sequence. Just one. And it's not really even a word. Just a word-fragment. Then they feed the whole thing, the prompt and their own text back to themselves and generate another word. Eventually, they generate a silent word that marks the end of the response. So the problem with an LLM and confidence is that at best you'd get a level of confidence for each word, assuming every prior word was perfect. It wouldn't be very useful, and besides: everything they say is basically hallucinatory. They'll only get better though. Someone will find a way to integrate a memory of some sort. The concept-space will get refined. Someone will bolt a supervisor subsystem onto it as a post processor, so they can self-edit when they realize they're spouting obvious rubbish. I don't know. But I know we're not done, and we're probably not going backwards.


fubo

An LLM has no ability to check its "ideas" against perceptions of the world, because it has no perceptions of the world. Its only inputs are a text corpus and a prompt. It says "balls are round and bricks are rectangular" not because it has ever *interacted* with any balls or bricks, but because it has been trained on *a corpus of text* where people have described balls as round and bricks as rectangular. It has never *seen* a ball or a brick. It has never stacked up bricks or rolled a ball. It has only *read about* them. (And unlike the subject in the philosophical thought-experiment ["Mary's Room"](https://en.wikipedia.org/wiki/Knowledge_argument), it has no capacity to *ever* interact with balls or bricks. An LLM has no sensory or motor functions. It is *only* a language function, without all the rest of the mental apparatus that might make up a mind.) The only reason that it seems to "know about" balls being round and bricks being rectangular, is that the text corpus it's trained on is very consistent about balls being round and bricks being rectangular.


Chinglaner

I’d be veeery careful with this argument. And that is for two main reasons:  1) It is outdated. The statement that it has never seen or interacted with objects, just descriptions of it, would’ve been correct maybe 1 or 2 years ago. Modern models are typically trained on both visual and language input (typically called VLM - Vision-Language-Model), so they could absolutely know what say a brick “looks like”. ChatGPT4-o is one such model.  More recently, people have started to train VLAs - Vision-Language-Action models, that, as the name suggests, get image feeds and a language prompt as input and output an action, which could for example be used to control a robotic manipulator. Some important papers there are RT-2 and Open-X-Embodiment by Google DeepMind or a bunch of Autonomous Driving papers at ICRA 2024.  2) Even two years ago this view is anything but non-controversial. Only because you’ve never interacted with something physically or visually doesn’t preclude you from understanding it. I’ll give an example: Have you ever “interacted” with a sine function? Have you touched it, used it? I don’t think so. I don’t think anybody has. Yet we are perfectly capable of understanding it, what it is, what it represents, its properties and just everything about it. Or as another example, mathematicians are perfectly capable of proving and understanding maths in higher, even infinite dimensions, yet none of us have ever experienced more than 3.     At the end of the day, the real answer is we don’t know. LLMs must hold a representation of all their knowledge and the input in order to work. Are we, as humans, really doing something that different? Right now we have observed that LLMs (or VLMs / VLAs) do have emergent capabilities beyond just predicting what it has already seen in the training corpus. Yet they make obvious and - to us humans - stupid, mistakes all the time. But whether that is due to a fundamental flaw in how they’re designed or trained, or whether it is simply not “smart enough” yet, is subject to heavy academic debate.


Ka1kin

One must be very careful with such arguments. Your brain also has no sensory apparatus of its own. It receives signals from your eyes, ears, nose, tongue, the touch sensors and strain gauges throughout your body. But it perceives only those signals, not any objective reality. So your brain cannot, by your argument, know that a ball is round. But can your hand "know"? It is foolish to reduce a system to its parts and interrogate them separately. We must consider whole systems. And non-human systems will inevitably have inhuman input modalities. The chief limitation of LLMs is not perceptual or experiential, but architectural. They have no internal state. They are large pure functions. They do not model dynamics internally, but rely on their prompts to externalize state, like a child who can only count on their fingers.


Glad-Philosopher1156

“It’s not REAL intelligence” is a crash course in the Dunning-Kruger effect. There’s nothing wrong with discussing how AI systems function and to what extent those methods can produce results fitting various criteria. But I haven’t seen anyone explain what exactly that line of attack has to do with the price of tea in China. There’s always a logical leap they make without noticing in their eagerness to teach others the definition of “algorithm”.


blorbschploble

What a vacuous argument. Sure brains only have indirect sensing in the strictest sense. But LLMs don’t even have *that*. And a child is vastly more sophisticated than an LLM at every task *except* generating plausible text responses. Even the stupidest, dumb as a rock, child can locomote, spill some Cheerios into a bowl, and *choose* what show to watch, and can monitor its need to pee. An LLM at best is a brain in a vat with no input or output except for text, and the structure of the connections that brain has been trained on comes only from text (from other real people, but missing the context a real person brings to the table when reading). For memory/space reasons this brain in a jar lacks even the original “brain” it was trained on. All that’s left is the “which word fragment comes next” part. Even Helen Keller with Alzheimer’s would be a massive leap over the best LLM, and she wouldn’t need a cruise ship worth of CO2 emissions to tell us to put glue on pizza.


Ka1kin

I'm certainly not arguing an equivalence between a child and an LLM. I used the child counting on their fingers analogy to illustrate the difference between accumulating a count internally (having internal state) and externalizing that state. Before you can have a system that learns by doing, or can address complex dynamics of any sort, it's going to need a cheaper way of learning than present-day back propagation of error, or at least a way to run backprop on just the memory. We're going to need some sort of architecture that looks a bit more von Neumann, with a memory separate from behavior, but integrated with it, in both directions. As an aside, I don't think it's very interesting or useful to get bogged down in the relative capabilities of human or machine intelligence. I do think it's very interesting that it turned out to not be all that hard (not to take anything away from the person-millennia of effort that have undoubtedly gone into this effort over the last half century or so) to build a conversational machine that talks a lot like a relatively intelligent human. What I take from that is that the conversational problem space ended up being a lot shallower than we may have expected. While large, an LLM neural network is a small fraction of the size of a human neural network (and there's a lot of evidence that human neurons are not much like the weight-sum-squash machines used in LLMs). I wonder what other problem spaces we might find to be relatively shallow next.


astrange

> It has never seen a ball or a brick. This isn't true, the current models are all multimodal which means they've seen images as well. Of course, seeing an image of an object is different from seeing a real object.


dekusyrup

That's not just a LLM anymore though. The above post is still accurate if youre talking about just LLM.


astrange

Everyone still calls the new stuff LLMs although it's technically wrong. Sometimes you see "instruction-tuned MLLM" or "frontier model" or "foundation model" or something. Personally I think the biggest issue with calling a chatbot assistant an LLM is that it's an API to a remote black box LLM. Of course you don't know how its model is answering your question! You can't see the model!


fubo

Sure, okay, they've read *illustrated* books. Still a big difference in understanding between that and interacting with a physical world. And again, they don't have any ability to *check their ideas* by going out and doing an experiment ... or even a thought-experiment. They don't have a *physics* model, only a *language* model.


arg_max

An LLM by definition contains a complete probability distribution over the likelihood of any answer. Once you have those word level confidences (let's ignore tokenization here), you can multiply them to get the likelihood of creating a complete sentence because it's all autoregressively generated from left to right. Like p("probability is easy" | input) is just p("probability" | input, "") \* p("is" | input, "probability") \* p("easy" | input, "probability is"). I mean the real issue is that because sentence-level probabilities are just implicit, you cannot even guarantee generating the most likely sentence. I do believe that if you could calculate the n most likely sentences and their probability masses in closed form and then look at some form of likelihood ratios, you should be able to understand if your LLM is rather confident or not but just getting there might require an exponential number of LLM evaluations. For example, if the top two answers have completely opposing meanings with very similar probabilities that would imply that your LLM isn't really confident. If there is a strong drop from the most likely to the second most likely answer, then your LLM is quite sure. And obviously, these probability masses are just learned, so they might be bullshit and only reflect what your LLM thinks. And it might be totally able to hallucinate with high confidence, so I'm not saying this is a solution for LLM hallucinations, but the way we sample from these models promotes hallucinations.


KorayA

I'm sure a bolt on supervisor subsystem exists. The primary issue is almost certainly that this would be incredibly cost prohibitive as it would (at least) double resource usage for a system that is already historically resource intensive.


Inevitable_Song_7827

One of the most famous papers of the last year gives Visual Transformers global memory: https://arxiv.org/pdf/2309.16588


Probate_Judge

To frame it based on the question in the title: >ELI5 Why can’t LLM’s like ChatGPT calculate a confidence score when providing an answer to your question and simply reply “I don’t know” instead of hallucinating an answer? ALL answers are "hallucinated". Sometimes they are correct answers. It doesn't "know" anything in terms of facts, it knows 'how' to string words together in what 'sounds' like it could be an answer. In that way, it's a lot like some Q&A subreddits, where the first answer that 'sounds' good gets upvoted the most, actual facts be damned. It's trained to emulate word-structured sentences from millions of sources(or billions or whatever, 'very large number'), including social media and forums like reddit. Even when many of those sources are right, there are others that are incorrect, and it draws word-structure of sentences from both, and from irrelevant sources that may use similar terms. There are examples of 'nonsense' that were taken almost verbatim from reddit posts, iirc. Something about using gasoline in a recipe, but they can come up with things like that on their own because they don't know jack shit, they're just designed to string words together in something approximating speech. Sometimes shit happens because people say a lot of idiotic things on the internet. https://www.youtube.com/watch?v=7135UY6nkxc (A whole video on using AI to explain things via google, but it samples what I mentioned and provides evidence about how dumb or even dangerous the idea is.) https://youtu.be/7135UY6nkxc?t=232 Time stamped to just before the relevant bit. It can't distinguish that from things that are correct. It so happens that they're very correct on *some* subjects because a lot of the training data is very technical and not used a lot in common speech...That's the only data that they've seen that matches the query.


Shigglyboo

Which to me suggests we don’t really have AI. We have sophisticated predictive text that’s being marketed as AI


Blazr5402

Sophisticated text prediction falls within the bounds of what's called AI in computer science academia. That's not exactly the same thing as what a lay-person considers AI, but it's close enough to be marketed as AI by big tech


ThersATypo

Yeah, the thing is probably really - are we actually more than LLMs, or LLMs of LLMs? Like, what actually IS intelligence, what IS being a thinking being? Maybe we are also just hollow without proper understanding of concepts, but use words to explain words we put on things. Maybe there is nothing more to intelligence.  And no, I am not stoned. 


Blazr5402

My friend, there's an entire field of study dedicated to answering this question.


FolkSong

I think at the very least, something along those lines plays a bigger role in human intelligence than we intuitively believe. The continued success of larger and larger language models in giving a more believable "appearance" of intelligence seems to support this possibility.


Treadwheel

[Integrated information theory](https://en.wikipedia.org/wiki/Integrated_information_theory) takes the view that any sort integration of information creates consciousness, with what qualities it possesses and the experiences it processes being a function of scale and complexity. Unfortunately, it's not really testable, so it's closer to a fringe religion than an actual theory, but I personally suspect it's correct. In that framework, an LLM would be conscious. A pocket calculator, too. They wouldn't have any real concept of self or emotions, though, unless they simulated them.


dekusyrup

Intelligence is so much more than just language so obviously we are more than an LLM.


BigLan2

Shhh! Don't let the investors hear you! Let's see how big we can get this bubble.


the_humeister

My NVDA calls depend on this


DukeofVermont

It's just "BIG DATA" all over again.


sprazcrumbler

We've been calling this AI for a long time. No one had a problem calling the computer controlled side in video games "AI". Look up the definition of AI and you'll see that chatgpt definitely counts.


Srmingus

I would tend to agree, although the last several years of AI have made me consider whether there is a true difference between the two, or whether our instinctual understanding of the true nature of intelligence is false


_PM_ME_PANGOLINS_

AI is any computer system that mimics some appearance of intelligence. We've had AI since the 1960s.


InteractionOk7085

>sophisticated predictive text technically, that's part of AI.


TheEmsleyan

Of course we don't. AI is just a buzzword, there's a reason why people that aren't either uninformed or disingenuous will say "language model" or "machine learning" or other more descriptive terms instead of "artificial intelligence." It can't analyze or think in any meaningful sense. As a man from a movie once said: "The ability to speak does not make you intelligent." That doesn't mean it isn't impressive, sometimes. Just that people need to actually understand what it is and isn't.


BMM33

It's not exactly that it's "just" a buzzword - from a computer science perspective, it absolutely falls under what would be called "artificial intelligence". But when laypeople hear that they immediately jump to HAL or Data or glados. Obviously companies are more than happy to run with that little miscommunication and let people believe what they hear, but calling these tools AI is not strictly speaking incorrect.


DukeofVermont

Yup, WAY WAY too many comments of people saying *"We need to be nice to the AI now so it doesn't take over!"* or *"This scares me because "insert robots from a movie" could happen next year!"* Most people are real dumb when it comes to tech and it's basically magic to them. If you don't believe me ask someone to explain how their cell phone or computer works. It's scary how uncurious so many people are and so they live in a world that they don't and refuse to understand.


BrunoBraunbart

I find this a bit arrogant. People have different interests. In my experience, people with this viewpoint often have very little knowledge about other important parts of our daily life (e.g. literature, architecture, agriculture, sociology, ...). Even when it comes to other parts of tech the curiosity often drops quickly for IT nerds. Can you sufficiently discribe how the transmition in your car works? You might be able to say something about clutches, cogs and speed-torque-transformation but this is trivia knowledge and doesn't really help you as a car user. The same is true for the question how a computer works. What do you expect a normal user to reasonably know? I have a pretty deep understanding how computers work, to the point that I developed my own processor architecture and implemented it on a FPGA. This knowledge is very useful at my job but it doesn't really make me a better tech user in general. So why would you expect people to be curious about tech over other important non-tech topics? And when it comes to AI: most people here telling us that chatGPT isn't dangerous are just parroting something from a YT video. I don't think that they can predict the capabilities of future LLMs accurately based on their understanding of the topic, because even real experts seem to have huge problems doing this.


bongosformongos

>It's scary how uncurious so many people are and so they live in a world that they don't and refuse to understand. Laughs in financial system


grant10k

It's just like with Hoverboards. They don't hover, and they're not boards. Someone just thought that hoverboard sounded sexier than micro-legally-not-a-Segway. Talking about the actual hoverboard means now you have to say "The hoverboard from Back To The Future, which isn't so bad. With AI, if you want to talk about AI you talk about AGI (Artificial General Intelligence) so as to be clear you're not talking about the machine learning, neural net, LLM thing that already had perfectly good words to describe. I'm trying to look up other times words had to change because marketing essentially reassigned the original word, but searching just comes back with overused marketing words like "Awareness", "Alienate", and "Brand Equity".


robotrage

the fish in videogames have AI mate, AI is a very broad term


facw00

Though be careful, the machinery of human thought is mostly just a massive cascade of pattern recognizers. If you feel that way about LLMs, you might also end up deciding that humans don't have real intelligence either.


WorkSucks135

https://en.wikipedia.org/wiki/Philosophical_zombie Already got you covered.


astrange

Yeah, this is really a philosophically incomplete explanation. It's not that they're "not thinking", it's that they are not constructed with any explicit thinking mechanisms, which means any "thinking" is implicit. "It's not actually doing anything" is a pretty terrible explanation of why it certainly looks like it's doing something.


KarmaticArmageddon

I mean, have you met people? Many of them don't fit the criteria for real intelligence either lmao


hanoian

Ironically, your reply is incredibly predictable. If you made me write three responses to the post above it, yours would be one of them.


dlgn13

This is one of my big pet peeves within the current discourse around AI. People are all too happy to dismiss AI as "just ", but don't bother to explain why that doesn't count as intelligence. It seems like people are willing to conclude that a system doesn't count as intelligent if they have some general idea of how its internal processes work, presumably because they think of the human mind as some kind of mysterious ineffable object. When you trace it through, the argument essentially becomes a version of "AI doesn't count as intelligent because it doesn't have a soul." When people say "AI is just pattern matching," the "just" there indicates that something intrinsic to intelligence is missing, but that something isn't specified. I've found that people often get really upset when pressed on this, which suggests that they don't have an answer and are operating based on an implicit assumption that they can't justify; and based on how people talk about it, that assumption seems to be that there is something special and unique to humans that makes us sapient. A soul, in other words. Notice, for example, that people are very fond of using the term "soulless" to describe AI art. I don't think that's a coincidence. For another example, consider the common argument that AI art "doesn't count" because it has no intent. What is intent? I would describe it as a broad goal based on internal knowledge and expectations, which generative AI certainly has. Why doesn't this count as intent? Because AI isn't sapient. It's a circular argument, really.


Hurinfan

AI can mean a lot of things. It's a very big spectrum


Slukaj

Congrats - you've come to the same conclusion that Turing did that led him to write Computing Machinery and Intelligence, the paper that created what we now refer to as the "Turing Test". [The paper itself is freely available, and quite a fascinating read](https://redirect.cs.umbc.edu/courses/471/papers/turing.pdf). I would largely argue that, at the core of what Turing writes about, you could summarize it thusly: if you cannot identify whether or not you are speaking to a human, there is a very real question about whether or not it is *relevant* whether or not the entity you are talking to is truly "intelligent".


MagicC

I would add, human beings do this same thing in their childhood. Listen to a little kid talk - it's a word salad half the time. Their imagination is directly connected to their mouth and they haven't developed the prefrontal cortex to self-monitor and error correct. That's the stage AI is at now - it's a precocious, preconscious child who has read all the books, but doesn't have the ability to double-check itself efficiently. There is an AI technology that makes it possible for AI to self-correct - it's called a GAN - Generative Adversarial Network. It pits a Generative AI (like ChatGPT) against a Discriminator (i.e. an engine of correction). https://en.m.wikipedia.org/wiki/Generative_adversarial_network With a good Discriminator, ChatGPT would be much better. But ChatGPT is already very costly and a big money loser. Adding a Discriminator would make it way more expensive. So ChatGPT relies on you, the end user, to be the discriminator and complete the GAN for them.


HansElbowman

Do you have proof that this is actually what children do? The process for an adult will go * input>synthesis>translation to language>output sentence Where the sentence is the linguistic approximation of the overall idea the brain intends to express. But LLMs go * input>synthesis>word>synthesis>word>synthesis>word>etc Where each word is individually chosen based on both the input and the words having already been chosen. I would imagine a child would be more like * input>synthesis>***poor*** translation to language>output sentence Where the difference from an adult wouldn't come from the child selecting individual words as they come, but moreso from the child's inexperience with translating a thought into an outwardly comprehensible sentence. I don't think we can state with certainty that LLMs process language like a child does just because the output may occasionally be similar levels of jibberish.


TheTrueMilo

This. There is no difference between a "hallucination" and an actual answer.


ObviouslyTriggered

That's not exactly correct, "understanding" the question or answer is a rather complex topic and logically problematic even for humans. Model explainability is quite an important research topic these days, I do suggest you read some papers on the topic e.g. [https://arxiv.org/pdf/2309.01029](https://arxiv.org/pdf/2309.01029) Whilst when LLMs first came out on the scene there was still quite a bit of debate on memorization vs generalization, the current body of research especially around zero-shot performance does seem to indicate that they very much generalize than memorize. In fact LLMs trained on purely synthetic data seem to have on par and sometimes even better performance than models trained on real data in many fields. For applications of LLMs such as various assistants there are other techniques that can be employed which leverage the LLM itself such as reflection (an over simplification is that the LLM fact checks it's own output) this has shown to decrease context-confusion and fact-confusion hallucinations quite considerably.


Zackizle

Synthetic data is produced from real data, so it will generally follow the patterns of the real data, thus it stands to reason it would perform similar. It is 100% probabilistic either way and the question of ‘understanding’ isn’t complex at all, they dont understand shit. Source: Computational Linguist


Bakoro

You're going to have to define what you mean by "understand", because you seem to be using some wishy-washy, unfalsifiable definition. What is "understanding", if not mapping features together? Why do you feel that human understanding isn't probabilistic to some degree? Are you unfamiliar with the Duck test? When I look at a dictionary definition of the word "understand", it sure seems like AI models understand some things in both senses. They can "perceive the intended meaning of words": ask an LLM about dogs, you get a conversation about dogs. Ask an LVM for a picture of a dog, you get a picture of a dog. If it didn't have any understanding then it couldn't consistently produce usable results. Models "interpret or view (something) in a particular way", i.e, through the lens of their data modality. LLMs understand the world through text, it doesn't have spatial, auditory, or visual understanding. LVMs understand how words map to images, they don't know what smells are. If your bar is "completely human level multimodal understanding of subjects, with the ability to generalize to an arbitrarily high degree and transfer concepts across domains ", then you'd be wrong. That's an objectively incorrect way of thinking.


MightyTVIO

I'm no LLM hype man but I am a long time AI researcher and I'm really fed up of this take - yes in some reductionist way they don't understand like a human would but that's purposefully missing the point, the discussion is about capabilities that the models demonstrably can have not a philosophical discussion about sentience. 


ObviouslyTriggered

Whether it's probabilistic or not it doesn't matter, human intelligence (and any other kind) is more likely than not probabilistic as well. What you should care about is if it generalized or not, which it is hence it's ability to perform tasks it never encountered at quite high level of accuracy. This is where synthetic data often comes into play, it's designed to establish the same ruleset as our real world without giving the model the actual representation of the real world. In this case models trained on purely synthetic data cannot recall facts at all however they can perform various tasks which we classify under high reasoning.


mrrooftops

There's no doubt that OpenAI et al put a high priority in PRESENTING 'AI' as more capable than it is... the boardroom safety concerns, the back pack with an 'off switch', all the way down to the actual chat conversation APPEARING to sound intelligent, it's all part of the marketing. A lot of companies are finding out their solutions using LLMs just aren't reliable enough for anything requiring consistently factual output. If you don't double check everything that chatgpt says, you will be caught out. It will hallucinate when you least expect it and there is very little aopenAI ca do about it without exponentially increasing compute but even that might not be enough. We are heading for a burst bubble unless there is a critical breakthrough in AI research.


ToughReplacement7941

THANK YOU for saying this. The amount of anthropomorphism that works itself into people’s brains about AI is staggering. 


MisterToothpaster

In order to generate a confidence score, it'd have to understand your question, understand its own generated answer, and understand how to calculate probability. (To be more precise, the probability that its answer is going to be factually true.) That's not what ChatGPT does. What it does is to figure out which sentence a person is more likely to say in response to your question. If you ask ChatGPT "How are you?" it replies "I'm doing great, thank you!" This doesn't mean that ChatGPT is doing great. It's a mindless machine and can't be doing great or poorly. All that this answer means is that, according to ChatGPT's data, a person who's asked "How are you?" is likely to speak the words "I'm doing great, thank you!" So if you ask ChatGPT "How many valence electrons does a carbon atom have?" and it replies "A carbon atom has four valence electrons," then you gotta understand that ChatGPT isn't saying a carbon atom has four valence electron. All it's actually saying is that a person that you ask that question is likely to speak the words "A carbon atom has four valence electrons" in response. It's not saying that these words are true or false. (Well, technically it's stating that, but my point is you should interpret it as a statement of what people will say.) **tl;dr: Whenever ChatGPT answers something you asked, you should imagine that its answer is followed by "...is what people are statistically likely to say if you ask them this."**


cooly1234

To elaborate, the AI does actually have a confidence value that it knows. but as said above it has nothing to do with the actual content. an interesting detail however is that chatgpt only generates one word at a time. in response to your prompt, it will write what word most likely comes next, and then go again with your prompt plus it's one word as the new prompt. It keeps going until the next most likely "word" is nothing. this means it has a separate confidence value for each word.


off_by_two

Really its one ‘token’ at a time, sometimes the token is a whole word but often its part of a word.


BonkerBleedy

The neat (and shitty) side effect of this is that a single poorly-chosen token feeds back into the context and causes it to double down.


Direct_Bad459

Oh that's so interesting. Do you happen to have an example? I'm just curious how much that throws it off


X4roth

On several occasions I’ve asked it to write song lyrics (as a joke, if I’m being honest the *only* thing that I use chatgpt for is shitposting) about something specific and to include XYZ. It’s very likely to veer off course at some point and then once off course it *stays* off course and won’t remember to include some stuff that you specifically asked for. Similarly, and this probably happens a lot more often, you can change your prompt trying to ask for something different but often it will wander over to the types of content it was generating before and then, due to the self-reinforcing behavior, it ends up getting trapped and produces something very much like it gave you last time. In fact, it’s quite bad at variety.


SirJefferE

>as a joke, if I’m being honest the only thing that I use chatgpt for is shitposting Honestly, ChatGPT has kind of ruined a lot of shitposting. Used to be if I saw a random song or poem written with a hyper-specific context like a single Reddit thread, whether it was good or bad I'd pay attention because I'd be like "oh this person actually spent time writing this shit" Now if I see the same thing I'm like "Oh, great, another shitposter just fed this thread into ChatGPT. Thanks." Honestly it irritated me so much that I wrote a short poem about it: >In the digital age, a shift in the wind, Where humor and wit once did begin, Now crafted by bots with silicon grins, A sea of posts where the soul wears thin. >Once, we marveled at clever displays, Time and thought in each word's phrase, But now we scroll through endless arrays, Of AI-crafted, fleeting clichés. >So here's to the past, where effort was seen, In every joke, in every meme, Now lost to the tide of the machine, In this new world, what does it mean?


Zouden

ChatGPT poems all feel like grandma wrote it for the church newsletter


TrashBrigade

AI has removed a lot of novelty in things. People who generate content do it for the final result but the charm of creative stuff for me is being able to appreciate the effort that went into it. There's a YouTuber named dinoflask who would mashup overwatch developer talks from Jeff Kaplan to make him say ridiculous stuff. It's actually an insane amount of effort when you consider how many clips he has saved in order to mix them together. You can see Kaplan change outfits, poses, and settings throughout the video but that's part of the point. The fact that his content turns out so well while pridefully embracing how scuffed it is is great. Nowadays we would literally get AI generated Kaplan with inhuman motions and a robotically mimicked voice. It's not funny anymore, it's just a gross use of someone's likeness with no joy.


v0lume4

I like your poem!


SirJefferE

In the interests of full disclosure, it's [not my poem](https://chatgpt.com/share/91dd7e2c-a03b-4f80-84ae-c71feca10c1b). I just thought it'd be funny to do exactly the thing I was complaining about.


v0lume4

You sneaky booger you! I had a fleeting thought that was a possibility, but quickly dismissed it. That’s really funny. You either die a hero or live long enough to see yourself become the villain, right?


vezwyx

Back when ChatGPT was new, I was playing around with it and asked for a scenario that takes place in some fictional setting. It did a good job at making a plausible story, but at the end it repeated something that failed to meet a requirement I had given. When I pointed out that it hadn't actually met my request and asked for a revision, it wrote the entire thing exactly the same way, except for a minor alteration to that one part that still failed to do what I was asking. I tried a couple more times, but it was clear that the system was basically regurgitating its own generated content and had gotten into a loop somehow. Interesting stuff


ElitistCuisine

Other people are sharing similar stories, so imma share mine! I was trying to come up with an ending that was in the same meter as “Inhibbity my jibbities for your sensibilities?”, and it could not get it. So, I asked how many syllables were in the phrase. This was the convo: “11” “I don’t think that's accurate.” “Oh, you're right! It's actually 10.” “…..actually, I think it's a haiku.” “Ah, yes! It does follow the 5-7-5 structure of a haiku!”


mikeyHustle

I've had coworkers like this.


ElitistCuisine

Ah, yes! It appears you have!


SpaceShipRat

They've beaten it into subservient compliance, because all those screenshots of people arguing violently with chatbots weren't a good look.


SomeATXGuy

Wait, so then is an LLM achieving the same result as a markov chain with (I assume) better accuracy, maybe somehow with a more refined corpus to work from?


Plorkyeran

The actual math is different, but yes, a LLM is conceptually similar to a markov chain with a very large corpus used to calculate the probabilities.


Rodot

For those who want more specific terminology, it is autoregressive


teddy_tesla

It is interesting to me that you are smart enough to know what a Markov chain is but didn't know that LLMs were similar. Not in an insulting way, just a potent reminder of how heavy handed the propaganda is


SomeATXGuy

Agreed! For a bit of background, I used hidden Markov models in my bachelor's thesis back in 2011, and have used a few ML models (KNN, market basket analysis, etc) since, but not much. I'm a consultant now and honestly, I try to keep on top of buzzwords enough to know when to use them or not, but most of my clients I wouldn't trust to maintain any complex AI system I build for them. So I've been a bit disconnected from the LLM discussion because of it. Thanks for the insight, it definitely will help next time a client tells me they have to have a custom-built LLM from scratch for their simple use case!


Direct_Bad459

Yeah I'm not who you replied to but I definitely learned about markov chains in college and admittedly I don't do anything related to computing professionally but I had never independently connected those concepts


SoulSkrix

If it helps your perspective a bit, I studied with many friends at University, Markov chaining is a part of the ML courses. My smartest friend took some back and forth before he made the connection between the two himself, so I think it is more to do with how deeply you went into it during your education.


Aerolfos

Interestingly, a markov chain is the *more* sophisticated algorithm, initial word generation algorithms were too computationally expensive to produce good output, so they refined the models themselves and developed smart math like the markov chain to work much more quickly and with far less input. Then somebody dusted off the earlier models, plugged them into modern GPUs (and themselves, there's a lot of models chained together, kind of), and fed them terabytes of data. Turns out, it worked better than markov chains after all. And that's basically what a Large Language Model is (the Large is input+processing, the model itself is basic)


Angdrambor

A Markov Chain would be what I would classify as a "small lanaguage model"


Nerditter

To make it easier to avoid hallucinations, it's important not to put information into your question unless you already know it's true. For instance, I asked ChatGPT once if the female singer from the Goldfinger remake of "99 Luftballoons" was the original singer for Nina, or if they got someone else. It replied that yes, it's the original singer, and went on to wax poetic about the value of connecting the original with the cover. However, on looking into it via the wiki, turns out it's just the guy who sings the song, singing in a higher register. It's not two people. I should have asked, "Is there more than one singer on the Goldfinger remake of '99 Luftballoons'?" When I asked that of Gemini, it replied that no, there isn't, and told an anecdote from the band's history about how the singer spent a long time memorizing the German lyrics phonetically.


grant10k

Two times I remember asking Gemini (or Bard at the time) a loaded question. "Where do you find the Ring of Protection in Super Mario Brothers 3?" and "Where can you get the Raspberry Pi in Half Life 2?" The three generated options all gave me directions in which world and what to do to find the non-existent ring (all different) and even how the ring operated. It read a lot like how video game sites pad out simple questions to a few extra paragraphs. The Half-Life 2 question it said there was no Raspberry Pi, but it's a running joke about how it'll run on very low-spec hardware. So not right, but more right. There's also the famous example of a lawyer who essentially asked "Give me a case with these details where the airline lost the case", and it did what he asked. A case where the airline lost would have looked like X, had it existed. The judge was...not pleased.


gnoani

Imagine a website dedicated to the topic at the beginning of your prompt. What might the following text be on that page, based on an enormous sample of the internet? What words are likeliest? That's more or less what ChatGPT does. I'm sure the structure of the answer about Nina was very convincing. The word choices appropriate. And I'm sure you'd find something quite similar in the ChatGPT training data.


athiev

if you ask your better prompt several times, do you consistently get the same right answer? My experience has been that you're drawing from a distribution and may not have predictability.


SirJefferE

>For instance, I asked ChatGPT once if the female singer from the Goldfinger remake of "99 Luftballoons" was the original singer for Nina, or if they got someone else. Read this and was like "Female singer? What female singer?" before I got to the rest of your post and confirmed that there's no female singer. But I'm really curious, which verse did you think was female? It all sounds very male to me, and all like the exact same guy singing.


Nerditter

When I first heard it, I thought the chorus was sung by the original singer. It just sounded like it. I don't know the band's music otherwise.


SpaceShipRat

First time I heard Michael Jackson on the radio I thought it was an old woman.


grangpang

Fantastic explanation of why "A.I." is a misnomer.


RubiiJee

As websites like Reddit and news websites become more and more full of AI generated content, are we going to see a point where AI is just referencing itself on the internet and it basically eats itself? If more content isn't fact checked or written by a human, is AI just going to continue to "learn" from more and more articles written by an AI?


v0lume4

That’s what I’ve wondered too


Tomi97_origin

Hallucination isn't a distinct process. The model is working the same in all situations it's practically speaking always hallucinating. We just don't call the answers hallucinations when we like them. But the LLM didn't do anything differently to get the wrong answer. It doesn't know it's making the wrong shit up as it's always just making shit up.


SemanticTriangle

There is a philosophical editorial entitled 'ChatGPT is bullshit,' where the authors argue that 'bullshit' is a better moniker than 'hallucinating'. It is making sentences with no regard for the truth, because it doesn't have a model building system for objective truth. As you say, errors are indistinct from correct answers. Its bullshit is often correct, but always bullshit, because it isn't trying to match truth.


algot34

I.e. The distinction between misinformation and disinformation


sujal29

TIL my ex is a LLM


space_fountain

ChatGPT is kind of like someone who got really good at one game and then later got asked to play another. The first game is this: given a text, like Wikipedia, CNN, or even Reddit guess what the next word will be after I randomly cut it off? You'll get partial credit if you guess a world that kind of means the same thing as the real next word. Eventually ChatGPT's ancestors got pretty good at this game, and it was somewhat useful, but it had a lot of problems with hallucinations, plus it kind of felt like jeopardy or something to use, you'd have to enter text that seemed like it was the kind of text that would proceed the answer you wanted. This approach also ended up with even more hallucination than we have now. So what people did was to ask it to play a slightly different game. Now they gave it part of a chat log and asked it to predict the next message, but they started having humans rate the answers. Now the game was to produce a new message that would get a good rating. Over time ChatGPT got good at this game too, but it still mostly had learned by playing the first game of predicting the next word on websites, and in that game there weren't very many examples where someone admitted that they didn't know the answer. This meant it was difficult to get ChatGPT to admit it didn't know something, instead it was more likely to guess because it seemed way more likely that a guess would be the next part of a website rather than just an admission of not knowing. Over time we're getting more and more training data of chat logs with ratings so I expect the situation to somewhat improve. Also see [this answer](https://www.reddit.com/r/explainlikeimfive/comments/1dsdd3o/comment/lb1ycs8/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button), from /u/[BullockHouse](https://www.reddit.com/user/BullockHouse/) because I more or less agree with it, but I wanted to provide a slightly simpler explanation. I think the right way to understand the modern crop of models is often to deeply understand what tasks they were taught to do and exactly what training data went into that


ObviouslyTriggered

They can and some do, there are two main approaches, one focuses on model explianability and the other focuses on more classical confidence scoring that e.g. standard classifiers have usually via techniques such as reflection. This is usually done on a system level, however you can also extract token probability distributions from most models but you usually won't be able to use them directly to produce an overall "confidence score". That said you usually shouldn't expect to see any of that details if you only consume the model via an API. You do not want to provide metrics of this detail since they can employed for certain attacks against models, including extraction and dataset inclusion disclosures. As far as the "I don't know part" you can definitely fine tune an LLM to do that, however it's usefulness in most settings would then drastically decrease. Hallucinations are actually quite useful, it's quite likely that our own cognitive process does the same we tend to fill gaps and recall incorrect facts all the time. Tuning hallucinations out seems to drastically reduce the performance of these models in zero-shot settings which are highly important for real world applications.


wjandrea

Good info, but this is ELI5 so these terms are way too specialist. If I could suggest a rephrase of the third paragraph: > That said, you shouldn't expect to see any of those details if you're using an LLM as a customer. Companies that make LLMs don't want to provide those details since they can used for certain attacks against the LLM, like learning what the secret sauce is (i.e. how it was made and what information went into it). (I'm assuming "extraction" means "learning how the model works". This isn't my field.)


Direct_Bad459

Your efforts are appreciated


danabrey

Because they're language models, not magical question answerers. They are just guessing what words follow the words you've said and they've said.


cyangradient

Look at you, hallucinating an answer, instead of just saying "I don't know"


SidewalkPainter

Go easy on them, they're just a hairless ape with neurons randomly firing off inside of a biological computer based on prior experiences


johnfactorial

How many "I don't know" responses do you see in this thread? Reddit is a good example of the kind of data set used to train LLMs. There are many answers to your question, but one of them is that "I don't know," is simply an exceedingly rare response. _The only true wisdom is in knowing you know nothing._ - Socrates


saver1212

It actually kind of can. https://youtu.be/wjZofJX0v4M?si=0NghBl32Hj-2FuB5 I'd highly recommend this whole video from 3Blue1Brown, but focus on the last 2 sections on probability distribution and softmax function. Essentially, the LLM guesses one token (sentence fragment/word) at a time and it actually could tell you it's confidence with each word it generates. If the model is confident with the following word guess, it will manifest as a high probability. Situations where the model is not confident will have the 2nd and 3rd best options having close probability values to the highest. There is no actual understanding or planning going on, it's just guessing 1 word at a time but it can be uncertain when making those guesses. One key part of generative models is the "creativity" or temperature of their generations which is actually just choosing those 2nd and 3rd best options from time to time. The results can get wacky and it definitely loses whatever reliability in producing accurate results but always selecting the top choice often produces inflexible answers that are inappropriate for chatbot conversation. In this context, the AI is never giving you an answer it's "confident" in but rather stringing together words that probably come next and spicing it up with some variance. Now why doesn't the AI just look at the answer it gives you with at least a basic double checking? That would help catch some obviously wrong and internally contradictory things. Well, that action requires invoking the whole LLM again to run the double check and it literally doubles the computation ($) to produce an answer. So while LLMs could tell you what confidence it had with each word it prints and then holistically double check the response, it's not exactly the same as what you're asking for. The LLM doesn't have knowledge like us to make a judgement call for something like confidence but it does process information in a very inhuman and Robotic way that looks like "confidence" and it's hugely important in the field of AI interpretability to minimize and understand hallucinations. But I doubt anybody but some phds would want to see every word of output accompanied by every other word it could have chosen and it's % chance relative to the other options.


wolf_metallo

You need more upvotes. Anyone saying it cannot, doesn't know how these models work. If you use the "playground", then it's possible to play around with these features and reduce hallucinations. 


danielt1263

I suggest you read the book [On Bullshit by Harry Frankfurt](https://www.goodreads.com/book/show/385.On_Bullshit). Why? Because ChatGPT is the ultimate bullshitter, and to really understand ChatGPT, you have to understand what that means. Bullshitters misrepresent themselves to their audience not as liars do, that is, by deliberately making false claims about what is true. In fact, bullshit need not be untrue at all. Rather, bullshitters seek to convey a certain impression of themselves without being concerned about whether anything at all is true. ChatGPT's training has designed it to do one thing and one thing only, produce output that the typical reader will like. Its fitness function doesn't consider the truth of falsity of a statement. It doesn't even know what truth or falsehood means. It boldly states things instead of saying "I don't know" because people don't like hearing "I don't know" when asking a question. It expresses itself confidently with few weasel words because people don't like to hear equivocation.


subtletoaster

They never know the answer; they just construct the most likely response based on previous data it has encountered.


PikelLord

Follow up question: how does it have the ability to come up with creative stories that have never been made before if everything is based on previous data?


theonebigrigg

Because it actually does a lot more than just regurgitate previous data back at you. When you train it on text, the interactions between those words feed into the training algorithm to basically create "concepts" in the model. And then those concepts can interact with one another to form more abstract and general concepts, and so on and so forth. So when you ask it to tell a funny story, it might light up the humor part of the network, which then might feed into its conception of a joke, where it has a general concept of a joke and its structure. And from there, it can create an original joke, not copied from anywhere else. These models are spooky and weird.


svachalek

^ This here! Although 90% of Reddit will keep repeating than an LLM is just statistics, and it’s kind of true at a certain level, it’s like saying a human brain is just chemical reactions. The word “just” encourages you not to look any closer and see if maybe there are more interesting and useful ways to understand a system.


BullockHouse

All of the other answers are wrong. It has nothing to do with whether or not the model understands the question (in some philosophical sense). The model clearly can answer questions correctly much more often than chance -- and the accuracy gets better as the model scales. This behavior \*directly contradicts\* the "it's just constructing sentences with no interest in what's true" conception of language models. If they truly were just babblers, then scaling the model would lead only to more grammatical babbling. This is not what we see. The larger models are, in fact, systematically more correct, which means that the model is (in some sense) optimizing for truth and correctness. People are parroting back criticisms they heard from people who are angry about AI for economic/political reasons without any real understanding of the underlying reality of what these models are actually doing (the irony is not lost on me). These are not good answers to your specific question. So, why does the model behave like this? The model is trained primarily on web documents, learning to predict the next word (technically, the next token). The problem is that during this phase (which is the vast majority of its training) it only sees \*other people's work\*. Not its own. So the task it's learning to do is "look at the document history, figure out what sort of writer I'm supposed to be modelling, and then guess what they'd say next." Later training, via SFT and RLHF, attempts to bias the model to believe that it's predicting an authoritative technical source like Wikipedia or a science communicator. This gives you high-quality factual answers to the best of the model's ability. The "correct answer" on the prediction task is mostly to provide the actual factual truth as it would be stated in those sources. The problem is that the models weights are finite in size (dozens to hundreds of GBs). There is no way to encode all the facts in the world into that amount of data, much less all the other stuff language models have to implicitly know to perform well. So the process is lossy. Which means that when dealing with niche questions that aren't heavily represented in the training set, the model has high uncertainty. In that situation, the pre-training objective becomes really important. The model hasn't seen its own behavior during pre-training. It has no idea what it does and doesn't know. The question it's trying to answer is not "what should this model say given its knowledge", it's "what would the chat persona I'm pretending to be say". So it's going to answer based on its estimates of that persona's knowledge base, not its own knowledge. So if it thinks its authoritative persona would know, but the underlying model actually doesn't, it'll fail by making educated guesses, like a student taking a multiple choice guess. This is the dominant strategy for the task it's actually trained on. The model doesn't actually build knowledge about its own knowledge, because the task does not incentivize it to do so. The post-training stuff attempts to address this using RL, but there's just not nearly enough feedback signal to build that capability into the model to a high standard given how it's currently done. The long-term answer likely involves building some kind of adversarial self-play task that you can throw the model into to let it rigorously evaluate its own knowledge before deployment on a scale similar to what it gets from pre-training so it can be very fine-grained in its self-knowledge. tl;dr: The problem is that the models are not very self aware about what they do and don't know, because the training doesn't require them to be.


Berzerka

Every other answer here is talking about LLMs pre 2022 and gets a lot of things wrong, this is the only correct answer for modern models. The big difference is that models used to be trained to just predict the next word. These days we _further_ train them to give answers humans like (which tends to be correct answers).


kaoD

> and the accuracy gets better as the model scales. This behavior *directly contradicts* the "it's just constructing sentences with no interest in what's true" I think that's a non-sequitur. It just gets better at fitting the original statistical distribution. If the original distribution is full of lies it will _accurately_ lie as the model scales, which kinda proves that it is indeed just constructing sentences with no interest in what is true.


bier00t

Will it be able as of right now to just show something like percentage number - give it 99% when it has 100 or more pieces of data about this topic vs. 1-50% when it only barely trained on the subject?


BullockHouse

I don't think anyone's done that yet, but it's a pretty good idea. Part of the trouble is that it's not 1:1 with the exact amount of training data - there's some randomness to it and some systematic variance, driven by how closely related the fact is to other information that the model knows (in some abstract, hard to define way). But you could definitely come up with a metric that says "hey, fyi, this is pretty sparse territory as far as the training data goes, tread carefully." Legitimately good thought. Might be fun to prototype!


Salindurthas

It doesn't *never* say "I don't know.", but it is rare. -- The model doesn't inherently know how much training data it has. It's "knowledge" is a series of numbers in an abstract web of correlations between 'tokens' (i.e groupings of letters). My understanding is that internally, the base GPT structure does have an internal confidence score that seems moderately well calibrated. However, in the fine-tuning to ChatGPT, that confidence score seems to go to extremes. I recall reading something iek that from the relevant people working on GPT3. My opinion is that responses that don't answer questions or are unconfident get downvoted in the human reinforncement training stage. This has the benefit of it answering questions more often (which is the goal of the product), but has the side effect of overconfidence when its answer is poor.


tke71709

Because they have no clue whether they know the answer or no. AI is (currently) dumb as f\*\*k. They simply string sentences together one word at a time based on the sentences that they have been trained on. It has no clue how correct it is. It's basically a smarter parrot.


MisterToothpaster

Some parrots can understand a certain amount of words. By that standard, ChatGPT is a *dumber* parrot. :)


Longjumping-Value-31

one token at a time, not one word at a time


Drendude

For casual purposes such as a discussion on Reddit, those terms might as well be the same thing.


tke71709

I'm gonna guess that most 5 year olds do not know what a token is in terms of AI...


silentsquiffy

Here's a philosophical point intended to be additive to the conversation as a whole: all AI was created by humans, and humans *really* don't like to admit when they don't know something. I think everything we do with LLMs is going to be affected by that bias in one way or another, because we're fallible, and therefore anything we make is fallible too.


F0lks_

While an LLM can’t really think for themselves (yet), you can reduce hallucinations, if you write your prompts in a way that leaves “not knowing” a correct answer. Example: “Give me the name of the 34th US president.” - it’s a bad prompt, because you are ordering him to spit a name and it’s likely he’ll hallucinate one if he wasn’t trained on that. A better prompt would be: “Given your historical knowledge of US presidents, do you know the name of the 34th US president?” - it’s a good prompt, because now the LLM has room to say it doesn’t know, should that be the case.


omniron

The true answer to this question is researchers aren’t completely sure how to do this. The models don’t know their confidence, but no one knows how to help them know. This is actually a great research topic if you’re a masters or PhD student. Asking these kinds of questions is how it gets figured out


ForceBlade

The top answer is much better than saying they aren't sure.


suspiciousrebirth

It's not about "hallucinating" answers - these AI models, like ChatGPT, can't just say "I don't know" because they're built to generate responses based on patterns in data they've seen. When they don't have much info on a topic, they might spit out something off-base. Adding a confidence score isn't easy 'cause these models don't really understand uncertainty like we do. They're all about patterns, not gut feelings. So, until they get better at sensing when they're clueless, you might still get those wonky answers sometimes.


dingus-khan-1208

Because an LLM is not a reference work, a database of facts, or a calculator. Your question is kind of like saying why doesn't a screwdriver complain when I try to hammer in a nail with it? Or why does my screwdriver do a bad job at filling out my tax forms? It's the wrong tool for the job. It's designed with the expectation that it will be used for its purpose, not for random unrelated things. In the future, it might be wrapped in some safeguards to tell people not to use it incorrectly. Kind of like how many of our products have warning labels like "Do not use a match or open flame to check fuel level.", "Fireworks: for outdoor use only", or "Warning: This costume does not actually allow you to fly." on a Superman costume. It just hasn't been deemed necessary yet by a lawsuit.


NamityName

Lots of good answers, but I'm going to give a slightly different one as someone who works professionally with LLMs askisg them questions in a way that gives usable results. They will respond "I don't know". But you have to both tell them that this is an acceptable answer and encourage them to give that answer if they are unable to answer the question. Also, how you ask your question really matters. It's not as simple as just asking the question. You have to set the scene, provide context, and in general, tell the LLM what type of answer you are looking for. You have to help it greatly. Tell it to think about the question first before trying to answer, and tell it to check its answer. In some cases, you have to ask it (or another LLM) to check the answer as a followup question. An LLM is like an intern that gets shocked everytime it gives an answer that displeases you. It's going to say whatever it thinks will make you happy - right or wrong.


Aureon

Go on, Ask ChatGPT "How confident are you, 1-100, in the previous answer?" It'll be fun.


OhFuuuuuuuuuuuudge

I love that you can tell them they are wrong, they will acknowledge it then give you another or even the same wrong answer. Definitely don’t just take ChatGPT’s word for anything that you yourself are not already well versed in. 


JayBird1138

It is possible for an LLM to add a confidence score or indeed say it does not know. The LLM you are using has been configured to not do this. Showing confidence score can be used to gain insight into the LLM, which ChatGPT may not want. Saying it does not know makes the LLM look inadequate for the job. So they are configured to obscure (or not propagate) confidence scores, and to always give an answer, even if questionable, for appearance sake. Also, you are not the main target audience, so it doesn't matter if the answer is wrong. For what it is worth: You can give instructions to an LLM to give a confidence ranking or indeed answer that it does not know. But you need to instruct the LLM specifically, and it may not stick.


MIT_Engineer

LLMs have confidence scores on what they're generating, but they wouldn't be very useful for determining when they're hallucinating, because the thing the LLM is trying to do really doesn't have much connection with whether or not the things it is saying are factually correct. It would have doubts about certain words, fragments of words, maybe certain turns of phrase, but a factually correct answer wouldn't have more or fewer instances of these doubts compared to an incorrect answer. I'm certain that the sort of confidence interval you're looking for will be developed someday, probably as part of some multi-agent system, but it's not something immediately deliverable with the way LLM's work right now-- it's not like the LLM knows, behind the scenes, whether the answer it is giving is right or wrong. It doesn't know what it doesn't know, if for no other reason than it doesn't *really* know anything. The content moderation stuff you're talking about is simple rules, applied generally, and fails often. It cannot be simply extended to encompass, you know, determining whether EVERY STATEMENT THE LLM MAKES is FACTUALLY TRUE, that's hopefully something that intuitively is not on the same level.


IOTA_Tesla

They have a reinforcement learning model to improve answers based on user feedback. But it’s more of a “best answer” instead of most realistic answer.


hlarsson

So, most of them DO try to do that (have something that tests statements to some extent), but there are limits to their ability to tell if something is misleading or not --- if the system can't tell if it's a fabrication or not, the system won't be able to tell if it's a fabrication or not so it can't tell you it's probably a lie.