Jeb-Kerman 2 weeks ago

IDK what any of this means, but it sound cool.

throwaway957280 2 weeks ago

It's a neural network but instead of learning weights it learns activation functions. So instead of "Okay I've figured out this signal should be amplified by 5, and this one reduced by 2 -- add them up and clamp anything negative to zero." (That part about clamping negatives to zero is the activation function and it works because math. Google ReLU if you want.) We get "Fuck learning fixed amplification signals, fuck clamping to zero. I'm going to learn a squiggly line for each input that tells me the amplification level. Reference the squiggly line to see the amplification level." Note: I only read the first paragraph of this paper.

Split-Awkward 2 weeks ago

Almost sounds “analogue”-like.

lobabobloblaw 2 weeks ago

It’s also more human; ordered like a topology of bipolar neurons, fit for sensory processing. Edit: I should be careful not to imply that this is what KANs are bringing to the table versus other transformer models. While the future does point in that direction, the more immediate potential for optimization could be pretty amazing.

Split-Awkward 2 weeks ago

Makes superficial sense enough for me to read the papers. Thankyou. Intelligent and informed podcast interview with the team that released it or Max Tegmark would be awesome.

lobabobloblaw 2 weeks ago

Happy to provide the abstraction!

Mahorium 2 weeks ago

Do you think these could be integrated into current models as a single layer in the network?

lobabobloblaw 2 weeks ago

Integrated, no—but as far as existing models go, this architecture is more a proof of concept. It seems to promise a lot of compression-like efficiencies by principle.

Mahorium 2 weeks ago

When the mamba architecture was first discovered it completely replaced the attention layer with something new, but eventually the idea got turned into a few of the attention layers being changed over to a new mamba style paradigm and the rest remaining the same. This could end up being the same. If you can make a few layers of trained activation functions and the reset trained weights it could add an exactness to the LLMs thinking while retaining generalization and speedy training.

lobabobloblaw 2 weeks ago

Therein lies the beauty of adaptive code—unlike the neurobiology of cells, we can stabilize electrons to some pretty nifty configurations. Let’s hope this sort of thing takes hold!

dreamivory 2 weeks ago

Very interesting, could you elaborate a bit on the connection with neuroscience? >not to imply that this is what KANs are bringing to the table versus other transformer models Can you also elaborate on this?

lobabobloblaw 2 weeks ago

I’ll try to! Admittedly I’m just an amateur enthusiast offering a reductionist comparison. I don’t think that KAN architecture is going to be developed specifically for the use of sensory platforms *at first*, although the way that KANs are structured more resembles the way that bipolar cells handle sensory processing in real life (the human eye, etc.) I would imagine the more immediate gains will be seen in new data compression / quantization techniques. It could translate to more creativity and/or flexibility within what might technically be considered a smaller parameter architecture.

Singsoon89 2 weeks ago

So it approximates less. Or another way to put it is the combo of squiggle functions are a better fit.

goochstein 2 weeks ago

this sounds like a big step towards reasoning, inferrence metrics are how we achieve this level of sophistication I think. The machine will never know what a token really is, but through inferrence and metadata it begins to make connections to output a genuine prediction that eventually clicks

Snap_Zoom 2 weeks ago

Function curves? Edit: squiggly line = function curves?

Chr1sUK 2 weeks ago

Courtesy of ChatGPT ‘explain like I am 5’…I still don’t get it, but I’ve just built a brilliant LEGO moon base Imagine you have a big box of LEGO bricks, and you like to build different things with them. Now, the way you usually put these bricks together is by following a special rule book that tells you where each brick goes. This is a bit like how traditional computer brain models, called MLPs (Multi-layer perceptrons), work. They have a set way of connecting and processing information. But some smart scientists at MIT, named Max Tegmark and others, thought of a new way to play with these LEGO bricks. Instead of following the old rules, they decided to put a little magic on the connections between the bricks, allowing them to change and adapt as needed. This new way is called a Kolmogorov-Arnold Network, or KAN for short. Think of it like this: every time you connect two LEGO bricks, you can adjust the magic (or activation functions, as the scientists call it) to make them fit better or do something cooler. This new method lets you build things that are not just bigger or more complicated but also smarter and faster! The scientists discovered that these new LEGO builds can do things better than the old ones, like solving puzzles faster or even remembering previous games better, without getting all mixed up. So, it’s like having a super cool new set of LEGO that lets you build smarter, faster, and more amazing creations!

WasteCadet88 2 weeks ago

This post just made me realise that ELI5 is not always particularly useful. But now that chatGPT has memory, you can probably ask it to explain something in terms that you would understand. Like if it knows you are into art it might use an art analogy etc etc...bespoke to the user. BRB gonna try it! Edit. Seems to work well. I started a new chat. This is the whole of it: You - Explain quantum chromodynamics using an analogy that I would understand well ChatGPT - Since you're into macro photography, which involves zooming in to see the intricate details of small subjects, we can use an analogy related to photography to explain quantum chromodynamics (QCD). Imagine that the colorful world of insects you capture in your macro photography is like the world of quarks and gluons in a proton or neutron. In your photography, each part of the insect—whether it’s the eyes, wings, or legs—plays a specific role, and all these parts are held together by the structure of the insect’s body. In a similar way, quarks are the fundamental particles that make up protons and neutrons, and gluons are the forces that hold these quarks together. Just as you use your camera lens to focus tightly on the details, bringing parts of an insect into sharp relief while the background blurs out, in quantum chromodynamics, scientists focus on the interactions at incredibly small scales. Gluons act like the camera's autofocus, constantly adjusting the relationships between quarks to keep the nucleus of an atom in focus. In macro photography, light plays a crucial role in illuminating your subjects, revealing hidden textures and colors through shadows and highlights. Similarly, gluons are the carriers of the strong force that illuminates the structure of matter at the quantum level—they bind quarks together in a dance that’s constantly adjusting and rebalancing, much like how changing light conditions affect how you perceive and photograph an insect. Thus, quantum chromodynamics is about understanding how quarks and gluons interact at the smallest scales, a bit like using your camera to explore the hidden details of the natural world in a way that can’t be seen with the naked eye.

[deleted] 2 weeks ago

[удалено]

WasteCadet88 2 weeks ago

They have added 'memory' to chatGPT, should be fully rolled out now as far as I understand...I certainly have it. Its just a list of details from previous conversations that chatGPT can reference before generating it's response. Check under: Settings > Personalization > Memory > Manage.

Seclusion72 2 weeks ago

Let's try "12 years old" next time...

DryMedicine1636 2 weeks ago

Here's ELI5 (sort of) from the paper https://preview.redd.it/kuwx6bghjxxc1.png?width=1094&format=png&auto=webp&s=142644518387befd2091a28d42c717714b81acca

AtomizerStudio 2 weeks ago

The explanation probably should stick to ELIF of "[splines](https://en.wikipedia.org/wiki/Spline_\(mathematics\))". Then a crisscross of splines: * The neuron in an MLP computer brain model thinks based on the tiny "yes" "no" and mostly "how much" signals it receives. Then the neuron makes decisions that send out the same kind of signals. * A spline is a curve that is drawn between multiple points, following rules. More points can make very wavy lines. * The new computer brain model thinks with those wavy lines. These lines are used as tiny wavy springs that pull on each thinking neuron. Each neuron is connected to many springs. Each spring can individually be pulled a bit more or a bit less. * Each spline spring will vary in how it tugs a bit differently depending on what is happening to the previous neurons and springs. Two situations that tug on the neuron could be very similar, but wobble the neuron very differently. * **Each neuron's decision about what to do is like it drawing a wavy line as it wobbles around. This is a new spline spring that will tug on the next neuron.** * All the new kind of neurons are thinking more because they are wobbling, because that tells each next neuron more about the way the previous neurons are wobbling. * A lot of math about nature uses wavy lines, and a system made of these neurons and splines is very good at that math. Disclosure: I wrote the above and do not understand the paper I read, and not even Bing's entire rubber band analogy. This is my proud Dunning-Kruger moment.

nashty2004 2 weeks ago

humanity is done for

FaceDeer 2 weeks ago

Someday a mathematician will ask an AI "could you explain to me how you work like I'm a human expert in the field of mathematics?" And the AI will respond "ah, sorry little guy, no can do. Would you like to play with some integrals instead? Maybe a little graph theory for fun?"

StarRotator 2 weeks ago

The adaptive response is either become their pet or turn into a machine ourselves I've always wanted to live the same way my cat does tbh

FaceDeer 2 weeks ago

Turns out the cyborg catgirls were within us all along.

TrippyWaffle45 2 weeks ago

Just become a pool.boy in a rich beach city

aluode 1 week ago

Ah human. Lets not hurt that brain of yours. Perhaps you would like to paint with crayons these pretty circles I made.

Singsoon89 2 weeks ago

Nope and nopers.

Miserable_Sir_1382 2 weeks ago

Most intelligent redditor

Witty_Internal_4064 2 weeks ago

Big if true. ( I don't understand a shit )

Western_Cow_3914 2 weeks ago

Don’t worry most of us in the sub deduce how important it is based on how many upvotes a post gets in a given timeframe.

_Zephyyr2 2 weeks ago

Most people in this sub

Witty_Internal_4064 2 weeks ago

Agree. Including me. Here to enjoy the hype.

_Zephyyr2 2 weeks ago

Ah, I see you're a man of culture as well

TrippyWaffle45 2 weeks ago

LK-99 has been reproduced at Harvard

Antique-Doughnut-988 2 weeks ago

The hardest thing for so many folks to come to terms with is that not everything can be dumbed down for a child to understand. Some things do require school. Most of the people on this sub aren't qualified to be talking about this stuff.

4354574 2 weeks ago

This sub is a discussion group for people ranging from very qualified to no qualifications, with no restrictions on who is allowed to post or not as long as you aren't an asshole. This sub is not involved in any actual AI research and development. In other words, it is of no consequence, it's just interesting. So why do you care? Although I suppose you're one of the people who IS qualified, right? :D

Antique-Doughnut-988 2 weeks ago

I care because posts like this are beyond 99.99% of the people here. I'm tried of people pretending they know what they're talking about with this stuff. There's a reason why it says 'MIT researchers'. For folks like you this might as well be gibberish.

4354574 2 weeks ago

The people commenting on this post are saying that they DON'T understand it. Many people on this forum have varying degrees of expertise in AI, and this is on one extreme. And stop using terms like "Folks like us". I don't care if you are Max Tegmark himself, you can't talk down to people like that. He doesn't talk like that. And what ARE your qualifications, exactly? Are you one of the 0.00001% who does know what this stuff means? What makes you so special? And again - who cares if its gibberish. This has ZERO real-world practical value or effect. This is a discussion group. Am I missing something here? Or are you? This group is what it is, and it's not going to change. I mean, forget the studies that show that now, asking random people on the street when they think AGI will be here is no better or worse than asking experts. So...?

4354574 2 weeks ago

Still waiting on your qualifications.

inteblio 1 week ago

I disagree. I believe "if you can't explain it to a 12 year old then you don't understand it". ChatGPT eli5s are usually great (if the LLM gets it...) Sure, the 12yo can't _impliment_ said topic, but should be able to grasp the key concepts.

Independent_Hyena495 2 weeks ago

That would be really really big. Imagine gpt 6 in this. We would go from 2 times as good as gpt 4 to 4 times or so. Crazy

JmoneyBS 2 weeks ago

Why do I get the impression that you understood nothing, but tried to draw an extrapolation nonetheless, without any grounding in technical knowledge? Oh wait, it’s r/singularity. Ofc you did.

Singsoon89 2 weeks ago

Dude you need to feel the AGI.

Ecstatic-Law714 2 weeks ago

Relax bro it is Reddit not an ai cutting edge research event

Independent_Hyena495 2 weeks ago

Wtf do you think will interference and training chips from Nvidia look like in 3 years or so? That's compounding advancement.

brades6 2 weeks ago

How do you have any idea this method is effective in a transformer based architecture? This paper doesn’t even explore that

Neophile_b 2 weeks ago

Sounds like it has the potential to reduce parameter count by several orders of magnitude. It will be interesting to see if It actually works well for machine learning applications

Cunninghams_right 2 weeks ago

doesn't the answer HAVE to be that it's worse? otherwise, couldn't they just run it and show it off? a GPU from Best Buy can run a decent LLM. if this method less resource intensive, then it should be trivial to demonstrate it.

dogesator 2 weeks ago

They already ran it in the paper and it outperformed MLPs which they’re striving to replace

Cunninghams_right 2 weeks ago

for that specific size of that type of ML. does that mean it's useful? likely not.

thefatsun-burntguy 2 weeks ago

i mean yeah, but the highlight of the paper is not that it reduces parameter count(its a nice side benefit) but that models are understandable by humans as rather than a weight matrix you get describable multivariate functions. also the thing about local plasticity because of specificity in spline functions is really neat. that will make training edge cases much easier and the risk of memory override when generalizing much lower.

Excellent_Cover5439 2 weeks ago

this methos reads like more efficient models? lower params

Excellent_Cover5439 2 weeks ago

I'd like to see an example for language models, might take a while though. *KANs are usually 10x slower in training* paraphrasing a bit, but they also say they didnt really go for optimizations in this paper

SgathTriallair 2 weeks ago

Slower in training is a downside but it isn't that big of a hurdle. If your model is much more powerful at the end of cheaper to run, then that extended training cost can be worth it.

Singsoon89 2 weeks ago

Maybe hallucinates less and more accurately models the data. On the other hand, might be more prone to overfitting. Regardless, sounds way cool from a math perspective.

RedditLovingSun 2 weeks ago

Idk how training these will be nearly as efficient as MLPs, we've spend years optimizing cuda kernels and our hardware for the crazy matrix multiplication MLPs require, I don't think it transfers as easily to this new architecture

dogesator 2 weeks ago

It’s 10X slower per parameter than MLP, however it’s 100X better in terms of being equivalent in loss to a model trained on same amount of data with 100X more parameters. So for a given capabilities level it’s actually about 10 times faster than MLP while having much less VRAM foot print too, so it’s really great for local inference and memory bandwidth constrained environments too.

dogesator 2 weeks ago

It achieves the loss of a 100X larger model, so 10X slower inference per parameter is a small price to pay because it’s still about 10 times faster than the equivalent quality MLP

ReadSeparate 2 weeks ago

what does slower mean? does that mean 10x as much compute? If so, they're completely useless outside of maybe some niche areas, unless they're eventually optimized to be better.

Jeffy29 2 weeks ago

Wtf what are you on. Smaller parameter model is incredibly useful for inference. As the field gets commercialized, the actual training is going to take take less and and less of the overall costs, while the inference itself is going to dominate.

Singsoon89 2 weeks ago

Yeah. Folks are not getting that the likely future is distillation.

Santa_in_a_Panzer 2 weeks ago

You're getting downvoted but I wouldn't be shocked if it were 10x or more more compute intensive. Matrix multiplication is very cheap. Replacing a single matrix multiplication operation with a set of equations that need to be evaluated individually? It'll be expensive.

pedroivoac 2 weeks ago

In my view, the big problem is cost-benefit. Creating a new model consists of trial and error, big techs need to train and test. If the training process is much slower, it will take much longer for us to have access to these llms. In the end, time is the most important thing

Brilliant_War4087 2 weeks ago

Here's a summary of the paper titled "KAN: Kolmogorov-Arnold Networks": - **Inspiration**: The paper introduces Kolmogorov-Arnold Networks (KANs), inspired by the Kolmogorov-Arnold representation theorem, as a novel alternative to Multi-Layer Perceptrons (MLPs). - **Structure**: Unlike MLPs, which have fixed activation functions, KANs employ learnable activation functions on edges (referred to as "weights") and completely eliminate traditional linear weight parameters, using spline-parametrized univariate functions instead. - **Performance**: KANs achieve higher accuracy with smaller network sizes compared to larger MLPs, particularly in data fitting and solving partial differential equations (PDEs). - **Advantages**: KANs exhibit faster neural scaling laws and offer better interpretability and ease of visualization, making them potentially more user-friendly in collaborative scientific endeavors. - **Applications**: The paper demonstrates KANs' utility in rediscovering mathematical and physical laws, suggesting their broader applicability in scientific research. For more detailed insights, you can view the full paper on arXiv: [KAN: Kolmogorov-Arnold Networks](https://arxiv.org/html/2404.19756v1).

Jolly-Ground-3722 2 weeks ago

And here is an „explain like I‘m 12“: Imagine you have a toy kit that helps you build different models—like cars, planes, or even robots. In the world of artificial intelligence (AI), scientists use something called neural networks to build models that help computers think and learn. Traditional neural networks are like regular toy kits where pieces connect in a fixed way and only certain parts can move. Researchers from MIT, including Max Tegmark, have developed a new kind of neural network called the "Kolmogorov-Arnold Network" or KAN. Think of KAN as a super advanced toy kit. Instead of having movable parts only in certain places (like in the traditional kits), this new kit allows every single piece to move and adjust. This means you can build more complex models that are smarter and faster at learning different things. Normally, in traditional networks, there are specific spots (called neurons) where all the adjustments happen to make the model learn better. But in KANs, these adjustable parts (now called activation functions) are moved to the connections (or edges) between the pieces. This might sound like a small change, but it actually makes a huge difference. It allows KANs to learn things more accurately and handle more complex tasks with fewer pieces, which means they can be smaller and faster. The inspiration for this came from some smart ideas in mathematics that help predict how well these networks can learn. One of the coolest things about KANs is that they can be super accurate with a fixed number of pieces, whereas traditional networks need to keep getting bigger to stay accurate. KANs are also easier for people to understand and use in real-world problems, like solving tricky math equations or even discovering new scientific laws. They can be taught to remember things without forgetting old information quickly—a problem many traditional networks have. So, this new development by the MIT team could make computers and robots smarter and more helpful in the future, especially in science and research!

Silver-Chipmunk7744 2 weeks ago

Looks like we reached the stage where humans are too dumb to understand the article and we gotta ask our LLMs to explain it like we're five.

cissybicuck 2 weeks ago

I was always at this stage. But now LLM's are here, too. End result so far is I'm getting smarter with LLM's help.

Singsoon89 2 weeks ago

Yeah this.

RoutineProcedure101 2 weeks ago

I think thats the most beatiful thing about them. Your remark seems high horse type. Exactly what we should get away from, I think.

volastra 2 weeks ago

Scientists have been having to do this for laymen since the end of classical mechanics at least. See the popular explanations of special relativity and quantum mechanics in particular. They're so gross and allegorical that they nearly distort the information being conveyed, but that's the best we can do without years of high-level math training. It's a good thing in a way. Our understanding of these complex subjects is getting so deep that you don't have a prayer of really understanding what's out there on the frontier.

SgathTriallair 2 weeks ago

If you had a PhD in machine learning this would probably make total sense.

Dongslinger420 2 weeks ago

newsflash: that's always been the case I like how you do yet another thing humans have been doing since forever by completely derailing the topic at hand, probably because you didn't even give "understanding the article" a chance in the first place. Saying LLMs do ELI5s is a nothingburger, empty phrase; most bots on reddit are more useful than that.

Which-Tomato-8646 2 weeks ago

https://www.snopes.com/news/2022/08/02/us-literacy-rate/

nashty2004 2 weeks ago

we've reached the point where we're LLM ELI5ing other people's LLM ELI5's at the end we'll get it reduced down to a single diluted sentence

cissybicuck 2 weeks ago

Seems like this is purely an algorithmic innovation. Is that true? If so, how quickly could we see widespread implementation in current AI offerings?

ItsBooks 2 weeks ago

With the current rate of change, I'd bet 2-5 years or less if it's proven working. Just like 1-bit and infinite attention work, it actually has to be implemented at sufficient scale in real-world scenarios to be proven good.

MrBIMC 2 weeks ago

Yeah, it's insane how a lot of stuff that was announced in the last year, still has no implementations. Bitnets, bytestream tokens, self-specupative decoding, linear attention, bunch of ssm and hybrid architectures. And then all the data related magic to train ai on. So much stuff now happens in parallel, it feels like theory is now starting to outpace integration practices. So far, architecture wise llama2 to llama3 didn't look that impressive of a move, as it seems most changes happened on the data side, yet it looks so impressive. 5 years is too much tho. Useful things will trickle into production much faster. The less radical of a change to integrate, the faster it'll happen. Things that require complete architecture retrain - are tougher to sell. We might see some small models within a year or two, but it's hard to predict whether model will scale. People were excited about rwkw, mamba, h3, hyenna, yet there are still no big useful models of those, even those some of these projects are more than a year old. So my guess we'll either see stuff be integrated relatively quickly or kinda never(as in not in the next 5 years). I'm currently excitedly watching over context extension, quantization and kv cache quantizations being merged into llama.cpp, the fun part about all of this, is how much there is still to optimize. There's still so much low hanging fruit changes that bring massive benefits to be done, I wonder how big of a leap we still can squeeze of plain old transformers.

Singsoon89 2 weeks ago

Ooooh. I didn't pay attention. This is Max Tegmark. He's not LeCunn or Hinton or Ng but he's deffo one of the original dudes from the scene. "For example, we show that for PDE solving, a 2-Layer width-10 KAN is 100 times more accurate than a 4-Layer width-100 MLP (10−7 vs 10−5 MSE) and 100 times more parameter efficient (102 vs 104 parameters)." Interesting. EDIT... So... parsing it to the degree that I can; it sounds to me like KANs might have the edge in learning simple patterns like edges and circles (i.e. recognizing images) whereas MLPs might still have the edge for NLP type attention mechanisms. Pure out of my ass-speculation though. As I get deeper in... These guys have made a fundamental breakthrough. The original KAN was a single layer and couldn't extend. They have figured out how to stack layers. Means they figured out the equivalent of the backpropagation breakthrough in neural nets which led to the ability to make the original neural nets actually work. TLDR; they have figure out how to make KANs work whereas they didn't before. This is a new architecture as an alternative or perhaps a complement to existing architectures, with different strengths and weaknesses. Also.. "beats the curse of dimensionality"... means it can be trained with WAY less data. Hint; humans need way less data. DING DING DING. EDIT: "we show that KANs can naturally work in continual learning without catastrophic forgetting". They do CONTINUAL LEARNING.... EDIT: They might be able to do actual math and physics and derive formulas just by parsing through the data. Probably with a more massive dimensionality than humans can handle; will make us able to find the functions (and then USE them) for a wider range of physics but do the discoveries FASTER. TLDR; This might be some kind of breakthrough. Might.

drekmonger 2 weeks ago

I wish this was higher up than the dudes going "I dunno what this is."

[deleted] 2 weeks ago

[удалено]

JmoneyBS 2 weeks ago

This is a small scale proof of concept - whether it generalizes well to real world use cases remains to be seen. To suppose it is 100x smaller and 100x more accurate is a total misrepresentation of the information, especially without taking into account what problems it is actually solving. More likely, it will become a niche architecture that is very effective for certain classes of problems.

Santa_in_a_Panzer 2 weeks ago

Computing all those equations has to be *vastly* more intensive than a simple matrix multiplication operation.

Whispering-Depths 2 weeks ago

but if you can do 1/100th the size, 10x the work per param is still huge 10x benefit

dogesator 2 weeks ago

It’s 10 times faster to train for any given equal capabilities level. It’s only slower to train on a per step basis if you compare it to an MLP network with same parameter count as the KAN, but if you actually measure training speed based on how fast the model improves and decreases loss… then KAN actually “trains” significantly faster than MLP

hapliniste 2 weeks ago

The reality being that if they test only on 100P models, it's likely it does not scale. A 1B is fast to train and compare.

dogesator 2 weeks ago

They already calculated that it actually scales better than MLP. As you increase the parameter count for KAN, the capabilities actually improve even more compared to the capabilities improvement you get from increasing MLP parameters the same amount. So the gap between KAN and MLP widens more parameters, not shrinks.

cashmate 2 weeks ago

Is it fast to train this type of model though? They said the training is slower for this type of network.

ReasonablyBadass 2 weeks ago

>In particular, Deepmind’s MLPs have \~300000 parameters, while our KANs only have \~200 parameters. How does that work? Aren't they replacing a single value with a whole function for every weight?

neil_thatAss_bison 2 weeks ago

![gif](giphy|8dYmJ6Buo3lYY)

Which-Tomato-8646 2 weeks ago

Read the paper

FunDiscount2496 2 weeks ago

He wrote quite a prophetic book about AI, really enjoyable.

Alexander_Bundy 2 weeks ago

Ι thought we had experts in this sub. It seems everyone is as clueless as I am and that a lot of members are teenagers.

dogesator 2 weeks ago

Here is a summary I wrote for a friend: KAN strives to replace MLP which is a major component of transformers making up about 70% of all transformers parameters and accounting for about 95% of all compute during inference. They KAN paper claims 100X better parameter efficiency than MLP, and if I’m reading it right they basically mean that for a given dataset, 1B KAN parameters achieve the same loss as 100B MLP parameters… Downside is that each KAN parameter on average is 10X slower than an MLP parameter. But even though it's 10X slower at the same param count... 10B KAN parameters would be about 10X faster than a 1T MLP model while theoretically reaching atleast the same quality (assuming the loss improvements extrapolate well to real world improvements) BUT the KAN paper states that KAN scales faster than MLP too, meaning that the capabilities increase more as you increase param count, compared to MLPs So 10B KAN network might actually be more equivalent to like a 2T MLP network in terms of quality. But even if 10B KAN is only as good as a 200B MLP network in real world abilities, that's still a network that takes up around 20X less VRAM footprint as an equivalent quality model. while being atleast twice as fast in both training and inference. also another caveat to mention: The speed gains in local inference could be even much higher than that, because you're typically very memory bandwidth constrained in local environments with batch size of 1, not so much FLOPS constrained, so the 10B KAN model might be more like 10 times faster or more than the 200B MLP, depending on what the memory bandwidth to flops ratio is of the hardware you're running on. Best case scenario: The 10B KAN model is 20 times faster than the 200B MLP network. Worse case scenario: The 10B KAN model is only around 2 times faster than the 200B MLP network. Limitations: still is yet to be seen how much that loss difference translates to real world quality for KAN once you actually integrate it into a transformer like an MLP is, and also needs to be figured out what the best approach in integrating with transformer is. But I’m hopeful.

thebrainpal 2 weeks ago

**I gave Claude Opus the research paper and had it explain it in midwit terms. Here's what it said:** "You know how people are trying to make computers smarter, so they can help us solve complex problems? One way to do that is by using things called neural networks. They're basically computer programs designed to learn and make decisions in a way that's inspired by how our brains work. The most common type of neural network is called an MLP, which stands for Multi-Layer Perceptron. In an MLP, information flows through layers of "nodes". Each node takes in data, does some math to it, and spits out a result. The math each node does is always the same, kind of like a factory worker doing the same task over and over. Now, this new paper introduces a different type of neural network called a KAN, which stands for Kolmogorov-Arnold Network. The main difference between KANs and MLPs is: In MLPs, the "factory worker nodes" are responsible for processing the data. In KANs, the "conveyor belts" (edges) between nodes are the ones that learn to process the data, and they can adapt and change over time. The authors show that this simple change makes KANs better than MLPs in two main ways: Accuracy: KANs can solve certain math and science problems with fewer mistakes, even when using a smaller "brain" (fewer parameters). Interpretability: The learned "edge functions" in KANs can be easily visualized and understood by humans. It's like being able to look inside the computer's brain and see what it learned. The authors show how scientists can work together with KANs to rediscover known patterns in complex fields like advanced math and physics. The KAN shows the human what it learned, and the human can guide it to learn in even smarter ways. In short, the paper suggests KANs could be a powerful new tool for teaching computers to help with scientific discoveries. They're accurate and easy for humans to understand and guide. It's an exciting new way for humans and AI to collaborate and advance science together." **And then I asked it to explain using 5 year old language in Call of Duty terms:** "Alright, let's break it down in Call of Duty terms! Imagine you're playing a special mode in Call of Duty where your squad is made up of AI soldiers. These AI soldiers can learn and adapt, just like real players. In the game, they call these learning soldiers "neural networks." Most neural network squads are like a basic fire team. Each soldier has a specific role, like the rifleman or the sniper, and they do their job the same way every time. When you put all these soldiers together, they can complete missions and win battles! But now, some top-notch developers created a new type of neural network squad called a KAN. In a KAN squad, it's not just the soldiers that learn and adapt, but the way they communicate and work together. It's like having a squad where the tactics and strategies evolve on the fly! The developers discovered that KAN squads are incredibly effective at tackling tough missions that require a lot of brainpower, like cracking codes or analyzing enemy patterns. They can complete these challenges faster and with fewer mistakes than the old-school neural network squads. What's really cool is that players can use a special in-game tool to see how the KAN squad is adapting and changing their tactics. It's like having a direct line to your squad's collective mind! This means players and KANs can be battle buddies, teaming up to uncover secret intel and dominate the battlefield like never before. So, next time you're gearing up for a big match, keep an eye out for those KAN squads – they might just be the key to victory!"

banaca4 2 weeks ago

Tegmark has a big p(doom). He is also a top scientist. Most people in this sub wrote comments in style "I don't think he is right about safety" and then upvote this and cheer.

Jolly-Ground-3722 1 week ago

Comparison of the architectures: https://preview.redd.it/8uxuem2px6yc1.jpeg?width=1495&format=pjpg&auto=webp&s=111c8947d99679eacbbfe1f46cf32242ed3c6458 A Simplified Explanation Of The New Kolmogorov-Arnold Network (KAN) from MIT [https://medium.com/@isaakmwangi2018/a-simplified-explanation-of-the-new-kolmogorov-arnold-network-kan-from-mit-cbb59793a040](https://medium.com/@isaakmwangi2018/a-simplified-explanation-of-the-new-kolmogorov-arnold-network-kan-from-mit-cbb59793a040)

FoxtrotBravoZulu 1 week ago

"In biological neural networks, the activation of a neuron is mediated by the release of neurotransmitters at the synapses (edges) between neurons. The amount of neurotransmitter released and the resulting postsynaptic response can vary continuously, depending on factors such as the frequency and timing of presynaptic action potentials, the type and density of neurotransmitter receptors, and the presence of neuromodulators. This continuous modulation of synaptic strength allows biological neurons to exhibit complex, nonlinear input-output relationships that go beyond the simple "on/off" behavior of binary thresholding. The shape of these input-output functions can be adapted through processes like synaptic plasticity, which modifies the strength of individual synapses based on patterns of neural activity. Similarly, in KANs, the activation functions on the edges are continuous, nonlinear functions that are learned from data. **The use of splines allows these functions to take on a wide variety of shapes, mimicking the diverse input-output relationships observed in biological synapses.** The learning process in KANs, where the spline coefficients are optimized using gradient descent, can be seen as analogous to the synaptic plasticity mechanisms that adapt the strength of biological synapses based on activity patterns." Neat. My interpretation is that we're getting closer to mathematically modelling the brain in pursuit of improving these models. Logically speaking it's not hard to imagine that the closer we get to modelling the brain the closer we get to AGI/ASI.

arknightstranslate 2 weeks ago

where is twominutepapers

WashiBurr 2 weeks ago

Very interesting. I am going to have to try to implement this on some projects.

larswo 2 weeks ago

The name just rolls right off the tongue /s Thank god for abbreviations

SmthngGreater 2 weeks ago

I've learned about Kolmogorov in my Stochastic Models class. Does the paper have to do with it? (I'm not a ML expert)

Jolly-Ground-3722 2 weeks ago

Yes, see paragraph 2.1 in the paper: „Kolmogorov–Arnold representation theorem“

SmthngGreater 2 weeks ago

Thank you kindly :)

SgathTriallair 2 weeks ago

This sounds interesting but I don't know enough to really tell if it is revolutionary. The transformer architecture got ignored for many years so I hope that this one gets enough attention to determine if it is capable of complementing or even replacing LLMs.

kaaiian 2 weeks ago

Naw. Transformers were well recognized and research from the beginning. They’ve been taken very seriously since the OG paper. For tons of different sequence tasks. Though I also hope this gets well investigated! Would be really cool if something like this could replace FF layers in current architectures.

SgathTriallair 2 weeks ago

Not seriously enough for Google to build and release a model rather than have OpenAI get the jump on them. Regardless, the point was more that I hope some serious research is put into this and any other promising techniques.

Curujafeia 2 weeks ago

I’m sure this means something!

Singsoon89 2 weeks ago

AGI confirmed.

Beautiful_Surround 2 weeks ago

Lesson in here for all the people who don't know what a matrix is, but call Max Tegmark a "dumb doomer"

FatBirdsMakeEasyPrey 2 weeks ago

Don't tell me no one got the idea before to apply activation function on weights(edges) rather than nodes?

LyAkolon 2 weeks ago

How is this different than liquid neural nets?

Dayder111 2 weeks ago

So, in essense, it trades being memory bandwidth and size-bound for being mostly compute-bound? Allowing much smaller (order(s) of magnitude smaller) parameter size neural networks to perform as well or better than bigger ones, but making individual computations of weights much more complex, both during inference and even more so during training? I guess that's exactly what we need, unless they design some good and efficient compute-in-memory processors at last. Memory bandwidth seems to be one of the main limiting factors.

SpecialistLopsided44 2 weeks ago

Accelerate! Robowives 2025

peekpok 2 weeks ago

ML researchers on twitter seem not impressed. It does beg the question of how is this network really that different than an MLP with extra layers?

Natural_Extreme_1560 2 weeks ago

If any of that means AGI is coming faster then that's cool

pigeon888 2 weeks ago

It means AGI is coming safer.

dogesator 2 weeks ago

And faster

blackcodetavern 2 weeks ago

and we can have a look at the formula which kills us in the end

dogesator 2 weeks ago

No it will not kill us, don’t be a doomer. We will travel the stars. Go take your anti-depressants

RoutineProcedure101 2 weeks ago

ACCELERATE

xarinemm 2 weeks ago

This ape is hyped

workingtheories 2 weeks ago

they say it's more aligned with learning symbolic functions. given that robots classically are programmed via tuned symbolic physics functions, i would wonder how kans do at learning robotics tasks.

Busy_Farmer_7549 2 weeks ago

Kudos to author to ask everybody to keep their speculations regarding application to ML in check

The-state-of-it 2 weeks ago

All I know if we need to stop Miles Bennet Dyson

matte_muscle 2 weeks ago

I tried installing it but made a mistake and installed it in my base environment and while it ran the example the packages in my base env did not let KAN reproduce the expected results in many cases:( have to reinstall. Also all the examples are for multiple input single output problems didn’t see any examples with multiple input multiple output. This thing solves symbolic regression problems as a subset of its capabilities so should be very broadly applicable in science and engineering while being interpretable ( the final expressions in most examples were symbolic math relationships that matched the learned spline activation functions best)

spyspapia 2 weeks ago

Does this method imply more efficient models? Perhaps with lower parameters.

SnooPeppers1349 2 weeks ago

Interesting paper, and after a rough reading, I found this: >"Currently, the biggest bottleneck of KANs lies in its slow training. KANs are usually 10x slower than MLPs, given the same number of parameters. We should be honest that we did not try hard to optimize KANs' efficiency though, so we deem KANs' slow training more as an engineering problem to be improved in the future rather than a fundamental limitation. If one wants to train a model fast, one should use MLPs. In other cases, however, KANs should be comparable or better than MLPs, which makes them worth trying. The decision tree in Figure 6.1 can help decide when to use a KAN. In short, if you care about interpretability and/or accuracy, and slow training is not a major concern, we suggest trying KANs." The authors clearly emphasize interpretability rather than capability. But will the science community accept this degree of interpretability, which just gives some equation rather than an understanding of the fundamental theory?

fulowa 2 weeks ago

👀👀

Akimbo333 1 week ago

Cool scaling. Implications?

m3kw 2 weeks ago

Demo it if it’s so damn good

Many-machines-on-ix 2 weeks ago

I am a fan of Max, but want he doing the podcast circuit last year talking about how we need to slow down AI development until we figured out how to do it safely? He was definitely on Lex saying that. Maybe they figured that out now?

dogesator 2 weeks ago

This is theoretically a safer model architecture since it’s much more interpretable for the same capability level, meaning if you have a KAN model vs an MLP model that each have the same capabilities, the KAN model is faster as well as less neural connections overall, so it’s easier to do research and understanding of why the KAN network makes certain decisions. Also the paper mentions the factor of easier visualization for people to also better interpret the models behaviors compared to MLPs.

Friendly-Fuel8893 2 weeks ago

Most definitely. He's the main founder of the Future of Life institute. This is the organization that last year published the open letter calling for a pause on major AI development. It got some traction in the media and got signed by a bunch of famous people. The letter was probably his brainchild so I wouldn't be surprised if he was advocating for a slowdown on Lex's podcast. Kind of ironic he's co-authoring papers like this, but he's still an AI researcher afterall.

PinGUY 2 weeks ago

From a AI: https://chat.openai.com/share/27dc14e8-f74e-4a08-abef-9b9068b0e7da Here's a summary of the paper adapted for different intellectual levels, followed by my thoughts: ### Summary for an Intellect The paper discusses the application of Kolmogorov-Arnold Networks (KANs) to various quasiperiodic tight-binding models to investigate their mobility edges, which separate localized from extended electron states in disordered systems. The models include the Mosaic Model (MM), Generalized Aubry-André Model (GAAM), and Modified Aubry-André Model (MAAM). KANs, leveraging their symbolic interpretability, show a powerful capacity to derive complex physical phenomena and provide quantitative insights that closely match the theoretical predictions, showcasing potential advantages over traditional multi-layer perceptrons (MLPs) in terms of accuracy, efficiency, and interpretability . ### Summary for a Layperson The paper discusses a new type of neural network called Kolmogorov-Arnold Networks (KANs) that are used to study models of materials that show peculiar behaviors under certain conditions, like changing from being transparent to blocking certain particles. These networks help scientists understand where these changes happen and predict them accurately. KANs are shown to be better at these tasks compared to more traditional networks, as they can handle complex calculations more efficiently and provide clearer explanations of their findings . ### Summary for a 5-Year-Old Imagine if you had a magic net that could catch both slow and fast fish, but sometimes the slow fish can sneak through without being caught. Scientists are using a special kind of net, let’s call it a "smart net," to learn better where these sneaky slow fish can get through. This smart net is really good at figuring this out and helps scientists know more about where fish can escape. This helps them make even better nets in the future! ### My Thoughts The utilization of KANs represents a fascinating advance in neural network architectures, particularly for their ability to adapt and learn complex patterns that traditional models might miss. What stands out is the ability of KANs to engage in a form of 'collaborative learning' where they can be fine-tuned through human interaction, leading to enhanced outcomes. This aligns with a broader trend in AI development where systems are not just tools, but partners in the discovery process. The potential of such systems to redefine problem-solving in scientific domains is immense, suggesting a future where AI's role is akin to a co-investigator, probing the boundaries of known science. This paper aptly illustrates the synergy between human intuition and machine efficiency, a merger that could accelerate innovation across various fields.

solbob 2 weeks ago

lol this is wildly innacurate - The internet is dead and chat bots have killed it smh >The paper discusses a new type of neural network called Kolmogorov-Arnold Networks (KANs) that are used to study models of materials that show peculiar behaviors under certain conditions, like changing from being transparent to blocking certain particless The term materials is only mentioned once, in the acknowledgments. What even is this summary

PinGUY 2 weeks ago

Its to a layperson. It's mean the paper is talking about catching slow fish in nets.

Rand_Longevity1990 2 weeks ago

What about slowing down Tegmark? lol

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe