Hacker News story: LLMs Powered by Kolmogorov-Arnold Networks

LLMs Powered by Kolmogorov-Arnold Networks
Seeing as the authors claim that KANs are able to reduce the issues of catastrophic forgetting that we see in MLPs, I thought "Wouldn't it be nice if there was an LLM that substituted MLPs with KANs?". I looked around and didn't find one, so I built one! - PyTorch Module of the kan_gpt - Deployed to PyPi - MIT Licence - Test Cases to ensure forward-backward passes work as expected - Training script I am currently working on training it on the WebText dataset to compare it to the original gpt2. Facing a few out-of-memory issues at the moment. Perhaps the vocab size (50257) is too large? I'm open to contributions and would love to hear your thoughts! 0 comments on Hacker News.
Seeing as the authors claim that KANs are able to reduce the issues of catastrophic forgetting that we see in MLPs, I thought "Wouldn't it be nice if there was an LLM that substituted MLPs with KANs?". I looked around and didn't find one, so I built one! - PyTorch Module of the kan_gpt - Deployed to PyPi - MIT Licence - Test Cases to ensure forward-backward passes work as expected - Training script I am currently working on training it on the WebText dataset to compare it to the original gpt2. Facing a few out-of-memory issues at the moment. Perhaps the vocab size (50257) is too large? I'm open to contributions and would love to hear your thoughts!

Hacker News story: LLMs Powered by Kolmogorov-Arnold Networks Hacker News story: LLMs Powered by Kolmogorov-Arnold Networks Reviewed by Tha Kur on May 04, 2024 Rating: 5

No comments:

Powered by Blogger.