LLMs Powered by Kolmogorov-Arnold Networks
Seeing as the authors claim that KANs are able to reduce the issues of catastrophic forgetting that we see in MLPs, I thought "Wouldn't it be nice if there was an LLM that substituted MLPs with KANs?". I looked around and didn't find one, so I built one! - PyTorch Module of the kan_gpt - Deployed to PyPi - MIT Licence - Test Cases to ensure forward-backward passes work as expected - Training script I am currently working on training it on the WebText dataset to compare it to the original gpt2. Facing a few out-of-memory issues at the moment. Perhaps the vocab size (50257) is too large? I'm open to contributions and would love to hear your thoughts! 0 comments on Hacker News.
Seeing as the authors claim that KANs are able to reduce the issues of catastrophic forgetting that we see in MLPs, I thought "Wouldn't it be nice if there was an LLM that substituted MLPs with KANs?". I looked around and didn't find one, so I built one! - PyTorch Module of the kan_gpt - Deployed to PyPi - MIT Licence - Test Cases to ensure forward-backward passes work as expected - Training script I am currently working on training it on the WebText dataset to compare it to the original gpt2. Facing a few out-of-memory issues at the moment. Perhaps the vocab size (50257) is too large? I'm open to contributions and would love to hear your thoughts!
Seeing as the authors claim that KANs are able to reduce the issues of catastrophic forgetting that we see in MLPs, I thought "Wouldn't it be nice if there was an LLM that substituted MLPs with KANs?". I looked around and didn't find one, so I built one! - PyTorch Module of the kan_gpt - Deployed to PyPi - MIT Licence - Test Cases to ensure forward-backward passes work as expected - Training script I am currently working on training it on the WebText dataset to compare it to the original gpt2. Facing a few out-of-memory issues at the moment. Perhaps the vocab size (50257) is too large? I'm open to contributions and would love to hear your thoughts! 0 comments on Hacker News.
Seeing as the authors claim that KANs are able to reduce the issues of catastrophic forgetting that we see in MLPs, I thought "Wouldn't it be nice if there was an LLM that substituted MLPs with KANs?". I looked around and didn't find one, so I built one! - PyTorch Module of the kan_gpt - Deployed to PyPi - MIT Licence - Test Cases to ensure forward-backward passes work as expected - Training script I am currently working on training it on the WebText dataset to compare it to the original gpt2. Facing a few out-of-memory issues at the moment. Perhaps the vocab size (50257) is too large? I'm open to contributions and would love to hear your thoughts!
Hacker News story: LLMs Powered by Kolmogorov-Arnold Networks
Reviewed by Tha Kur
on
May 04, 2024
Rating:
No comments: