GitHub - johnma2006/mamba-minimal: Simple, minimal implementation of Mamba in on...

1 year ago

source link: https://github.com/johnma2006/mamba-minimal
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

mamba-minimal

Simple, minimal implementation of Mamba in one file of PyTorch.

Featuring:

Equivalent numerical output as official implementation for both forward and backward pass
Simplified, readable, annotated code

Does NOT include:

Speed. The official implementation is heavily optimized, and these optimizations are core contributions of the Mamba paper. I kept most implementations simple for readability.
Proper parameter initialization (though this could be added without sacrificing readability)

See demo.ipynb for examples of prompt completions.

from model import Mamba
from transformers import AutoTokenizer

model = Mamba.from_pretrained('state-spaces/mamba-370m')
tokenizer = AutoTokenizer.from_pretrained('EleutherAI/gpt-neox-20b')

generate(model, tokenizer, 'Mamba is the')

Mamba is the world's longest venomous snake with an estimated length of over 150 m. With such a large size and a venomous bite, Mamba kills by stabbing the victim (which is more painful and less effective than a single stab of the bite)

150 meters... 🫢 scary!

References

The Mamba architecture was introduced in Mamba: Linear-Time Sequence Modeling with Selective State Spaces by Albert Gu and Tri Dao.

The official implementation is here: https://github.com/state-spaces/mamba/tree/main

Recommend

pythonspeed.com 3 years ago
Cache

Speed up your Conda installs with Mamba

Speed up your Conda installs with Mamba by Itamar Turner-TrauringLast updated 17 Nov 2021, originally created 17 Nov 2021 Conda installs can be very very very slow. Every...

blog.jupyter.org 2 years ago
Cache

Mamba meets JupyterLite

Mamba meets JupyterLiteIntroducing a mamba-based distribution for WebAssembly, and deploying scalable computing environments with JupyterLite.

arxiv.org 1 year ago
Cache

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Computer Science > Machine Learning [Submitted on 8 Jan 2024] MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts...

jackcook.com 1 year ago
Cache

Mamba: The Easy Way

Mamba: The Easy Way Oxford, UK — February 23, 2024 Today, basically any language model you can name is a Transformer model. OpenAI’s ChatGPT, Google’s

www.kolaayonrinde.com 1 year ago
Cache

Mamba Explained: The State Space Model Taking On Transformers

Mamba Explained Feb 11, 2024 The State Space Model taking on Transfor...

siliconangle.com 1 year ago
Cache

AI21 Labs’ Jamba infuses Mamba to bring more context to transformer-based LLMs

www.maginative.com 1 year ago
Cache

Jamba: Production-grade Mamba-based AI model

AI21 Labs Unveils Jamba: The First Production-Grade Mamba-Based AI Model Jamba is a groundbreaking SSM-Transformer model that offers the best of both worlds, a...

thegradient.pub 1 year ago
Cache

Mamba Explained

www.qbitai.com 1 year ago
Cache

Mamba架构第一次做大！混合Transformer，打败Transformer

changelog.com 1 year ago
Cache

Mamba & Jamba

Brought to you by First there was Mamba… now there is Jamba from AI21. This is a model that combines the best non-transformer goodness of Mamba with good ‘ol attention layers. This res...

GitHub - johnma2006/mamba-minimal: Simple, minimal implementation of Mamba in on...

mamba-minimal

References

Recommend

About Joyk