Mambabyte : An unlabeled selectional state space model

Mambabyte

AI Model AI Model Inference Training #Language Model #Unlabeled #Byte-Level Model Standard Picks Open Source

Overview :

MambaByte is an unlabeled language model that learns directly from raw bytes, eliminating biases introduced by subword tokenization. While it operates on bytes, this results in significantly lengthened sequences, posing challenges for the extensibility of standard autoregressive Transformer models. We trained MambaByte autoregressively on byte sequences, representing a unlabeled adaptation of the Mamba state space model. Our experiments demonstrate that MambaByte exhibits higher computational efficiency compared to other byte-level models. We further observe that MambaByte performs comparably to or even surpasses the performance of state-of-the-art subword Transformers. Moreover, due to its linear length expansion, MambaByte achieves faster inference speeds compared to Transformers. Our findings validate the feasibility of MambaByte in achieving unlabeled language modeling.

Target Users :

MambaByte is suitable for language modeling tasks that require eliminating subword tokenization bias and improving computational efficiency.

Total Visits： 29.7M

Top Region： US(17.94%)

Website Views ： 47.2K

Use Cases

MambaByte model used in natural language processing tasks

Example of MambaByte's use in text generation applications

Case study of using MambaByte for sentiment analysis

Features

Unlabeled Language Modeling

Eliminating Subword Tokenization Bias