Neural Audio Codecs: Audio Compression Using LLM

Mikhail T. (Sh0ny)

20 июня 2026

1 min read

Updated 4 июля 2026

In short

The French company Kyutai has released the Moshi speech model, which features the Mimi neural audio codec—the first open-source end-to-end AI for real-time conversations. Let’s take a closer look at how these codecs work.

In July 2024, the French company Kyutai unveiled the Moshi model—the world’s first open-source end-to-end voice AI capable of real-time conversation. The key technology behind it is the Mimi neural audio codec.

How does it work?

Instead of directly predicting audio samples, the audio codec operates in three stages:

Audio tokenization—converting the audio signal into a sequence of tokens.
Predicting the next tokens using an LLM—the neural network learns to predict which tokens will follow.
Reconstructing the original — converting the tokens back into sound.

This approach allows for significant compression of audio data without loss of quality, opening up new possibilities for voice interfaces and real-time communication.

Source: Best Posts of the Week

нейронные сети аудиокодеки llm искусственный интеллект открытый код

Neural Audio Codecs: Audio Compression Using LLM

Mikhail T. (Sh0ny)

How does it work?

Liked this write-up? Get one like it in your inbox every week

Comments

(0)

Neural Audio Codecs: Audio Compression Using LLM

Mikhail T. (Sh0ny)

How does it work?

Liked this write-up? Get one like it in your inbox every week

Comments

(0)