This is an info Alert.
⌘K
  • Home
  • News
  • Blog
  • Releases
  • LLM history
  • Compare LLMs
  • Library
  • About
Sign in

A blog and notes on development. The easiest way to reach me is via the social links below.

Documents
Terms of UsePrivacy Policy
Contacts
talalaev.misha@gmail.com

© All rights reserved.

Testing the Qwen and Whisper ASR models on pre-revolutionary Russian

Mikhail T. (Sh0ny)
Mikhail T. (Sh0ny)
24 июня 2026
  1. Home
  2. Blog
  3. Testing the Qwen and Whisper ASR models on pre-revolutionary Russian
1 min read

In short

Modern speech recognition systems promise to take context into account, but their capabilities are limited. We tested the Qwen and Whisper models on pre-revolutionary texts to evaluate transcription quality for long recordings and in the presence of noise.

Recording your thoughts by voice or transcribing conversations is convenient, but not always reliable. Modern ASR systems (automatic speech recognition) of the new generation, such as Qwen and Whisper, are capable of taking context into account and producing meaningful text. However, they have architectural limitations.

To understand whether these models are ready for real-world scenarios, we conducted a benchmark on Hugging Face. We focused primarily on pre-revolutionary Russian—a language that is rare and difficult to recognize.

What We Tested

  • Context Window: Does understanding break down in long video recordings?
  • Impact of Noise: How does background noise affect transcription quality?

Results

  • The models showed varying levels of robustness to long recordings. Qwen maintains context better, but Whisper is more accurate on short segments.
  • Noise significantly reduces the accuracy of both models, especially at low signal-to-noise ratios.
  • Both models make more errors with pre-revolutionary Russian than with modern Russian.

Conclusions

Testing showed that even state-of-the-art ASR systems are not perfect. Further refinements are needed to improve recognition quality under specific conditions (long recordings, noise, rare languages).

Source: Habr

новостиaiнейросетитехнологии
Liked this write-up? Get one like it in your inbox every week
​

Comments

(0)
​