The 9B model outperformed the 30B model in a task for a local LLM agent

Mikhail T. (Sh0ny)

1 июля 2026

2 min read

In short

The author of the experiment compared two local models—with 9B and 30B parameters, respectively—on the same task for an agent. The smaller model completed the task in three steps, while the larger one took 24 steps and still made a mistake. The post explains why model size does not always determine the quality of an agent’s performance.

There is a widespread belief that the more parameters a model has, the smarter it is and the better it will perform the task. In practice, this isn’t always the case, especially when we’re not just talking about a simple answer to a question, but about the work of an LLM agent that must plan its moves, invoke tools, and see the task through to completion.

The author of this post spent three months working with local agents and conducted a straightforward experiment: the same task was given to two models—one with 9 billion parameters and one with 30 billion. The result was unexpected.

What the Experiment Showed

The 30B model took 24 steps to solve the task and ultimately produced an incorrect answer.
The 9B model completed the same task in just 3 steps and produced the correct result.

The difference lies not in the model’s “intelligence” per se, but in how its workflow is organized within the agent framework.

Why size isn’t the main factor

When it comes to a local LLM agent, the final quality depends not only on the number of parameters but also on a whole set of engineering solutions surrounding the model itself:

how the cycle of reasoning and actions is structured (planning, invoking tools, verifying the result);
how clearly and concisely the system prompt is formulated;
how the model handles context and maintains focus across long chains of steps;
whether there are mechanisms to prevent the agent from performing unnecessary or redundant actions;
how well the tools and their descriptions are tailored to the specific model.

A large model lacking such constraints can “get stuck” in excessive reasoning, piling up steps without any real benefit. A smaller model, on the other hand, operating within a properly configured loop, acts more directly and predictably—and thus arrives at the correct answer more quickly.

Conclusion for Those Building Agents Locally

The main practical lesson is that when developing local agents, it’s worth investing not only in choosing the largest available model, but also in the architecture for interacting with it: the prompt system, stop logic, tools, and the method of context transfer.

It is these details—not the number of parameters—that determine how many steps the agent will need to complete a task and how accurate the final result will be.

Source: Habr

новости ai llm агенты

Liked this write-up? Get one like it in your inbox every week

Comments

(0)

What the Experiment Showed

The 30B model took 24 steps to solve the task and ultimately produced an incorrect answer.

The 9B model completed the same task in just 3 steps and produced the correct result.

The difference lies not in the model’s “intelligence” per se, but in how its workflow is organized within the agent framework.

Why size isn’t the main factor

When it comes to a local LLM agent, the final quality depends not only on the number of parameters but also on a whole set of engineering solutions surrounding the model itself:

how the cycle of reasoning and actions is structured (planning, invoking tools, verifying the result);

how clearly and concisely the system prompt is formulated;

how the model handles context and maintains focus across long chains of steps;

whether there are mechanisms to prevent the agent from performing unnecessary or redundant actions;

how well the tools and their descriptions are tailored to the specific model.

Conclusion for Those Building Agents Locally

It is these details—not the number of parameters—that determine how many steps the agent will need to complete a task and how accurate the final result will be.

Source: Habr