Researchers introduce Self-Harness, a framework that lets AI agents rewrite their own rules, boosting performance up to 60%

🇨🇳 VentureBeat AI (CN) —
Researchers introduce Self-Harness, a framework that lets AI agents rewrite their own rules, boosting performance up to 60%

AI Summary

Researchers at the Shanghai Artificial Intelligence Laboratory have introduced Self-Harness, an innovative AI framework that enables language model-based agents to autonomously rewrite and improve their operating rules, leading to performance boosts up to 60%. This approach addresses challenges in manual harness engineering by leveraging systematic feedback loops for continuous improvement.

Not every company can or should build their own frontier AI language model. However, the harness controlling the model is something that most enterprises can and should customize for their specific purposes. Of course, this is easier said than done. Agent harnesses are still largely tuned through manual, ad hoc debugging — a process that relies heavily on intuition rather than systematic feedback loops, making it difficult to keep pace with rapidly evolving LLMs. To solve this challenge, researchers at the Shanghai Artificial Intelligence Laboratory have introduced “Self-Harness,” a new paradigm in which an LLM-based agent systematically improves its own operating rules. By examining its own execution traces to apply edits, the system trades manual guesswork for empirical evidence. Self-improving harnesses can enable development teams to deploy robust custom agents that continually adapt their own execution protocols to overcome model-specific weaknesses. The challenge of harness engineering An LLM-based agent's performance is not determined solely by its underlying base model, but also by its harness: the surrounding system that provides context and enables the model to interact with the environment. A harness includes components like system prompts, tools, memory, verification rules, runtime policies, orchestration logic, and failure-recovery procedures. This layer is crucial because many common agent failures stem from the harness rather than the model. For example, an agent may report success without checking the model’s response (e.g., running the code to see if it passes the tests), or it might retry a failed action repeatedly. The harness is also responsible for preventing context rot or overload when the agent’s interaction history grows very large. Examples of popular harnesses include SWE-agent, Claude Code, Codex, and OpenHands. Harness engineering remains a significant challenge, but the bottleneck isn't necessarily that humans are too slow or incapable. In fact, Hangfan Zhang, lead author of the Self-Harness paper, told VentureBeat that "in many cases, an experienced engineer with deep domain knowledge can still propose better changes than an LLM can today." Instead, the true bottleneck of manual engineering is that it relies heavily on ad hoc debugging rather than a verifiable, empirical feedback loop. "The deeper issue is that the current harness-engineering paradigm often lacks a systematic feedback loop," Zhang explained. "Many edits are made based on intuition, a few observed failures, or ad hoc debugging." With new models being released at a rapid pace, depending on human intuition to manually tune model-specific harnesses becomes increasingly costly and untenable. While some approaches use stronger models to improve the harnesses of weaker target agents, this dependence on external guidance has its own challenges, as these models may be costly, unavailable for frontier models, or mismatched to the target model's failure modes. How Self-Harness works The Self-Harness paradigm enables an LLM-based agent to improve its own harness without relying on human engineers or stronger external models. This continuous self-evolution is driven by a three-stage iterative loop that turns behavioral evidence into harness updates: Weakness mining: Starting from an initial harness, the agent runs a set of tasks, producing execution traces with verifiable outcomes. The agent categorizes failed traces and tries to detect model-specific failure patterns. Harness proposal: Based on these failure patterns, the agent uses a “proposer” role to generate a set of diverse yet minimal harness modifications, each tied to a specific failure mechanism to avoid overly general corrections. Proposal validation: The system evaluates candidate modifications through regression tests. An edit is promoted only if it improves performance without causing measurable degradation on held-out tasks. If multiple candidate modifications pass the regression tests, they are merged into the next version of the harness, which then serves as the starting point for the next iteration. To visualize why an enterprise would need this, imagine an automated issue-fixing agent that reads internal documentation, writes patches, and opens pull requests. If the company updates its documentation style, the agent might suddenly fail, pulling the wrong context or writing bad patches. On the surface, the agent simply looks broken. But Self-Harness turns this ambiguous failure into a solvable problem. "The failure traces expose where the agent is misusing the new documentation format; the proposer can generate a targeted harness edit... and the evaluator can decide whether that edit improves the failing cases without regressing other cases," Zhang said. Self-Harness in action The researchers evaluated Self-Harness on Terminal-Bench-2.0, a benchmark that tests general tool-based execution, including artifact management, command use, verification beh

AI & Tech Self-Harness artificial intelligence LLM agents machine learning AI framework Shanghai AI Laboratory performance improvement

Read original source →