
A report from Epoch AI examines the economics of reinforcement learning (RL) environments, a market that has become a core input into how frontier AI models are trained.
Based on interviews with 18 people across RL environment startups, neolabs and frontier AI labs, the report brings together insights on how RL environments are built, priced, and used.
Among those interviewed, one RL environment founder said, “I’ve seen $200 to $2,000 mostly. $20k per task would be rare but possible.” EpochAI stated that the $20k figure “comes up for especially complex software engineering tasks, but it’s rare.”
In modern RL training, an environment defines the world a model operates in—what actions it can take, such as running code, clicking through software interfaces, querying databases, or using tools, and how the system responds.
Tasks sit on top of these environments, specifying the objective and the grader that determines whether the objective has been achieved. Once built, a single environment can support hundreds of tasks, which is what makes the business viable despite high upfront costs.
Epoch AI cited examples of RL environments such as a Bloomberg terminal clone, where tasks involve calculating metrics such as five-year compound annual growth rates, with the system simulating the interface and automatically verifying the results.
The report points to a growing ecosystem of startups that build and sell RL environments as a service.
Companies such as Mercor, Surge, Handshake, and Turing, which are traditionally known for providing human-labelled data, now also sell RL environments.
Spending in this market is substantial. “Contract sizes are often six to seven figures per quarter,” the report said.
One RL environment founder noted that contracts frequently reach seven figures per quarter or more, while a neolab researcher said they had seen contracts in the $300,000 to $500,000 range, depending on task volume.
RL environments and tasks can be sold exclusively to a single lab or non-exclusively to multiple customers.
Two RL environment founders independently told Epoch AI that exclusive deals are roughly four to five times more expensive than non-exclusive ones.
Recently, SemiAnalysis also reported that so-called “UI gym” environments—mocked-up replicas of real websites used to train agents—typically cost around $20,000 per website.
It added that “OpenAI has purchased hundreds of sites for ChatGPT Agent training and development.” These environments are usually built once and reused across multiple model generations, improving their return on investment.
The Information previously reported that Anthropic had discussed spending more than $1 billion on RL environments over the course of a year.
According to EpochAI, RL environments are reused across multiple stages of model development.
The same environment–task pair can be used for reinforcement learning, benchmarking, or supervised fine-tuning on successful trajectories. In practice, reinforcement learning dominates.
One RL environment startup employee said, “RL is the main use. We have some requests for creating [environments] for benchmarking. I’d say perhaps 10–20x more the former vs the latter.”
Early RL environments focused on mathematics and coding tasks with verifiable answers.
While coding remains a major source of demand, interviewees said growth is increasingly coming from enterprise workflows—tasks that mirror real business processes and can be scored reliably.
The report also notes growing interest in longer-horizon tasks, where models must complete multi-step objectives across multiple tools or interfaces rather than single-turn problems.
Across sectors, interviewees emphasised that progress is constrained less by compute and more by the availability of high-quality, robust environments that resist reward hacking and provide meaningful learning signals.
The post Complex Reinforcement Learning Tasks Can Cost Up to $20,000 Each: EpochAI Report appeared first on Analytics India Magazine.