Agentic Sandbox

You are building an AI coding agent that calls third-party APIs. Running the agent against the real service during development is expensive, rate-limited, and produces unpredictable results.

Problem

Third-party API calls from an agent cost money, consume quota, and can fail for reasons unrelated to the agent’s logic. Reproducibility is essential for iterating on agent behavior, but real services are not reproducible.

Solution

Point the agent at a Counterfact mock instead of the real service. Control exactly what the mock returns so you can test every response scenario — including failures — cheaply and repeatably. Use the REPL to change mock behavior while the agent is running, without restarting anything.

Example

Generate the mock from the target API’s OpenAPI spec:

npx counterfact@latest https://raw.githubusercontent.com/stripe/openapi/master/openapi/spec3.yaml stripe-mock

Configure the agent to point at the mock instead of the real Stripe API:

const stripe = new Stripe("sk_test_fake", { host: "localhost", port: 3100, protocol: "http" });

Customize the handler to return exactly what your agent needs to see:

// stripe-mock/routes/v1/charges.ts
export const POST: HTTP_POST = ($) => {
  return $.response[200].json({
    id: "ch_mock_001",
    status: "succeeded",
    amount: $.body.amount,
    currency: $.body.currency,
  });
};

To test how the agent handles a rate limit, toggle a context flag and steer the agent from the REPL while it runs:

// stripe-mock/routes/_.context.ts
export class Context {
  simulateRateLimit = false;
}
// stripe-mock/routes/v1/charges.ts
export const POST: HTTP_POST = ($) => {
  if ($.context.simulateRateLimit) {
    return $.response[429].json({ error: { message: "Too many requests" } });
  }
  return $.response[200].json({
    id: "ch_mock_001",
    status: "succeeded",
    amount: $.body.amount,
    currency: $.body.currency,
  });
};
⬣> context.simulateRateLimit = true

The agent’s next request hits the 429. Its retry logic runs for real.

Consequences