How an AI Agent Chooses What to Do Below Tokens, Latency, and Device-Name Price range Constraints?

**Why a Cost-Aware Planning Agent Matters**

As AI agents become more prevalent in our daily lives, it’s crucial to develop systems that can make decisions not just based on accuracy, but also on real-world constraints. In this tutorial, I’ll show you how to build a cost-aware planning agent that balances output quality with token utilization, latency, and tool-call budgets. This is essential because AI agents are no longer just assistants – they’re decision-makers that require explicit consideration of trade-offs, efficiency, and resource awareness.

**Budgeting Abstractions: The Foundation of Cost-Aware Planning**

To start, we define the core budgeting abstractions that allow our agent to reason about prices. We model token utilization, latency, and tool calls as first-class components and provide utility methods to build up and validate spend. This setup gives us a clear basis for imposing constraints during planning and execution.

**A Look Under the Hood: Budget and Spend Classes**

Here’s an example of how we define our Budget and Spend classes:
“`python
from dataclasses import dataclass, fields
from typing import Record, Dict, Optional, Tuple, Any

@dataclass
class Budget:
max_tokens: int
max_latency_ms: int
max_tool_calls: int

@dataclass
class Spend:
tokens: int = 0
latency_ms: int = 0
tool_calls: int = 0

def inside(self, b: Budget) -> bool:
return (self.tokens <= b.max_tokens and
self.latency_ms <= b.max_latency_ms and
self.tool_calls “Spend”:
return Spend(
tokens=self.tokens + other.tokens,
latency_ms=self.latency_ms + other.latency_ms,
tool_calls=self.tool_calls + other.tool_calls
)
“`
**Executing Actions and Generating Text: The LLM Wrapper**

Next, we introduce the information structures that represent individual action decisions and full plan candidates. We also outline a lightweight LLM wrapper that standardizes how text is generated and measured.

**Planning and Execution: Finding the Highest-Value Mixture of Steps**

With the building blocks in place, we implement the budget-constrained planning logic that searches for the highest-value mixture of steps under strict limits.

**Executing the Plan: Monitoring Precise Resource Utilization**

Finally, we execute the chosen plan and monitor precise resource utilization step-by-step. We dynamically select between native and LLM execution paths and combine the ultimate output into a coherent draft.

**Conclusion**

By developing a cost-aware planning agent, we can create more scalable and controllable AI systems that are sensitive to real-world constraints. This is a crucial step towards building AI systems that can effectively collaborate with humans and make decisions that balance multiple objectives. Try the full code and join our community on Twitter, Reddit, and Telegram to stay updated on the latest AI developments!

How an AI Agent Chooses What to Do Below Tokens, Latency, and Device-Name Price range Constraints?

MiniMax Simply Open Sourced MiniMax M2.7: A Self-Evolving Agent Mannequin that Scores 56.22% on SWE-Professional and 57.0% on Terminal Bench 2

Walmart-owned Flipkart, Amazon are squeezing India’s fast commerce startups

Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Imaginative and prescient-Language Mannequin with Bounding Field Prediction, Multilingual Assist, and Sub-250ms Edge Inference

How an AI Agent Chooses What to Do Below Tokens, Latency, and Device-Name Price range Constraints?

Related Posts

MiniMax Simply Open Sourced MiniMax M2.7: A Self-Evolving Agent Mannequin that Scores 56.22% on SWE-Professional and 57.0% on Terminal Bench 2

Walmart-owned Flipkart, Amazon are squeezing India’s fast commerce startups

Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Imaginative and prescient-Language Mannequin with Bounding Field Prediction, Multilingual Assist, and Sub-250ms Edge Inference