Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About us
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    How an AI Agent Chooses What to Do Below Tokens, Latency, and Device-Name Price range Constraints?

    Naveed AhmadBy Naveed Ahmad24/01/2026Updated:30/01/2026No Comments3 Mins Read
    blog banner23 45

    **Why a Cost-Aware Planning Agent Matters**

    As AI agents become more prevalent in our daily lives, it’s crucial to develop systems that can make decisions not just based on accuracy, but also on real-world constraints. In this tutorial, I’ll show you how to build a cost-aware planning agent that balances output quality with token utilization, latency, and tool-call budgets. This is essential because AI agents are no longer just assistants – they’re decision-makers that require explicit consideration of trade-offs, efficiency, and resource awareness.

    **Budgeting Abstractions: The Foundation of Cost-Aware Planning**

    To start, we define the core budgeting abstractions that allow our agent to reason about prices. We model token utilization, latency, and tool calls as first-class components and provide utility methods to build up and validate spend. This setup gives us a clear basis for imposing constraints during planning and execution.

    **A Look Under the Hood: Budget and Spend Classes**

    Here’s an example of how we define our Budget and Spend classes:
    “`python
    from dataclasses import dataclass, fields
    from typing import Record, Dict, Optional, Tuple, Any

    @dataclass
    class Budget:
    max_tokens: int
    max_latency_ms: int
    max_tool_calls: int

    @dataclass
    class Spend:
    tokens: int = 0
    latency_ms: int = 0
    tool_calls: int = 0

    def inside(self, b: Budget) -> bool:
    return (self.tokens <= b.max_tokens and
    self.latency_ms <= b.max_latency_ms and
    self.tool_calls “Spend”:
    return Spend(
    tokens=self.tokens + other.tokens,
    latency_ms=self.latency_ms + other.latency_ms,
    tool_calls=self.tool_calls + other.tool_calls
    )
    “`
    **Executing Actions and Generating Text: The LLM Wrapper**

    Next, we introduce the information structures that represent individual action decisions and full plan candidates. We also outline a lightweight LLM wrapper that standardizes how text is generated and measured.

    **Planning and Execution: Finding the Highest-Value Mixture of Steps**

    With the building blocks in place, we implement the budget-constrained planning logic that searches for the highest-value mixture of steps under strict limits.

    **Executing the Plan: Monitoring Precise Resource Utilization**

    Finally, we execute the chosen plan and monitor precise resource utilization step-by-step. We dynamically select between native and LLM execution paths and combine the ultimate output into a coherent draft.

    **Conclusion**

    By developing a cost-aware planning agent, we can create more scalable and controllable AI systems that are sensitive to real-world constraints. This is a crucial step towards building AI systems that can effectively collaborate with humans and make decisions that balance multiple objectives. Try the full code and join our community on Twitter, Reddit, and Telegram to stay updated on the latest AI developments!

    Naveed Ahmad

    Related Posts

    Alphabet-owned robotics software program firm Intrinsic joins Google

    26/02/2026

    Welcome to the post-hype crypto market

    26/02/2026

    Nous Analysis Releases ‘Hermes Agent’ to Repair AI Forgetfulness with Multi-Stage Reminiscence and Devoted Distant Terminal Entry Assist

    26/02/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.