Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About ArticlesStock — AI & Technology Journalist
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    How one can Construct a Value-Conscious LLM Routing System with NadirClaw Utilizing Native Immediate Classification and Gemini Mannequin Switching

    Naveed AhmadBy Naveed Ahmad10/05/2026Updated:10/05/2026No Comments2 Mins Read
    blog11 1 3


    if proxy_alive():
       print("n[10] Blended 10-prompt workload…")
       workload = [
           "Capital of France?",
           "Read foo.py",
           "Type hint for a list of dicts",
           "Lowercase: HELLO",
           "One-sentence summary of REST",
           "Refactor a callback chain into async/await with proper error handling",
           "Design a sharded multi-region key-value store with linearizable reads",
           "Analyze the asymptotic complexity of this code and prove the bound rigorously",
           "Debug why our gRPC stream stalls when the client TCP window saturates",
           "Compare and contrast B-trees and LSM-trees for write-heavy workloads",
       ]
       runs = []
       consumer = OpenAI(base_url=f"http://localhost:{PORT}/v1", api_key="native")
       for p in workload:
           t0 = time.time()
           strive:
               r = consumer.chat.completions.create(
                   mannequin="auto",
                   messages=[{"role": "user", "content": p}],
                   max_tokens=140,
               )
               utilization = getattr(r, "utilization", None)
               runs.append({
                   "immediate": p[:55],
                   "mannequin": r.mannequin,
                   "latency_s": spherical(time.time() - t0, 2),
                   "in_tok": getattr(utilization, "prompt_tokens", 0) if utilization else 0,
                   "out_tok": getattr(utilization, "completion_tokens", 0) if utilization else 0,
               })
           besides Exception as e:
               runs.append({"immediate": p[:55], "mannequin": "ERROR",
                            "latency_s": None, "in_tok": 0, "out_tok": 0,
                            "error": str(e)[:80]})
       rdf = pd.DataFrame(runs)
       print(rdf.to_string(index=False))
       PRICE = {
           "flash": {"in": 0.30 / 1e6, "out": 2.50 / 1e6},
           "professional":   {"in": 1.25 / 1e6, "out": 10.0 / 1e6},
       }
       def price_for(model_str, in_t, out_t):
           m = (model_str or "").decrease()
           tier = "flash" if "flash" in m else "professional"
           return in_t * PRICE[tier]["in"] + out_t * PRICE[tier]["out"]
       cost_routed = sum(price_for(r["model"], r["in_tok"], r["out_tok"]) for r in runs)
       cost_no_route = sum(price_for("gemini-2.5-pro", r["in_tok"], r["out_tok"]) for r in runs)
       print(f"n[10] Value (NadirClaw routed)        : ${cost_routed:.6f}")
       print(f"     Value (always-Professional baseline)     : ${cost_no_route:.6f}")
       if cost_no_route > 0:
           print(f"     Estimated financial savings on this run  : "
                 f"{(1 - cost_routed/cost_no_route) * 100:.1f}%")
    print("n[11] `nadirclaw report` (parses the JSONL request log):")
    rep = subprocess.run(["nadirclaw", "report"], capture_output=True, textual content=True, timeout=60)
    print(rep.stdout or rep.stderr)
    if proxy_alive():
       print("n[12] Stopping the proxy…")
       strive:
           if hasattr(os, "killpg"):
               os.killpg(os.getpgid(server_proc.pid), sign.SIGTERM)
           else:
               server_proc.terminate()
           server_proc.wait(timeout=10)
       besides Exception:
           strive:
               server_proc.kill()
           besides Exception:
               go
       print("    ✓ proxy stopped.")
    print("nDone. 🎉")



    Source link

    Naveed Ahmad

    Naveed Ahmad is a technology journalist and AI writer at ArticlesStock, covering artificial intelligence, machine learning, and emerging tech policy. Read his latest articles.

    Related Posts

    Korea’s largest producers again Config, the TSMC of robotic information

    11/05/2026

    I Work in Hollywood. Everybody Who Used to Make TV Is Now Secretly Coaching AI

    11/05/2026

    Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Coaching Speedup in LLMs

    11/05/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.