Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About ArticlesStock — AI & Technology Journalist
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    How one can Construct Traceable and Evaluated LLM Workflows Utilizing Promptflow, Prompty, and OpenAI

    Naveed AhmadBy Naveed Ahmad29/04/2026Updated:29/04/2026No Comments2 Mins Read
    blog 84


    (WORK_DIR / "choose.prompty").write_text("""---
    identify: Choose
    mannequin:
     api: chat
     configuration:
       sort: openai
       connection: open_ai_connection
       mannequin: gpt-4o-mini
     parameters:
       temperature: 0
       max_tokens: 150
       response_format: {sort: json_object}
    inputs:
     query: {sort: string}
     reply:   {sort: string}
     anticipated: {sort: string}
    ---
    system:
    You're an exacting grader. Determine whether or not the assistant's reply comprises the anticipated reality (case-insensitive, permitting affordable phrasing/synonyms). Reply ONLY as JSON: {"rating": 0 or 1, "purpose": "..."}.
    
    
    person:
    Query: {{query}}
    Anticipated: {{anticipated}}
    Reply:   {{reply}}
    """)
    
    
    (WORK_DIR / "eval_flow.py").write_text(textwrap.dedent('''
       import json
       from pathlib import Path
       from promptflow.tracing import hint
       from promptflow.core import Prompty
    
    
       BASE = Path(__file__).mother or father
    
    
       class Evaluator:
           def __init__(self):
               self.choose = Prompty.load(supply=BASE / "choose.prompty")
    
    
           @hint
           def __call__(self, query: str, reply: str, anticipated: str) -> dict:
               uncooked = self.choose(query=query, reply=reply, anticipated=anticipated)
               if isinstance(uncooked, str):
                   strive: uncooked = json.masses(uncooked)
                   besides Exception: uncooked = {"rating": 0, "purpose": f"unparseable:{uncooked[:80]}"}
               return {"rating": int(uncooked.get("rating", 0)), "purpose": str(uncooked.get("purpose",""))}
    
    
           def __aggregate__(self, line_results):
               """Run-level aggregation. No matter this returns reveals up in pf.get_metrics()."""
               scores = [r["score"] for r in line_results if r]
               return {
                   "accuracy": (sum(scores) / len(scores)) if scores else 0.0,
                   "handed":   sum(scores),
                   "whole":    len(scores),
               }
    '''))
    
    
    (WORK_DIR / "eval.flex.yaml").write_text(
       "$schema: https://azuremlschemas.azureedge.internet/promptflow/newest/Circulate.schema.jsonn"
       "entry: eval_flow:Evaluatorn"
    )
    
    
    print("n=== Analysis run ===")
    eval_run = pf.run(
       circulation=str(WORK_DIR / "eval.flex.yaml"),
       information=str(data_path),
       run=base_run,
       column_mapping={
           "query": "${information.query}",
           "anticipated": "${information.anticipated}",
           "reply":   "${run.outputs.reply}",
       },
       stream=True,
    )
    
    
    eval_details = pf.get_details(eval_run)
    print(eval_details)
    
    
    print("n=== Aggregated metrics (from __aggregate__) ===")
    print(json.dumps(pf.get_metrics(eval_run), indent=2))
    
    
    import pandas as pd
    if "outputs.rating" in eval_details.columns:
       s = pd.to_numeric(eval_details["outputs.score"], errors="coerce").fillna(0)
       print(f"Guide accuracy: {s.imply():.2%}  ({int(s.sum())}/{len(s)})")



    Source link

    Naveed Ahmad

    Naveed Ahmad is a technology journalist and AI writer at ArticlesStock, covering artificial intelligence, machine learning, and emerging tech policy. Read his latest articles.

    Related Posts

    How one enterprise agency is investing in an more and more fragmented world

    29/04/2026

    At his OpenAI trial, Musk relitigates an outdated friendship

    29/04/2026

    OpenAI Actually Desires Codex to Shut Up About Goblins

    29/04/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.