Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About us
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    Tips on how to Design a Swiss Military Knife Analysis Agent with Instrument-Utilizing AI, Internet Search, PDF Evaluation, Imaginative and prescient, and Automated Reporting

    Naveed AhmadBy Naveed Ahmad21/02/2026Updated:21/02/2026No Comments7 Mins Read
    blog banner23 47


    On this tutorial, we construct a “Swiss Military Knife” analysis agent that goes far past easy chat interactions and actively solves multi-step analysis issues end-to-end. We mix a tool-using agent structure with reside internet search, native PDF ingestion, vision-based chart evaluation, and automatic report era to show how trendy brokers can motive, confirm, and produce structured outputs. By wiring collectively small brokers, OpenAI fashions, and sensible data-extraction utilities, we present how a single agent can discover sources, cross-check claims, and synthesize findings into professional-grade Markdown and DOCX stories.

    %pip -q set up -U smolagents openai trafilatura duckduckgo-search pypdf pymupdf python-docx pillow tqdm
    
    
    import os, re, json, getpass
    from typing import Record, Dict, Any
    import requests
    import trafilatura
    from duckduckgo_search import DDGS
    from pypdf import PdfReader
    import fitz
    from docx import Doc
    from docx.shared import Pt
    from datetime import datetime
    
    
    from openai import OpenAI
    from smolagents import CodeAgent, OpenAIModel, device
    
    
    if not os.environ.get("OPENAI_API_KEY"):
       os.environ["OPENAI_API_KEY"] = getpass.getpass("Paste your OpenAI API key (hidden): ").strip()
    print("OPENAI_API_KEY set:", "YES" if os.environ.get("OPENAI_API_KEY") else "NO")
    
    
    if not os.environ.get("SERPER_API_KEY"):
       serper = getpass.getpass("Non-compulsory: Paste SERPER_API_KEY for Google outcomes (press Enter to skip): ").strip()
       if serper:
           os.environ["SERPER_API_KEY"] = serper
    print("SERPER_API_KEY set:", "YES" if os.environ.get("SERPER_API_KEY") else "NO")
    
    
    consumer = OpenAI()
    
    
    def _now():
       return datetime.utcnow().strftime("%Y-%m-%d %H:%M:%SZ")
    
    
    def _safe_filename(s: str) -> str:
       s = re.sub(r"[^a-zA-Z0-9._-]+", "_", s).strip("_")
       return s[:180] if s else "file"

    We arrange the total execution setting and securely load all required credentials with out hardcoding secrets and techniques. We import all dependencies required for internet search, doc parsing, imaginative and prescient evaluation, and agent orchestration. We additionally initialize shared utilities to standardize timestamps and file naming all through the workflow.

    attempt:
       from google.colab import information
       os.makedirs("/content material/pdfs", exist_ok=True)
       uploaded = information.add()
       for identify, information in uploaded.gadgets():
           if identify.decrease().endswith(".pdf"):
               with open(f"/content material/pdfs/{identify}", "wb") as f:
                   f.write(information)
       print("PDFs in /content material/pdfs:", os.listdir("/content material/pdfs"))
    besides Exception as e:
       print("Add skipped:", str(e))
    
    
    def web_search(question: str, okay: int = 6) -> Record[Dict[str, str]]:
       serper_key = os.environ.get("SERPER_API_KEY", "").strip()
       if serper_key:
           resp = requests.publish(
               "https://google.serper.dev/search",
               headers={"X-API-KEY": serper_key, "Content material-Sort": "software/json"},
               json={"q": question, "num": okay},
               timeout=30,
           )
           resp.raise_for_status()
           information = resp.json()
           out = []
           for merchandise in (information.get("natural") or [])[:k]:
               out.append({
                   "title": merchandise.get("title",""),
                   "url": merchandise.get("hyperlink",""),
                   "snippet": merchandise.get("snippet",""),
               })
           return out
    
    
       out = []
       with DDGS() as ddgs:
           for r in ddgs.textual content(question, max_results=okay):
               out.append({
                   "title": r.get("title",""),
                   "url": r.get("href",""),
                   "snippet": r.get("physique",""),
               })
       return out
    
    
    def fetch_url_text(url: str) -> Dict[str, Any]:
       attempt:
           downloaded = trafilatura.fetch_url(url, timeout=30)
           if not downloaded:
               return {"url": url, "okay": False, "error": "fetch_failed", "textual content": ""}
           textual content = trafilatura.extract(downloaded, include_comments=False, include_tables=True)
           if not textual content:
               return {"url": url, "okay": False, "error": "extract_failed", "textual content": ""}
           title_guess = subsequent((ln.strip() for ln in textual content.splitlines() if ln.strip()), "")[:120]
           return {"url": url, "okay": True, "title_guess": title_guess, "textual content": textual content}
       besides Exception as e:
           return {"url": url, "okay": False, "error": str(e), "textual content": ""}

    We allow native PDF ingestion and set up a versatile internet search pipeline that works with or with out a paid search API. We present how we gracefully deal with non-obligatory inputs whereas sustaining a dependable analysis move. We additionally implement strong URL fetching and textual content extraction to arrange clear supply materials for downstream reasoning.

    def read_pdf_text(pdf_path: str, max_pages: int = 30) -> Dict[str, Any]:
       reader = PdfReader(pdf_path)
       pages = min(len(reader.pages), max_pages)
       chunks = []
       for i in vary(pages):
           attempt:
               chunks.append(reader.pages[i].extract_text() or "")
           besides Exception:
               chunks.append("")
       return {"pdf_path": pdf_path, "pages_read": pages, "textual content": "nn".be a part of(chunks).strip()}
    
    
    def extract_pdf_images(pdf_path: str, out_dir: str = "/content material/extracted_images", max_pages: int = 10) -> Record[str]:
       os.makedirs(out_dir, exist_ok=True)
       doc = fitz.open(pdf_path)
       saved = []
       pages = min(len(doc), max_pages)
       base = _safe_filename(os.path.basename(pdf_path).rsplit(".", 1)[0])
    
    
       for p in vary(pages):
           web page = doc[p]
           img_list = web page.get_images(full=True)
           for img_i, img in enumerate(img_list):
               xref = img[0]
               pix = fitz.Pixmap(doc, xref)
               if pix.n - pix.alpha >= 4:
                   pix = fitz.Pixmap(fitz.csRGB, pix)
               img_path = os.path.be a part of(out_dir, f"{base}_p{p+1}_img{img_i+1}.png")
               pix.save(img_path)
               saved.append(img_path)
    
    
       doc.shut()
       return saved
    
    
    def vision_analyze_image(image_path: str, query: str, mannequin: str = "gpt-4.1-mini") -> Dict[str, Any]:
       with open(image_path, "rb") as f:
           img_bytes = f.learn()
    
    
       resp = consumer.responses.create(
           mannequin=mannequin,
           enter=[{
               "role": "user",
               "content": [
                   {"type": "input_text", "text": f"Answer concisely and accurately.nnQuestion: {question}"},
                   {"type": "input_image", "image_data": img_bytes},
               ],
           }],
       )
       return {"image_path": image_path, "reply": resp.output_text}

    We concentrate on deep doc understanding by extracting structured textual content and visible artifacts from PDFs. We combine a vision-capable mannequin to interpret charts and figures as a substitute of treating them as opaque photographs. We be sure that numerical developments and visible insights could be transformed into specific, text-based proof.

    def write_markdown(path: str, content material: str) -> str:
       os.makedirs(os.path.dirname(path), exist_ok=True)
       with open(path, "w", encoding="utf-8") as f:
           f.write(content material)
       return path
    
    
    def write_docx_from_markdown(docx_path: str, md: str, title: str = "Analysis Report") -> str:
       os.makedirs(os.path.dirname(docx_path), exist_ok=True)
       doc = Doc()
       t = doc.add_paragraph()
       run = t.add_run(title)
       run.daring = True
       run.font.dimension = Pt(18)
       meta = doc.add_paragraph()
       meta.add_run(f"Generated: {_now()}").italic = True
       doc.add_paragraph("")
       for line in md.splitlines():
           line = line.rstrip()
           if not line:
               doc.add_paragraph("")
               proceed
           if line.startswith("# "):
               doc.add_heading(line[2:].strip(), degree=1)
           elif line.startswith("## "):
               doc.add_heading(line[3:].strip(), degree=2)
           elif line.startswith("### "):
               doc.add_heading(line[4:].strip(), degree=3)
           elif re.match(r"^s*[-*]s+", line):
               p = doc.add_paragraph(fashion="Record Bullet")
               p.add_run(re.sub(r"^s*[-*]s+", "", line).strip())
           else:
               doc.add_paragraph(line)
       doc.save(docx_path)
       return docx_path
    
    
    @device
    def t_web_search(question: str, okay: int = 6) -> str:
       return json.dumps(web_search(question, okay), ensure_ascii=False)
    
    
    @device
    def t_fetch_url_text(url: str) -> str:
       return json.dumps(fetch_url_text(url), ensure_ascii=False)
    
    
    @device
    def t_list_pdfs() -> str:
       pdf_dir = "/content material/pdfs"
       if not os.path.isdir(pdf_dir):
           return json.dumps([])
       paths = [os.path.join(pdf_dir, f) for f in os.listdir(pdf_dir) if f.lower().endswith(".pdf")]
       return json.dumps(sorted(paths), ensure_ascii=False)
    
    
    @device
    def t_read_pdf_text(pdf_path: str, max_pages: int = 30) -> str:
       return json.dumps(read_pdf_text(pdf_path, max_pages=max_pages), ensure_ascii=False)
    
    
    @device
    def t_extract_pdf_images(pdf_path: str, max_pages: int = 10) -> str:
       imgs = extract_pdf_images(pdf_path, max_pages=max_pages)
       return json.dumps(imgs, ensure_ascii=False)
    
    
    @device
    def t_vision_analyze_image(image_path: str, query: str) -> str:
       return json.dumps(vision_analyze_image(image_path, query), ensure_ascii=False)
    
    
    @device
    def t_write_markdown(path: str, content material: str) -> str:
       return write_markdown(path, content material)
    
    
    @device
    def t_write_docx_from_markdown(docx_path: str, md_path: str, title: str = "Analysis Report") -> str:
       with open(md_path, "r", encoding="utf-8") as f:
           md = f.learn()
       return write_docx_from_markdown(docx_path, md, title=title)

    We implement the total output layer by producing Markdown stories and changing them into polished DOCX paperwork. We expose all core capabilities as specific instruments that the agent can motive about and invoke step-by-step. We be sure that each transformation from uncooked information to closing report stays deterministic and inspectable.

    mannequin = OpenAIModel(model_id="gpt-5")
    
    
    agent = CodeAgent(
       instruments=[
           t_web_search,
           t_fetch_url_text,
           t_list_pdfs,
           t_read_pdf_text,
           t_extract_pdf_images,
           t_vision_analyze_image,
           t_write_markdown,
           t_write_docx_from_markdown,
       ],
       mannequin=mannequin,
       add_base_tools=False,
       additional_authorized_imports=["json","re","os","math","datetime","time","textwrap"],
    )
    
    
    SYSTEM_INSTRUCTIONS = """
    You're a Swiss Military Knife Analysis Agent.
    """
    
    
    def run_research(matter: str):
       os.makedirs("/content material/report", exist_ok=True)
       immediate = f"""{SYSTEM_INSTRUCTIONS.strip()}
    
    
    Analysis query:
    {matter}
    
    
    Steps:
    1) Record accessible PDFs (if any) and resolve that are related.
    2) Do internet seek for the subject.
    3) Fetch and extract the textual content of the perfect sources.
    4) If PDFs exist, extract textual content and pictures.
    5) Visually analyze figures.
    6) Write a Markdown report and convert to DOCX.
    """
       return agent.run(immediate)
    
    
    matter = "Construct a analysis temporary on probably the most dependable design patterns for tool-using brokers (2024-2026), specializing in analysis, citations, and failure modes."
    out = run_research(matter)
    print(out[:1500] if isinstance(out, str) else out)
    
    
    attempt:
       from google.colab import information
       information.obtain("/content material/report/report.md")
       information.obtain("/content material/report/report.docx")
    besides Exception as e:
       print("Obtain skipped:", str(e))

    We assemble the entire analysis agent and outline a structured execution plan for multi-step reasoning. We information the agent to go looking, analyze, synthesize, and write utilizing a single coherent immediate. We show how the agent produces a completed analysis artifact that may be reviewed, shared, and reused instantly.

    In conclusion, we demonstrated how a well-designed tool-using agent can operate as a dependable analysis assistant quite than a conversational toy. We showcased how specific instruments, disciplined prompting, and step-by-step execution permit the agent to go looking the net, analyze paperwork and visuals, and generate traceable, citation-aware stories. This strategy affords a sensible blueprint for constructing reliable analysis brokers that emphasize analysis, proof, and failure consciousness, capabilities more and more important for real-world AI techniques.


    Take a look at the Full Codes here. Additionally, be at liberty to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




    Source link

    Naveed Ahmad

    Related Posts

    The creator economic system’s advert income drawback and India’s AI ambitions

    21/02/2026

    Keep in mind HQ? ‘Quiz Daddy’ Scott Rogowsky is again with TextSavvy, a day by day cellular sport present

    21/02/2026

    NVIDIA Releases DreamDojo: An Open-Supply Robotic World Mannequin Skilled on 44,711 Hours of Actual-World Human Video Information

    21/02/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.