Construct an Autonomous Moist-Lab Protocol Planner and Validator Utilizing Salesforce CodeGen for Agentic Experiment Design and Security Optimization

On this tutorial, we construct a Moist-Lab Protocol Planner & Validator that acts as an clever agent for experimental design and execution. We design the system utilizing Python and combine Salesforce’s CodeGen-350M-mono model for pure language reasoning. We construction the pipeline into modular elements: ProtocolParser for extracting structured information, resembling steps, durations, and temperatures, from textual protocols; InventoryManager for validating reagent availability and expiry; Schedule Planner for producing timelines and parallelization; and Security Validator for figuring out biosafety or chemical hazards. The LLM is then used to generate optimization solutions, successfully closing the loop between notion, planning, validation, and refinement.

import re, json, pandas as pd
from datetime import datetime, timedelta
from collections import defaultdict
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch


MODEL_NAME = "Salesforce/codegen-350M-mono"
print("Loading CodeGen mannequin (30 seconds)...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
mannequin = AutoModelForCausalLM.from_pretrained(
   MODEL_NAME, torch_dtype=torch.float16, device_map="auto"
)
print("✓ Mannequin loaded!")

We start by importing important libraries and loading the Salesforce CodeGen-350M-mono mannequin domestically for light-weight, API-free inference. We initialize each the tokenizer and mannequin with float16 precision and computerized system mapping to make sure compatibility and pace on Colab GPUs.

class ProtocolParser:
   def read_protocol(self, textual content):
       steps = []
       strains = textual content.break up('n')
       for i, line in enumerate(strains, 1):
           step_match = re.search(r'^(d+).s+(.+)', line.strip())
           if step_match:
               num, title = step_match.teams()
               context="n".be a part of(strains[i:min(i+4, len(lines))])
               length = self._extract_duration(context)
               temp = self._extract_temp(context)
               security = self._check_safety(context)
               steps.append({
                   'step': int(num), 'title': title, 'duration_min': length,
                   'temp': temp, 'security': security, 'line': i, 'particulars': context[:200]
               })
       return steps
  
   def _extract_duration(self, textual content):
       textual content = textual content.decrease()
       if 'in a single day' in textual content: return 720
       match = re.search(r'(d+)s*(?:hour|hr|h)(?:s)?(?!w)', textual content)
       if match: return int(match.group(1)) * 60
       match = re.search(r'(d+)s*(?:min|minute)(?:s)?', textual content)
       if match: return int(match.group(1))
       match = re.search(r'(d+)-(d+)s*(?:min|minute)', textual content)
       if match: return (int(match.group(1)) + int(match.group(2))) // 2
       return 30
  
   def _extract_temp(self, textual content):
       textual content = textual content.decrease()
       if '4°c' in textual content or '4 °c' in textual content or '4°' in textual content: return '4C'
       if '37°c' in textual content or '37 °c' in textual content: return '37C'
       if '-20°c' in textual content or '-80°c' in textual content: return 'FREEZER'
       if 'room temp' in textual content or 'rt' in textual content or 'ambient' in textual content: return 'RT'
       return 'RT'
  
   def _check_safety(self, textual content):
       flags = []
       text_lower = textual content.decrease()
       if re.search(r'bsl-[23]|biosafety', text_lower): flags.append('BSL-2/3')
       if re.search(r'warning|corrosive|hazard|poisonous', text_lower): flags.append('HAZARD')
       if 'sharp' in text_lower or 'needle' in text_lower: flags.append('SHARPS')
       if 'darkish' in text_lower or 'light-sensitive' in text_lower: flags.append('LIGHT-SENSITIVE')
       if 'flammable' in text_lower: flags.append('FLAMMABLE')
       return flags


class InventoryManager:
   def __init__(self, csv_text):
       from io import StringIO
       self.df = pd.read_csv(StringIO(csv_text))
       self.df['expiry'] = pd.to_datetime(self.df['expiry'])
  
   def check_availability(self, reagent_list):
       points = []
       for reagent in reagent_list:
           reagent_clean = reagent.decrease().exchange('_', ' ').exchange('-', ' ')
           matches = self.df[self.df['reagent'].str.decrease().str.accommodates(
               '|'.be a part of(reagent_clean.break up()[:2]), na=False, regex=True
           )]
           if matches.empty:
               points.append(f"❌ {reagent}: NOT IN INVENTORY")
           else:
               row = matches.iloc[0]
               if row['expiry'] < datetime.now():
                   points.append(f"⚠️  {reagent}: EXPIRED on {row['expiry'].date()} (lot {row['lot']})")
               elif (row['expiry'] - datetime.now()).days < 30:
                   points.append(f"⚠️  {reagent}: Expires quickly ({row['expiry'].date()}, lot {row['lot']})")
               if row['quantity'] < 10:
                   points.append(f"⚠️  {reagent}: LOW STOCK ({row['quantity']} {row['unit']} remaining)")
       return points
  
   def extract_reagents(self, protocol_text):
       reagents = set()
       patterns = [
           r'b([A-Z][a-z]+(?:s+[A-Z][a-z]+)*)s+(?:antibody|buffer|answer)',
           r'b([A-Z]{2,}(?:-[A-Z0-9]+)?)b',
           r'(?:add|use|put together|dilute)s+([a-z-]+s*(?:antibody|buffer|substrate|answer))',
       ]
       for sample in patterns:
           matches = re.findall(sample, protocol_text, re.IGNORECASE)
           reagents.replace(m.strip() for m in matches if len(m) > 2)
       return checklist(reagents)[:15]

We outline the ProtocolParser and InventoryManager lessons to extract structured experimental particulars and confirm reagent stock. We parse every protocol step for length, temperature, and security markers, whereas the stock supervisor validates inventory ranges, expiry dates, and reagent availability by means of fuzzy matching.

class SchedulePlanner:
   def make_schedule(self, steps, start_time="09:00"):
       schedule = []
       present = datetime.strptime(f"2025-01-01 {start_time}", "%Y-%m-%d %H:%M")
       day = 1
       for step in steps:
           finish = present + timedelta(minutes=step['duration_min'])
           if step['duration_min'] > 480:
               day += 1
               present = datetime.strptime(f"2025-01-0{day} 09:00", "%Y-%m-%d %H:%M")
               finish = present
           schedule.append({
               'step': step['step'], 'title': step['name'][:40],
               'begin': present.strftime("%H:%M"), 'finish': finish.strftime("%H:%M"),
               'length': step['duration_min'], 'temp': step['temp'],
               'day': day, 'can_parallelize': step['duration_min'] > 60,
               'security': ', '.be a part of(step['safety']) if step['safety'] else 'None'
           })
           if step['duration_min'] <= 480:
               present = finish
       return schedule
  
   def optimize_parallelization(self, schedule):
       parallel_groups = []
       idle_time = 0
       for i, step in enumerate(schedule):
           if step['can_parallelize'] and that i + 1 < len(schedule):
               next_step = schedule[i+1]
               if step['temp'] == next_step['temp']:
                   saved = min(step['duration'], next_step['duration'])
                   parallel_groups.append(
                       f"✨ Steps {step['step']} & {next_step['step']} can overlap → Save {saved} min"
                   )
                   idle_time += saved
       return parallel_groups, idle_time


class SafetyValidator:
   RULES = {
       'ph_range': (5.0, 11.0),
       'temp_limits': {'4C': (2, 8), '37C': (35, 39), 'RT': (20, 25)},
       'max_concurrent_instruments': 3,
   }
  
   def validate(self, steps):
       dangers = []
       for step in steps:
           ph_match = re.search(r'phs*(d+.?d*)', step['details'].decrease())
           if ph_match:
               ph = float(ph_match.group(1))
               if not (self.RULES['ph_range'][0] <= ph <= self.RULES['ph_range'][1]):
                   dangers.append(f"⚠️  Step {step['step']}: pH {ph} OUT OF SAFE RANGE")
           if 'BSL-2/3' in step['safety']:
               dangers.append(f"🛡️  Step {step['step']}: BSL-2 cupboard REQUIRED")
           if 'HAZARD' in step['safety']:
               dangers.append(f"🧤 Step {step['step']}: Full PPE + chemical hood REQUIRED")
           if 'SHARPS' in step['safety']:
               dangers.append(f"💉 Step {step['step']}: Sharps container + needle security")
           if 'LIGHT-SENSITIVE' in step['safety']:
               dangers.append(f"🌑 Step {step['step']}: Work in darkish/amber tubes")
       return dangers

We implement the SchedulePlanner and SafetyValidator to design environment friendly experiment timelines and implement lab security requirements. We dynamically generate every day schedules, establish parallelizable steps, and validate potential dangers, resembling unsafe pH ranges, hazardous chemical compounds, or biosafety-level necessities.

def llm_call(immediate, max_tokens=200):
   attempt:
       inputs = tokenizer(immediate, return_tensors="pt", truncation=True, max_length=512).to(mannequin.system)
       outputs = mannequin.generate(
           **inputs, max_new_tokens=max_tokens, do_sample=True,
           temperature=0.7, top_p=0.9, pad_token_id=tokenizer.eos_token_id
       )
       return tokenizer.decode(outputs[0], skip_special_tokens=True)[len(prompt):].strip()
   besides:
       return "Batch comparable temperature steps collectively. Pre-warm devices."


def agent_loop(protocol_text, inventory_csv, start_time="09:00"):
   print("n🔬 AGENT STARTING PROTOCOL ANALYSIS...n")
   parser = ProtocolParser()
   steps = parser.read_protocol(protocol_text)
   print(f"📄 Parsed {len(steps)} protocol steps")
   stock = InventoryManager(inventory_csv)
   reagents = stock.extract_reagents(protocol_text)
   print(f"🧪 Recognized {len(reagents)} reagents: {', '.be a part of(reagents[:5])}...")
   inv_issues = stock.check_availability(reagents)
   validator = SafetyValidator()
   safety_risks = validator.validate(steps)
   planner = SchedulePlanner()
   schedule = planner.make_schedule(steps, start_time)
   parallel_opts, time_saved = planner.optimize_parallelization(schedule)
   total_time = sum(s['duration'] for s in schedule)
   optimized_time = total_time - time_saved
   opt_prompt = f"Protocol has {len(steps)} steps, {total_time} min whole. Key bottleneck optimization:"
   optimization = llm_call(opt_prompt, max_tokens=80)
   return {
       'steps': steps, 'schedule': schedule, 'inventory_issues': inv_issues,
       'safety_risks': safety_risks, 'parallelization': parallel_opts,
       'time_saved': time_saved, 'total_time': total_time,
       'optimized_time': optimized_time, 'ai_optimization': optimization,
       'reagents': reagents
   }

We assemble the agent loop, integrating notion, planning, validation, and revision right into a single, coherent stream. We use CodeGen for reasoning-based optimization to refine step sequencing and suggest sensible enhancements for effectivity and parallel execution.

def generate_checklist(outcomes):
   md = "# 🔬 WET-LAB PROTOCOL CHECKLISTnn"
   md += f"**Complete Steps:** {len(outcomes['schedule'])}n"
   md += f"**Estimated Time:** {outcomes['total_time']} min ({outcomes['total_time']//60}h {outcomes['total_time']%60}m)n"
   md += f"**Optimized Time:** {outcomes['optimized_time']} min (save {outcomes['time_saved']} min)nn"
   md += "## ⏱️ TIMELINEn"
   current_day = 1
   for merchandise in outcomes['schedule']:
       if merchandise['day'] > current_day:
           md += f"n### Day {merchandise['day']}n"
           current_day = merchandise['day']
       parallel = " 🔄" if merchandise['can_parallelize'] else ""
       md += f"- [ ] **{merchandise['start']}-{merchandise['end']}** | Step {merchandise['step']}: {merchandise['name']} ({merchandise['temp']}){parallel}n"
   md += "n## 🧪 REAGENT PICK-LISTn"
   for reagent in outcomes['reagents']:
       md += f"- [ ] {reagent}n"
   md += "n## ⚠️ SAFETY & INVENTORY ALERTSn"
   all_issues = outcomes['safety_risks'] + outcomes['inventory_issues']
   if all_issues:
       for threat in all_issues:
           md += f"- {threat}n"
   else:
       md += "- ✅ No essential points detectedn"
   md += "n## ✨ OPTIMIZATION TIPSn"
   for tip in outcomes['parallelization']:
       md += f"- {tip}n"
   md += f"- 💡 AI Suggestion: {outcomes['ai_optimization']}n"
   return md


def generate_gantt_csv(schedule):
   df = pd.DataFrame(schedule)
   return df.to_csv(index=False)

We create output mills that rework outcomes into human-readable Markdown checklists and Gantt-compatible CSVs. We be sure that each execution produces clear summaries of reagents, time financial savings, and security or stock alerts for streamlined lab operations.

SAMPLE_PROTOCOL = """ELISA Protocol for Cytokine Detection


1. Coating (Day 1, 4°C in a single day)
  - Dilute seize antibody to 2 μg/mL in coating buffer (pH 9.6)
  - Add 100 μL per effectively to 96-well plate
  - Incubate at 4°C in a single day (12-16 hours)
  - BSL-2 cupboard required


2. Blocking (Day 2)
  - Wash plate 3× with PBS-T (200 μL/effectively)
  - Add 200 μL blocking buffer (1% BSA in PBS)
  - Incubate 1 hour at room temperature


3. Pattern Incubation
  - Wash 3× with PBS-T
  - Add 100 μL diluted samples/requirements
  - Incubate 2 hours at room temperature


4. Detection Antibody
  - Wash 5× with PBS-T
  - Add 100 μL biotinylated detection antibody (0.5 μg/mL)
  - Incubate 1 hour at room temperature


5. Streptavidin-HRP
  - Wash 5× with PBS-T
  - Add 100 μL streptavidin-HRP (1:1000 dilution)
  - Incubate half-hour at room temperature
  - Work in darkish


6. Improvement
  - Wash 7× with PBS-T
  - Add 100 μL TMB substrate
  - Incubate 10-Quarter-hour (monitor coloration growth)
  - Add 50 μL cease answer (2M H2SO4) - CAUTION: corrosive
"""


SAMPLE_INVENTORY = """reagent,amount,unit,expiry,lot
seize antibody,500,μg,2025-12-31,AB123
blocking buffer,500,mL,2025-11-30,BB456
PBS-T,1000,mL,2026-01-15,PT789
detection antibody,8,μg,2025-10-15,DA321
streptavidin HRP,10,mL,2025-12-01,SH654
TMB substrate,100,mL,2025-11-20,TM987
cease answer,250,mL,2026-03-01,SS147
BSA,100,g,2024-09-30,BS741"""


outcomes = agent_loop(SAMPLE_PROTOCOL, SAMPLE_INVENTORY, start_time="09:00")
print("n" + "="*70)
print(generate_checklist(outcomes))
print("n" + "="*70)
print("n📊 GANTT CSV (first 400 chars):n")
print(generate_gantt_csv(outcomes['schedule'])[:400])
print("n🎯 Time Financial savings:", f"{outcomes['time_saved']} minutes by way of parallelization")

We conduct a complete check run utilizing a pattern ELISA protocol and a reagent stock dataset. We visualize the agent’s outputs, optimized schedule, parallelization good points, and AI-suggested enhancements, demonstrating how our planner features as a self-contained, clever lab assistant.

Finally, we demonstrated how agentic AI rules can improve reproducibility and security in wet-lab workflows. By parsing free-form experimental textual content into structured, actionable plans, we automated protocol validation, reagent administration, and temporal optimization in a single pipeline. The combination of CodeGen permits on-device reasoning about bottlenecks and security circumstances, permitting for self-contained, data-secure operations. We concluded with a totally useful planner that generates Gantt-compatible schedules, Markdown checklists, and AI-driven optimization suggestions, establishing a strong basis for autonomous laboratory planning techniques.

Take a look at the FULL CODES here. Be happy to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Additionally, be happy to observe us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

Source link

Construct an Autonomous Moist-Lab Protocol Planner and Validator Utilizing Salesforce CodeGen for Agentic Experiment Design and Security Optimization

Apple acquires video modifying software program firm MotionVFX

Reminiscences AI is constructing the visible reminiscence layer for wearables and robotics

SEC eyes shift to twice-yearly earnings stories

Construct an Autonomous Moist-Lab Protocol Planner and Validator Utilizing Salesforce CodeGen for Agentic Experiment Design and Security Optimization

Related Posts

Apple acquires video modifying software program firm MotionVFX

Reminiscences AI is constructing the visible reminiscence layer for wearables and robotics

SEC eyes shift to twice-yearly earnings stories