Agentic AI Engineer

I build the infrastructure
AI products run on.

View Artifacts abhinandan@abhinandan.one

Building at the edge with Artificial Intelligence

I join the teams early, when the team is still figuring out the process, the architecture, and whether the whole thing would actually work. So I don't just write code. I make product and engineering calls that have to hold up later.

Most of my work sits between language models and real software: understanding where model reasoning breaks, building systems that fail safely when it does, and shipping things that still work when users hit them in messy ways.

I care a lot about the boring parts: latency, error surfaces, cost, retries, observability, and all the small details that never show up in a demo but decide whether the product actually survives in production.

AI products shipped to production

agents in a single pipeline

700K+

LLM calls monitored in production

Artifacts that I have built

All artifacts →

Build Artifact #2

Health Voice

A clinical voice scribe that runs on one Mac. A nurse talks, it transcribes on-device, figures out who said what, pulls the medical terms, drafts a SOAP note, and files it to FHIR after a human signs off.

MLX Whisper (large-v3-turbo + base.en)silero VADspeechbrain ECAPA (speaker ID)

Jun 26, 2026Open artifact ->

Build Artifact #1

Enterprise RAG

A full enterprise pipeline for the doc search system based on semantic RBAC RAG.

QdrantElasticsearchSupabase (Postgres + Storage)

Jun 25, 2026Open artifact ->

Projects I have crafted

PyTorchTRLDAPOPRMPEFT / LoRAQwen3-8BOllamaFastMCP

Reimplementation of the ICLR 2026 AgentFlow paper as a local Qwen3-8B Planner, Executor, Verifier, Memory loop. Grammar-constrained JSON planning, Tavily search, and a sandboxed Python + SymPy executor.
Swapped the paper's outcome-only GRPO for DAPO plus a learned Process Reward Model (Qwen3-0.6B regression head, trained on 531 DeepSeek-judged step labels) to get dense per-step credit. TRL ships no dynamic-sampling stage, so I wrote one.
Full pipeline on a single A40: trajectory collection, step judging, PRM training, 300-step DAPO LoRA on Qwen3-8B (bf16), GGUF export, Ollama serving.
Evaluation is leakage-free & quantization-matched: trained in bf16, scored on the served GGUF. GPQA-Diamond moved 40.0% to 45.0% (n=100), a directional cross-domain gain from a planner trained only on AIME math. AIME24 held flat (n=30).

OpenAI SDKAnthropic SDKLangGraphOpenTelemetry

Enforces budgets on an agent before it acts. Decimal-precise caps on cost, tokens, wall time, and tool calls, checked pre-flight, so a runaway loop halts before the next expensive call rather than after it.
Per-tool circuit breakers, and a verifier retry loop that feeds corrections back to the agent under the same shared budget.
OpenTelemetry GenAI spans on every protected call. Failures return typed RunResult objects instead of raising, so callers can branch on the reason.
Adapters for LangGraph and the OpenAI Agents SDK. Existing agents wrap without touching their code.

FAISSSentenceTransformersPyTorchSQLitePydantic

Semantic cache for LLM agents. Embedding retrieval proposes candidates, then a learned pairwise classifier decides whether reuse is safe. "Approve this refund" never returns the cached answer for "deny this refund."
Ships a pretrained classifier (v2) trained on 16,576 labeled pairs across 9 domains. At equal recall it holds 30 more precision points than a tuned cosine-similarity baseline.
FAISS index, WAL-backed SQLite persistence, implicit bad-hit detection from downstream signals, gated retraining, CI across Python 3.11 through 3.14.

AsyncIOLiteLLMPydantic

Multi-agent pipeline framework for Python 3.11+ with no required dependencies. Sequential, parallel, conditional, and retryable steps share one typed StepContext.
Lifecycle events, flat execution traces, human review gates, and JSON checkpoint/resume for runs that outlive the process.
Optional LiteLLM-backed Agent with structured Pydantic outputs. Shipped through v0.5.0 on PyPI

AsyncIOOpenAI SDKAnthropic SDKLangChainTyper

Scores agents on pass rate over repeated runs rather than a single exact-match assertion. Agents are stochastic, so one green run is a sample of size one.
Traces tool calls, step counts, and timing. Behavioral assertions collect and raise at the end: call ordering, argument schemas, latency bounds.
Adapters for OpenAI, Anthropic, and LangChain. Typer CLI emits JSON reports that gate CI.

Where I've worked

Built Browzer's Chrome MV3 recorder + CDP-native browser automation agent, achieving 95%+ precise AX/DOM element capture with cross-iframe support, obstruction checks, and real mouse/key/upload execution.
Built a smart streaming ReAct loop across FastAPI + extension with SSE tool execution, multi-tab orchestration, safe parallelism, abort/continue, and audit logs.
Cut automation LLM spend by roughly 67% using compact recording traces, context-window compression, prompt caching, and model-routing across GPT-5, Claude Sonnet & Haiku.
Shipped a zero-LLM replay engine: recordings run as variable-driven tool-call templates, with a stateful AI fallback that resumes mid-run on failure.
Shipped self-healing docs that auto-repair on UI drift — Haiku→Sonnet diff triage, LLM-free replay of intact steps, and a CDP agent that fixes only what changed.

Shipped core features of an AI-powered real estate platform using Next.js, Nest.js, GraphQL, Redis, and GCP.
Built the AI knowledge base service using FastAPI, LangChain, and vector retrieval pipelines, powering customer-facing search workflows.
Developed document-ingestion pipelines using Google Cloud Vision, XLSX processing, and BullMQ workers, enabling automated extraction of customer data from spreadsheets and scanned records.
Automated containerized CI/CD infrastructure via Docker, GitHub Actions, and Nginx for reverse proxy/load balancing.

Built a LangChain + pgvector knowledge base powering AI-assisted document search and retrieval workflows, improving query accuracy by 15%.
Developed scalable data-ingestion pipelines using bulk CSV processing and Celery workers, reducing processing time by 40%.
Engineered a production PDF generation system transforming structured AI outputs and dynamic JSON reports into enterprise-grade documents.
Automated deployment of AI services using Docker, GitHub Actions, and AWS EC2, establishing reliable CI/CD workflows for production environments.

Received a personal offer from the CEO to join HeroUI (prev. NextUI) after making open-source contributions.
Resolved 10+ bugs & delivered 7+ feature enhancements in core components including Calendar, Table and Pagination.

Achievements

Top 1% TypeScript Engineer GloballyAlgora
International Youth Math Challenge Gold HonourIYMC
Amazon ML Summer School 2025Amazon

I build the infrastructureAI products run on.