Engineering leader and AI systems architect.
I build and lead teams that ship production AI.12 years building and leading engineering teams in regulated enterprise environments. I hire, mentor, and grow engineers — and I stay close enough to the architecture to make real technical decisions, not just oversee them. Based in San Francisco.
12 years building engineering teams and production software in regulated environments. Operational discipline isn't optional when systems serve 50K+ users in HIPAA-compliant healthcare.
For the past few years my focus has been autonomous AI systems — production RAG pipelines, LLM evaluation frameworks, and observability tooling at scale. Outside of work I built and operate a fully autonomous multi-agent system: 8+ LangGraph agents handling real-time data, LLM-driven decisions, and autonomous failure recovery. The architecture problems it forced me to solve are the most technically demanding of my career.
I've managed teams of 8-15 engineers with 90% retention, hired 10+ people, and served as interim Engineering Manager during leadership transitions. Clear ownership, high trust, and staying close enough to the architecture to lead it credibly — not just oversee it.
Orchestrating 8+ autonomous agents with shared state and graceful degradation
AI pair programming for daily development
Terminal-based AI coding agent
GPT-4o, embeddings, and fine-tuned models in production
Local LLM inference for privacy-sensitive workloads
AI-native project management and issue tracking
Workflow automation and AI integrations
Containerized deployments at scale
Vector database powering RAG pipelines
LLM application framework for chains and agents
Production multi-agent systems with LangGraph — concurrent execution, shared state management, and graceful degradation.
RAG pipelines, evaluation frameworks, and observability layers. Multi-provider orchestration across OpenAI, Anthropic, and Groq.
Custom tooling for tracing agent workflows, benchmarking model performance, and monitoring LLM calls in production.
Persistent context-aware decision making — structured knowledge persistence, retrieval-augmented reasoning, and adaptive behavior.
HIPAA-compliant platforms, audit trails, zero-downtime deployments, and security-first architecture at enterprise scale.
Teams of 8-15 engineers with 90% retention. Hiring, mentoring, and serving as interim Engineering Manager.
I'm looking for Head of AI Engineering, Engineering Manager, or Staff Engineer roles at companies where autonomous systems or AI infrastructure is central to the product — not peripheral.
The work I want: own architecture, grow engineers, ship AI systems that matter. I work best when technical decisions and people decisions sit in the same role.
STERIS CORP · Dec 2021 – Present
STERIS CORP · Oct 2017 – Nov 2021
Contributed bug fixes for terminal UI stability and session management.
Contributed Linux desktop integration fixes including native file handling and window management.
Contributed fixes for tab management and window state handling.
4-app AI platform • 100% local • Privacy-focused
Privacy-focused AI platform with Personal AI Assistant (multi-model chat & image generation), Code Documentation Generator (auto-docs for any codebase), Local RAG System (semantic search with citations), and AI Image Classifier (auto-tagging & face recognition).
Architected production multi-agent AI system with 8+ LangGraph agents featuring real-time data ingestion, LLM-driven decision making, autonomous failure recovery, and self-monitoring. Built production RAG pipelines and LLM evaluation framework from scratch with multi-provider orchestration (OpenAI, Anthropic, Groq). Led implementation of multi-agent AI pipelines for automated code review, testing, and documentation. Implemented AI observability layer with custom agent workflow tracing and LLM call monitoring for real-time visibility into production AI decision quality.
Privacy-focused 4-app AI platform: Personal AI Assistant (multi-model chat & image generation), Code Documentation Generator (auto-docs for any codebase), Local RAG System (semantic search with citations), and AI Image Classifier (auto-tagging & face recognition). 100% local, fully containerized. Demonstrates production-ready AI architecture with real-world applicability for enterprise privacy requirements.
Framework for evaluating LLM outputs with LLM-as-judge pattern, consistency testing, and RAG evaluation. Includes 6 evaluators (Accuracy, Consistency, Latency, Cost, LLM-as-Judge, RAG), multi-provider runners (Ollama, OpenAI, Anthropic), and CLI for automated testing. Addresses critical production AI challenge: measuring LLM performance at scale. Used for real-world model selection and optimization workflows.
Lightweight observability layer for AI/LLM applications. Decorator-based tracing (@trace, @trace_workflow), web dashboard with timeline visualization, and framework integrations for Ollama, CrewAI, and LangChain. SQLite/PostgreSQL storage.
Complete Arch Linux setup with GNOME for Microsoft Surface Pro devices. Includes custom kernel patches, hardware optimization scripts, and configuration for touch/stylus support. Addresses Surface-specific challenges like firmware, battery management, and type cover integration.
Chrome extension for exporting Claude AI conversations to markdown, JSON, or text format. Built to preserve important AI interactions for documentation and knowledge management workflows.
Developer-focused tab management Chrome extension for organizing browser sessions, saving workspace states, and quickly restoring development environments. Reduces context-switching overhead for multi-project workflows.
Chrome extension that converts web page selections to clean markdown format. Preserves formatting, links, and code blocks for seamless documentation workflows.
Personal journaling application with stylus and touch support for natural writing experience. Features daily entries, mood tracking, and searchable history for mindful reflection and personal growth.
Engineering Leader & AI Systems Architect
shalin.dev@proton.me | 415-490-7852 | San Francisco, CA
shalinbhatt.dev | github.com/shalin-dev | linkedin.com/in/shalinkb
Engineering leader with 12 years building and leading teams in regulated enterprise environments — healthcare platforms serving 50K+ daily users, HIPAA-compliant, 99.95% uptime.
Built production RAG pipelines, LLM evaluation frameworks, and observability tooling for AI applications at scale. Outside of work, built and operate a fully autonomous multi-agent system — 8+ specialized agents on LangGraph with real-time data ingestion, LLM-driven decision making, and autonomous failure recovery.
Managed teams of 8-15 engineers with 90% retention. Hired 10+ engineers. Served as interim Engineering Manager during leadership transitions, reporting directly to VP.
Seeking Head of AI Engineering, Engineering Manager, or Staff-level roles where autonomous systems or AI infrastructure is central to the product.
Platform engineering for $3B healthcare company — 50K+ daily users, 500+ enterprise installations, 99.95% uptime
AI/ML Infrastructure
Engineering Leadership
Platform Engineering
Privacy-first 4-app AI platform — personal AI assistant, RAG system with semantic search, code documentation generator, and image classifier. 100% local, fully containerized.
Tech: React • TypeScript • FastAPI • Ollama • ChromaDB • Docker
Production LLM evaluation framework with 6 evaluators (accuracy, consistency, latency, cost, LLM-as-judge, RAG) and multi-provider support — built for real model selection at scale.
Tech: Python • Sentence Transformers • PyTorch • OpenAI API • Ollama
Lightweight observability layer for LLM applications — decorator-based tracing, web dashboard with timeline visualization, and framework integrations.
Tech: Python • FastAPI • SQLAlchemy • PostgreSQL • LangChain