Principal Engineer · Conversational AI

Raju Roopani

I build real-time conversational AI and agentic systems at planetary scale — from the Meetings AI layer to the Agent Extensibility Platform inside Microsoft Teams, serving 300M+ daily users.

Raju Roopani — Principal Engineer, Conversational AI
Open to work
Summary

Principal Engineer with 14+ years of distributed systems and platform engineering at planetary scale; the last 3+ focused on real-time conversational AI and agentic systems. At Microsoft Teams I architected the Meetings AI layer — real-time RAG over meeting context (transcript, calendar, task systems), context-window optimization, and streaming LLM integration delivering sub-200ms AI assistance to millions of daily participants. Full-stack delivery from React/TypeScript frontend to distributed Python/Java backend, with deep ML/Eng collaboration patterns. Looking to apply real-time RAG and conversational AI product-engineering depth to a focused conversational-AI company.

300M+
Daily active users served
85ms
p95 agent latency (from 320ms)
3rd-party agent reliability
40%
Cost per request reduction
Core Competencies

What I go deep on

Conversational AI Real-Time RAG Agent Orchestration LLM Integration & Streaming UI A2A Protocol Model Context Protocol (MCP) Human-in-the-Loop No-Code-Adjacent Platforms Full-Stack (React + Python) Distributed Systems Performance Optimization ML / Eng Collaboration Production AI at Scale
Experience

Work history

Principal Software Engineer · AI Tech LeadMicrosoft
Jul 2018 — Present · Redmond, WA
Teams Platform
  • Real-Time Conversational AI — Teams Meetings AI: Built the real-time conversational AI layer for Teams meetings — a hybrid RAG retrieval pipeline over meeting context (transcript, calendar, task systems), context-window optimization, and streaming UI integration with the LLM tool-use loop. Sub-200ms perceived latency for AI assistance to millions of daily participants.
  • Full-Stack No-Code-Adjacent Platform — Modern Project Online: Built collaborative no-code-style platform interfaces for non-technical PMs — React/TypeScript with Fluent UI Data Grid optimization, GraphQL APIs, dynamic UI configuration. 30% page-load reduction via virtualization, memoization, and batch update strategies for millions of enterprise users.
  • Agent SDK for Less-Technical Developers: Designed and shipped the Teams Agent SDK — manifest-based declarative agent configuration with opinionated defaults for HITL, retrieval, and error handling. Adopted by 3rd-party developers of varying technical depth. 3rd-party agent reliability +3×; developer support tickets −60%.
  • Multi-Agent Orchestration Runtime: Architected and delivered a real-time distributed multi-agent orchestration runtime supporting Agent-to-Agent (A2A) protocol, Model Context Protocol (MCP), HITL, and partial-failure isolation — serving 300M+ daily active users.
  • Latency, Cost & Quality Optimization: Reduced p95 agent runtime latency from 320ms to 85ms via event-driven coordination, retrieval caching, connection pooling, and tiered model routing. Cut cost per request by 40%.
  • ML / Eng Collaboration: Built the integration patterns — eval pipelines, observability instrumentation, progressive rollout — that let AI Research ship model improvements without product regression. Cross-functional TL across Engineering, Product, Design, and AI Research.
Member of Technical Staff · Platform EngineeringEnvestnet Yodlee Infotech
Apr 2016 — Jul 2018 · Bengaluru, India
  • High-Throughput REST APIs: Designed and developed REST APIs aggregating financial data across banks and third-party providers, processing 10M+ transactions per day on event-driven Kafka pipelines using Java Spring Boot.
  • Framework-Agnostic Integration Platform: Built integrations bridging hundreds of heterogeneous external systems (100+ bank APIs with varying schemas and custom auth) into a unified runtime.
  • OAuth 2.0 + PSD2 Compliance: Architected and implemented an OAuth 2.0 client framework in Java Spring MVC and Hibernate, enabling PSD2-compliant integration with EU Open Banking APIs across multiple European markets.
Software Engineer · Build & Automation ToolingARM Embedded Technologies
Jan 2012 — Apr 2016 · Bengaluru, India
  • Build Systems & Automation Tooling: Designed and built automation tooling for Physical IP modeling and validation flows — complex verification algorithms, efficient build pipelines, and reusable tooling frameworks. Reduced manual verification effort ~60% and accelerated hardware design-to-tape-out cycles.
Education
B.E. — Electronics & Communications Engineering · Osmania University
2007 — 2011
Engineering Deep Dive

Engineering decisions behind the Teams agentic AI platform

A walkthrough of how I designed, built, and shipped real-time conversational AI inside Microsoft Teams — and the calls I made (and a few I'd remake) along the way.

This isn't a list of frameworks. It's the story of one program I led from 0→1 to GA inside Microsoft Teams — what we were trying to do, the load-bearing decisions I made along the way, and the few I'd revisit if I were starting over. Where the decisions mattered I've tried to show the reasoning, not just the outcome.
01 What the system does, and why it matters

Microsoft Teams serves 300M+ daily active users, and meetings are the highest-bandwidth knowledge exchange in most enterprises. Before this program, that bandwidth was lossy: transcripts existed but were post-hoc artifacts, action items walked out the door with whoever happened to be writing them down, and most of the context of the meeting died with the meeting itself. The system I led has two tightly-coupled pieces.

Piece 1The Meetings AI layer

Real-time conversational AI that runs alongside a live meeting. It listens to the streaming transcript, retrieves relevant context (calendar, prior meetings, task systems, chat history), and offers in-meeting assistance — answering questions, drafting follow-ups, surfacing related decisions — with sub-200ms perceived latency. Under the hood it's a streaming LLM tool-use loop wrapped around a hybrid RAG pipeline reading a corpus that is literally growing as the meeting happens.

Piece 2The Agent Extensibility Platform

A runtime, SDK, and set of APIs that lets internal and external developers build agents that live inside Teams — at-mentionable, multi-agent, with the right primitives for context propagation, turn-taking, human-in-the-loop intervention, and partial-failure isolation. It speaks A2A (agent-to-agent) and MCP (Model Context Protocol) as first-class. At GA it backed agents reaching the full 300M+ DAU surface.

Why it mattered. The difference between "AI that helps you" and "AI that lives where you already work" is enormous in enterprise. The latter compounds — your tools get smarter together — and Teams is the only surface where a meeting-aware AI can be born without forcing users to switch context.

02 My role

Tech lead and lead architect across both pieces. IC role, not management. Across the program I worked with ~12–15 engineers (plus a separate AI Research org) and was personally on the hook for:

  • The orchestration runtime architecture: turn-taking protocol, context propagation, failure model, tenancy story.
  • The Agent SDK surface: what it feels like for a third-party developer to build an agent that behaves correctly under the runtime's constraints.
  • The real-time RAG pipeline: what retrieval shape actually works over a corpus that doesn't exist when the meeting starts.
  • The ML / Eng integration contract: the boundary between what AI Research owns (models, prompts, evals) and what Engineering owns (latency, reliability, rollout) — and the substrate that lets both sides ship independently.
  • Binding architectural decisions where teams disagreed.

Concretely: I wrote the load-bearing design docs, prototyped the hardest pieces (runtime turn-taking, streaming RAG) myself before handing them off, reviewed PRs across the runtime, and spent a lot of time saying no to features so we could ship a thing that worked.

03 Where I reused what existed — and why

The fastest way to ship a bad AI system was to over-build. The default move every quarter was to reach for what already existed and spend our build budget on the things only we could build.

Reuse

Foundation models — Azure OpenAI & Anthropic Claude

Behind a thin internal interface. I never seriously entertained training our own — model quality changes month-to-month; commit to one in your architecture and you pay for it for years. The layer was deliberately thin: swap providers, route by task complexity (tiered routing later cut cost/request 40%), unify telemetry. It explicitly did not try to be a "framework" — those grow tentacles.

Reuse

Vector retrieval — managed services on Azure

Hosting our own index early would have been a multi-quarter distraction for marginal gain. Meeting traffic is bursty and dense; managed gave us cold-start in seconds. We revisited this only when retrieval latency on the long tail started showing up in evals.

Reuse

Speech-to-text — Azure Speech Services

ASR is a domain of its own with a dedicated org. I had zero interest in building expertise we didn't need to own.

Reuse

MCP & A2A as protocol standards

A build/buy decision in disguise. Teams is big enough that we could have invented our own protocols and forced adoption. I deliberately didn't — developers were already converging on MCP, and "Teams works with the protocol you already speak" is a far stronger pitch. It cost some optimization headroom and gave us a much larger ecosystem on day one.

Reuse, then discard

AutoGen for the earliest prototypes

We used it to learn what multi-agent orchestration should feel like before committing to our own runtime. We discarded it before GA, but designing from a blank page without that grounding would have led to a worse design. The cost of throwing away the prototype was much smaller than the cost of skipping the learning.

04 Where I built from scratch — and why

Four pieces, in roughly the order we admitted we had to build them.

Build · the call I argued hardest for

The orchestration runtime

AutoGen, LangGraph, and the open frameworks at the time were single-process, single-tenant, and batch-shaped. We needed distributed, multi-tenant, low-latency, and meeting-shaped: agents joining and leaving as participants do, turn-taking that respects who has the floor, idempotent retries (participants re-trigger under flaky networks), and partial-failure isolation so one misbehaving 3rd-party agent couldn't take down the meeting. There was no honest "buy" option — so we built one, event-driven, with a turn-taking protocol I wrote up as a small internal spec and a failure model that treated partial degradation as the normal case.

Build

A real-time RAG pipeline over the live transcript

Off-the-shelf RAG assumes a static corpus. We needed retrieval over a corpus that grew as the meeting progressed, where "relevance" depends on speaker, topic shifts, and time — a customer mentioned in minute 3 might be what the agent needs in minute 27. We built a streaming chunker respecting semantic boundaries (utterance + speaker + topic), a hybrid retrieval layer combining embedding similarity with structured filters (calendar links, task IDs, participant identity), and a context-window manager that knew which retrieved items were still load-bearing vs. could be evicted.

Build · the piece I'm proudest of

The Agent SDK manifest schema

Existing SDKs were code-first — written for engineers comfortable instantiating an Agent class. Our 3rd-party population was broader, including PMs and customer-org IT admins. The SDK had to be declarative: a manifest where you specify intent, retrieval surfaces, tool permissions, and HITL defaults, and the runtime fills in correct behavior. When we measured it, reliability went up 3× and support tickets dropped 60% — almost entirely because the right defaults were locked into the manifest shape rather than left as opportunities for misuse.

Build

The evaluation & observability harness

Model providers' tooling stops at the model. We needed full-loop eval — retrieval quality, end-to-end p95 latency, agent behavior under turn-taking pressure, regression detection when AI Research rolled new prompts or models, and progressive rollout gates that could halt before regressions reached production. We built it — partly because it didn't exist, mostly because owning it made the AI Research / Engineering contract concrete: Research could change anything as long as it cleared the eval harness.

05 Tradeoffs I made — and lived with

Streaming UI vs. complete-response UI

Streaming tokens as the model produced them dropped perceived latency from ~3s to ~200ms. The cost: a UI that now had to handle partial states, mid-stream retraction, and graceful degradation when the stream dropped. In a meeting — where the AI competes with humans speaking — feeling instant matters far more than feeling complete.

// I still think that was right.

Tiered model routing

Routing simple intents to a smaller model and reserving the frontier model for harder calls cut cost/request 40%. The cost: a routing classifier that could itself be wrong, and a test surface that grew with the number of tiers. The mitigation was the eval harness — every routing decision scored offline before it shipped.

// Without evals first, this would have been a bad call.

HITL on by default in the SDK

Every agent the SDK produced had a human-in-the-loop confirmation step by default for any action writing to a customer system. It made agents feel slower and chattier — the tradeoff was conservative-by-default for enterprise customers with no patience for AI acting incorrectly on their behalf. The escape hatch (explicit opt-out per action) was available, but you had to ask for it.

// I'd make the same call again.

Embracing MCP & A2A early

This cost some optimization (an internal protocol could pack data more efficiently) and locked us into a moving target — the standards evolved under us. The win was ecosystem: external developers showed up speaking our protocols on day one. The cost showed up later as compatibility work when the standards shifted.

// Net positive, but with eyes open.

Building our own runtime

The decision I'd defend most strongly and second-guess most often. For: no off-the-shelf framework fit our shape, so we'd have forked one anyway. Against: we now own and maintain a runtime forever. I keep watching the open-source space — if a community-maintained runtime ever catches up to our needs, the right move is to migrate and stop carrying the maintenance load.

// Correct at the time. Still watching.
06 What I'd do differently in hindsight

Three things, in increasing order of "I should have known better."

The eval harness should have come first, not third

We shipped the orchestration runtime and the first agents to GA before a real evaluation harness was in place, and caught a few embarrassing regressions in production that an eval suite would have flagged on commit. The classic mistake: eval feels like infrastructure investment, and infrastructure loses to features under shipping pressure. I'd now treat the eval harness as a P0 deliverable in the same release as the first real model integration, not a fast-follow.

We over-built the multi-tenant story too early

I had a real bias toward "design for the platform we'll be in two years." The runtime carried complexity — tenant isolation primitives, per-tenant rate limits, multi-tenant config surfaces — before any tenant needed it. If I were doing it again, I'd ship a single-tenant runtime first, get one use case completely solid, and only then generalize. We could have shipped two months earlier and learned more from real usage than from the extra abstraction.

I should have watched the open-source space longer before committing to our own runtime

The decision to build was right at the moment we made it — but I made it earlier than I had to. Another quarter of watching what the community shipped might have let us base our runtime on something rather than starting from a blank page. The kind of decision where being right too early is almost as expensive as being wrong.

A pattern across all three: delay the irreversible decisions, accelerate the reversible ones. I knew this in principle. Shipping under pressure made it harder to live by than I'd like to admit.
Closing note

The pieces of this program I'm proudest of aren't the technologies. They're the contracts: the eval harness as the boundary between AI Research and Engineering, the manifest as the boundary between the platform and 3rd-party developers, the turn-taking protocol as the boundary between agents. Each one took an ambiguous, negotiation-heavy interface and turned it into something a machine could check — and that's mostly what let us ship at scale without the program collapsing under coordination cost. Happy to go deeper on any of this in conversation.

Stack

Tools & technologies

Conversational AI

Real-Time RAG · Context Engineering · LLM Integration · Streaming UI · Conversation Workflows · Agent Orchestration · A2A · MCP · HITL

LLM Frameworks

AutoGen · Azure OpenAI · OpenAI API · Anthropic Claude API · LangChain & LlamaIndex (familiarity)

Languages

Python (AutoGen, Azure OpenAI SDK, Anthropic SDK, OpenAI API) · TypeScript · C#/.NET · Java (Spring Boot) · Node.js

Frontend

React · TypeScript · Fluent UI · GraphQL · Streaming UI · Perf (virtualization, memoization, batch updates)

Backend & Distributed

REST · GraphQL · gRPC · WebSockets · Microservices · Event-Driven Architecture · Fault Tolerance · Idempotency · Horizontal Scalability · Partial-Failure Isolation

Cloud, Data & Messaging

Azure (Service Bus, Functions, Cosmos DB, OpenAI) · Docker · Apache Kafka · Cosmos DB · MySQL · MongoDB · Oracle