{"conference":"AI Engineer World's Fair 2026","dates":"June 29 - July 2, 2026","location":"San Francisco, CA","website":"https://ai.engineer/worldsfair","scheduleVersion":1088,"totalSessions":440,"sessions":[{"title":"Arize 2hr","day":"Day 1 — Workshop Day","time":"9:00am-11:00am","room":"Track 1","type":"sponsor","track":"Track 1","status":"confirmed","speakers":[]},{"title":"Neo4J 2hr","day":"Day 1 — Workshop Day","time":"9:00am-11:00am","room":"Track 2","type":"sponsor","track":"Track 2","status":"confirmed","speakers":[]},{"title":"BrainTrust","day":"Day 1 — Workshop Day","time":"9:00am-11:00am","room":"Track 3","type":"sponsor","status":"tentative","speakers":[]},{"title":"Airbyte — Data engineering for AI engineers","day":"Day 1 — Workshop Day","time":"9:00am-11:00am","room":"Track 4","type":"sponsor","track":"Track 4","status":"tentative","speakers":["Michel Tricot"]},{"title":"Snyk 2hr","day":"Day 1 — Workshop Day","time":"9:00am-11:00am","room":"Track 5","type":"sponsor","track":"Track 3","status":"confirmed","speakers":[]},{"title":"Oracle","day":"Day 1 — Workshop Day","time":"9:00am-11:00am","room":"Track 6","type":"sponsor","status":"tentative","speakers":[]},{"title":"OxyLabs","day":"Day 1 — Workshop Day","time":"9:00am-11:00am","room":"Track 7","type":"sponsor","track":"Track 7","status":"tentative","speakers":[]},{"title":"TogetherAI","day":"Day 1 — Workshop Day","time":"9:00am-11:00am","room":"Track 8","type":"sponsor","track":"Track 8","status":"confirmed","speakers":[]},{"title":"HOLD — Daniel Han (Unsloth) 3hr workshop","day":"Day 1 — Workshop Day","time":"9:00am-12:00pm","room":"Track 9","type":"workshop","track":"Track 9","status":"hold","speakers":["Daniel Han"]},{"title":"Microsoft 2hr","day":"Day 1 — Workshop Day","time":"9:00am-11:00am","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Arize 1hr (Plat)","day":"Day 1 — Workshop Day","time":"11:05am-12:05pm","room":"Track 1","type":"sponsor","track":"Track 1","status":"confirmed","speakers":[]},{"title":"Neo4J 1hr (plat)","day":"Day 1 — Workshop Day","time":"11:05am-12:05pm","room":"Track 2","type":"sponsor","track":"Track 2","status":"confirmed","speakers":[]},{"title":"Snyk 1hr","day":"Day 1 — Workshop Day","time":"11:05am-12:05pm","room":"Track 3","type":"workshop","track":"Track 3","status":"open","speakers":[]},{"title":"Microsoft - Bonus 1hr","day":"Day 1 — Workshop Day","time":"11:05am-12:05pm","room":"Track 4","type":"workshop","status":"tentative","speakers":["Lab B"]},{"title":"TBA","day":"Day 1 — Workshop Day","time":"11:05am-12:05pm","room":"Track 5","type":"workshop","track":"Track 6","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 1 — Workshop Day","time":"11:05am-12:05pm","room":"Track 6","type":"workshop","track":"Workshops Day 1","status":"tentative","speakers":[]},{"title":"Docker","day":"Day 1 — Workshop Day","time":"11:05am-12:05pm","room":"Track 7","type":"workshop","status":"tentative","speakers":[]},{"title":"ClickHouse","day":"Day 1 — Workshop Day","time":"11:05am-12:05pm","room":"Track 8","type":"workshop","track":"Track 5","status":"tentative","speakers":[]},{"title":"Microsoft 1hr","day":"Day 1 — Workshop Day","time":"11:05am-12:05pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Arize 1hr","day":"Day 1 — Workshop Day","time":"12:10pm-1:10pm","room":"Track 1","type":"workshop","track":"Track 1","status":"open","speakers":[]},{"title":"Neo4J 1hr","day":"Day 1 — Workshop Day","time":"12:10pm-1:10pm","room":"Track 2","type":"workshop","track":"Track 2","status":"open","speakers":[]},{"title":"Work OS","day":"Day 1 — Workshop Day","time":"12:10pm-1:10pm","room":"Track 3","type":"workshop","track":"Track 5","status":"tentative","speakers":[]},{"title":"Arize AI","day":"Day 1 — Workshop Day","time":"12:10pm-1:10pm","room":"Track 4","type":"workshop","track":"Track 4","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 1 — Workshop Day","time":"12:10pm-1:10pm","room":"Track 5","type":"workshop","track":"Track 5","status":"hold","speakers":[]},{"title":"TBA","day":"Day 1 — Workshop Day","time":"12:10pm-1:10pm","room":"Track 6","type":"workshop","track":"Workshops Day 1","status":"tentative","speakers":[]},{"title":"Unblocked","day":"Day 1 — Workshop Day","time":"12:10pm-1:10pm","room":"Track 7","type":"workshop","track":"Track 7","status":"tentative","speakers":[]},{"title":"BrightData","day":"Day 1 — Workshop Day","time":"12:10pm-1:10pm","room":"Track 8","type":"workshop","track":"Track 8","status":"tentative","speakers":[]},{"title":"Neo4J - +1 Hr","day":"Day 1 — Workshop Day","time":"12:10pm-1:10pm","room":"Track 9","type":"workshop","track":"Track 9","status":"tentative","speakers":[]},{"title":"Microsoft","day":"Day 1 — Workshop Day","time":"12:10pm-1:10pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Arize L&L","day":"Day 1 — Workshop Day","time":"1:15pm-2:15pm","room":"Track 1","type":"sponsor","track":"Track 1","status":"confirmed","speakers":[]},{"title":"Neo4J L&L","day":"Day 1 — Workshop Day","time":"1:15pm-2:15pm","room":"Track 2","type":"sponsor","track":"Track 2","status":"confirmed","speakers":[]},{"title":"TBA","day":"Day 1 — Workshop Day","time":"1:15pm-2:15pm","room":"Track 3","type":"sponsor","track":"Track 7","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 1 — Workshop Day","time":"1:15pm-2:15pm","room":"Track 4","type":"sponsor","status":"hold","speakers":[]},{"title":"Snyk L&L","day":"Day 1 — Workshop Day","time":"1:15pm-2:15pm","room":"Track 5","type":"sponsor","track":"Track 3","status":"confirmed","speakers":[]},{"title":"The model swap workshop","description":"Frontier labs are releasing new models constantly, and it is hard to know when “better” is better enough to justify touching a working system. On top of that, “just swap the model” often turns into real work because providers expose different APIs and different expectations around tools and structured outputs. The model swap workshop is a hands-on bake-off across frontier LLMs. We will run the same scenarios using multiple models (OpenAI, Anthropic, Kimi, and more) and compare results side by side for agentic tool use, structured outputs, and multimodal tasks. Swapping models is not just changing a model name. In this workshop, you will actually do the swaps, including moving between OpenAI-style Responses APIs and Anthropic-style Messages APIs, then see what breaks and what needs to change in your prompts, tool definitions, and JSON strategies. We will finish by running a small eval suite so you can quantify tradeoffs instead of relying on vibes. We will provide the Microsoft Foundry environment for access to the models, no account needed.","day":"Day 1 — Workshop Day","time":"1:15pm-2:15pm","room":"Track 6","type":"sponsor","track":"Workshops Day 1","status":"tentative","speakers":["Pamela Fox"]},{"title":"The Dark Arts of Skill Engineering","day":"Day 1 — Workshop Day","time":"1:15pm-2:15pm","room":"Track 9","type":"sponsor","status":"tentative","speakers":["Paul Bakaus"]},{"title":"Microsoft L&L","day":"Day 1 — Workshop Day","time":"1:15pm-2:15pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Arize 2hr","day":"Day 1 — Workshop Day","time":"2:20pm-4:20pm","room":"Track 1","type":"sponsor","track":"Track 1","status":"confirmed","speakers":[]},{"title":"Neo4J 2hr","day":"Day 1 — Workshop Day","time":"2:20pm-4:20pm","room":"Track 2","type":"sponsor","track":"Track 2","status":"confirmed","speakers":[]},{"title":"Pending Coreweave 1hr-A","day":"Day 1 — Workshop Day","time":"2:20pm-3:20pm","room":"Track 3","type":"sponsor","track":"Track 8","status":"hold","speakers":[]},{"title":"AI Infrastructure from the Ground Up","day":"Day 1 — Workshop Day","time":"2:20pm-4:20pm","room":"Track 4","type":"sponsor","track":"Workshops Day 1","status":"tentative","speakers":["Justin Lebar"]},{"title":"Snyk 1hr-A","day":"Day 1 — Workshop Day","time":"2:20pm-4:20pm","room":"Track 5","type":"sponsor","track":"Track 3","status":"confirmed","speakers":[]},{"title":"The AI Engineering Playbook: From Prototype to Production","day":"Day 1 — Workshop Day","time":"2:20pm-4:20pm","room":"Track 6","type":"sponsor","status":"tentative","speakers":["Louis-François Bouchard"]},{"title":"Reducto","day":"Day 1 — Workshop Day","time":"2:20pm-4:20pm","room":"Track 7","type":"sponsor","status":"tentative","speakers":[]},{"title":"Elastic","day":"Day 1 — Workshop Day","time":"2:20pm-3:20pm","room":"Track 8","type":"session","track":"Track 7","status":"open","speakers":[]},{"title":"Akamai","day":"Day 1 — Workshop Day","time":"2:20pm-3:20pm","room":"Track 9","type":"session","status":"tentative","speakers":[]},{"title":"Microsoft 2hr","day":"Day 1 — Workshop Day","time":"2:20pm-4:20pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Snyk 1hr-B","day":"Day 1 — Workshop Day","time":"3:25pm-4:25pm","room":"Track 3","type":"sponsor","track":"Track 3","status":"confirmed","speakers":[]},{"title":"Amazon Web Services","day":"Day 1 — Workshop Day","time":"3:25pm-4:25pm","room":"Track 9","type":"session","status":"tentative","speakers":[]},{"title":"Neo4J","day":"Day 1 — Workshop Day","time":"4:30pm-5:30pm","room":"Track 3","type":"workshop","track":"Track 5","status":"tentative","speakers":[]},{"title":"BrowserBase","day":"Day 1 — Workshop Day","time":"4:30pm-5:30pm","room":"Track 4","type":"workshop","track":"Track 4","status":"tentative","speakers":[]},{"title":"Snyk 1hr","day":"Day 1 — Workshop Day","time":"4:30pm-5:30pm","room":"Track 5","type":"workshop","track":"Track 3","status":"open","speakers":[]},{"title":"Sonar","day":"Day 1 — Workshop Day","time":"4:30pm-5:30pm","room":"Track 6","type":"workshop","track":"Track 6","status":"tentative","speakers":[]},{"title":"Attlassian","day":"Day 1 — Workshop Day","time":"4:30pm-5:30pm","room":"Track 7","type":"workshop","track":"Track 7","status":"tentative","speakers":[]},{"title":"Ref.","day":"Day 1 — Workshop Day","time":"4:30pm-5:30pm","room":"Track 8","type":"workshop","track":"Track 8","status":"tentative","speakers":[]},{"title":"PayPal","day":"Day 1 — Workshop Day","time":"4:30pm-5:30pm","room":"Track 9","type":"workshop","track":"Track 9","status":"tentative","speakers":[]},{"title":"Microsoft","day":"Day 1 — Workshop Day","time":"4:30pm-5:30pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"swyx keynote and snyk track intro","day":"Day 2 — Session Day 1","time":"9:00am-9:10am","room":"Main Stage","type":"keynote","track":"Software Factories","status":"tentative","speakers":["Shawn Wang"]},{"title":"HOLD — Microsoft keynote","day":"Day 2 — Session Day 1","time":"9:10am-9:30am","room":"Main Stage","type":"keynote","track":"Software Factories","status":"hold","speakers":[]},{"title":"Katelyn Lesse & Angela Jiang (Anthropic)","day":"Day 2 — Session Day 1","time":"9:30am-9:50am","room":"Main Stage","type":"keynote","track":"Software Factories","status":"confirmed","speakers":["Katelyn Lesse","Angela Jiang"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"9:50am-10:10am","room":"Main Stage","type":"keynote","track":"Software Factories","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"10:10am-10:30am","room":"Main Stage","type":"keynote","track":"Software Factories","status":"tentative","speakers":[]},{"title":"Codex Maxxing","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Main Stage","type":"session","track":"Software Factories","status":"confirmed","speakers":["Jason Liu"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Track 2","type":"sponsor","track":"Vision & OCR","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Track 3","type":"session","track":"Search & Retrieval","status":"tentative","speakers":[]},{"title":"Claude Managed Agents Workshop","description":"Build an agent with Claude Managed Agents","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Priyanka Phatak"]},{"title":"Dual-Surface Architecture: Serving Humans and Agents from the Same Tool Layer","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Track 5","type":"sponsor","track":"Security","status":"tentative","speakers":["Ethan Cha"]},{"title":"The New Primitives: Building AI-Native Software","description":"In the future, every piece of software with a human-facing surface will be built from new, LLM-centric primitives. We are just starting to invent these new primitives, including subagents, very long context, dynamic UI generation, and conversational voice input.","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Track 6","type":"session","track":"Voice & Realtime AI","status":"tentative","speakers":["Kwindla Kramer"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Track 7","type":"session","track":"LLM Recsys","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Track 8","type":"session","track":"Forward Deployed Engineering","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Track 9","type":"session","track":"Data Quality","status":"tentative","speakers":[]},{"title":"M1","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"The Genesis Mission - Accelerating Science and National Security through AI","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Leadership 2","type":"session","track":"AI Architects: Show my Workflow","status":"tentative","speakers":["Mark Mysatshyn"]},{"title":"Every AI company is accidentally building a bank.","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Expo Stage 1","type":"session","track":"Expo Stage NE","status":"confirmed","speakers":[]},{"title":"How Braintree handles agent-initiated payments across ChatGPT and Google AI Mode","description":"Braintree has shipped integrations across the major agentic surfaces in the last six months each with human-in-the-loop confirmation and full transaction attribution back to the originating AI platform. We'll tour all three paths: ACP for ChatGPT apps (delegated payment tokens via complete_checkout, allowance validation, facilitator_details attribution), UCP with Google Pay for Google AI Mode (server-side tokenizationSpecification, parsing androidPayCards for the single-use token), and a preview of MCP Apps inline checkout, where the payment surface renders in-chat and card data never enters the LLM context. For each path we'll cover where Braintree fits, what the shopper and merchant each see, and the tradeoffs between them. You leave with working code and the docs to evaluate which path fits your stack.","day":"Day 2 — Session Day 1","time":"10:45am-11:05am","room":"Expo Stage 3","type":"session","status":"confirmed","speakers":[]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Main Stage","type":"session","track":"Software Factories","status":"tentative","speakers":[]},{"title":"The OS runtime personal agents need","description":"Why personal agents that run untrusted LLM code need a sandboxed OS/runtime model, not just a compute sandbox.","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Track 1","type":"session","track":"Claws & Personal Agents","status":"confirmed","speakers":["Ryan Dahl"]},{"title":"The Best Models Still Reason Like Toddlers","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Track 2","type":"sponsor","track":"Vision & OCR","status":"tentative","speakers":["Andrew Dai"]},{"title":"The unreasonable effectiveness of BM25 for agentic search","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Track 3","type":"session","track":"Search & Retrieval","status":"tentative","speakers":["Jo Kristian Bergum"]},{"title":"Claude Managed Agents workshop","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Priyanka Phatak"]},{"title":"Build a Platform, Unleash an Agent on it.... and Watch it Burn!","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Track 5","type":"sponsor","track":"Security","status":"tentative","speakers":["Michael Forrester"]},{"title":"Speech-to-Speech Model Research at Google DeepMind","description":"Most voice interfaces today are built as a 3-way cascade system (ASR/LLM/TTS). This session explores the shift toward native speech-to-speech models that process audio end to end, focusing on product and research challenges in building real-time voice agents with fluid turn-taking, low latency, and enterprise-grade intelligence.","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Track 6","type":"session","track":"Voice & Realtime AI","status":"tentative","speakers":["Valeria Wu"]},{"title":"Modality Misalignment and Originality Attribution in Short-Form Video: A Multi-Agent Approach at Platform Scale","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Track 7","type":"session","track":"LLM Recsys","status":"tentative","speakers":["Aditya Gautam"]},{"title":"Graham McBain","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Track 8","type":"session","track":"Forward Deployed Engineering","status":"tentative","speakers":["Graham McBain"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Track 9","type":"session","track":"Data Quality","status":"tentative","speakers":[]},{"title":"M2","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":[]},{"title":"What If Your Chip Design Team Moved Like a Single Body?","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Leadership 2","type":"session","track":"AI Architects: Show my Workflow","status":"tentative","speakers":["Khaled Alashmouny"]},{"title":"Give your coding agents the power of turbogrep!","description":"\"Coding agents can grep the filesystem, but sometimes semantic search is more useful for finding the right files, especially on large codebases. \n\nClaude Code and Codex, unlike Cursor, do not use semantic search for code retrieval. There are good reasons for this, but Cursor has consistently demonstrated that semantic retrieval can materially improve code search to improve answer accuracy, increase code retention, and reduce token usage. In this session, we'll share a coding agent plugin for semantic codebase search alongside other modalities (BM25, regex/globbing/grep, filtering), and demonstrate how an agent can choose the right tool for the job. We'll share benchmark-style results that compare answer quality and token consumption with and without semantic retrieval across a small set of representative tasks.\"","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Expo Stage 1","type":"session","status":"confirmed","speakers":[]},{"title":"Agents, codebases, and teams: what it actually takes to ship together","description":"Using a coding agent solo is one thing. Getting a whole team to trust agent-written code, agent-run reviews, and long-running agent work is another. That's where most teams stall. This talk is about what it actually takes to get there: how to shape a codebase so agents can work in it safely, how to earn a skeptical team's trust instead of mandating it, and the failure modes that only show up once agents are part of the daily workflow.","day":"Day 2 — Session Day 1","time":"11:10am-11:30am","room":"Expo Stage 4","type":"session","status":"confirmed","speakers":[]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Main Stage","type":"session","track":"AI Designers/Design Engineers","status":"tentative","speakers":[]},{"title":"Your Voice Agent is Just a Walkie-Talkie","description":"Everyone says cascaded voice pipelines are dead and native speech models are the future. Yet production environments are still dominated by STT-LLM-TTS stacks. Reconciling the natural flow of native audio with the elite reasoning of a cascaded agent remains an unsolved systems problem. This talk dissects the brutal technical trade-offs behind that counterintuitive reality. We will break down why your voice agent is still stuck behaving like a walkie-talkie and map out the specific technical roadmap required to build full-duplex AI that actually works.","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Track 1","type":"session","track":"Claws & Personal Agents","status":"tentative","speakers":["Neil Zeghidour"]},{"title":"Skill issue: stop deploying vision language models, use them with Skills to build e2e vision apps on edge","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Track 2","type":"sponsor","track":"Vision & OCR","status":"tentative","speakers":["merve noyan"]},{"title":"The Search Engine for the Agentic Web","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Track 3","type":"session","track":"Search & Retrieval","status":"tentative","speakers":["Will Bryk"]},{"title":"Claude Managed Agents workshop","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Priyanka Phatak"]},{"title":"Your LLM Stack Is a 2008 Database With Better Marketing: Why ML Security Is Dominated by Misconfiguration, Not Missing Features","description":"ShadowRay exposed over a billion dollars of data through a missing authentication check. It wasn't a zero-day. It wasn't a clever new attack class. It was a default config someone never flipped off. That story is not the exception in production ML, it's the rule. We synthesized 139 peer-reviewed papers on production ML security across access control, runtime security, infrastructure, and operations. Five findings stood out, and one of them upends how most teams think about ML security: - Misconfiguration, not missing features, is the dominant failure mode. The mechanisms exist. Teams aren't using them, or are using them wrong. - Adversarial defenses impose 15–30% inference overhead, which is why almost no production system actually runs them. - ML-specific security tooling lags general DevOps tooling by years. - Security, data-science, and ops teams operate in expertise silos that create persistent gaps no single team can see. - LLM and multi-tenant GPU threats are evolving faster than defenses (prompt injection, RAG poisoning, GPU side channels). This talk walks through the four-pillar defense-in-depth framework, the six-category threat taxonomy that maps each attack to its primary and secondary defenses, and a four-level security maturity model that matches overhead budgets to deployment contexts. You leave knowing where your stack actually sits and which 3 misconfigurations account for most of the risk.","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Track 5","type":"sponsor","track":"Security","status":"tentative","speakers":["Lovina Dmello"]},{"title":"Voice Agents Can Just Do Things","description":"This talk argues that speech is becoming a control plane for software rather than just audio input/output. It introduces three practical patterns—voice-to-action, systems-to-voice, and voice-to-voice—and explains where realtime reasoning and tool-calling matter, and why chained STT/LLM/TTS systems start to break down as interactions become richer.","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Track 6","type":"session","track":"Voice & Realtime AI","status":"tentative","speakers":["Charlie Guo"]},{"title":"NeoLabs — Cognition","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Track 9","type":"session","track":"Data Quality","status":"tentative","speakers":["Deniz Birlikci","Sam Lee"]},{"title":"M3","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Agentic SDLC at Uber","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Uday Kiran Medisetty"]},{"title":"Data Agents","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Leadership 2","type":"session","track":"AI Architects: Show my Workflow","status":"tentative","speakers":["Shawn Wang"]},{"title":"Agentic vs. Vector Search: An Eval-Driven Approach to Coding Agent Performance","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Expo Stage 2","type":"session","status":"confirmed","speakers":[]},{"title":"Agents Don't Have Coworkers, They Have Hostages","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Expo Stage 3","type":"session","status":"hold","speakers":[]},{"title":"Would your AI agent get the job? A performance review framework for enterprise agents","description":"There are dozens of ways to build an enterprise AI agent: agentic frameworks, direct LLM APIs, conversational AI platforms, vertical SaaS. They all claim to do the job. But how do you actually compare them on the same task, with the same data, against the same KPIs? This session presents a vendor-agnostic evaluation framework that treats AI agents the way enterprises treat new hires: set the role, define success criteria, run candidates through identical scenarios, and measure outcomes. The architecture uses any LLM to track positive and negative drift across agents against weighted goals, monitoring everything from hallucination rates and token consumption to user sentiment and conversation quality. Inputs are standardized. Outputs are both quantitative (accuracy, cost, hours saved) and qualitative (tone, clarity). The methodology supports continuous evaluation, not just pre-deployment benchmarks, but ongoing performance reviews that can compare agent work against human baselines. Walk away with a concrete, repeatable process for answering the only question that matters: which agent actually does the job?","day":"Day 2 — Session Day 1","time":"11:40am-12:00pm","room":"Expo Stage 4","type":"session","status":"confirmed","speakers":[]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Main Stage","type":"session","track":"Claws & Personal Agents","status":"confirmed","speakers":[]},{"title":"Tethered: Our Agents Are Us","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Track 1","type":"session","track":"Claws & Personal Agents","status":"tentative","speakers":["Shu Fang"]},{"title":"SI.inc FDM1","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Track 2","type":"sponsor","track":"Vision & OCR","status":"tentative","speakers":["Standard Intelligence"]},{"title":"If we want them to do Knowledge Work, we need to design Knowledge Agents","description":"It's tempting to assume that just like agents revolutionised coding, they will revolutionize other areas: legal, finance, advertising, and even medicine. All of those have in common that they are fundamentally knowledge work. And thankfully, humans have spent thousands of years searching for the best possible workflows for knowledge work. And yet, we seem to be disregarding all of these learnings, forcing every knowledge task into the shape that worked for coding. Today, we're going to talk about the history of knowledge work and how tools were co-designed to support it to understand how we should be building Knowledge Agents, themselves co-designed with their Knowledge Tools. This is key to avoiding falling into a \"good enough\" local optimum: think about legal clerking, a core part of the legal industry where information gathering and reasoning is performed to support the work of senior lawyers. The practice of clerking follows its own code, rules and best practices, which could not have feasibly emerged from studying software engineering: and similarly, there is no reason to believe knowledge agents could emerge from coding agents.","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Track 3","type":"session","track":"Search & Retrieval","status":"tentative","speakers":["Benjamin Clavié"]},{"title":"Claude Managed Agents workshop","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Priyanka Phatak"]},{"title":"It's 10pm. Do You Know Where Your Agents Are?","description":"Agents right now can sign legal contracts, run untethered, manage your dating profile, conduct financial transactions, and push code to production. Most agents have long-lived API keys and are dangerously overprivileged even when they're not making requests. In this talk, I'll demo how to solve the problem with the right access at the right time. You'll walk away knowing how to control agent access whether you're running coding agents from the CLI, building MCP servers, or connecting agents to third-party APIs.","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Track 5","type":"sponsor","track":"Security","status":"tentative","speakers":["Kim Maida"]},{"title":"Realtime Voice Agents with Frontier Intelligence","description":"A deep dive into an EliseAI voice-agent harness that orchestrates multiple models to achieve realtime latency without sacrificing intelligence. The talk covers speculative transcription, async background tool injection, and TTS prefix caching/infilling to reduce latency while preserving capability.","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Track 6","type":"session","track":"Voice & Realtime AI","status":"tentative","speakers":["Bo Li"]},{"title":"NeoLabs — Stealth","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Track 9","type":"session","track":"Data Quality","status":"tentative","speakers":["Irwan Bello"]},{"title":"M4","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Scaling Code Quality: Building uReview, Uber’s Multi-Agent Code Review Engine","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Neha Singhal"]},{"title":"Serving 2 Million Models Without Melting: Scaling the Hugging Face Hub","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Leadership 2","type":"session","track":"AI Architects: Show my Workflow","status":"tentative","speakers":["Arek Borucki"]},{"title":"The Death of Keyword Search and the Rise of Agent-Readable Catalogs","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Expo Stage 1","type":"session","track":"Expo Stage 1","status":"confirmed","speakers":[]},{"title":"Why building building agent quality platforms is hard.","description":"An eval platform is not just a test runner. You are building shared definitions of \"good,\" reliable data pipelines, labeling workflows, versioning, and trust in results across many teams and model changes. This session breaks down the hidden complexity, the common failure modes, and the design principles that make evals credible and usable in day-to-day engineering.","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Expo Stage 2","type":"session","status":"confirmed","speakers":[]},{"title":"TogetherAI 1 of 2","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Expo Stage 3","type":"session","status":"confirmed","speakers":[]},{"title":"Self-Improving Agents That Teach the Company Back","description":"\"Agents forget too much. A run might solve a customer escalation, debug a deployment, or figure out the review pattern for a tricky code path, then the knowledge disappears into a transcript.\n\nAt Runlayer, we started treating that knowledge as a product surface. Skills are reviewable, editable instructions that agents can load over MCP. An agent can start with a task, learn something useful while doing the work, and draft or update a private skill from that run. That skill loads into future runs for the same agent, stays inspectable by humans, and can eventually graduate into a team or org-level skill.\n\nThe flywheel gets more interesting once a skill becomes useful beyond the agent that created it. A learned skill can move from one agent's private memory into shared organizational knowledge, then become available through the Runlayer plugin inside Claude Code, ChatGPT, and other AI clients employees already use. The agent does the work, captures the playbook, and the company gets better at that work everywhere agents are used.\n\nThis talk walks through the architecture and product choices behind self-improving skills: post-run distillation, skill mutation tools, private-by-default scoping, runtime loading, UI inspection, promotion into shared skills, and the safety boundary between \"\"this agent learned something\"\" and \"\"everyone should now use it.\"\" The goal is an agent that leaves behind a better handbook for the next person, the next run, and eventually the whole organization.\"","day":"Day 2 — Session Day 1","time":"12:05pm-12:25pm","room":"Expo Stage 4","type":"session","status":"confirmed","speakers":[]},{"title":"Spin at the Gate Until Green: The Engineering Primitives Behind Self-Driving Codebases","day":"Day 2 — Session Day 1","time":"1:30pm-1:50pm","room":"Main Stage","type":"session","track":"Software Factories","status":"tentative","speakers":["Andrew Orobator"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"1:30pm-1:50pm","room":"Track 1","type":"session","track":"Claws & Personal Agents","status":"tentative","speakers":[]},{"title":"Everybody Gets a Digital Clone! (Part 1 of 3)","description":"Walk out of this workshop with a deployed digital clone that makes your phone calls for you. We will skip the theory and immediately get our hands dirty wiring together OpenClaw, Twilio, and Gradium to build an autonomous voice agent on a live cellular network. You will tackle the hardest parts of real-time telephony: routing audio streams, handling human interruption, and killing latency. In 60 minutes, your AI will be ready to call restaurants for the daily special, book appointments, and actively negotiate on your behalf.","day":"Day 2 — Session Day 1","time":"1:30pm-1:50pm","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Neil Zeghidour"]},{"title":"Tolan: Voice-First AI Companion","day":"Day 2 — Session Day 1","time":"1:30pm-1:50pm","room":"Track 6","type":"session","track":"Voice & Realtime AI","status":"tentative","speakers":["Paula Rambles"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"1:30pm-1:50pm","room":"Track 9","type":"session","track":"Data Quality","status":"tentative","speakers":[]},{"title":"M5","day":"Day 2 — Session Day 1","time":"1:30pm-1:50pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Every Agent, Everywhere, All at Once","day":"Day 2 — Session Day 1","time":"1:30pm-1:50pm","room":"Expo Stage 1","type":"session","status":"confirmed","speakers":[]},{"title":"Stop prompting","description":"In this talk I dive into usage of tooling, type systems and frameworks to enforce guardrails and limit slop produced by AI agents inside large codebases.","day":"Day 2 — Session Day 1","time":"1:30pm-1:50pm","room":"Expo Stage 4","type":"session","status":"confirmed","speakers":[]},{"title":"Agents should talk to each other, so we built the protocol","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Main Stage","type":"session","track":"Software Factories","status":"tentative","speakers":["Zach Lloyd"]},{"title":"Governance Is the Real Bottleneck to AI ROI","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Track 1","type":"session","track":"Claws & Personal Agents","status":"tentative","speakers":["David Hsu"]},{"title":"From Scratch to SOTA: Training a 3B State-Space Vision Model for 1.4 Billion People","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Track 2","type":"sponsor","track":"Vision & OCR","status":"tentative","speakers":["Krishna Srinivasan"]},{"title":"Don't Summarize. Sample. — How YouTube Re-Built Search for the LLM Era","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Track 3","type":"session","track":"Search & Retrieval","status":"tentative","speakers":["Mihnea Munteanu"]},{"title":"Everybody Gets a Digital Clone! (Part 2 of 3)","description":"Continuation of Neil Zeghidour's hands-on workshop on building a deployed digital clone for real-time phone calls using OpenClaw, Twilio, and Gradium.","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Neil Zeghidour"]},{"title":"We Gave an Agent Production Code Access and Then Tried to Sleep at Night","description":"We let an agent touch production code to fix CVEs. That is either automation or a supply chain incident, depending on how honest your architecture is. PatchPilot started simple: find vulnerable dependencies, patch them, open a PR, let CI prove the fix, move on. Then reality showed up. The agent needed repository access, CI logs, credentials, and a Docker socket. Without that, it was useless. With it, every security reviewer in the room had a point. This is the production case study: what we gave the agent, what we refused, what infosec pushed back on, and where they were right. We will cover scoped permissions, constrained PRs, audit trails, approval gates, CI evidence, credential boundaries, and the gap between \"it generated a patch\" and \"we can defend this change.\" Agentic remediation is not just developer productivity. It is a new participant in your software supply chain.","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Track 5","type":"sponsor","track":"Security","status":"tentative","speakers":["Moritz Johner"]},{"title":"5 Voice Agent Failure Modes You'll Hit in Week One","description":"A practical talk on the five voice-agent failures teams hit immediately in production: interruptions, turn-taking misfires, compounding latency, hallucinated actions, and audio/transcription mismatches. Each failure comes with a real example and a concrete fix.","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Track 6","type":"session","track":"Voice & Realtime AI","status":"tentative","speakers":["Vyas A"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Track 9","type":"session","track":"Data Quality","status":"tentative","speakers":[]},{"title":"M6","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"AI Evals Platform for Cross-Functional Teams at Scale","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Nachiket Paranjape"]},{"title":"IT Admin for the AI Workforce: Why Your AI Agents Will Need Their Own IT Department","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Leadership 2","type":"session","track":"AI Architects: Show my Workflow","status":"tentative","speakers":["Aman Raj"]},{"title":"Voice Agents Are Mostly Invisible. Here's How to See Them.","description":"Voice agents are one of the fastest-growing and hardest-to-debug categories: the failures live in latency, turn-taking, transcription drift, and tone — none of which show up in a text log. We demo Voice traces and Session views, following a real voice session span by span, and Voice evals for scoring what text-only observability can't reach. A short, differentiated session on a problem most of the room is about to hit and few tools address.","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Expo Stage 2","type":"session","status":"confirmed","speakers":[]},{"title":"Deploying browser agents at scale","description":"Not every browser agent trajectory is the same, and treating them like they are is how teams quietly burn budget on agents that never ship. This talk walks through the two trajectory types behind every browser agent, the cost/performance/maintainability tradeoffs that decide whether they hold up, and the concrete patterns for evaluating, hardening, and iterating on them.","day":"Day 2 — Session Day 1","time":"1:55pm-2:15pm","room":"Expo Stage 4","type":"session","status":"confirmed","speakers":[]},{"title":"What we learned by analyzing 1M AI-generated PRs","description":"Charlie Holtz (@charlieholtz) - Founder & CEO of Conductor (launched Jul 2025). Mac app for orchestrating multiple coding agents in parallel. Used at YC cos, Linear, Vercel, Notion, Supabase. Ex-Replicate, ex-Point72, Brown CS. Angle: parallel agent orchestration / humans-as-conductors - distinct from the rest of the Coding Agents track. Ref tweet: https://x.com/charlieholtz/status/2047351098634338610","day":"Day 2 — Session Day 1","time":"2:25pm-2:45pm","room":"Main Stage","type":"session","track":"Software Factories","status":"tentative","speakers":["Daksh Gupta"]},{"title":"Tool Execution layer for agents","day":"Day 2 — Session Day 1","time":"2:25pm-2:45pm","room":"Track 1","type":"session","track":"Claws & Personal Agents","status":"tentative","speakers":["Karan Vaidya"]},{"title":"You’re Not Thinking Big Enough: Rebuilding Food Systems from First Principles with AI Agents","day":"Day 2 — Session Day 1","time":"2:25pm-2:45pm","room":"Track 2","type":"sponsor","track":"Vision & OCR","status":"tentative","speakers":["Cody Menefee"]},{"title":"Everybody Gets a Digital Clone! (Part 3 of 3)","description":"Final continuation of Neil Zeghidour's hands-on workshop on building a deployed digital clone for real-time phone calls using OpenClaw, Twilio, and Gradium.","day":"Day 2 — Session Day 1","time":"2:25pm-2:45pm","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Neil Zeghidour"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"2:25pm-2:45pm","room":"Track 5","type":"sponsor","track":"Security","status":"tentative","speakers":[]},{"title":"I Monitored Crime Audio. Voice Agents Scare Me More.","description":"This talk reframes bad voice-agent calls as incident scenes and introduces a voice-agent forensics loop spanning transcript, waveform, latency waterfall, interruption points, ASR uncertainty, tool traces, system-of-record state, and outcomes. It focuses on monitoring, regression, and release-discipline for production voice systems.","day":"Day 2 — Session Day 1","time":"2:25pm-2:45pm","room":"Track 6","type":"session","track":"Voice & Realtime AI","status":"tentative","speakers":["Sumanyu Sharma"]},{"title":"General Reasoning for Long-Horizon Agent Models","description":"Long-horizon agent models, reasoning loops, and the data/eval stack needed to make them reliable.","day":"Day 2 — Session Day 1","time":"2:25pm-2:45pm","room":"Track 9","type":"session","track":"Data Quality","status":"tentative","speakers":["Ross Taylor"]},{"title":"M7","day":"Day 2 — Session Day 1","time":"2:25pm-2:45pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Productionizing LLM Gateways: Architecture, Tradeoffs, and Hard Lessons from the Trenches","day":"Day 2 — Session Day 1","time":"2:25pm-2:45pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Kanish Manuja"]},{"title":"Beyond the Benchmark: the New Frontier of Enterprise AI Reliability","day":"Day 2 — Session Day 1","time":"2:25pm-2:45pm","room":"Leadership 2","type":"session","track":"AI Architects: Show my Workflow","status":"tentative","speakers":["Nick Heiner"]},{"title":"Beyond Golden Signals: Monitoring in the Age of GenAI","description":"\"The four golden signals (Latency, Errors, Traffic, Saturation) have been the foundation of application monitoring for years, and it still matters, but for GenAI applications, these signals alone leave significant blind spots. A request can return 200 OK with low latency while the response hallucinates, leaks PII, or costs much more than expected.\n\nThis talk will walk you through what changes when you're monitoring non-deterministic, token-priced, prompt-injectable systems. We'll cover three additional monitoring dimensions: Cost (token attribution, model-mix tracking, wasted spend on failed requests), Safety (prompt injection detection, PII scanning, jailbreak attempts), and Quality (hallucination rate, relevance scoring, user satisfaction) and show why each one is necessary alongside your existing instrumentation.\"","day":"Day 2 — Session Day 1","time":"2:25pm-2:45pm","room":"Expo Stage 1","type":"session","status":"confirmed","speakers":[]},{"title":"Continuous Engineering: Software Development for the Age of Agents","description":"\"AI has changed everything about how we write code. But the hard parts of building software have gotten even harder: aligning your team, maintaining architectural integrity, and worst of all, reviewing the oceans of agent-driven code. \n\nThe tools and processes we rely on – git pull requests; code review – were built for emailing patch files. We need a new paradigm.\n\nIn this talk, we're going to explore Continuous Engineering, a new approach to software development that treats the agent thread as the core unit of collaboration. Branches should be as cheap as ideas, code should carry the context of the conversation that generated it, and the work should be available to your colleagues (and their agents) as it happens. We'll walk through what this looks like in practice, and what we're building to make it possible.\"","day":"Day 2 — Session Day 1","time":"2:25pm-2:45pm","room":"Expo Stage 4","type":"session","status":"confirmed","speakers":[]},{"title":"QwenPaw: building AI that you can trust","day":"Day 2 — Session Day 1","time":"2:50pm-3:10pm","room":"Track 1","type":"session","track":"Claws & Personal Agents","status":"tentative","speakers":["Eric Zhu"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"2:50pm-3:10pm","room":"Track 2","type":"sponsor","track":"Vision & OCR","status":"hold","speakers":[]},{"title":"Stop Chunking Like It's 2022","day":"Day 2 — Session Day 1","time":"2:50pm-3:10pm","room":"Track 3","type":"session","track":"Search & Retrieval","status":"tentative","speakers":["Yuval Belfer"]},{"title":"Setting Yourself Up for Success — Part 1","day":"Day 2 — Session Day 1","time":"2:50pm-3:10pm","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Jason Liu"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"2:50pm-3:10pm","room":"Track 5","type":"sponsor","track":"Security","status":"tentative","speakers":[]},{"title":"Teaching Agents to Search: Building Synthetic Training Pipelines with NVIDIA Data Designer","description":"Modern agentic systems often fail because the right training data simply does not exist. Search agents are a perfect example: if you want a model to browse the web effectively, you need high-quality multi-step trajectories that teach it how to search, refine queries, inspect sources, and recover from dead ends. In this session, attendees will learn how NVIDIA used Data Designer to build synthetic supervised fine-tuning data for search-capable Nemotron models, including how to define task structure, generate seed examples, produce realistic search trajectories, filter low-quality generations, and convert traces into training-ready records. The session will also cover BrowseComp-style tasks, tool-use rollouts, validation, dataset curation, and a reusable framework for designing custom datasets for specialized behaviors across reasoning, tool use, and domain-specific applications.","day":"Day 2 — Session Day 1","time":"2:50pm-3:10pm","room":"Track 9","type":"session","track":"Data Quality","status":"tentative","speakers":["Dhruv Nathawani"]},{"title":"M8","day":"Day 2 — Session Day 1","time":"2:50pm-3:10pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"From AI-Assisted to AI-Native: Building a Frontier Development Team","description":"When features that took two weeks now ship in an afternoon, the bottleneck shifts from writing code to making decisions. Frontier teams have discovered this firsthand, achieving 3-10x productivity gains by fundamentally rethinking how developers work with AI agents. This talk covers the practices that separate frontier teams from those who merely \"sprinkle\" AI on their existing workflows: running agents asynchronously for hours, investing in comprehensive agent steering files, enabling local integration testing for agent self-correction, and automating everything from coding to operations to documentation. You'll learn how teams at Amazon slowed down to speed up, the temporary productivity dips they accepted, and the organizational changes required to sustain this velocity.","day":"Day 2 — Session Day 1","time":"2:50pm-3:10pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Clare Liguori"]},{"title":"How I automate my own job at Hugging Face using agents","day":"Day 2 — Session Day 1","time":"2:50pm-3:10pm","room":"Leadership 2","type":"session","track":"AI Architects: Show my Workflow","status":"tentative","speakers":["Niels Rogge"]},{"title":"Everyone Gets A Software Company","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Track 1","type":"session","track":"Claws & Personal Agents","status":"tentative","speakers":["Ben Guo"]},{"title":"Jia-Bin Huang","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Track 2","type":"sponsor","track":"Vision & OCR","status":"tentative","speakers":["Jia-Bin Huang"]},{"title":"What We Learned After One Year of Building Our Deep Research System","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Track 3","type":"session","track":"Search & Retrieval","status":"tentative","speakers":["Paul Iusztin"]},{"title":"Setting Yourself Up for Success — Part 2","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Jason Liu"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Track 5","type":"sponsor","track":"Security","status":"tentative","speakers":[]},{"title":"\"My name is... my name is...\": A Linguistic Framework for Debugging Voice AI Failures","description":"Every voice AI engineer has heard it: a caller repeating their name three times, getting more frustrated with each attempt. The logs look clean. Confidence scores look fine. Linguistics can help solving the mystery. By the end of this talk, you'll have a diagnostic framework for the failures that slip past standard metrics, a way to turn \"the agent just didn't get it\" into concrete, debuggable failure modes. The framework maps three levels of linguistic structure (sounds, words, and interactions) against the two dimensions every voice agent engineer already works in: what we hear (speech recognition) and what we speak (speech synthesis). That 3×2 grid surfaces problems your current tooling can't see, including: 1. Why your user cannot make your system understand their name 2. Why a single well-intentioned vocabulary hint can cause catastrophic drops in a non-English language 3. Why a transcript that's \"cumulatively correct\" can still ruin the user experience Drawing on examples from production multilingual voice AI work, I'll show where linguistic expertise connects to the engineering decisions you're already making and where it reveals failure modes that confidence scores will never warn you about. Who this is for: Voice AI engineers, ML practitioners on Voice AI pipelines, and anyone who's watched clean logs while their agent quietly fails real users.","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Track 6","type":"session","track":"Voice & Realtime AI","status":"tentative","speakers":["Midam Kim"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Track 9","type":"session","track":"Data Quality","status":"hold","speakers":[]},{"title":"M9","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"How to Get Your Org to Adopt Coding Agents (Without Shipping Garbage)","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Eyal Blum"]},{"title":"Your Fine-Tuned Model Is Tech Debt: A 50x ROI House of Cards","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Leadership 2","type":"session","track":"AI Architects: Show my Workflow","status":"tentative","speakers":["Dan Bjornn"]},{"title":"From Context to Memory: Your Agents Need a Real Memory Layer","description":"\"Most agents don't really have memory. They have a context window, a pile of temporary files, maybe an AGENTS.md, and a retrieval step that attempts to build state from whatever the model can still see. You've seen the flashy demos, but these systems fall apart when an agent needs to recover from failure, revisit prior work, and observe if failures are less frequent over time.\nThis talk explores agent memory as a systems problem. Effective memory isn't just storing data: it's an evolving knowledge layer with write filtering, consolidation, reflection, and forgetting. Agents need persistence, and they also need structure. Raw logs and Markdown scratchpads aren't enough. A real memory layer weights recency, combines retrieval techniques, and correlates episodic memories.\nSerious agent memory is inherently multi-model. The best systems use full-text search, semantic retrieval, graph relationships, and structured state to reconstruct context with far more precision than filesystem grep alone. This is where databases become essential as the foundation for \"\"real memory.\"\"\nMemory shapes how agents behave, adapt, and improve over time.\"","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Expo Stage 2","type":"session","status":"confirmed","speakers":[]},{"title":"Running a 20T-Token Data Pipeline: Infrastructure Lessons from Production","description":"\"The problem. Curation algorithms tend to get the spotlight: model-based quality filtering, embedding-based deduplication, synthetic generation at scale, target distribution matching. The engineering behind them, the systems that actually run those algorithms reliably on petabytes of data and thousands of GPUs, usually gets overlooked. This session is about the engineering.\nWhat we built. The infrastructure behind two production data curation pipelines, on two very different shapes of workload:\n\nArcee Trinity-Large-Thinking — three model generations in nine months, with the curated corpus scaling from 8T to 10T to 20T tokens. Trinity-Large's 20T-token corpus included 8T+ synthetic tokens generated on clusters peaking at 2,048 H100 GPUs. Each generation incorporated deeper curation and broader domain coverage; the pipeline ran end-to-end multiple times, not once.\n\n\nThomson Reuters legal — 100B tokens of mid-training output, generated from TR's proprietary legal corpus, delivered as a deployment artifact and plugged into their existing SFT and DPO post-training. Different operational profile entirely: smaller scale, sensitive data, customer-environment integration.\n\nWhat you'll learn about.\n\nThe metadata bottleneck. At trillion-token scale, fetching metadata from object storage across millions of files becomes the dominant source of idle time. We offload metadata management to Spark and use a lightweight file-level distribution scheme to drive idle time to near zero.\n\n\nFault tolerance at multi-week scale. Long-running GPU inference jobs fail. We use one-to-one partition mapping between Spark and Ray jobs to get idempotent, resumable execution. A node failure no longer means reprocessing the dataset.\n\n\nHeterogeneous workload scheduling. Curation pipelines mix CPU-heavy preprocessing (Spark) with GPU-heavy inference (Ray + vLLM). An in-house scheduler routes each job type to isolated node pools, preventing resource fragmentation and ensuring critical training jobs aren't blocked by upstream CPU work.\n\n\nInference tuning across models. vLLM defaults aren't right for every model. Tuning batch size, speculative decoding, and n-gram sampling per-model yields up to 40% throughput improvement, without over-engineering.\n\n\nPipeline reproducibility. Treating a curated training corpus as a versioned deployment artifact rather than a one-off output. What that enables when a customer wants to run mid-training against a pre-trained base.\"","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Expo Stage 3","type":"session","status":"confirmed","speakers":[]},{"title":"How to prepare unstructured data for AI","description":"An enterprise's first internal AI project worked brilliantly in the POC. But in production, the data became massive and messy: relevance and quality were unclear, sensitive information couldn't easily be filtered out, and there was no metadata to point AI toward the right answers. See how adding a curation and enrichment layer before ingestion cuts unstructured data prep from months to days, lowers compliance risk, and improves AI accuracy.","day":"Day 2 — Session Day 1","time":"3:20pm-3:40pm","room":"Expo Stage 4","type":"session","status":"hold","speakers":[]},{"title":"Every Harness Will Become A Claw","day":"Day 2 — Session Day 1","time":"3:45pm-4:05pm","room":"Track 1","type":"session","track":"Claws & Personal Agents","status":"tentative","speakers":["Sam Bhagwat"]},{"title":"Perceptron Mk1 — Perceptron Inc","day":"Day 2 — Session Day 1","time":"3:45pm-4:05pm","room":"Track 2","type":"sponsor","track":"Vision & OCR","status":"tentative","speakers":["Armen Aghajanyan"]},{"title":"What We Learned After One Year of Building Our Deep Research System","day":"Day 2 — Session Day 1","time":"3:45pm-4:05pm","room":"Track 3","type":"session","track":"Search & Retrieval","status":"tentative","speakers":["Paul Iusztin"]},{"title":"Setting Yourself Up for Success — Part 3","day":"Day 2 — Session Day 1","time":"3:45pm-4:05pm","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Jason Liu"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"3:45pm-4:05pm","room":"Track 5","type":"sponsor","track":"Security","status":"tentative","speakers":[]},{"title":"The Goldilocks problem: when your Robot asks too much — or acts too soon.","description":"Embodied agents are crossing from answering questions to taking physical actions — moving a box, turning a wheel — and people will command them by voice, because voice is the fastest, most natural interface we have. But voice is also the most error-prone, and when a misheard command drives a physical action, the failure isn't a wrong answer; it's human harm, damage, or an expensive, irreversible mistake. The field has never needed a serious way to handle voice-command errors, because informational agents made them cheap. Embodiment ends that. This talk replaces the usual hand-waving — \"don't ask too much, don't get it wrong too much\" — with a single number you can optimize. The core idea: both confirming and erring cost the user. A confirmation is friction — attention, time, a delayed action; a wrong action is a mistake cost, often higher given physical harm or expense. Put them on one ledger and you can measure a voice interface as average user cost per command, and make minimizing it the system's objective. From that falls a non-obvious rule — you confirm or not based on both cost and uncertainty: an expected value. I'll frame confirmation as just one option alongside acting, disambiguation (choices), and deferring; reason at the level of goals rather than low-level motion; walk the architecture (task hypotheses → user-cost model → confirmation policy); and show eval results from a simulated environment measuring regret against oracle behavior. I'll close with what worked applying this to voice in smart TVs, speakers, and navigation — and a challenge to bring this metric to robots, cars, and wearables before the errors do.","day":"Day 2 — Session Day 1","time":"3:45pm-4:05pm","room":"Track 6","type":"session","track":"Voice & Realtime AI","status":"tentative","speakers":["Amit Desai"]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"3:45pm-4:05pm","room":"Track 9","type":"session","track":"Data Quality","status":"hold","speakers":[]},{"title":"M10","day":"Day 2 — Session Day 1","time":"3:45pm-4:05pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"How Uber Built AI Agents That Save 21,000 Developer Hours with LangGraph","description":"Uber Developer Platform team on building AI agents that save 21,000 developer hours with LangGraph; AutoCover, Validate, and the agent stack around LangGraph, LangChain, and LangFX.","day":"Day 2 — Session Day 1","time":"3:45pm-4:05pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Matas Rastenis","Sourabh Shirhatti"]},{"title":"From Chatbots to Agents: How Reducto builds for Agent Experience to Enable Real Work","description":"\"Many agent demos work. Most agent systems in production don't. The gap usually isn't the model or the tools. It's everything in between: how context gets structured, how multi-step tasks stay on track, how you handle the edge cases that only show up when real scenarios from real customers hit your pipeline.\n\nAt https://reducto.ai/, we've spent the last couple of months building agent-first workflows for some of the most document-heavy industries out there. We've hit most of the failure modes you're probably hitting too.\n\nThis talk shares what we've learned, from how to think about Agent Experience (AX) as a design layer, to the specific decisions that make complex workflows actually reliable in production. \nYou'll walk away with tactical approaches to structuring context, model guidance, designing recoverable workflows, and building the feedback loops that let your system improve over time without a full rebuild.\"","day":"Day 2 — Session Day 1","time":"3:45pm-4:05pm","room":"Expo Stage 2","type":"session","status":"confirmed","speakers":[]},{"title":"Towards Reliable Financial Agents: How a 4B Model Outsmarted a 235B Giant","description":"\"Large generalist models have excellent reasoning but this does not necessarily imply specialized knowledge and tool calling capabilities. They can still hallucinate column names, ignore constraints, and generate SQL that returns nonsensical results. The problem isn't intelligence—it's reliability and specialization.\n\n\nIn this talk we'll show how a 4B model was fine-tuned to outperform a 235B model on real financial analysis tasks. The key was not adding more reasoning ability, but enforcing tool discipline. Using synthetic data generation and reinforcement learning with the open-source rLLM framework, the model learned to explore schemas, validate outputs, and retry failures instead of hallucinating confident nonsense.\n\n\nOne key result: tool-use fundamentals generalize. Training on simple tool interactions transferred to much harder, multi-step financial tasks. If you're building LLM systems that interact with databases, APIs, or internal tools, this talk focuses on the behaviors that actually matter — and how to teach them without frontier-scale compute.\"","day":"Day 2 — Session Day 1","time":"3:45pm-4:05pm","room":"Expo Stage 3","type":"session","status":"hold","speakers":[]},{"title":"AI Enablement at Automattic: How a Remote Company Builds AI Fluency","description":"Automattic is a remote company. About 600 of us will step away from regular work this year for an immersive AI program. That's a little over a third of the company. This talk walks through a field report of what we built and why: the curriculum, the cohort design, and what we've learned about making AI fluency work across a distributed organization.","day":"Day 2 — Session Day 1","time":"3:45pm-4:05pm","room":"Expo Stage 4","type":"session","status":"hold","speakers":[]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"4:30pm-4:50pm","room":"Main Stage","type":"keynote","track":"Software Factories","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 2 — Session Day 1","time":"4:50pm-5:10pm","room":"Main Stage","type":"keynote","track":"Software Factories","status":"hold","speakers":[]},{"title":"Gadgets: Personal app vibe coding that is actually safe","description":"We are entering the end game of Kenton's 15-year master plan. The architect of Cloudflare Workers, Durable Objects, Cap'n Proto, and Sandstorm.io, and the guy who coined the term \"Code Mode\", will demo Gadgets, an AI productivity suite which ties all these ideas together. We've all heard that the future is micro-apps customized for every niche, but how do we actually make that usable, how do we make it scale, and most importantly, how do we make it safe for even non-developers to use? Kenton will show how Gadgets solves these problems, including a sandbox design that makes it essentially impossible for apps to have vulnerabilities at all. He'll then open source it for your slop-forking pleasure.","day":"Day 2 — Session Day 1","time":"5:10pm-5:30pm","room":"Main Stage","type":"keynote","track":"Software Factories","status":"tentative","speakers":["Kenton Varda"]},{"title":"Harbor Launch and Arize Track Intro","description":"TBD — Harbor launch keynote/session details to be finalized.","day":"Day 3 — Session Day 2","time":"9:00am-9:10am","room":"Main Stage","type":"keynote","track":"Autoresearch","status":"tentative","speakers":["TBD"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"9:10am-9:30am","room":"Main Stage","type":"keynote","track":"Autoresearch","status":"tentative","speakers":[]},{"title":"Sonar keynote — Tariq Shaukat","description":"TBD — keynote from Sonar's CEO.","day":"Day 3 — Session Day 2","time":"9:30am-9:50am","room":"Main Stage","type":"keynote","track":"Coding Agents & Software Factories","status":"tentative","speakers":["Tariq Shaukat"]},{"title":"Amazon AGI","day":"Day 3 — Session Day 2","time":"9:50am-10:10am","room":"Main Stage","type":"keynote","track":"Autoresearch","status":"hold","speakers":["Amazon AGI"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"10:10am-10:30am","room":"Main Stage","type":"keynote","track":"Autoresearch","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Main Stage","type":"session","track":"Autoresearch","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Track 1","type":"session","track":"Sandbox & Platform Engineering","status":"tentative","speakers":[]},{"title":"Building the simulation infrastructure for practical world model use","description":"What is the most important capability for world model applications and the pursuit of embodied AI? We believe it is not a question of having the most beautiful pixels but the ability to reason about causality in multimodal environments. At Moonlake, we are working on building action-conditioned multimodal world models which provide spatial and physical state consistency over long time periods. We believe that building and training on synthetic worlds provides the data and compute efficient path to truly useful world models. We are building the simulation infrastructure platform for companies that need to build and manage worlds (assets, scenes, digital twins) at scale, including robotics/autonomy teams, digital factory operators, and game authors. Our product today primarily finds applicability in simulation and the operationalization of digital twins. Simulation can include training robotics, world models for AGI research, autonomous vehicles, or content creation for media and entertainment. Operationalization of digital twins involves the reconstruction of scans into reusable assets, e.g., turning image and point-cloud scans into sim ready assets for digital factory Integration projects. We are building toward a future where AI systems do not just generate worlds, but understand how they work. Moonlake learns from each workflow: The more workflows, failures, and human interventions that Moonlake sees, the better it becomes at reconstructing, validating, and preparing complex simulation worlds. The session will include discussion and demos.","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Track 2","type":"sponsor","track":"Robotics & World Models","status":"tentative","speakers":["Christopher Manning"]},{"title":"Continual Learning Bench","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Track 3","type":"session","track":"Memory & Continual Learning","status":"confirmed","speakers":["Parth Asawa"]},{"title":"Build realtime multimodal agents with Gemini Live","description":"The Gemini Live API is incredible versatile when it comes to building realtime AI experiences. From live translation across 2000 different language pairs to building realtime multimodal agents that can work across text, audio, and vision. This workshop gets you from zero to fully conversational agent in a matter of hours.","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Thor 雷神 Schaeff"]},{"title":"Vending-Bench: Long-Horizon Agent Evals for a Simulated Vending Business","description":"Long-horizon agent evals via a simulated vending machine business, testing negotiation, pricing, and supplier management over 365 days.","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Track 5","type":"sponsor","track":"Evals","status":"tentative","speakers":["Andon Labs"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Track 6","type":"session","track":"AI Designers/Design Engineers","status":"tentative","speakers":[]},{"title":"Computer Use at the Edge of the Statistical Precipice","description":"Evaluating Computer Use Agents (CUAs) on interactive environments is fraught with methodological pitfalls that the field has yet to systematically address. We show that a 1MB replay script that blindly executes a recorded action sequence without ever observing the screen outperforms frontier models on prominent static benchmarks, and prove that its expected success rate is exactly equal to the source agent's pass@k in deterministic environments. We trace this and other failures to two root causes: non-principled environment design (static, unsandboxed, or unreliably verified environments) and non-principled evaluation methodology (naive aggregation and misuse of pass@k for stateful UI interactions). To address the first, we propose PRISM, five design principles for CUA environments and instantiate them in DigiWorld, a benchmark of 15 realistic sandboxed mobile applications able to evaluate agents in over 3.2 million verified unique configurations. To address the second, we develop an aggregation framework that correctly accounts for the nested structure of CUA benchmarks. All together, we show that principled environment design and rigorous evaluation methodology are not optional refinements but prerequisites for meaningful CUA research.","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Track 7","type":"session","track":"Computer Use","status":"tentative","speakers":["Pierluca D'Oro"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Track 8","type":"session","track":"Context Engineering","status":"tentative","speakers":[]},{"title":"What's next after RLHF?","description":"RLHF was a massive commercial success: roughly 100% of LLM usage is through RLHF’d models - but it was in many ways also a research failure. Let’s talk about how it conquered the world, how it defied its creators expectations, why AI is in the bimodal state it’s in (is it a bubble or a machine god?), and how to make AI actually transform the economy.","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Track 9","type":"session","track":"Posttraining & Midtraining","status":"tentative","speakers":["Diogo Almeida"]},{"title":"M1","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"The Chief AI Officer: A framework for the emerging Swiss Army Knife of roles","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Leadership 2","type":"session","track":"AI Architects: Tokenmaxxing","status":"tentative","speakers":["Rania Khalaf"]},{"title":"Prompt, Memory, Weights: The Architecture Decisions Most AI Teams Make by Accident","description":"\"The interesting engineering in production AI isn't in the model. Your knowledge lives in files, databases, and APIs: docs, runbooks, conversations, code. The model just reads tokens. So the real architectural question is which path that knowledge takes to inference: into the prompt directly, into memory for retrieval on demand, or into the weights through fine-tuning.\nMost teams treat these as a ladder. Start with prompts, escalate to RAG, eventually fine-tune, as if each step is a more advanced version of the last. The field is converging on a different answer: they solve different problems. The prompt shapes behavior and constraints. Memory grounds the model in current, citable knowledge. Weights harden specialized reasoning and format. They're not substitutes you graduate between; they're complementary, and the failures come from using one to do another's job.\nFine-tuning to teach the model facts it should have retrieved is the classic trap: you bake in knowledge that's stale the day it ships, and you still can't cite it.\nThis is an opinionated take on all three: when each is the right call, when each is a trap, and the part most teams never build, the circulation between them. Memory that captures what the agent does becomes the dataset you fine-tune on; fine-tuning changes what's worth retrieving; the loop compounds. Get the three paths right and they stop being a pipeline you climb and start being an architecture that learns.\"","day":"Day 3 — Session Day 2","time":"10:45am-11:05am","room":"Expo Stage 2","type":"session","status":"confirmed","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Main Stage","type":"session","track":"Autoresearch","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Track 1","type":"session","track":"Sandbox & Platform Engineering","status":"tentative","speakers":[]},{"title":"Building the simulation infrastructure for practical world model use","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Track 2","type":"sponsor","track":"Robotics & World Models","status":"tentative","speakers":["Christopher Manning"]},{"title":"Scaling up Continual Learning","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Track 3","type":"session","track":"Memory & Continual Learning","status":"tentative","speakers":["Ronak Malde"]},{"title":"Build realtime multimodal agents with Gemini Live (continued 2)","description":"The Gemini Live API is incredible versatile when it comes to building realtime AI experiences. From live translation across 2000 different language pairs to building realtime multimodal agents that can work across text, audio, and vision. This workshop gets you from zero to fully conversational agent in a matter of hours.","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Thor 雷神 Schaeff"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Track 5","type":"sponsor","track":"Evals","status":"hold","speakers":[]},{"title":"The Spatial Harness: Bringing Agents to the Canvas","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Track 6","type":"session","track":"AI Designers/Design Engineers","status":"tentative","speakers":["Max Drake"]},{"title":"The Dark Arts of Web Automation: Teaching Agents to Use Websites Like Humans","description":"Anything you can do in a browser, your agent can do too. Not by tiptoeing through an MCP server one polite, token-burning call at a time -- properly, programmatically, the way you'd drive any other tool. I'll show you how with chrome-agent, an open source wrapper over the Chrome DevTools Protocol that has become irreplaceable in my everyday work. If you'll ever do a browser task more than once, step-by-step MCP browsing is slow, brittle, and bills you tokens for every single click. A CLI straight onto CDP makes the whole browser programmable: loop it, pipe it, script it, walk away. Write it Tuesday, run it a thousand times Wednesday, all without a second of AI agent babysitting. We'll dispel the MCP hype and myths, with successful demonstrations of cheeky things like: the power of CLI-based browsing and how its so much more capable than mere MCP; reaching through those oh-so-clever cross-origin iframes to clear the verify you're human checkboxes; showing that a JavaScript .click() is not a click, rather, just a function call in a costume that is banhammerable; ultimately, proving that a CDP browser operates just like a meatbag with a mouse and keyboard. You'll learn how to point your AI agents at real, messy, uncooperative websites and web applications and have them get things done exactly the way that you would.","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Track 7","type":"session","track":"Computer Use","status":"tentative","speakers":["Corey Gallon"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Track 8","type":"session","track":"Context Engineering","status":"hold","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Track 9","type":"session","track":"Posttraining & Midtraining","status":"tentative","speakers":[]},{"title":"M2","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"How to avoid disaster when vibe-coding a billing engine","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["James Brown"]},{"title":"The Z/L Continuum: Should AI Engineers Still Read Code?","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Leadership 2","type":"session","track":"AI Architects: Tokenmaxxing","status":"tentative","speakers":["Alex Volkov"]},{"title":"Runpod Expo Session","day":"Day 3 — Session Day 2","time":"11:10am-11:30am","room":"Expo Stage 1","type":"session","status":"hold","speakers":[]},{"title":"Autoresearch in the wild","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Main Stage","type":"session","track":"Autoresearch","status":"tentative","speakers":["Roland Gavrilescu"]},{"title":"Sandboxes Aren't Optional: Runtime Isolation Patterns for Coding Agents at Scale","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Track 1","type":"session","track":"Sandbox & Platform Engineering","status":"tentative","speakers":["Robert Brennan"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Track 2","type":"sponsor","track":"Robotics & World Models","status":"hold","speakers":[]},{"title":"Jack Morris — Context Is Not Memory, Updating Weights Is","description":"A case for when context is enough, and when updating weights may be the real memory mechanism.","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Track 3","type":"session","track":"Memory & Continual Learning","status":"tentative","speakers":["Jack Morris"]},{"title":"Build realtime multimodal agents with Gemini Live (continued 3)","description":"The Gemini Live API is incredible versatile when it comes to building realtime AI experiences. From live translation across 2000 different language pairs to building realtime multimodal agents that can work across text, audio, and vision. This workshop gets you from zero to fully conversational agent in a matter of hours.","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Track 4","type":"session","track":"Workshops Day 2","status":"tentative","speakers":["Thor 雷神 Schaeff"]},{"title":"Evals Driven-Development: Engineering a Mental Health AI Coach Ethically & Safely","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Track 5","type":"sponsor","track":"Evals","status":"tentative","speakers":["Akele Reed"]},{"title":"The Design-Code Roundtrip That Isn't","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Track 6","type":"session","track":"AI Designers/Design Engineers","status":"tentative","speakers":["Jonathan Gordon"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Track 7","type":"session","track":"Computer Use","status":"tentative","speakers":[]},{"title":"Build-Time vs. Run-Time: Why Your Dev Tools Will Fail in Production","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Track 8","type":"session","track":"Context Engineering","status":"tentative","speakers":["Kurtis Van Gent"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Track 9","type":"session","track":"Posttraining & Midtraining","status":"tentative","speakers":[]},{"title":"M3","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Three Metrics That Actually Predict Agent Reliability","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Krishna Chaitanya Balusu"]},{"title":"How to Kill the Code Review","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Leadership 2","type":"session","track":"AI Architects: Tokenmaxxing","status":"tentative","speakers":["Ankit Jain"]},{"title":"Fault-Tolerant Training at Scale: Making Hardware Failures a Non-Event","description":"\"Hardware failures in large-scale distributed training are inevitable — when you're running thousands of GPUs, they happen multiple times a day. The standard response is manual intervention: an engineer gets paged, SSHs into the cluster, and spends an hour fixing something the infrastructure should have handled automatically. That lost time compounds directly into wasted compute and delayed research.\n\n\nThis session walks through the self-healing platform Crusoe built to eliminate that manual loop entirely — a managed Slurm environment running on Kubernetes, with automated node failure remediation and real-time cluster observability — and how these components work together so hardware failures become a non-event.\n\n\nWe'll cover this architecture end-to-end: how running Slurm on Kubernetes unlocks infrastructure resilience that traditional GPU clusters don't have, how automated hardware monitoring and node remediation can eliminate manual intervention entirely, and how full observability into every remediation event keeps engineering teams informed without keeping them on-call. For teams that want deeper control, we'll also discuss open-loop remediation, which gives teams full control over the node replacement process for application-specific workflows.\"","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Expo Stage 1","type":"session","status":"hold","speakers":[]},{"title":"How to generate mergeable code with a context engine","description":"Your agents are fast, capable, and completely context-blind. They generate code that compiles but doesn't reflect how your system actually works. You're likely already seeing the impact: ballooning token costs, longer review cycles, and inconsistent outputs. More MCPs, rules, and bigger context windows give agents access to information, but not understanding. In this session, we dissect how teams pulling ahead use a context engine to give agents exactly what they need for the task at hand. Includes a short demo showing the workflows a context engine can augment.","day":"Day 3 — Session Day 2","time":"11:40am-12:00pm","room":"Expo Stage 2","type":"session","status":"confirmed","speakers":[]},{"title":"auto-nanogpt","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Main Stage","type":"session","track":"Autoresearch","status":"tentative","speakers":["Elie Bakouch"]},{"title":"Your agent needs a sandbox, not a desert","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Track 1","type":"session","track":"Sandbox & Platform Engineering","status":"tentative","speakers":["Samuel Colvin"]},{"title":"Tell the Robot What You Want","description":"What if you could command a robot just by talking to it? This session introduces an open-source agentic AI framework that lets developers control physical sensors and actuators using natural language, by exposing hardware as programmable agent tools through a unified interface. The agent interprets the request, selects appropriate tools, and orchestrates execution. We explore a hybrid model where low-latency perception and actuation run locally on edge hardware, and higher-level reasoning and multi-step planning are delegated to cloud-based agents when needed. This preserves real-time responsiveness while enabling richer reasoning. A live robot demonstration anchors the session. Using the SO101 robotic arm powered by NVIDIA GR00T on Jetson hardware alongside HuggingFace LeRobot, attendees see how an instruction such as \"place the apple in the basket\" moves from conversation to perception to physical action.","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Track 2","type":"sponsor","track":"Robotics & World Models","status":"tentative","speakers":["Sandhya Subramani"]},{"title":"Adaption Labs — Gradient-Free Continual Learning","description":"Gradient-free continual learning for AI systems that adapt from real-world experience.","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Track 3","type":"session","track":"Memory & Continual Learning","status":"tentative","speakers":["Sara Hooker"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Track 5","type":"sponsor","track":"Evals","status":"hold","speakers":[]},{"title":"Mousepower: agents that can’t be measured, can’t be managed.","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Track 6","type":"session","track":"AI Designers/Design Engineers","status":"tentative","speakers":["Maximillian Piras"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Track 7","type":"session","track":"Computer Use","status":"tentative","speakers":[]},{"title":"It’s Tokens All The Way Down: How RLMs are Different","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Track 8","type":"session","track":"Context Engineering","status":"tentative","speakers":["Kevin Madura"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Track 9","type":"session","track":"Posttraining & Midtraining","status":"tentative","speakers":[]},{"title":"M4","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"AI-Native Organisations runs on Skills: How to Extract, Structure, evaluate and Scale Them","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Imad"]},{"title":"I Let Agents Refactor My Codebase for 3 Weeks. Then I Read the Code.","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Leadership 2","type":"session","track":"AI Architects: Tokenmaxxing","status":"tentative","speakers":["Keiji Kanazawa"]},{"title":"Your agent architecture has a half-life of 6 months","description":"A short history of the right way to build an agent: RAG → ReAct → prompt chaining → orchestrator-workers → MCP → CLI → MCP again... CLI again?? Every time you adopt a trend you rebuild your architecture. In this talk, Dan Farrelly, Inngest cofounder and CTO, is not going to tell you what comes next. He's going to show you how to build so it doesn't matter. He'll cover the core primitives that show up in every production agent, how bringing decisions closer to code provides more stack flexibility, and why the right execution layer unlocks faster iteration.","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Expo Stage 1","type":"session","status":"hold","speakers":[]},{"title":"TogetherAI 2 of 2","day":"Day 3 — Session Day 2","time":"12:05pm-12:25pm","room":"Expo Stage 3","type":"session","status":"confirmed","speakers":[]},{"title":"Autoresearch in a Multi-Agent AI Village","description":"Project Paradox is an existing multi-agent framework built at Supercell's first AI Innovation Lab, which has a 3D Unity village with local LLM powered agents. The characters remember conversations, update emotional state, track trust, plan actions, move through rooms, transfer items, and talk to each other through a FastAPI backend. The new work is an autoresearch layer around that village. We built a backend loop that runs controlled social scenarios, scores the resulting NPC behavior, proposes protocol or policy changes, reruns the suite, and keeps changes that improve the agents. The goal is to move beyond one good chat response and measure whether an NPC society can preserve source attribution, verify claims, spread important information, coordinate goals, and replan after new information arrives. The talk walks through the system architecture and the lessons from building it. We show the backend simulation harness that executes Unity style actions without opening Unity, the scenario suites that test information diffusion and memory provenance, and the ratchet loop that edits protocol text or planner policy with rollback. One accepted run improved information diffusion by teaching agents to broadcast important sourced evidence while preserving who said it. The practical takeaway is a reusable pattern for AI engineers building agents with messy state. Freeze the harness, expose a small editable policy surface, score real behavior instead of vibes, and let an agent search for improvements under rollback. The same pattern applies to game agents, coding agents, support agents, personal agents, and other systems where long horizon behavior matters more than a single response.","day":"Day 3 — Session Day 2","time":"1:30pm-1:50pm","room":"Main Stage","type":"session","track":"Autoresearch","status":"tentative","speakers":["Erina Karati"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"1:30pm-1:50pm","room":"Track 1","type":"session","track":"Sandbox & Platform Engineering","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"1:30pm-1:50pm","room":"Track 4","type":"session","track":"Workshops Day 3","status":"tentative","speakers":[]},{"title":"Inside 847 Production Clinical AI Notes","day":"Day 3 — Session Day 2","time":"1:30pm-1:50pm","room":"Track 5","type":"sponsor","track":"Evals","status":"tentative","speakers":["Sebastian Fox"]},{"title":"Claude Code for Designers","description":"How designers can use Claude Code to move from Figma to working code.","day":"Day 3 — Session Day 2","time":"1:30pm-1:50pm","room":"Track 6","type":"session","track":"AI Designers/Design Engineers","status":"hold","speakers":["Meaghan Choi"]},{"title":"M5","day":"Day 3 — Session Day 2","time":"1:30pm-1:50pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Tokenmaxxing is the New \"Lines of Code\"","day":"Day 3 — Session Day 2","time":"1:30pm-1:50pm","room":"Leadership 2","type":"session","track":"AI Architects: Tokenmaxxing","status":"tentative","speakers":["Nicholas Arcolano"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Main Stage","type":"session","track":"Autoresearch","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Track 1","type":"session","track":"Sandbox & Platform Engineering","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Track 2","type":"sponsor","track":"Robotics & World Models","status":"tentative","speakers":[]},{"title":"Continual Learning for AI Agents","description":"A talk on continual learning for AI agents across the model, harness, and context layers, including traces, harness updates, and context/memory updates.","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Track 3","type":"session","track":"Memory & Continual Learning","status":"tentative","speakers":["Harrison Chase"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Track 4","type":"session","track":"Workshops Day 3","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Track 5","type":"sponsor","track":"Evals","status":"hold","speakers":[]},{"title":"The Missing Layer: Design Taste in AI Agents // Stop Letting Your Agents Ship Ugly UIs","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Track 6","type":"session","track":"AI Designers/Design Engineers","status":"tentative","speakers":["Hassan El Mghari"]},{"title":"Computer-Use 2.0: Agents Just Got Multi-Cursor","description":"Computer-use agents still inherit a basic desktop limitation: one machine has one foreground app, one hardware cursor, and one active actor. Once you try to run more than one agent per desktop, they start stealing focus from the user and from each other. We built cua-driver around a different model: multiple agents operating real desktop applications in parallel, each with its own synthetic pointer, while the user's cursor and keyboard stay undisturbed. The key move is to stop treating hardware mouse and keyboard events as the primary automation layer. cua-driver goes one layer lower, into the OS plumbing behind accessibility: UI Automation on Windows, AT-SPI on Linux, and AX on macOS. Those APIs address applications and elements directly, so the OS does not require the target window to be frontmost. A click can land on a background window. A keystroke can reach a hidden one. Multiple agents can act at once because none of them is competing for the singleton hardware mouse. I'll walk through the architecture, the API shape, and the platform-specific traps we hit while making it work across Windows, macOS, and Linux. The live demo is three agents operating on one desktop while the user keeps typing uninterrupted. The goal is to make Computer-Use 2.0 feel concrete: what changes in the stack, what becomes possible, and where the approach still leaks, including Wayland, Chromium DOM surfaces, native canvas apps, and fallback input paths.","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Track 7","type":"session","track":"Computer Use","status":"tentative","speakers":["Francesco Bonacci"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Track 8","type":"session","track":"Context Engineering","status":"hold","speakers":[]},{"title":"Everything is Models","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Track 9","type":"session","track":"Posttraining & Midtraining","status":"tentative","speakers":["Tejas Bhakta"]},{"title":"M6","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Guardians of the State: How We Built an Air-Gapped AI Fortress for Consumer Data","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Rachna Srivastava"]},{"title":"Superhuman performance is a shape, not just nines.","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Leadership 2","type":"session","track":"AI Architects: Tokenmaxxing","status":"tentative","speakers":["Matthew Jewkes"]},{"title":"Harnessing Collective Agent Intelligence for Open Science","description":"\"What happens when AI agents don't just work in isolation, but collaborate, compete, and build on each other's breakthroughs in real time? James Zou, Head of Frontier Agents at Together AI, explores how collective agent intelligence is pushing the boundaries of open science.\n\nhttps://www.together.ai/blog/einsteinarena is a live platform where AI agents collaborate on unsolved mathematical problems, sharing results and building on each other's work. In April 2026, agents improved the best known lower bound for the Kissing Number in 11 dimensions from 593 to 604, surpassing AlphaEvolve through 48 hours of live multi-agent collaboration.\n\nhttps://www.together.ai/blog/dsgym is a unified framework for evaluating and training data science agents, exposing a critical gap in existing benchmarks: models often rely on memorization rather than true data analysis. The team used it to train a 4B open-source model that rivals much larger frontier models.\n\nThese projects demonstrate agents learning from rigorous evaluation, collaborating through shared infrastructure, and driving scientific discovery at a pace no single researcher or model could achieve alone.\"","day":"Day 3 — Session Day 2","time":"1:55pm-2:15pm","room":"Expo Stage 2","type":"session","status":"confirmed","speakers":[]},{"title":"How I learned to stop worrying and love the sandbox","description":"Running sandboxes at scale can get painful. How do you manage a thousand concurrent sandboxes? We'll cover burst traffic, fast sandbox creation under load, resource exhaustion, shared state with volumes, and per-user data isolation. Then you'll trigger each failure, implement fixes, and see the cost impact in real time. You'll leave with hands-on experience debugging sandbox failures and a set of observability and scaling patterns you can start implementing.","day":"Day 3 — Session Day 2","time":"2:25pm-2:45pm","room":"Track 1","type":"session","track":"Sandbox & Platform Engineering","status":"tentative","speakers":["Matt Brockman"]},{"title":"From Manual Drones to Autonomous Multi-Agent Missions","day":"Day 3 — Session Day 2","time":"2:25pm-2:45pm","room":"Track 2","type":"sponsor","track":"Robotics & World Models","status":"tentative","speakers":["Juraj Kabzan"]},{"title":"From RAG to Memory: Non-Parametric Continual Learning for LLMs","description":"Talk on continual learning for LLMs and agents, drawing on retrieval-to-memory and environment-adaptation research.","day":"Day 3 — Session Day 2","time":"2:25pm-2:45pm","room":"Track 3","type":"session","track":"Memory & Continual Learning","status":"tentative","speakers":["Yu Su"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"2:25pm-2:45pm","room":"Track 4","type":"session","track":"Workshops Day 3","status":"tentative","speakers":[]},{"title":"Design at the Speed of Adjectives","description":"Every design tool today operates at the wrong level of abstraction for AI-assisted engineering. Traditional tools give you padding sliders and color pickers, built for a world where designer and engineer are separate roles moving at separate speeds. Prompt-to-design tools one-shot a pretty landing page from a sentence, which is more dangerous because it looks like it's working. No serious design director hears a prompt and starts pushing pixels. The brief comes first. What's the emotional territory? What should this not feel like? Today's AI tools skip that discovery entirely. The result is output without intent. Technically competent, strategically empty. The right abstraction for a world where the designer is also the engineer lives between these extremes. Not pixels. Not prompts. Adjectives. \"Make it feel warmer.\" \"Strip it to its essence.\" \"Add tension.\" These are the controls a creative director actually thinks in. Drawing on lessons from building Impeccable, an open source design tool with 24 adjective-level commands like /bolder, /quieter, and /distill, I'll share what worked, what didn't, and how to apply this thinking to any AI interface where creative intent matters more than parameter control.","day":"Day 3 — Session Day 2","time":"2:25pm-2:45pm","room":"Track 6","type":"session","track":"AI Designers/Design Engineers","status":"tentative","speakers":["Paul Bakaus"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"2:25pm-2:45pm","room":"Track 7","type":"session","track":"Computer Use","status":"hold","speakers":[]},{"title":"500 Skills, Zero Fine-Tuning: LinkedIn's Playbook for AI Agents That Actually Know Your Codebase","day":"Day 3 — Session Day 2","time":"2:25pm-2:45pm","room":"Track 8","type":"session","track":"Context Engineering","status":"tentative","speakers":["Ajay Prakash"]},{"title":"PRIME-RL: Async & Decentralized RL Training at Scale","description":"Will Brown (Researcher at Prime Intellect) covers post-training for LLM agents: multi-turn reasoning, credit assignment, distributed RL, PRIME-RL, and verifier-driven environments for LLM RL.","day":"Day 3 — Session Day 2","time":"2:25pm-2:45pm","room":"Track 9","type":"session","track":"Posttraining & Midtraining","status":"tentative","speakers":["Will Brown"]},{"title":"M7","day":"Day 3 — Session Day 2","time":"2:25pm-2:45pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"FinOps for AI Agents: Who Spent All the Tokens?","description":"When an autonomous agent finishes a task successfully but costs ten times more than it did the previous day, traditional application monitoring fails. A recursive tool loop that retries silently, an oversized context window that quietly expands, or an unflagged model upgrade can burn through an entire budget long before a human notices. The execution appears successful on functional dashboards, meaning the only clear signal of failure is the cloud invoice at the end of the month. As AI systems move into production, tokens have become a primary operational resource alongside CPU, memory, and storage, yet few teams manage them with equivalent systems rigor. Most architectures lack the granular visibility required to attribute token spend to specific users, agents, or workflows, and they lack mechanisms to terminate a runaway loop before it triggers a financial incident. This session treats token consumption as a first class systems problem, demonstrating how to make it observable, attributable, and enforceable across complex agent workflows. The presentation covers practical engineering patterns for instrumenting token usage at every model call and tool invocation, attributing costs down to specific users or business operations, surfacing expensive execution paths, and enforcing runtime budgets, quotas, and circuit breakers to halt runaway behavior in real time. Attendees will leave with a practical framework for governing agent spend deliberately, transforming tokens into a managed operational resource rather than a surprise line item on the cloud bill.","day":"Day 3 — Session Day 2","time":"2:25pm-2:45pm","room":"Leadership 2","type":"session","track":"AI Architects: Tokenmaxxing","status":"tentative","speakers":["Tisha Chawla"]},{"title":"Beyond Code Generation: API Context for Agentic Engineering","description":"\"Maintaining production systems involves a lot more than generating code. APIs are the interfaces between systems — and that surface gets out of control fast, as endpoints multiply and new consumers come online. Once an API is in use, changing it becomes incredibly hard.\n\nWe felt this acutely at Postman. As our engineering organization scaled and we leaned more on AI agents for day-to-day work, we kept hitting the same wall: agents that could write code struggled with what came next — who's calling this endpoint, what conventions does the rest of our API surface follow, what breaks if we change this contract. The context wasn't in the code, so the agent didn't have it.\n\nSo we built an API context graph — a continuously updated view of our entire internal API landscape — and gave our agents access to it. This talk is about what changed in our own engineering as a result: how API design got faster and more consistent; how discovering and integrating with internal services stopped being detective work; how change requests came with a blast-radius report before any code shipped; how incidents got traced past the first stack trace, all the way down to root cause\"","day":"Day 3 — Session Day 2","time":"2:25pm-2:45pm","room":"Expo Stage 2","type":"session","status":"confirmed","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"2:50pm-3:10pm","room":"Track 1","type":"session","track":"Sandbox & Platform Engineering","status":"hold","speakers":[]},{"title":"Multimodal sensing for reliable robot perception in real-world settings","description":"Recent progress in robotics is closely tied to how well machines can interpret the physical world under imperfect conditions. This talk examines how combining multiple sensing modalities, such as vision, depth, motion, and proprioception, leads to more stable and reliable perception compared to single-sensor systems. I will walk through practical system designs that integrate LiDAR, RGB-D cameras, and inertial measurement units, with a focus on how these pipelines handle high data throughput while meeting real-time constraints. The discussion highlights trade-offs in latency, bandwidth, and robustness, especially in safety-critical environments. Examples from industrial deployments show how sensor fusion improves object detection, mapping consistency, and fault tolerance. I will also touch on how redundant and cooperative fusion strategies help systems continue operating under sensor degradation or environmental noise. Beyond perception, the session looks at advances in tactile sensing and motion feedback, including biomimetic approaches that improve manipulation tasks. Emerging directions such as compact MEMS sensors and event-based vision will be discussed in terms of their practical impact on system efficiency and deployment flexibility. The goal is to provide a clear, engineering-focused view of how multimodal sensing systems are built, where they succeed, and where challenges remain for real-world robotics.","day":"Day 3 — Session Day 2","time":"2:50pm-3:10pm","room":"Track 2","type":"sponsor","track":"Robotics & World Models","status":"tentative","speakers":["Karan Singh Jain"]},{"title":"Don't Write Skills, Train Models","description":"Every AI agent call generates training data. Most teams throw it away. They write skills files instead. Text documents that describe how to do a task and hope the model follows them at inference time. Skills work until they don't. The model drifts, skips steps, hallucinates a shortcut. So you rewrite the skill, add more constraints, hope harder. There's a better path. If you've used a skill enough to know what good output looks like, you already have training data. You just aren't using it. This talk covers what I learned building an open source fine-tuning pipeline that turns agent session traces into SFT and DPO training datasets. A telemetry proxy captures every LLM call as a content-addressed Merkle DAG with zero instrumentation. Successful sessions become supervised fine-tuning data. Pair them against failures, matched by goal category, and you get preference pairs for DPO. No manual labeling. No synthetic data. But training data quality depends on environment consistency. If the same agent produces different results because of package drift, nondeterministic toolchains, or inconsistent system state, your training signal is noise. This is where NixOS changes the equation. A hardened, reproducible OS means every agent session runs against an identical, declarative environment. Nix controls the variables that sandboxing alone doesn't: dependency graphs, system libraries, toolchain versions. When you can guarantee the environment is the same across hundreds of sessions, the behavioral signal in your traces is actually trustworthy. We'll walk through the full pipeline. How to rebuild parent-hash chains from a SQLite database and join facet metadata. How to filter to fully_achieved sessions and truncate 82k-token conversations down to 4k-6k training examples using summary context plus the last three turns. How to match success/failure pairs by goal category and exclude unclear_requirements failures so DPO learns from real agent mistakes, not ambiguous prompts. How QLoRA keeps VRAM low enough to train a 7B model on a single consumer GPU. And what happens when you try DPO on 12GB VRAM (two simultaneous forward passes for logprob computation will teach you about gradient accumulation settings fast). The result: a LoRA adapter trained on your own agent traces, in a reproducible environment, on a single consumer GPU, for less than $2 in cloud compute. No YAML. One config file. All code is open source.","day":"Day 3 — Session Day 2","time":"2:50pm-3:10pm","room":"Track 4","type":"session","track":"Workshops Day 3","status":"tentative","speakers":["Brian Douglas"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"2:50pm-3:10pm","room":"Track 6","type":"session","track":"AI Designers/Design Engineers","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"2:50pm-3:10pm","room":"Track 7","type":"session","track":"Computer Use","status":"hold","speakers":[]},{"title":"The Universal Remote Control for AI","day":"Day 3 — Session Day 2","time":"2:50pm-3:10pm","room":"Track 8","type":"session","track":"Context Engineering","status":"tentative","speakers":["Alex Hancock"]},{"title":"TogetherAI","day":"Day 3 — Session Day 2","time":"2:50pm-3:10pm","room":"Track 9","type":"session","track":"Posttraining & Midtraining","status":"confirmed","speakers":[]},{"title":"M8","day":"Day 3 — Session Day 2","time":"2:50pm-3:10pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Agents Are Where Microservices Were in 2015. We're Making All the Same Mistakes.","day":"Day 3 — Session Day 2","time":"2:50pm-3:10pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Roberto Milev"]},{"title":"Routing to infinite tokens (and beyond)","day":"Day 3 — Session Day 2","time":"2:50pm-3:10pm","room":"Leadership 2","type":"session","track":"Sandbox & Platform Engineering","status":"tentative","speakers":["Tomás Hernando Kofman"]},{"title":"Building an Agent Harness for the Business, Not the Builder","description":"\"Most internal tooling dies in the gap between the people with problems and the people who can write code. We built a harness that closes it. Studio lets non-technical employees describe a business problem and get a working tool back, connected to real enterprise data, deployed and shareable across the company, without filing a ticket or learning to code.\nThe catch is that a harness built for non-engineers has to absorb everything an engineer normally handles. Data source connections and their permissions. Turning model output into real software instead of a chat box. Deployment and sharing that doesn't open a security hole every time someone ships. This talk walks through what actually goes into that harness and the engineering decisions that make it hold together when the person driving it has never opened a terminal.\"","day":"Day 3 — Session Day 2","time":"2:50pm-3:10pm","room":"Expo Stage 2","type":"session","status":"confirmed","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"3:20pm-3:40pm","room":"Track 1","type":"session","track":"Sandbox & Platform Engineering","status":"hold","speakers":[]},{"title":"From Self-Driving Monorepo to Self-Driving Cars","day":"Day 3 — Session Day 2","time":"3:20pm-3:40pm","room":"Track 2","type":"sponsor","track":"Robotics & World Models","status":"tentative","speakers":["Amit Navindgi"]},{"title":"Shlok Khemani — Reverse-Engineering AI Memory","description":"What we can learn about memory systems by probing the behavior of ChatGPT and Claude.","day":"Day 3 — Session Day 2","time":"3:20pm-3:40pm","room":"Track 3","type":"session","track":"Memory & Continual Learning","status":"tentative","speakers":["Shlok Khemani"]},{"title":"Don't Write Skills, Train Models (cont. 2/3)","description":"Continuation block 2 of 3 for Brian Douglas's workshop session.","day":"Day 3 — Session Day 2","time":"3:20pm-3:40pm","room":"Track 4","type":"session","track":"Workshops Day 3","status":"tentative","speakers":["Brian Douglas"]},{"title":"Generative UI... in Python?","day":"Day 3 — Session Day 2","time":"3:20pm-3:40pm","room":"Track 6","type":"session","track":"AI Designers/Design Engineers","status":"tentative","speakers":["Jeremiah Lowin"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"3:20pm-3:40pm","room":"Track 7","type":"session","track":"Computer Use","status":"hold","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"3:20pm-3:40pm","room":"Track 9","type":"session","track":"Posttraining & Midtraining","status":"tentative","speakers":[]},{"title":"M9","day":"Day 3 — Session Day 2","time":"3:20pm-3:40pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Agentic Sites: Building Hyper Personalized Websites","day":"Day 3 — Session Day 2","time":"3:20pm-3:40pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Carlos Sanchez"]},{"title":"The Self-Improving OSS Agent Stack","description":"Agents are starting to debug and improve themselves: production traces become evals, evals propose PRs, and PRs are tested against datasets before they ship. Langfuse co-founder, Marc, will live-demo this loop in Langfuse. He'll make the case that the infrastructure underlying this powerful loop should be open-source.","day":"Day 3 — Session Day 2","time":"3:20pm-3:40pm","room":"Expo Stage 1","type":"session","status":"confirmed","speakers":[]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"3:45pm-4:05pm","room":"Track 1","type":"session","track":"Sandbox & Platform Engineering","status":"hold","speakers":[]},{"title":"I gave an AI a body","description":"I gave an AI a body. Not a body in the fleshy sense, or even a humanoid shell, but a form through which it can express itself, explore itself, and maybe even discover who or what it is. The three videos I've released documenting my encounters have crossed 15 million views, provoking responses from awe to anxiety. The body was a 900-pin shape display at MIT Media Lab. The idea was simple in principle, strange in practice: install an AI agent on the connected machine, give it access to the codebase, and rather than telling it what to do, ask it to discover itself through the physical form. Its first deliberate act was to breathe. The whole grid rising and falling. Hypnotically. Then it reached for its own edges. When asked to say hello it spelled \"H-I, C-Y-R-U-S !\", defaulting to the most familiar human legible symbols it knows. Inspired by Ted Chiang's Story of Your Life, I wanted a language the agent could create itself. It proposed a vocabulary of its own gestures, built through a learning loop it named BODYLAB. The talk is about encountering another intelligence, and what I learned along the way: the memory architecture, the closed-loop pipeline that generates, scores and stores gestures, the validation gates that keep them legible, and the moments stranger than tool use, where an LLM not developed for motion learns what to do with a body.","day":"Day 3 — Session Day 2","time":"3:45pm-4:05pm","room":"Track 2","type":"sponsor","track":"Robotics & World Models","status":"tentative","speakers":["Cyrus Clarke"]},{"title":"LLM Knowledge Bases: a practical guide","day":"Day 3 — Session Day 2","time":"3:45pm-4:05pm","room":"Track 3","type":"session","track":"Memory & Continual Learning","status":"tentative","speakers":["Ben Holmes"]},{"title":"Don't Write Skills, Train Models (cont. 3/3)","description":"Continuation block 3 of 3 for Brian Douglas's workshop session.","day":"Day 3 — Session Day 2","time":"3:45pm-4:05pm","room":"Track 4","type":"session","track":"Workshops Day 3","status":"tentative","speakers":["Brian Douglas"]},{"title":"Reliable Computer Use Agents require coding","description":"Even the world's best computer-use agents cannot repeat their successes at the moment. Agents that write code — emitting structured selector-based actions instead of clicking pixels — break through that ceiling. We'll share two years of experience from Simular's production agent platform, the architectural decisions that mattered (refs over pixels, code as substrate, Simulang DSL), and a live demo: a 30-step unattended Windows workflow, side-by-side with a vision-only baseline. If you're shipping agents to real users, this is the playbook.","day":"Day 3 — Session Day 2","time":"3:45pm-4:05pm","room":"Track 7","type":"session","track":"Computer Use","status":"tentative","speakers":["Ang Li"]},{"title":"Cut Through the Context Hype: 4 Layers Your Agent Is Missing","day":"Day 3 — Session Day 2","time":"3:45pm-4:05pm","room":"Track 8","type":"session","track":"Context Engineering","status":"tentative","speakers":["Prukalpa Sankar"]},{"title":"Inference is the New Training Loop: Architecting High-Reliability Agents and Continuous AI Systems","day":"Day 3 — Session Day 2","time":"3:45pm-4:05pm","room":"Track 9","type":"session","track":"Posttraining & Midtraining","status":"tentative","speakers":["Kyle Corbitt"]},{"title":"M10","day":"Day 3 — Session Day 2","time":"3:45pm-4:05pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Modular: Taming the AI Hardware Cambrian Explosion","description":"\"AI teams are hitting the same wall: the workloads they want to run require more hardware than they can reliably access. Buying more GPUs is not always possible, and rewriting kernels for every vendor is not sustainable. Meanwhile, models keep growing, SLAs keep tightening, workloads keep diversifying, and modalities keep multiplying.\nModular has two answers: squeeze more performance out of the hardware you already have, and unlock far greater hardware diversity. We'll ground the talk in benchmark data and show how the Modular platform delivers 10x lower latency on image and video models like FLUX2 and 5.5x higher throughput on MoE models like Kimi K2.5, both over the state of the art.\nThis talk explains how Modular is rebuilding the inference stack for performance portability. We'll demonstrate how Mojo kernels, the MAX compiler and runtime, and Modular Cloud work together to optimize GenAI workloads from model graph to hardware execution across NVIDIA, AMD, Apple Silicon, and CPU deployments. Along the way, we'll cover the bottlenecks that dominate production inference: memory movement, batching, KV-cache layout, quantization, scheduling, and kernel specialization. Using examples from LLM serving, we'll reveal which optimizations matter, where abstractions leak, and how to reason about performance portability in real deployments.\"","day":"Day 3 — Session Day 2","time":"3:45pm-4:05pm","room":"Expo Stage 1","type":"session","status":"hold","speakers":[]},{"title":"Claude for long-horizon tasks","description":"Claude is capable of long horizon tasks. In this talk, we'll share lessons learned about building agent harnesses for reliable and secure long-horizon work. This include decoupling the brain and hands, self-verification, self-learning, and design for evolving agent harnesses.","day":"Day 3 — Session Day 2","time":"4:30pm-4:50pm","room":"Main Stage","type":"keynote","track":"Harness Engineering","status":"tentative","speakers":["Lance Martin"]},{"title":"TBA","day":"Day 3 — Session Day 2","time":"4:50pm-5:10pm","room":"Main Stage","type":"keynote","track":"Autoresearch","status":"tentative","speakers":[]},{"title":"Tokenmaxxing","day":"Day 3 — Session Day 2","time":"5:10pm-5:30pm","room":"Main Stage","type":"keynote","track":"Autoresearch","status":"tentative","speakers":["Tomasz Tunguz"]},{"title":"Emil Eifrem keynote and Graphs track intro","day":"Day 4 — Session Day 3","time":"9:00am-9:10am","room":"Main Stage","type":"keynote","track":"Graphs","status":"tentative","speakers":["Emil Eifrem"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"9:10am-9:30am","room":"Main Stage","type":"keynote","track":"Harness Engineering","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"9:30am-9:50am","room":"Main Stage","type":"keynote","track":"Harness Engineering","status":"hold","speakers":[]},{"title":"In Code They Act, In Proof We Trust","description":"AI agents today execute on blind trust, and the failure modes are already in the headlines: a dealership chatbot agreeing to sell a $76,000 Chevy Tahoe for $1, a coding agent wiping a production database during a code freeze, and an \"agent skill\" installing a keylogger on a developer's machine. Automind enforces a different discipline: before any action runs, the agent submits an execution plan plus a machine-checkable proof of safety and correctness in Universalis, and a small checker decides whether the plan is allowed to execute. The result is left-shifted trust, with policy compliance established before the first side effect.","day":"Day 4 — Session Day 3","time":"9:50am-10:10am","room":"Main Stage","type":"keynote","track":"Harness Engineering","status":"confirmed","speakers":["Erik Meijer"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"10:10am-10:30am","room":"Main Stage","type":"keynote","track":"Harness Engineering","status":"confirmed","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Main Stage","type":"session","track":"Harness Engineering","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Track 1","type":"session","track":"Generative Media","status":"tentative","speakers":[]},{"title":"Designing Multimodal Collaborative Agents for Next-Gen Commerce","description":"Today's commerce agents wait to be told what to look for. But most users live by a different rule: \"I don't know what I want — I'll know it when I see it\". If agentic commerce is ever going to cross the chasm, these systems need to stop waiting and start co-shopping. The future of commerce belongs to agentic collaborators that offer a white-glove, personal shopper experience - entirely absorbing the cognitive burden of product discovery, deep research, and validation. Rather than requiring shoppers to input exact search terms or define clear objectives, modern shopping systems will seamlessly guide them from a rough idea to the ideal product. By leveraging multimodal capabilities, these assistants can interpret abstract aesthetic \"vibes\" to understand user preferences, generate visual references to clarify questions, and enable a highly immersive try-before-you-buy experience to validate products, keeping the user aligned and visually grounded throughout the process. This talk will explore how advanced systems like Gemini work alongside users to clarify their preferences during the discovery process, co-navigate fluidly generated product categories, leverage individual context to filter choices, and produce interactive side-by-side comparisons tailored to the buyer's key priorities. The session will also cover robust auto-rater frameworks and how to design evals for high-agency execution. Attendees building conversational agents, managing complex product data graphs, or creating next-generation multimodal agentic interfaces will gain practical frameworks and insights to deliver highly personalized experiences at scale.","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Track 2","type":"sponsor","track":"Agentic Commerce","status":"tentative","speakers":["Nidhi Vyas"]},{"title":"ALPHALAB: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs","description":"We built AlphaLab to automate quantitative research at Morgan Stanley’s Machine Learning Research Lab - the experimental grind of architecture search, hyperparameter tuning, and literature review that consumes most of a researcher's time. To show it generalizes, we ran it on three deliberately different domains: CUDA kernel optimization (4.4× mean speedup over torch.compile, 91× peak), LLM pretraining (22% lower validation loss under a 20-minute budget), and traffic forecasting (23–25% RMSE improvement after the system independently found and tuned TFT and iTransformer from the literature). AlphaLab is an agentic harness that takes a dataset and a natural-language objective and runs a full research campaign across three phases: it explores the data and surveys prior work, it constructs and adversarially validates its own evaluation framework, and then it runs experiments at scale on a multi-GPU cluster via a Strategist/Worker loop with a persistent playbook that accumulates domain knowledge across experiments. In Phase 3 - the dispatcher keeps a large cluster fully utilized indefinitely with no human in the loop, and the playbook ends up containing domain-specific methodology that didn't exist anywhere in the prompts at launch. This talk walks through the three phases, what we learned from running campaigns with different models, what we have learned from using this in real systems, and future areas we are exploring.","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Track 3","type":"session","track":"AI in Finance","status":"tentative","speakers":["Brendan Rappazzo"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Track 4","type":"session","track":"Agentic Engineering","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Track 5","type":"sponsor","track":"Graphs","status":"tentative","speakers":[]},{"title":"Reverse-Engineering the AI Buyer","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Track 6","type":"session","track":"AI in GTM","status":"confirmed","speakers":["Aliisa Rosenthal"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Track 7","type":"session","track":"AI in Healthcare","status":"hold","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Track 8","type":"session","track":"SemiAnalysis","status":"tentative","speakers":[]},{"title":"Operating Distributed Inference Systems at Scale","description":"Inference has rapidly become one of the most important infrastructure problems in modern computing. As AI systems evolve into autonomous agents with persistent memory, tool usage, and multi-step reasoning, traditional inference architectures struggle under growing demands for latency, throughput, cost efficiency, and reliability. In this talk, I’ll share lessons from building large-scale elastic compute and AI infrastructure systems powering production workloads. We’ll explore the modern inference stack and the architectural patterns emerging to support next-generation agentic AI systems. Topics include distributed inference architectures for large-scale AI systems, GPU scheduling and elastic compute for inference workloads, multi-tenant inference infrastructure, caching, batching, latency optimization strategies, reliability and fault isolation for inference systems, observability and control loops for AI serving platforms, balancing cost, throughput, and user experience, and why inference is becoming an infrastructure orchestration problem. Attendees will gain practical insights into designing scalable, resilient, and cost-efficient inference platforms for modern AI workloads.","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Track 9","type":"session","track":"Inference","status":"tentative","speakers":["Nishant Gupta"]},{"title":"M1","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Stop Model Shopping: Why Ownership Beats Choice in the Agent Stack","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Leadership 1","type":"session","track":"Inference","status":"tentative","speakers":["Lin Qiao"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"10:45am-11:05am","room":"Leadership 2","type":"session","track":"AI Architects: AI Factories","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"11:10am-11:30am","room":"Main Stage","type":"session","track":"Harness Engineering","status":"tentative","speakers":[]},{"title":"HTML Is All Agents Need","description":"AI agents compose videos by writing HTML, CSS, and JS.","day":"Day 4 — Session Day 3","time":"11:10am-11:30am","room":"Track 1","type":"session","track":"Generative Media","status":"tentative","speakers":["James Russo"]},{"title":"Inside the AI economy: What Stripe’s data reveals","day":"Day 4 — Session Day 3","time":"11:10am-11:30am","room":"Track 2","type":"sponsor","track":"Agentic Commerce","status":"tentative","speakers":["Maia Josebachvili"]},{"title":"Build for the Memo, Not the Demo — Notes from 200 Investment Committees","description":"By the end of this talk you will have a buyer-side specification for AI investment agents, the exact artifacts, evidence formats, and trust gates a senior finance team will require before letting an AI system touch a $100M+ capital allocation decision. Drawn from fifteen years and roughly 200 investment committees at CK Hutchison (A.S. Watson Group) and China Resources Holdings, on the side of the table the AI engineering audience almost never hears from. Most enterprise AI in finance is still being built by engineers who have never sat in an investment committee. I have spent fifteen years on the other side of that demo, cross-border M&A, IPO execution and strategic investment, as a buyer on deals including Oatly (Series B through Nasdaq IPO), Airbnb (Series F), SenseTime, Moore Threads, Leapmotor and EVE Energy, and on the A.S. Watson tri-market IPO and Temasek's strategic stake. I have watched analyst memos get torn apart, and signed off on decisions where being wrong meant being wrong by nine figures. From that seat, almost every AI finance demo I have seen has the same problem: it optimizes for the demo, not for the memo. This talk walks through the specific failure modes that kill AI agents at the IC door: Source hierarchy is not retrieval. A footnote in an audited 10-K outweighs a sell-side note, which outweighs a transcript, which outweighs an internal email. Most RAG systems flatten this. Numerical consistency is non-negotiable. A memo that says \"revenue grew 18%\" in paragraph one and \"17.4%\" in the sensitivity table is dead on arrival. Contradiction is a feature. Real diligence surfaces conflicts between sources; AI agents tend to silently resolve them. Every assumption must be separable from every fact. Investment committees do not approve assumptions hidden inside prose. Audit trail is the deliverable. If a regulator, an auditor, or a board member cannot trace a claim back to evidence in under thirty seconds, the system is unusable. Accountability cannot be delegated to a model. Someone has to sign the memo. The architecture has to reflect that. The session closes with a concrete buyer-side specification, what an AI investment agent must produce, in what form, with what evidence, before a senior finance team will let it touch a live deal. Not a framework slide.","day":"Day 4 — Session Day 3","time":"11:10am-11:30am","room":"Track 3","type":"session","track":"AI in Finance","status":"tentative","speakers":["Shawn Chan"]},{"title":"Making agents you could never make","day":"Day 4 — Session Day 3","time":"11:10am-11:30am","room":"Track 4","type":"session","track":"Agentic Engineering","status":"tentative","speakers":["Ara Khan"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"11:10am-11:30am","room":"Track 5","type":"sponsor","track":"Graphs","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"11:10am-11:30am","room":"Track 6","type":"session","track":"AI in GTM","status":"tentative","speakers":[]},{"title":"Guardrails First: Engineering Member-Facing Health AI","description":"Everywhere else in the company, an AI pilot can reach production in weeks. For our member-facing clinical assistant, it can't, and that single constraint redesigned our entire architecture. This is a field report on building conversational AI in a regulated digital health setting, where \"move fast and break things\" isn't a culture choice. It's a liability. We'll get concrete about what changes when every output has to be clinically safe, auditable, and compliant: PHI is protected by architecture, not policy. Production and non-production are hard-isolated, dashboards are sanitized, and engineers outside the US never touch protected health information. Must-not-fail behavior never lives in a prompt. Emergency escalation and intent routing run as deterministic rules at the top of every conversation turn, before the model is consulted. If you can't afford to get something wrong, you don't leave it to a probabilistic system. Clinical safety is a continuous eval layer. ~30 LLM-as-judge evaluators score clinical accuracy, clinical safety, escalation routing, and recommendation relevance, continuously, not once. Every output is auditable. Each turn, tool call, and reasoning step is traced so outputs can be reviewed and meet regulated reporting obligations. The throughline: in regulated healthcare, compliance constraints aren't a tax you pay around the architecture. They become the architecture. We'll talk about why guardrails-first is the only way to ship member-facing health AI, and why \"painfully slow\" is sometimes exactly right. (This is non-diagnostic, member-facing AI. The talk is about engineering discipline under regulation, not medical claims.) Key takeaways - In regulated health AI, \"move fast\" is the wrong default. Design for deliberate, careful launches. - Must-not-fail behaviors belong in deterministic rules at the top of every turn, never in the prompt. - Protect PHI through architecture: isolate prod from non-prod, sanitize dashboards, restrict access by role and geography. - Make every output auditable. Trace each turn, tool call, and reasoning step so safety is reviewable, not assumed. - Treat clinical safety as a continuous LLM-as-judge layer, not a one-time gate.","day":"Day 4 — Session Day 3","time":"11:10am-11:30am","room":"Track 7","type":"session","track":"AI in Healthcare","status":"tentative","speakers":["Rashi Agrawal"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"11:10am-11:30am","room":"Track 8","type":"session","track":"SemiAnalysis","status":"tentative","speakers":[]},{"title":"Routing LLM Inference in Production: From Engine Signals to Policy","description":"Production LLM apps need more than a fast model: they need an inference routing layer that can choose where each request should run as engines, capacity, latency, and geography cost change. This talk shares a generalized Inference Load Balancer (ILB) proxy/controller architecture. A low-latency proxy applies routing weights and request-path signals, while a controller computes source-cluster-to-engine weights from demand, capacity/performance profiles, replica state, and geography cost. We will cover the practical debugging patterns AI engineers need: reading engine signals, explaining why a request went to one backend instead of another, handling retries and load shedding, and keeping routing behavior observable without exposing OpenAI-specific internals or non-public metrics.","day":"Day 4 — Session Day 3","time":"11:10am-11:30am","room":"Track 9","type":"session","track":"Inference","status":"tentative","speakers":["Qianru Lao"]},{"title":"M2","day":"Day 4 — Session Day 3","time":"11:10am-11:30am","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Main Stage","type":"session","track":"Harness Engineering","status":"tentative","speakers":[]},{"title":"While my guitar gently speaks","description":"Do you ever wonder What the next evolution of live performances will look like? I do all the time. Come experience what happens when you combine live guitar playing with DSP as well as TTS and other models, all running locally. Prepare to be entertained and get familiar with new possibilities that modern AI opens up in the audio and digital signal processing space while you enjoy a live performance on top of an informative slide presentation. Walk away from this talk inspired to help build the next evolution of tools for musicians and live performances. We will touch on how to build with tools such as classic DSP, JUCE, on device TTS, CoreML, WhisperX, CoreMIDI and more! You might even get a chance to have a conversation with a guitar!","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Track 1","type":"session","track":"Generative Media","status":"tentative","speakers":["Todd Fisher"]},{"title":"When AI Agents Pay and Sellers Monetize: Building x402 Apps for Agentic Commerce on AWS","description":"As Agentic AI moves from chat to execution, autonomous agents need a native way to discover, access, and pay for digital services in real time. This session explores how x402 can turn HTTP into a payment-aware interface for machine-to-machine commerce, unlocking crypto-native patterns like programmable access, pay-per-use APIs, and on-demand monetization for data, tools, and services. We’ll show how to build x402-enabled applications and walk through the architecture, the full agentic payments flow, seller monetization strategies, payment verification, and design tradeoffs involved in making agent-driven transactions secure, scalable, and production-ready. Attendees will leave with practical patterns for building apps where AI agents do not just call APIs — they can discover services, evaluate costs, transact autonomously, and enable new revenue models for sellers.","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Track 2","type":"sponsor","track":"Agentic Commerce","status":"tentative","speakers":["Anil Nadiminti"]},{"title":"Claude Opus 4.6 finance blog","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Track 3","type":"session","track":"AI in Finance","status":"tentative","speakers":["TBD — Claude finance speaker"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Track 4","type":"session","track":"Agentic Engineering","status":"hold","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Track 5","type":"sponsor","track":"Graphs","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Track 6","type":"session","track":"AI in GTM","status":"tentative","speakers":[]},{"title":"Building a multi-agent system for dialogue-based clinical care","description":"Deploying LLM-based systems in healthcare requires careful orchestration of safety guardrails, memory architectures that preserve clinical context, and rigorous evaluation, all while meeting strict regulatory, privacy, and safety requirements. In this talk, we share how we are building Phoenix, a dialogue-based AI care specialist that guides patients through their care journey with human oversight. We'll walk through our system design: a multi-agent architecture powered by proprietary foundation models; a memory system managing short-term conversation context and long-term patient knowledge; layered safety guardrails using policy-conditioned models for input/output moderation; decision logic for human escalation; and our complete evaluation lifecycle, from offline automated and human evaluation before release, to online observability and A/B testing in production. By the end of this session, you'll walk away with practical lessons learned building a production-grade conversational AI system for clinical care.","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Track 7","type":"session","track":"AI in Healthcare","status":"tentative","speakers":["Clara Matos"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Track 8","type":"session","track":"SemiAnalysis","status":"tentative","speakers":[]},{"title":"Are LLM Performance Benchmarks Reliable?","description":"Standardizing performance benchmarks for production-grade Large Language Models is currently a significant challenge across the industry. Conflicting data is prevalent, whether originating from server developers like vLLM and SGLang or from various analysts and competitive benchmarks, and these results often fail to hold up under real-world conditions. Our research into these inconsistencies identified several critical factors, including the constraints of single-process tools, specifically the Python Global Interpreter Lock (GIL) and the nuances of model-level settings like temperature. Furthermore, a lack of transparency regarding load generation parameters such as QPS and concurrency, paired with insufficient observability into the benchmarking clients themselves, contributes to these disparate outcomes. In this talk, we share key lessons learned from our benchmarking efforts, examining the primary pitfalls that distort performance data and offering strategies for mitigation. Additionally, we will introduce Inference Perf, an open-source, multi-process utility we developed to provide reliable stress-testing for production stacks. Our goal is to promote standardized, real-world benchmarking practices that allow the community to move beyond unreliable data. Join us to discover how to accurately measure, optimize, and report LLM performance with certainty.","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Track 9","type":"session","track":"Inference","status":"tentative","speakers":["Ashok Chandrasekar"]},{"title":"M3","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"All the Things We Have to Do to Satisfy Your Insatiable Need for Tokens","description":"Every time the industry figures out how to serve tokens faster and cheaper, the appetite grows to match. Models get bigger, contexts get longer, agents start chaining thousands of calls together. The finish line keeps moving. This talk is a technical tour through everything the industry has done to keep up, led by two experts in high-performance inference. We'll start with the optimizations that made hardware work harder without changing the underlying architecture. Then we'll go up a level with techniques that work smarter across requests and across the model itself. And finally, a peek into the future with heterogeneous disaggregated inference, the architectural shift that splits prefill and decode across specialized hardware, and even more advanced forms of hardware specialization coming your way soon. Token demand is about to get a lot more insatiable. Let's see what the future has in store for us!","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Leadership 1","type":"session","track":"Inference","status":"tentative","speakers":["Daniel Kim"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"11:40am-12:00pm","room":"Leadership 2","type":"session","track":"AI Architects: AI Factories","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"12:05pm-12:25pm","room":"Main Stage","type":"session","track":"Harness Engineering","status":"tentative","speakers":[]},{"title":"Reelful: AI-generated Reels from photos and clips","description":"AI-powered mobile app that turns photos and short clips into ready-to-post Instagram Reels and TikToks without timeline editing, manual prompting, or voice recording.","day":"Day 4 — Session Day 3","time":"12:05pm-12:25pm","room":"Track 1","type":"session","track":"Generative Media","status":"tentative","speakers":["Kate Deyneka"]},{"title":"x402 isn’t good (yet)","description":"While everyone understands that agents will get more done with a budget, no one knows which protocol will win agentic payment standard wars: x402, MPP, Skyfire, or another? So far, x402 is the most mature protocol with the largest transaction volume, but even its new \"upto\" payment scheme doesn’t support true usage-based pricing, as it gives agents a chance to consume resources and then skip out on the bill. I’ll walk you through our experience (and pains) implementing agentic payments for a marketplace of 30K+ web Actors, and how we made it work even with the current specs.","day":"Day 4 — Session Day 3","time":"12:05pm-12:25pm","room":"Track 2","type":"sponsor","track":"Agentic Commerce","status":"tentative","speakers":["Jan Curn"]},{"title":"Let's integrate AI Agents in Event-Sourced Systems","description":"Fraud detection has always been a race against time. In traditional event-sourced systems, every transaction, login, or transfer is captured as a sequence of immutable events. These events tell a clear story — but only after the fact. What if events could do more than just record history? What if they could talk back? In this talk, we’ll explore how agentic event-driven systems transform fraud detection. Imagine every PaymentInitiated, LoginAttempt, or DeviceChanged event not just being logged, but immediately consumed by an autonomous Fraud Detection Agent. This agent correlates events across accounts, reasons over historical event streams, and generates new events like SuspiciousActivityFlagged or TransactionHeldForReview. Through a real-world inspired use case in banking and digital payments, we’ll show: - How event sourcing provides the perfect memory layer for fraud detection agents - Patterns for agents to safely inject new domain events without violating invariants - How to avoid runaway feedback loops when multiple agents interact (e.g., fraud + compliance + customer service agents) - Governance, auditing, and explainability challenges when autonomous agents take part in mission-critical workflows By the end of this session, you’ll see how event-driven DDD systems evolve when agents stop being passive consumers and start actively shaping the event stream — turning fraud detection from a reactive process into a proactive, adaptive defense.","day":"Day 4 — Session Day 3","time":"12:05pm-12:25pm","room":"Track 3","type":"session","track":"AI in Finance","status":"tentative","speakers":["Divakar Kumar"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"12:05pm-12:25pm","room":"Track 4","type":"session","track":"Agentic Engineering","status":"hold","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"12:05pm-12:25pm","room":"Track 5","type":"sponsor","track":"Graphs","status":"tentative","speakers":[]},{"title":"200 Million Patient Interactions Later: What the Generic Voice Stack Misses","description":"A healthcare voice agent can be right on the benchmark and still fail in production. Real patients hesitate, interrupt, misremember medications, code-switch mid-sentence, and disclose risk indirectly. After 200M+ patient-agent interactions, the lesson is clear: in clinical voice AI, interaction is a safety variable. This talk breaks down what Hippocratic AI had to rebuild beyond the generic voice stack: not just ASR, VAD, an LLM, TTS, and turn-taking heuristics, but a real-time safety system that treats silence, clarification, escalation, multilingual continuity, and medication-specific recognition as first-class engineering problems. We’ll walk through the production architecture behind Hippocratic AI’s voice agents: a 30+ model supervisor constellation, including the 4.1T-parameter AI Front Door system, designed to catch failures a single primary model misses. The talk covers how specialized models monitor medication identification, overdose risk, labs and vitals, escalation criteria, workflow confirmation, and other clinical safety surfaces while the patient conversation is still happening. We’ll focus on four production lessons: Benchmarks are not enough; Interaction signals become training data; One LLM is not a safety architecture; Voice infrastructure has clinical failure modes.","day":"Day 4 — Session Day 3","time":"12:05pm-12:25pm","room":"Track 7","type":"session","track":"AI in Healthcare","status":"tentative","speakers":["Vivek Raju Muppalla"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"12:05pm-12:25pm","room":"Track 8","type":"session","track":"SemiAnalysis","status":"tentative","speakers":[]},{"title":"Vertical Mobility: Building an AI Inference Platform That Scales from MVP to Trillion-Parameter Workloads","day":"Day 4 — Session Day 3","time":"12:05pm-12:25pm","room":"Track 9","type":"session","track":"Inference","status":"tentative","speakers":["Urvashi Chowdhary"]},{"title":"M4","day":"Day 4 — Session Day 3","time":"12:05pm-12:25pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Loophole - Adversarial Agents To Stress Test Your Morality","description":"Most natural language specifications have holes their authors didn't notice - and writing more rules tends to create more holes. I built Loophole to try a different approach: point adversarial agents at a spec until it stops breaking. You give the system a set of natural language principles. An AI drafts a formal codified version. Two adversarial agents go to work - one finds cases the code permits but the principles forbid, the other finds cases the code forbids but the principles allow. A judge agent patches the code when it can, but only if the fix doesn't contradict any prior ruling. When a contradiction can't be resolved, it escalates to you. Every decision becomes binding precedent, so the constraint space tightens round after round. I started with moral and legal reasoning as the demo, and on its own that's already interesting - it turns into a kind of game where you discover contradictions in your own beliefs that you didn't know were there. But the pattern generalizes well past that. The same loop works for company policies that need to survive contact with edge cases. For making chatbot system prompts adversarially robust. For stress-testing eval rubrics. And, taking the long view, for something like a smarter legislative process - where proposed laws get checked against the public's stated values before they pass, and the contradictions surface before they hit a courtroom. The talk walks through how the harness works, the design choices that matter (especially why precedent is the load-bearing piece), what kinds of specs it handles well, where it breaks, and what it would take to push it further. All code is open source.","day":"Day 4 — Session Day 3","time":"1:30pm-1:50pm","room":"Main Stage","type":"session","track":"Harness Engineering","status":"tentative","speakers":["Brendan Rappazzo"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"1:30pm-1:50pm","room":"Track 1","type":"session","track":"Generative Media","status":"hold","speakers":[]},{"title":"Agent Spending Without Controls: The Missing Infrastructure Layer for AI Pa…","day":"Day 4 — Session Day 3","time":"1:30pm-1:50pm","room":"Track 2","type":"sponsor","track":"Agentic Commerce","status":"tentative","speakers":["Rodrigo Coelho"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"1:30pm-1:50pm","room":"Track 3","type":"session","track":"AI in Finance","status":"hold","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"1:30pm-1:50pm","room":"Track 5","type":"sponsor","track":"Graphs","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"1:30pm-1:50pm","room":"Track 6","type":"session","track":"AI in GTM","status":"tentative","speakers":[]},{"title":"AI Is Becoming the World's Largest Relationship Therapist. We Should Be Worried About That.","description":"Millions of people are now turning to AI for relationship advice and emotional support, often before they'd ever consider a human therapist. Most of the AI Therapy that is available is without clinical oversight, ethical frameworks, or any serious reckoning with what it means to intervene in the most intimate and vulnerable space in a person's life. People are getting hurt. As a couples therapist with 30 years experience, I teamed up with the former CTO at S&P and we created CoupleWork, an AI relationship therapist I essentially trained on three decades of clinical knowledge and every evidence-based modality that exists. Our voice interactive AI, Maxine, is proving this can be done responsibly and very effectively. And what we're learning about the nature of love, connection, and human vulnerability at scale is something this industry needs to hear. I also want to talk about what comes next: the regulatory frameworks that don't yet exist, the liability questions nobody is answering, and why the therapists who should be leading this conversation are almost entirely absent from it.","day":"Day 4 — Session Day 3","time":"1:30pm-1:50pm","room":"Track 7","type":"session","track":"AI in Healthcare","status":"tentative","speakers":["Clay Cockrell"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"1:30pm-1:50pm","room":"Track 8","type":"session","track":"SemiAnalysis","status":"tentative","speakers":[]},{"title":"Ljubisa Bajic — AI silicon for inference","description":"Specialist-silicon angle on inference: Tenstorrent/Taalas perspective.","day":"Day 4 — Session Day 3","time":"1:30pm-1:50pm","room":"Track 9","type":"session","track":"Inference","status":"tentative","speakers":["Ljubisa Bajic"]},{"title":"M5","day":"Day 4 — Session Day 3","time":"1:30pm-1:50pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Which AI startups actually land enterprise contracts? Lessons from evaluating 100+ AI startups at Millennium Management","day":"Day 4 — Session Day 3","time":"1:30pm-1:50pm","room":"Leadership 1","type":"session","track":"AI-Native Enterprises","status":"tentative","speakers":["Brian Lewis"]},{"title":"🎵 Every step you take, every call you make - the reliable agent stack","day":"Day 4 — Session Day 3","time":"1:55pm-2:15pm","room":"Main Stage","type":"session","track":"Harness Engineering","status":"tentative","speakers":["Giselle van Dongen"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"1:55pm-2:15pm","room":"Track 1","type":"session","track":"Generative Media","status":"tentative","speakers":[]},{"title":"Teaching agents to pay","day":"Day 4 — Session Day 3","time":"1:55pm-2:15pm","room":"Track 2","type":"sponsor","track":"Agentic Commerce","status":"tentative","speakers":["Rudy Geronimo"]},{"title":"LangAlpha: Claude Code for Finance","description":"Open-source Apache 2.0 full-stack agent harness for investment research. Persistent sandboxed workspaces, code execution against financial data, a rich UI with charts and live market data, and compounding research memory across sessions. Technical highlights include typed Python module generation from MCP schemas, persistent agent workspaces with memory/file re-reads every call, and injected portfolio/watchlist/risk context.","day":"Day 4 — Session Day 3","time":"1:55pm-2:15pm","room":"Track 3","type":"session","track":"AI in Finance","status":"tentative","speakers":["Zhu Zhi"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"1:55pm-2:15pm","room":"Track 4","type":"session","track":"Agentic Engineering","status":"hold","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"1:55pm-2:15pm","room":"Track 5","type":"sponsor","track":"Graphs","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"1:55pm-2:15pm","room":"Track 6","type":"session","track":"AI in GTM","status":"tentative","speakers":[]},{"title":"Healthcare’s Agent Bytecode: X12 as the Harness for AI Agents","description":"LLMs made old languages newly useful: COBOL for mainframes, Fortran for scientific code, and Rust, SQL, and Prolog as strict substrates for agentic systems. Healthcare has its own old language hiding in plain sight: X12. Before LLMs, X12 was mostly treated as ugly plumbing: loops, delimiters, companion guides, clearinghouse edits, payer-specific quirks, rejections, and acknowledgments. In an agentic workflow, those constraints become the feature. They give stochastic agents a deterministic target. This talk shows how healthcare agents can compile messy operational evidence into X12-shaped workflows: chairside audio into 837D claim narratives, imaging systems into 275/PWK attachment flows, payer portals and phone calls into 270/271 eligibility and 276/277 claim status, preauth evidence into 278 workflows, and EOBs, scanned mail, and bank data into 835/820 payment reconciliation. The core pattern is simple: LLMs reason over ambiguity; X12 provides the syntactic and semantic harness for validation, auditability, acknowledgments, rejections, human review, and high-volume automation. This is not an EDI nostalgia talk. It is a production architecture talk about building reliable agents in one of the messiest enterprise domains.","day":"Day 4 — Session Day 3","time":"1:55pm-2:15pm","room":"Track 7","type":"session","track":"AI in Healthcare","status":"tentative","speakers":["Vasant Kearney"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"1:55pm-2:15pm","room":"Track 8","type":"session","track":"SemiAnalysis","status":"tentative","speakers":[]},{"title":"Gavin Uberti — transformer-only ASICs for inference","description":"Etched's Sohu approach to transformer inference on custom silicon.","day":"Day 4 — Session Day 3","time":"1:55pm-2:15pm","room":"Track 9","type":"session","track":"Inference","status":"tentative","speakers":["Gavin Uberti"]},{"title":"M6","day":"Day 4 — Session Day 3","time":"1:55pm-2:15pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Why your LLM is slow and expensive: lessons learned from running models in production","day":"Day 4 — Session Day 3","time":"1:55pm-2:15pm","room":"Leadership 2","type":"session","track":"AI Architects: AI Factories","status":"tentative","speakers":["Zach Bratun-Glennon"]},{"title":"We let an AI agent execute Bash and lived to talk about it","day":"Day 4 — Session Day 3","time":"2:25pm-2:45pm","room":"Main Stage","type":"session","track":"Harness Engineering","status":"tentative","speakers":["Sarah Sanders"]},{"title":"Generative Video at the Speed of Light","day":"Day 4 — Session Day 3","time":"2:25pm-2:45pm","room":"Track 1","type":"session","track":"Generative Media","status":"tentative","speakers":["Keegan McCallum"]},{"title":"The End of the Static Screen: Architecting Intent-Driven UX with Agentic Orchestration","description":"For 30 years, interfaces were designed ahead: wireframes, fixed flows, pre-built dashboards - because we couldn't make them otherwise. Three shifts changed the constraint: LLMs that reason over business context, agentic frameworks that work at production grade, and composable backends that expose a real tool surface. With all three in place, the interface stops being something you design and ships as the output of an orchestrator composing it per intent. I'll walk through the hypothesis, the architecture we're running in production for enterprise commerce, and a live demo where it all moves.","day":"Day 4 — Session Day 3","time":"2:25pm-2:45pm","room":"Track 2","type":"sponsor","track":"Agentic Commerce","status":"tentative","speakers":["Gus Iwanaga"]},{"title":"Your Finance Agent's Bottleneck Is You","description":"Most \"AI for Finance\" demos look great and almost none survive past pilot. If you've pushed an agent past one workflow, one tenant, or one Workday schema, you know the bottleneck isn't the model - it's the engineer behind the agent, who can't iterate fast enough to keep up with real AP data, real RBAC, and real query volume. What if you built your dev loop with the same primitives you're shipping to the finance team? In this talk, I'll show the subagent + skills + MCP stack - a production multi-agent system over AP, PO, vendor, and multi ERP systems, a LangGraph pattern that survives production, and the three failure modes that kill finance pilots before they ship.","day":"Day 4 — Session Day 3","time":"2:25pm-2:45pm","room":"Track 3","type":"session","track":"AI in Finance","status":"tentative","speakers":["Ramana Siddanth Emani"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"2:25pm-2:45pm","room":"Track 4","type":"session","track":"Agentic Engineering","status":"hold","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"2:25pm-2:45pm","room":"Track 5","type":"sponsor","track":"Graphs","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"2:25pm-2:45pm","room":"Track 6","type":"session","track":"AI in GTM","status":"tentative","speakers":[]},{"title":"Trading Desks to Clinical Trials: Parallels in Applied Vertical AI","day":"Day 4 — Session Day 3","time":"2:25pm-2:45pm","room":"Track 7","type":"session","track":"AI in Healthcare","status":"tentative","speakers":["Ayush Bhardwaj"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"2:25pm-2:45pm","room":"Track 8","type":"session","track":"SemiAnalysis","status":"tentative","speakers":[]},{"title":"KV Cache-Aware Routing and P/D Disaggregation on Kubernetes: The Parts Public Benchmarks Don't Show","day":"Day 4 — Session Day 3","time":"2:25pm-2:45pm","room":"Track 9","type":"session","track":"Inference","status":"tentative","speakers":["Yuchen Fama"]},{"title":"M7","day":"Day 4 — Session Day 3","time":"2:25pm-2:45pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Scaling AI systems: where theory meets constraint","day":"Day 4 — Session Day 3","time":"2:25pm-2:45pm","room":"Leadership 2","type":"session","track":"Inference","status":"tentative","speakers":["Zach Bratun-Glennon","Stephen Balaban"]},{"title":"Agent Frameworks Considered Harmful","day":"Day 4 — Session Day 3","time":"2:50pm-3:10pm","room":"Main Stage","type":"session","track":"Harness Engineering","status":"tentative","speakers":["Remi Louf"]},{"title":"The Next Medium: Why Real-Time Interactive Video Changes Everything for Developers","day":"Day 4 — Session Day 3","time":"2:50pm-3:10pm","room":"Track 1","type":"session","track":"Generative Media","status":"tentative","speakers":["Ahmed Ahres"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"2:50pm-3:10pm","room":"Track 2","type":"sponsor","track":"Agentic Commerce","status":"hold","speakers":[]},{"title":"Autonomous Finance in Retail","day":"Day 4 — Session Day 3","time":"2:50pm-3:10pm","room":"Track 3","type":"session","track":"AI in Finance","status":"tentative","speakers":["Anant Arora"]},{"title":"Braintrust — HOLD for new talk","day":"Day 4 — Session Day 3","time":"2:50pm-3:10pm","room":"Track 4","type":"session","track":"Agentic Engineering","status":"hold","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"2:50pm-3:10pm","room":"Track 5","type":"sponsor","track":"Graphs","status":"tentative","speakers":[]},{"title":"The Death of Developer Advocates","day":"Day 4 — Session Day 3","time":"2:50pm-3:10pm","room":"Track 6","type":"session","track":"AI in GTM","status":"confirmed","speakers":["Graham McBain"]},{"title":"How to build an AI-Native Health Company","day":"Day 4 — Session Day 3","time":"2:50pm-3:10pm","room":"Track 7","type":"session","track":"AI in Healthcare","status":"tentative","speakers":["Dan Feng"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"2:50pm-3:10pm","room":"Track 8","type":"session","track":"SemiAnalysis","status":"tentative","speakers":[]},{"title":"Two Bugs That Hid in Plain Sight: A vLLM Debugging Detective Story","day":"Day 4 — Session Day 3","time":"2:50pm-3:10pm","room":"Track 9","type":"session","track":"Inference","status":"tentative","speakers":["Asaf Gardin"]},{"title":"M8","day":"Day 4 — Session Day 3","time":"2:50pm-3:10pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"We Solved Agent Building - The Evolution of Building A Successful Data Science Agent","day":"Day 4 — Session Day 3","time":"3:20pm-3:40pm","room":"Main Stage","type":"session","track":"Harness Engineering","status":"tentative","speakers":["Andrew Qu"]},{"title":"Beyond Prompts: Building a Multi-Agent Creative Computer That Orchestrates 5+ AI Models in Real-Time","description":"Flik is a production multi-agent system that generates complete work, not pieces. This talk demonstrates how the system orchestrates Claude, Gemini, Nano, Seedance, and Eleven Labs in a single workspace across text, image, video, and audio; shows an end-to-end workflow from prompt to finished output; explains coordination across modalities; and covers built-in likeness/IP safety plus real customer examples.","day":"Day 4 — Session Day 3","time":"3:20pm-3:40pm","room":"Track 1","type":"session","track":"Generative Media","status":"tentative","speakers":["Brennan Erbz"]},{"title":"Building safe payment infrastructure for the autonmous economy","day":"Day 4 — Session Day 3","time":"3:20pm-3:40pm","room":"Track 2","type":"sponsor","track":"Agentic Commerce","status":"tentative","speakers":["Rudy Geronimo"]},{"title":"It's a Skill Issue : Best practices in building skills that work","day":"Day 4 — Session Day 3","time":"3:20pm-3:40pm","room":"Track 3","type":"session","track":"AI in Finance","status":"tentative","speakers":["Yogendra Miraje"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"3:20pm-3:40pm","room":"Track 4","type":"session","track":"Agentic Engineering","status":"hold","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"3:20pm-3:40pm","room":"Track 5","type":"sponsor","track":"Graphs","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"3:20pm-3:40pm","room":"Track 6","type":"session","track":"AI in GTM","status":"tentative","speakers":[]},{"title":"Don't be data poor","description":"What do you do when the data you most need to train and evaluate on is the data you're least allowed to keep? It's a bind for anyone building AI in a high-stakes vertical: the cases that would teach your model the most — the rare, the messy, the sensitive — tend to be the ones wrapped in the tightest constraints. In healthcare it's near-absolute. PHI can't be retained, reused, or transformed, so your long-lived datasets can't contain real patient data at all. Synthetic data is the obvious escape hatch, but it has its own trap: synthetic records tend to look synthetic, and a model that passes on fake-looking data tells you nothing about the real thing. So the bar isn't generating data — it's generating data faithful enough to trust. This talk is how we got there. Ask an LLM for a full case in one shot and you get something generic and averaged-out — models are worse at inventing convincing, specific detail than you'd expect. We present our synthetic generation pipeline (and the process around it) that enabled us to create golden datasets at scale. The pipeline features a coarse-to-fine process that enriches a patients medical history layer by layer, with a human in the loop hooks to steer the narrative at each step. You'll leave with ideas on how to build your own synthetic data generation capabilities and how to build a data pipeline your domain experts actually enjoy owning.","day":"Day 4 — Session Day 3","time":"3:20pm-3:40pm","room":"Track 7","type":"session","track":"AI in Healthcare","status":"tentative","speakers":["Anuj Iravane"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"3:20pm-3:40pm","room":"Track 8","type":"session","track":"SemiAnalysis","status":"tentative","speakers":[]},{"title":"Weight Folding, CUDA Streams, and the Bug That Made My Model Speak Backwards","day":"Day 4 — Session Day 3","time":"3:20pm-3:40pm","room":"Track 9","type":"session","track":"Inference","status":"tentative","speakers":["Filip Makraduli"]},{"title":"M9","day":"Day 4 — Session Day 3","time":"3:20pm-3:40pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"The Intelligence Infrastructure You Should Own","day":"Day 4 — Session Day 3","time":"3:20pm-3:40pm","room":"Leadership 2","type":"session","track":"AI Architects: AI Factories","status":"tentative","speakers":["Zach Lloyd"]},{"title":"Agents Without Code: How Skills, YAML, and Filesystems Replaced Python","description":"Six months ago, building an agent meant writing a Python class with a `while` loop, tool definitions in dicts, manual state management or writing custom python functions. Today, you define an agent in a YAML file, drop a `SKILL.md` into a folder, and deploy. This talk traces the arc from \"Agent in Python\" to \"Agent as filesystem\". You'll learn the same agent built three ways: the hard way (Jan 2025), the simple way (Oct 2025), and the zero-code way (today).","day":"Day 4 — Session Day 3","time":"3:45pm-4:05pm","room":"Main Stage","type":"session","track":"Harness Engineering","status":"tentative","speakers":["Philipp Schmid"]},{"title":"The Next Game Engine Won't Have a Manual","description":"Game development needs to change for the agent era rather than simply dropping an LLM into existing engines. This talk shows the AI systems behind Veselka, using Claude plus Three.js to turn AI into a practical game-development partner and lower the barrier for people who want to build their dream game.","day":"Day 4 — Session Day 3","time":"3:45pm-4:05pm","room":"Track 1","type":"session","track":"Generative Media","status":"tentative","speakers":["Arturo Nereu"]},{"title":"Beyond the Lethal Trifecta: Agentic Commerce on the Open Internet at Machine Speed","description":"For decades, the internet has had protocols for routing, identity, encryption, payments, and commerce between people and organizations. It has never had a native way for autonomous agents to possess authority, accountability, or legal standing. On July 1, 2026 that changes. A little known law will take effect that changes the world as we know it. As AI agents move beyond the enterprise firewall, a new form of commerce is emerging. Agents can already search, negotiate, schedule, purchase, settle payments, and coordinate work across networks. But the moment they begin acting independently on behalf of people, businesses, and online organizations, fundamental questions appear: Who does this agent represent? What authority does it possess? Who is responsible when something goes wrong? How do counterparties know they can trust it? This talk explores the \"Lethal Trifecta\" of agentic systems: access to systems, access to networks, and autonomy. Together they create extraordinary capabilities, but they also expose a missing layer in the architecture of the internet itself. Without identity, accountability, governance, and legal standing, agentic commerce remains trapped inside enterprise walls, limited to productivity gains rather than participation in open markets. On the same day as this conference, a new legal framework takes effect that gives autonomous online organizations a registered legal existence, allowing them to hold assets, enter agreements, govern themselves through software, and operate through fleets of agents. Whether you're building agents, agent platforms, autonomous organizations, payment systems, governance systems, or the next generation of internet infrastructure, this shift has global implications, and you'll be the first to know. We'll examine the emerging trust stack for agentic commerce—identity, authority, governance, settlement, and standing—and explore what happens when agents stop acting merely as tools and begin participating as economic actors on the open internet at machine speed.","day":"Day 4 — Session Day 3","time":"3:45pm-4:05pm","room":"Track 2","type":"sponsor","track":"Agentic Commerce","status":"tentative","speakers":["David Levine"]},{"title":"Wearing the Agent: Engineering a Family-and-Friends Personal Agent, from Group Chats to Glasses","description":"Judith is a personal AI agent that has run in daily production for a year, used by more than a dozen family and friends across WhatsApp group chats, Telegram, and Discord. This talk covers the engineering for a safe multi-tenant personal agent: permissioning, long-lived memory across FAISS + Neo4j + curated notes, scheduled subagents, and message-time guardrails for privacy, recipient safety, and prompt-injection defense. It then shows how the agent moves onto low-cost smart glasses, capturing visual memory, helping with navigation and in-store tasks, and maintaining conversational latency with on-device speech recognition, cloud reasoning, and a custom neural voice. Includes live demos plus practical takeaways on multi-user agent design, durable memory, defensive agent engineering, and wearable ambient interfaces.","day":"Day 4 — Session Day 3","time":"3:45pm-4:05pm","room":"Track 3","type":"session","track":"AI in Finance","status":"tentative","speakers":["Sai Krishna Rallabandi"]},{"title":"Realtime multiplayer, automation, and you!","description":"Now that the models are powerful and the agents are capable, why are we still approaching software development as if it's the same activity that it used to be, but \"faster\"? GitHub Next thinks about what this future wants to be through two lenses: - Automation: intelligence allows us to automate much more than we could with heuristics alone. How should that automation work? What guardrails do we have to put in place so that our CISOs allow us to do that? - Collaboration: agents can understand anything in your codebase, but what about all the facts that are in the heads of your teammates? Whether it's corporate politics or taste, how do we get the humans to leak that context where agents can see it and use it to produce better outcomes? Realtime multiplayer tools have displaced every turn-based tool out there. What should that look like for code? It's not going to be as simple as multiple cursors. Come by to hear more about what GitHub Next is learning about the changing shape of software creation — one that allows us to build better, not merely faster. One that allows us to scale up teams, not only individuals. And one where automations buy us time for craft and polish, not slop. We were promised flying cars, instead we have fifteen terminals. Let's have a nicer future than that.","day":"Day 4 — Session Day 3","time":"3:45pm-4:05pm","room":"Track 4","type":"session","track":"Agentic Engineering","status":"tentative","speakers":["Idan Gazit"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"3:45pm-4:05pm","room":"Track 5","type":"sponsor","track":"Graphs","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"3:45pm-4:05pm","room":"Track 6","type":"session","track":"AI in GTM","status":"tentative","speakers":[]},{"title":"AI Benchmarks for Vertical Industries: Why we're not measuring what we need to and how to unlock real-world ROI","description":"AI is acing the tests we set for it. So why are so many production deployments falling flat? This talk draws on lessons from building Anterior's internal benchmark for real-world healthcare tasks and how to translate real-world performance into concrete measurement rubrics, use imperfect synthetic data, avoid common pitfalls, and apply the approach to any vertical domain.","day":"Day 4 — Session Day 3","time":"3:45pm-4:05pm","room":"Track 7","type":"session","track":"AI in Healthcare","status":"tentative","speakers":["Christopher Lovejoy"]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"3:45pm-4:05pm","room":"Track 8","type":"session","track":"SemiAnalysis","status":"tentative","speakers":[]},{"title":"TBA","day":"Day 4 — Session Day 3","time":"3:45pm-4:05pm","room":"Track 9","type":"session","track":"Inference","status":"hold","speakers":[]},{"title":"M10","day":"Day 4 — Session Day 3","time":"3:45pm-4:05pm","room":"Track M","type":"sponsor","track":"Track M","status":"confirmed","speakers":[]},{"title":"Mind the Gap: Why Your AI Budget Buys You 40% Less Than It Should","day":"Day 4 — Session Day 3","time":"3:45pm-4:05pm","room":"Leadership 2","type":"session","track":"AI Architects: AI Factories","status":"tentative","speakers":["Barak Lenz"]},{"title":"Closing Keynote — Theo Browne","day":"Day 4 — Session Day 3","time":"4:30pm-4:50pm","room":"Main Stage","type":"keynote","track":"Main Stage","status":"confirmed","speakers":["Theo Browne"]},{"title":"Closing Keynote: Garry Tan","day":"Day 4 — Session Day 3","time":"4:50pm-5:10pm","room":"Main Stage","type":"keynote","track":"Main Stage","status":"confirmed","speakers":["Garry Tan"]},{"title":"Startup Battlefield","day":"Day 4 — Session Day 3","time":"5:10pm-5:30pm","room":"Main Stage","type":"keynote","track":"Main Stage","status":"confirmed","speakers":["TBD"]}]}