# AI Engineer World's Fair 2025 Main website: https://ai.engineer/ Basic llms info: https://ai.engineer/llms-full.txt LLms-full.txt including speakers: https://ai.engineer/llms-full.txt ## Overview **June 3–5, 2025 • San Francisco** The AI Engineer World's Fair is the largest technical conference for engineers working in AI today. Returning for its third year, this event is where the leading AI labs, founders, VPs of AI, and engineers gather to share what they're building and what's next. - ~3,000 attendees: Founders, VPs of AI, AI Engineers - ~150 launches and talks from top speakers - ~100 practical workshops and expo sessions - ~50 top DevTools and employers represented in the Expo Organized by the team behind the AI Engineer Summit. **[Buy Tickets](https://ti.to/software-3/ai-engineer-worlds-fair-2025?source={{UTM_SOURCE}}) | [Watch 2023/2024/2025 Talks](https://youtube.com/@aidotengineer) | [Subscribe to Newsletter](https://ai.engineer/newsletter)** ## Schedule June 3: Workshops + exclusive Speaker Dinner June 4: MCP, Tiny Teams, LLM RecSys, GraphRAG, Agent Reliability, Infrastructure, AI PM, Voice, AI in Fortune 500, AI Architects June 5: Reasoning + RL, SWE-Agents, Evals, Retrieval + Search, Security, Generative Media, AI Design, Robotics/Autonomy, AI in Fortune 500 (day 2), AI Architects (Day 2) ### Tuesday, June 3 – Workshop Day + Evening Expo & Reception - Exclusive hands-on workshops across 5 tracks, instructed by industry-leading companies, founders, and engineers. - Topics span all levels of experience and specialties in AI Engineering. - **Evening Welcome Reception** (4:00–7:00pm): Held in the Grand Assembly & Expo Hall. Open to all ticketholders. ### Wednesday & Thursday, June 4–5 – Conference Days - 18 tracks of talks, panels, and demos. - Keynotes from the biggest and most consequential labs and companies. - High-value hallway track and facilitated networking. - Workshops and exclusive access for "Conference + Workshop Pass" holders. ## Tracks ====================================================================== --- Track: AI ARCHITECTS (June 4-5) --- ====================================================================== Session Title: Does AI Actually Boost Developer Productivity? (Stanford / 100k Devs Study) Description: Forget vendor hype: Is AI actually boosting developer productivity, or just shifting bottlenecks? Stop guessing. Our study at Stanford cuts through the noise, analyzing real-world productivity data from nearly 100,000 developers across hundreds of companies. We reveal the hard numbers: while the average productivity boost is significant (~20%), the reality is complex – some teams even see productivity decrease with AI adoption. The crucial insights lie in why this variance occurs. Discover which company types, industries, and tech stacks achieve dramatic gains versus minimal impact (or worse). Leave with the objective, data-driven evidence needed to build a winning AI strategy tailored to your context, not just follow the trend. Speaker: Simon Obstbaum (Researcher, former CTO @ Crunchyroll) Format: Talk ------------------------------------ Session Title: Does AI Actually Boost Developer Productivity? (Stanford / 100k Devs Study) Description: Forget vendor hype: Is AI actually boosting developer productivity, or just shifting bottlenecks? Stop guessing. Our study at Stanford cuts through the noise, analyzing real-world productivity data from nearly 100,000 developers across hundreds of companies. We reveal the hard numbers: while the average productivity boost is significant (~20%), the reality is complex – some teams even see productivity decrease with AI adoption. The crucial insights lie in why this variance occurs. Discover which company types, industries, and tech stacks achieve dramatic gains versus minimal impact (or worse). Leave with the objective, data-driven evidence needed to build a winning AI strategy tailored to your context, not just follow the trend. Speaker: Yegor Denisov-Blanch (Developer Productivity Researcher at Stanford University) Format: Talk ====================================================================== --- Track: AI PRODUCT MANAGEMENT (TBA) --- ====================================================================== Session Title: Everything is ugly so go build something that isn't Description: We're in an awkward adolescent phase of AI product (design). But what if this chaotic moment is actually our greatest opportunity? Enter the rebuilding revolution. In this talk, we'll explore how the current state of AI interfaces offers a once-in-a-career chance to rethink fundamental UX patterns, with practical guidance on avoiding common pitfalls that plague first-generation AI products. Learn how to balance technical constraints with user needs, identify which conventional wisdom to keep versus discard, and ship AI experiences that actually delight users rather than frustrate them. Speaker: Raiza Martin (CEO & Co-Founder Huxe || Previously NotebookLM) Format: Talk ------------------------------------ Session Title: Build AI PMs for PMs to Replace PMs Description: If you’ve ever been blocked by vague specs, shifting goals, or chasing “vibes,” things have only gotten messier in the age of AI. What if the PM were an AI—and it understood the product, the customers, the market, the design, and, most importantly, you? At Reforge, we built AI agents that analyze user feedback at scale, perform real-time market analysis, write aspects, model feature impact, and run continuous user research -- pushing us to rethink what "product work” actually looks like. In this talk, we’ll explore what happens when engineers collaborate with AI PMs instead of humans: evaluation-driven backlogs grounded in real user data, ruthless and precise feature scoping, and product decisions that iterate as fast as the models powering them. You’ll learn the user behaviors and engineering patterns behind feedback analysis, synthetic users, AI-native surveys, and the metrics we use to measure impact before a feature ships—along with the cultural shifts teams need to embrace to make this future a reality. In this new era, the teams who win won’t just adopt AI—they’ll architect workflows where human intuition and machine intelligence ship product side by side. Speaker: Chun Jiang (VP Product ) Format: Talk ------------------------------------ Session Title: Shipping Products When You Don’t Know What they Can Do Description: A customer recently asked me: “Hey, can I tag your AI agent in a Google Doc comment?” The honest answer: I have no idea! We never designed our agents to handle Google Doc comments, but we tried it anyway… and it worked! The agent performed beautifully, the customer was thrilled, and I was left bewildered. Welcome to Product Management for AI agents, where roadmaps are fuzzy and we only learn the boundaries of our products after they’re released. When a product doesn’t follow predefined requirements but instead learns and improvises at runtime, PMs must give up control and lean into uncertainty, curiosity, experimentation, and fast feedback loops. This talk is a field guide for Product/Engineering teams navigating this new reality. We’ll cover how to write specs for affordances instead of features, how to use AI evals as a product development tool, and how to perform User Acceptance Testing on undocumented emergent behavior. Most importantly, we’ll explore how to build trust with customers even when the answer is, truthfully, “I don’t know.” If you’re managing AI-native products in 2025 the same way you managed web apps in 2020, you might find yourself A/B testing an agent that decided to go off and do C, D, and E all by themselves! Speaker: Ben Stein (CEO and Founder) Format: Talk ------------------------------------ Session Title: The Billable Hour is Dead; Long Live the Billable Hour? Description: If software was eating the world before, knowledge work will soon be devoured by AI. In corporate America there are thousands of hours spent on rote tasks every day by employees, consultants, and lawyers alike. But is AI really capable of replacing work in the real world yet? Productivity estimates from GenAI range from 1.5% (NBER) to 96% (☝ us! ️). In this talk we'll share war stories of where the answer is yes (and no) and how we reduced human time spent on tasks from days to minutes in high-impact situations. The path from promise to actual product, used in real world settings, from our experience, is still unmapped. Learn what we built, how we built it - with code - and how we got stakeholder buy-in to deploy it. Speaker: Kevin Madura (Director, Advanced Technologies) Format: Talk ------------------------------------ Session Title: Shipping AI That Works: An Evaluation Framework for PMs Description: GenAI is reshaping the product landscape, creating huge opportunities (along with new expectations) for product managers. Yet while prompt engineering and model tuning get the spotlight, one critical skill can get overlooked: rigorous evaluation. This talk will help PMs move beyond gut-feel “vibe checks” to adopt concrete, repeatable evaluation strategies for LLM-powered products. I'll break down essential eval methodologies, from human feedback and code-based checks to cutting-edge LLM-based evaluations. Drawing on real-world examples, I'll share a practical framework PMs can use to: -Confidently evaluate AI-driven features - Ground decisions in real, repeatable data - Build trust and delight through consistent quality Speaker: Aman Khan (Director of Product) Format: Workshop ------------------------------------ Session Title: From Hunch to Handoff: How AI PMs Can Help Turn Ideas Into Shippable Features Quickly Description: "We should add AI to this!" Great, but how do you know if your idea will actually work? The gap between AI concept and engineering reality is where most promising features die. In this talk, we will reveal a rapid validation framework developed through working with dozens of product teams—including within Workday's AI product efforts. We'll share a three-step process that starts with lightweight prototyping, builds a relevant evaluation suite, and creates the right artifacts for successful engineering handoffs. You'll see how leading teams use this approach to explore what's possible, establish practical quality benchmarks, and align cross-functional stakeholders before writing a single line of production code. Eliza Cabrera (Principal PM, Workday) and Jeremy Silva (Product Lead, Freeplay) will share the playbook they use to turn “we should add AI here” hunches into AI features customers actually use and trust. Attendees will leave with a field‑tested framework, real examples from enterprise teams, and ready‑to‑use templates that let AI PMs guide ideas from first spark to successful release—cheaply, quickly, and with confidence. Speaker: Jeremy Silva (Product Lead ) Format: Talk ------------------------------------ Session Title: Why your product needs an AI product manager, and why it should be you Description: So you've built another cool demo. Now what? You have hype, but not impact. You have kudos but no users. Ultimately you have a demo, but not a product. The unique uncertainty of AI technology demands a new approach – beyond traditional product management. You need an AI Product Manager. This talk explains why this role is essential for building real AI products, using real case studies from the incubator for Artificial Intelligence in the UK Government. More importantly, it reveals why your technical depth makes you uniquely suited to step into this critical leadership gap. Discover why could be the ideal candidate to be the AI Product Manager your product needs, and how to step into that role. Speaker: James Lowe (Head of AI Engineering) Format: Talk ------------------------------------ Session Title: The Billable Hour is Dead; Long Live the Billable Hour? Description: If software was eating the world before, knowledge work will soon be devoured by AI. In corporate America there are thousands of hours spent on rote tasks every day by employees, consultants, and lawyers alike. But is AI really capable of replacing work in the real world yet? Productivity estimates from GenAI range from 1.5% (NBER) to 96% (☝ us! ️). In this talk we'll share war stories of where the answer is yes (and no) and how we reduced human time spent on tasks from days to minutes in high-impact situations. The path from promise to actual product, used in real world settings, from our experience, is still unmapped. Learn what we built, how we built it - with code - and how we got stakeholder buy-in to deploy it. Speaker: Mo Bhasin (Director of AI Products) Format: Talk ====================================================================== --- Track: AI IN ACTION (TBA) --- ====================================================================== Session Title: tba Description: Sarah Guo shares her insights on the evolving landscape of AI investment and innovation. Speaker: Sarah Guo (Founder) Format: Keynote ------------------------------------ Session Title: tba Description: Simon Willison discusses open-source AI tools and how they empower developers and researchers. Speaker: Simon Willison (AI Engineer) Format: Keynote ------------------------------------ Session Title: Ambient Agents Description: Harrison Chase demonstrates the power of LangChain in creating sophisticated applications with Large Language Models. Speaker: Harrison Chase (CEO) Format: Keynote ------------------------------------ Session Title: tba Description: Omar Khattab discusses the principles and applications of DSPy, and his research at Databricks. Speaker: Omar Khattab (Creator of DSPy / Research Scientist) Format: Talk ------------------------------------ Session Title: tba Description: Micah Hill-Smith presents an in-depth analysis of the current AI landscape, highlighting key trends and future projections. Speaker: Micah Hill-Smith (CEO) Format: Talk ------------------------------------ Session Title: AI Engineering with the Google Gemini 2.5 Model Family Description: Hands on Workshop on learning to use Gemini 2.5 Pro in combination with Agentic tooling and MCP Servers. Speaker: Philipp Schmid (AI Developer Experience) Format: Workshop ------------------------------------ Session Title: tba (remote only) Description: Justin Junyang Lin discusses the latest developments and contributions from Alibaba's Qwen team to the open-source AI community. Speaker: Justin Junyang Lin (Core Maintainer) Format: Talk ------------------------------------ Session Title: Prompt Engineering is Dead - Everything is a Spec Description: [!!Subject to change!!] Large models are trained through mountains of data and learned reward functions, yet - quis custodiet ipsos custodes? - what exactly are those amorphous blobs of data and rewards trying to specify? Building LLMs in any domain demands both clarity of thought and the skill to communicate those thoughts precisely - not only to other humans but to the models themselves. Without either, we risk unpleasant surprises. This talk dives into: • Why prompt spaghetti and data gumbo inevitably collapse at scale, unleashing behaviors we never intended - while a rigorously versioned spec keeps safety, personality, and UX firmly aligned, and makes incidents easier and faster to diagnose and fix. • How OpenAI’s public Model Spec provides a clear template, complemented by emerging “dev tools” that turn hazy human intent into precise, human-and-machine-readable policy. • How deliberative alignment training teaches models to first read and reason about the spec, boosting robustness without inflating context windows. • Practical tactics for catching ambiguity, untangling contradictions, and preserving global consistency. Plus, techniques for verifying that deployed models truly follow the contract we crafted. Resources: Model Spec (2025‑04‑11) and Deliberative Alignment, Guan et al., 2024. Speaker: Sean Grove (Member of Technical Staff) Format: Keynote ------------------------------------ Session Title: The Billable Hour is Dead; Long Live the Billable Hour? Description: If software was eating the world before, knowledge work will soon be devoured by AI. In corporate America there are thousands of hours spent on rote tasks every day by employees, consultants, and lawyers alike. But is AI really capable of replacing work in the real world yet? Productivity estimates from GenAI range from 1.5% (NBER) to 96% (☝ us! ️). In this talk we'll share war stories of where the answer is yes (and no) and how we reduced human time spent on tasks from days to minutes in high-impact situations. The path from promise to actual product, used in real world settings, from our experience, is still unmapped. Learn what we built, how we built it - with code - and how we got stakeholder buy-in to deploy it. Speaker: Kevin Madura (Director, Advanced Technologies) Format: Talk ------------------------------------ Session Title: Embeddings ARE NOT All You Need: Understanding Tradeoffs in Multimodal Search Description: Multimodal search demos show impressive capabilities but understanding how to scale these systems while balancing cost and performance is where things get tricky. Through three real-world implementations from wildlife stock footage to breaking news to sports highlights, I'll demonstrate how HNSW-indexed vector embeddings and more traditional caption-based approaches each excel in different domains. You'll see how querying pooled image embeddings struggle with spatiotemporal relationships that simple JSON tagging handles effortlessly. You'll see how traditional computer vision techniques applied as preprocessing steps enable video understanding models to more accurately answer “was the athlete over this line?”. I'll share implementation patterns for these approaches with animated architecture diagrams. We’ll cover why chunking and preprocessing strategies impact both accuracy and index size. By the end of this talk, you'll understand the nuanced technical tradeoffs between embeddings, metadata filtering, and hybrid retrieval for your specific multimodal search challenges. Most importantly, you'll get some real world costs around indexing and continued operation. Speaker: Randall Hunt (CTO at Caylent) Format: Talk ------------------------------------ Session Title: Operationalizing AI Interpretability with Neural Programming Interfaces (NPIs) Description: Mechanistic interpretability is a frontier field that aims to reverse engineer neural networks. At Goodfire, we're operationalizing the latest in interpretability research by building Ember: the universal platform for neural programming. Ember decodes the neurons of an AI model to give direct, programmable access to its internal representations. In this talk, we'll share more about what's unlocked by moving beyond black-box inputs and outputs, including entirely new ways to apply, train, and align AI models. We're excited for a future in which neural programming allows users to discover new knowledge hidden in their model, precisely shape its behaviors, and improve its performance. Speaker: Mark Bissell (Applied Interpretability Research ) Format: Talk ------------------------------------ Session Title: How LLMs work for Web Devs: GPT in 600 lines of Vanilla JS Description: Don't be intimidated. Modern AI can feel like magic, but underneath the hood are principles that web developers can understand, even if you don't have a machine learning background. In this workshop, we'll explore a complete GPT-2 inference implementation built entirely in Vanilla JS. This JavaScript translation of the popular "Spreadsheets-are-all-you-need" approach will let you debug and step through a real LLM line by line without the overhead of learning a new language, framework, or even IDE. All the major LLMs, including ChatGPT, Claude, DeepSeek, and Llama, inherit from GPT-2's architecture, making this exploration a solid foundation to understand modern AI systems and comprehend the latest research. While we won't have time to cover *everything*, you'll gain the essential knowledge to understand the key concepts that matter when building with LLMs, including how they: -Convert raw text into meaningful tokens - Represent semantic meaning through vector embeddings - Train neural networks through gradient descent - Generate text with sampling algorithms like top-k, top-p, and temperature This intense but beginner-friendly workshop is designed specifically for web developers diving into ML and AI for the first time. It’s your "missing AI degree" in just two hours. You'll walk away with an intuitive mental model of how Transformers work that you can apply immediately to your own LLM-powered projects. Speaker: Ishan Anand (AI Consultant and educator) Format: Workshop ------------------------------------ Session Title: Case Study + Deep Dive: Telemedicine Support Agents with LangGraph/MCP Description: Workshop/walkthrough of a Stride/Avila Science partnership to build agentic telemedicine support Speaker: Dan Mason (Principal, Head of AI) Format: Workshop ------------------------------------ Session Title: The Billable Hour is Dead; Long Live the Billable Hour? Description: If software was eating the world before, knowledge work will soon be devoured by AI. In corporate America there are thousands of hours spent on rote tasks every day by employees, consultants, and lawyers alike. But is AI really capable of replacing work in the real world yet? Productivity estimates from GenAI range from 1.5% (NBER) to 96% (☝ us! ️). In this talk we'll share war stories of where the answer is yes (and no) and how we reduced human time spent on tasks from days to minutes in high-impact situations. The path from promise to actual product, used in real world settings, from our experience, is still unmapped. Learn what we built, how we built it - with code - and how we got stakeholder buy-in to deploy it. Speaker: Mo Bhasin (Director of AI Products) Format: Talk ------------------------------------ Session Title: tba Description: Ben Dunphy discusses the programs and initiatives at AI Engineer fostering the next generation of AI talent. Speaker: Ben Dunphy (Cofounder) Format: Talk ------------------------------------ Session Title: tba Description: swyx shares insights from curating Latent Space and his work as an AI Engineer. Speaker: swyx (Curator) Format: Talk ====================================================================== --- Track: AI IN FORTUNE 500 (June 4-5) --- ====================================================================== Session Title: The Billable Hour is Dead; Long Live the Billable Hour? Description: If software was eating the world before, knowledge work will soon be devoured by AI. In corporate America there are thousands of hours spent on rote tasks every day by employees, consultants, and lawyers alike. But is AI really capable of replacing work in the real world yet? Productivity estimates from GenAI range from 1.5% (NBER) to 96% (☝ us! ️). In this talk we'll share war stories of where the answer is yes (and no) and how we reduced human time spent on tasks from days to minutes in high-impact situations. The path from promise to actual product, used in real world settings, from our experience, is still unmapped. Learn what we built, how we built it - with code - and how we got stakeholder buy-in to deploy it. Speaker: Kevin Madura (Director, Advanced Technologies) Format: Talk ------------------------------------ Session Title: Does AI Actually Boost Developer Productivity? (Stanford / 100k Devs Study) Description: Forget vendor hype: Is AI actually boosting developer productivity, or just shifting bottlenecks? Stop guessing. Our study at Stanford cuts through the noise, analyzing real-world productivity data from nearly 100,000 developers across hundreds of companies. We reveal the hard numbers: while the average productivity boost is significant (~20%), the reality is complex – some teams even see productivity decrease with AI adoption. The crucial insights lie in why this variance occurs. Discover which company types, industries, and tech stacks achieve dramatic gains versus minimal impact (or worse). Leave with the objective, data-driven evidence needed to build a winning AI strategy tailored to your context, not just follow the trend. Speaker: Simon Obstbaum (Researcher, former CTO @ Crunchyroll) Format: Talk ------------------------------------ Session Title: Does AI Actually Boost Developer Productivity? (Stanford / 100k Devs Study) Description: Forget vendor hype: Is AI actually boosting developer productivity, or just shifting bottlenecks? Stop guessing. Our study at Stanford cuts through the noise, analyzing real-world productivity data from nearly 100,000 developers across hundreds of companies. We reveal the hard numbers: while the average productivity boost is significant (~20%), the reality is complex – some teams even see productivity decrease with AI adoption. The crucial insights lie in why this variance occurs. Discover which company types, industries, and tech stacks achieve dramatic gains versus minimal impact (or worse). Leave with the objective, data-driven evidence needed to build a winning AI strategy tailored to your context, not just follow the trend. Speaker: Yegor Denisov-Blanch (Developer Productivity Researcher at Stanford University) Format: Talk ------------------------------------ Session Title: The Billable Hour is Dead; Long Live the Billable Hour? Description: If software was eating the world before, knowledge work will soon be devoured by AI. In corporate America there are thousands of hours spent on rote tasks every day by employees, consultants, and lawyers alike. But is AI really capable of replacing work in the real world yet? Productivity estimates from GenAI range from 1.5% (NBER) to 96% (☝ us! ️). In this talk we'll share war stories of where the answer is yes (and no) and how we reduced human time spent on tasks from days to minutes in high-impact situations. The path from promise to actual product, used in real world settings, from our experience, is still unmapped. Learn what we built, how we built it - with code - and how we got stakeholder buy-in to deploy it. Speaker: Mo Bhasin (Director of AI Products) Format: Talk ====================================================================== --- Track: AGENT RELIABILITY (June 4) --- ====================================================================== Session Title: Scaling AI agents without breaking reliability Description: As AI agents move from prototypes to production, developers are running into new challenges with orchestration, failure handling, and infrastructure. This session will unpack lessons from teams already building real-world systems and share how to design for reliability from the start. Speaker: Preeti Somal (SVP Engineering) Format: Talk ------------------------------------ Session Title: Want reliable agents? Generate better (synthetic) data. Description: You don’t need a bigger model. You need better synthetic data. In this talk, we’ll demonstrate how you can build reliable, domain-specific agents, without manually labeling thousands of samples. Using agentic data pipelines, you can finally achieve 9s of accuracy—e.g., 95%+ accuracy on domain-specific tasks like Text-to-SQL. We'll show how, with just a handful of high-quality examples, you can generate, validate, and iteratively expand your dataset. This workflow keeps humans in the loop where they add the most value (spotting inconsistencies, defining edge cases, etc.) while letting LLMs do what they do best (finding patterns, generating diverse examples, enforcing structure, etc.). The result is faster iteration and higher-quality training data—without tedious and expensive manual labeling. Building reliable agents also requires good evaluation data sets. Evaluations are critical for measuring alignment with your training objectives. Again, using agentic pipelines, you can continuously assess model performance, identify failure modes, and generate more synthetic data to further improve your model. With this proven approach, you can train smaller models to be experts in your domain with a small amount of data in a relatively short amount of time. Speaker: Sharon Zhou (CEO) Format: Talk ------------------------------------ Session Title: AI Automation that actually works: $100M impact on messy data with zero surprises Description: We will review the different kinds of automation use-cases, and the approach we used, that will drive over a $100M of expected annual impact by deploying AI for business critical initiatives. We will discuss what kinds of automation initiatives become possible because of Gen AI. These were not tenable before because of the amount of customization required per customer or per scenario, and the kind of data involved in these workflows. Previously, these workflows were driven manually which were both error prone and required expensive training. To replace or augment these manual business critical processes, automation _has_ to cross a very high bar of reliability. We will share how we addressed the inherent non-determinism of Gen AI to create a predictable system that doesn’t have any surprising failure modes. We’ll also discuss how we worked with our existing data that was spread across various systems without an expensive centralisation and clean up effort. Speaker: Tanmai Gopal (CEO, Co-founder) Format: Talk ------------------------------------ Session Title: 12 Factor Agents - Principles of Reliable LLM Applications Description: Hi, I'm Dex. I've been hacking on AI agents for a while. I've tried every agent framework out there, from the plug-and-play crew/langchains to the "minimalist" smolagents of the world to the "production grade" langraph, griptape, etc. I've talked to a lot of really strong founders who are all building really impressive things with AI. Most of them are rolling the stack themselves. I don't see a lot of frameworks in production customer-facing agents. I've been surprised to find that most of the products out there billing themselves as "AI Agents" are not all that agentic. A lot of them are mostly deterministic code, with LLM steps sprinkled in at just the right points to make the experience truly magical. Agents, at least the good ones, don't follow the "here's your prompt, here's a bag of tools, loop until you hit the goal" pattern. Rather, they are comprised of mostly just software. So, I set out to answer: What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers? Speaker: Dexter Horthy (Founder) Format: Talk ------------------------------------ Session Title: Agents vs Workflows: Why Not Both? Description: One current hot debate is should you make your top-level abstraction a ReAct type agent running in a loop? or should you make it a structured workflow graph? OpenAI is launching their new framework and throwing shade on workflow graph approaches TBH we think this whole debate is kinda dumb. We've seen a lot of folks be able to structure the problem in a way that a workflow graph makes a lot of sense. We also see a ton of agents where you need to run the core bit in a loop for a long time. You can also give your agents structured workflow graphs as a tool. You can use structured workflow graphs as a handoff mechanism between agents. What we've seen from the community is frankly that folks need to tinker with multiple approaches and combine primitives in interesting ways We'll share a couple stories where teams ended up with workflow graph based approaches, a couple where teams ended up with agent based approaches, and a couple where a blended approach made sense. Speaker: Sam Bhagwat (Co-founder) Format: Talk ====================================================================== --- Track: AUTONOMY+ROBOTICS (TBA) --- ====================================================================== Session Title: tba Description: Daniel Persczyk presents cutting-edge AI research and its applications from Amazon. Speaker: Danielle Persczyk (Research Scientist) Format: Keynote ------------------------------------ Session Title: Teaching Cars to Think: Language Models and Autonomous Vehicles Description: This session explores Waymo's latest research on the End-to-End Multimodal Model for Autonomous Driving (EMMA) and advanced sensor simulation techniques. Jyh-Jing Hwang will demonstrate how multimodal large language models like Gemini could improve autonomous driving through unified end-to-end architectures that process raw sensor data directly into driving decisions. The presentation will showcase EMMA's state-of-the-art performance in trajectory planning, 3D object detection, and road graph understanding, as well as another Drive&Gen research approach to sensor simulation for evaluating an end-to-end motion planning model. Attendees will gain insights into the benefits of co-training across multiple autonomous driving tasks and the potential of controlled video generation for testing under various environmental conditions. More on EMMA here: https://waymo.com/blog/2024/10/introducing-emma Speaker: Jyh-Jing Hwang (Research Scientist & TLM ) Format: Talk ------------------------------------ Session Title: What Is a Humanoid Foundation Model? An Introduction to GR00T N1 Description: Foundation models don’t just write or draw anymore—they’re starting to move. GR00T N1 is NVIDIA’s open Vision-Language-Action (VLA) foundation model for humanoid robots. Built with a dual-system architecture, it combines a System 2 module for high-level reasoning with a System 1 module for real-time, fluid motor control. It’s trained end-to-end on a an impressive mix of data—from human videos to robot trajectories to synthetic simulations—and deployed on a full-sized humanoid robot performing bimanual manipulation tasks in the real world. This talk is a high-level, beginner-friendly overview of GR00T N1: - What makes a robot foundation model different from an LLM or vision model - How GR00T’s architecture is inspired by cognitive systems - Why grounding language, vision, and action together unlocks new generalist capabilities If you’ve ever wondered how large-scale AI is crossing over into the physical world, this session will get you up to speed—no robotics PhD required. Speaker: Annika Brundyn (GenAI Architect) Format: Talk ------------------------------------ Session Title: What Is a Humanoid Foundation Model? An Introduction to GR00T N1 Description: Foundation models don’t just write or draw anymore—they’re starting to move. GR00T N1 is NVIDIA’s open Vision-Language-Action (VLA) foundation model for humanoid robots. Built with a dual-system architecture, it combines a System 2 module for high-level reasoning with a System 1 module for real-time, fluid motor control. It’s trained end-to-end on a an impressive mix of data—from human videos to robot trajectories to synthetic simulations—and deployed on a full-sized humanoid robot performing bimanual manipulation tasks in the real world. This talk is a high-level, beginner-friendly overview of GR00T N1: - What makes a robot foundation model different from an LLM or vision model - How GR00T’s architecture is inspired by cognitive systems - Why grounding language, vision, and action together unlocks new generalist capabilities If you’ve ever wondered how large-scale AI is crossing over into the physical world, this session will get you up to speed—no robotics PhD required. Speaker: Aastha Jhunjhunwala (Solutions Architect) Format: Talk ------------------------------------ Session Title: Real-time Experiments with an AI Co-Scientist Description: The sheer volume of data and complexity of modern scientific challenges necessitate tools that go beyond mere analysis. The vision of an "AI Co-scientist" – a true collaborative partner in the lab – requires sophisticated engineering to bridge the gap between powerful AI reasoning and the dynamic reality of physical experiments. This talk dives into the engineering required to build robust AI Co-scientists for hands-on research. We will explore scalable architectures, such as multi-agent systems leveraging foundation models like Gemini for complex reasoning, hypothesis refinement (inspired by the "generate, debate, evolve" paradigm described in recent AI Co-scientist research), and intelligent tool use. The core focus will be on the engineering challenges and solutions for integrating diverse, real-time empirical data streams – visual data from cameras, quantitative readings from sensors, positional feedback from actuators, and instrument outputs – directly into the AI's reasoning loop. I will illustrate this with concrete, technically detailed examples in chemistry (adaptive reaction monitoring), robotics (vision-guided assembly with SO Arm 100 and LeRobot library), and synthetic biology (real-time bacterial growth monitoring & interpretation). We'll discuss engineering strategies for handling data heterogeneity, latency, noise, and enabling the AI to interpret, correlate, and act upon live experimental feedback. Finally, we will touch upon how thoughtful engineering of these AI Co-scientists can contribute to democratizing access to advanced scientific capabilities. Speaker: Stefania Druga (AI Research Scientist ) Format: Talk ====================================================================== --- Track: DESIGN ENGINEERING (TBA) --- ====================================================================== Session Title: UX Design Principles for (Semi) Autonomous Multi-Agent Systems Description: Autonomous or semi-autonomous multi-agent systems (MAS) involve exponentially complex configurations (system config, agent configs, task management and delegation, etc.). These present unique interface design challenges for both developer tooling and end-user experiences. In this session, I'll explore UX design principles for multi-agent systems, addressing critical questions: What is the true configuration space for autonomous MAS? How can users arrive at the correct mental model of an MAS's capabilities, if at all? How can we improve trust and safety through techniques like cost-aware action delegation? What makes agent actions observable? How do we enable seamless interruptibility? Attendees will gain actionable insights to create more transparent, trustworthy, and user-centered multi-agent applications, illustrated through real-world implementations in AutoGen Studio - a low code developer tool built on AutoGen (44k stars on GitHub, MIT license) and similar tools. Speaker: Victor Dibia (Principal Research Engineer) Format: Talk ------------------------------------ Session Title: AI and Game Theory: A Case Study on NYT's Connections Description: This session will examine the interplay between human intuition and artificial intelligence in puzzle-solving, using the popular New York Times Connections game as a practical case study. We'll investigate how gameplay can be systematically evaluated through AI algorithms, exploring machine learning strategies such as clustering, semantic mapping, and natural language processing. Attendees will gain insights into building AI-driven puzzle solvers, learn methods for quantitatively assessing gameplay complexity, and discuss the potential impacts of AI on puzzle game design and player engagement. Speaker: Shafik Quoraishee (AI Game Engineer) Format: Talk ------------------------------------ Session Title: AI and Human Whiteboarding Partnership Description: Covid sent everybody home and created the space of virtual whiteboards. At first the experience reused the physical constraints but soon it became better than a physical whiteboard thanks to using virtual native concepts like copy-paste and using keyboard input. The next step in this evolution is to integrate AI into the workflow. We've tried a lot of things with Excalidraw and ended up landing on turning prompt into diagram. Come to the talk to understand how it fits into the workflow and how we implemented it. Speaker: Christopher Chedeau (Frenchy Front-end Engineer) Format: Talk ------------------------------------ Session Title: Good design hasn’t changed with AI Description: Bad designs are still bad. AI doesn’t make it good. The novelty of AI makes the bad things tolerable, for a short time. Building great designs and experiences with AI have the same first principles pre-AI. When people use software, they want it to feel responsive, safe, accessible and delightful. We’ll go over the big and small details that goes into software that people want to use, not forced to use. Speaker: John Pham (Head of Design ) Format: Talk ====================================================================== --- Track: EVALS (June 5) --- ====================================================================== Session Title: tba Description: Ankur Goyal discusses the future of talent and work in the AI era with Braintrust. Speaker: Ankur Goyal (CEO) Format: Talk ------------------------------------ Session Title: tba Description: Sarah Sachs discusses how Notion is leveraging AI to enhance productivity and collaboration tools. Speaker: Sarah Sachs (AI Lead) Format: Talk ------------------------------------ Session Title: Beyond Benchmarks: Strategies for Evaluating LLMs in Production Description: Accuracy scores and leaderboard metrics look impressive—but production-grade AI requires evals that reflect real-world performance, reliability, and user happiness. Traditional benchmarks rarely help you understand how your LLM will perform when embedded in complex workflows or agentic systems. How can you realistically and adequately measure reasoning quality, agent consistency, MCP integration, and user-focused outcomes? In this practical, example-driven talk, we'll go beyond standard benchmarks and dive into tangible evaluation strategies using various open-source frameworks like GuideLLM and lm-eval-harness. You'll see concrete examples of how to create custom eval suites tailored to your use case, integrate human-in-the-loop feedback effectively, and implement agent reliability checks that reflect production conditions. Walk away with actionable insights and best practices for evaluating and improving your LLMs, ensuring they meet real-world expectations—not just leaderboard positions! Speaker: Taylor Jordan Smith (Senior Developer Advocate) Format: Workshop ------------------------------------ Session Title: Embeddings ARE NOT All You Need: Understanding Tradeoffs in Multimodal Search Description: Multimodal search demos show impressive capabilities but understanding how to scale these systems while balancing cost and performance is where things get tricky. Through three real-world implementations from wildlife stock footage to breaking news to sports highlights, I'll demonstrate how HNSW-indexed vector embeddings and more traditional caption-based approaches each excel in different domains. You'll see how querying pooled image embeddings struggle with spatiotemporal relationships that simple JSON tagging handles effortlessly. You'll see how traditional computer vision techniques applied as preprocessing steps enable video understanding models to more accurately answer “was the athlete over this line?”. I'll share implementation patterns for these approaches with animated architecture diagrams. We’ll cover why chunking and preprocessing strategies impact both accuracy and index size. By the end of this talk, you'll understand the nuanced technical tradeoffs between embeddings, metadata filtering, and hybrid retrieval for your specific multimodal search challenges. Most importantly, you'll get some real world costs around indexing and continued operation. Speaker: Randall Hunt (CTO at Caylent) Format: Talk ------------------------------------ Session Title: Testing the Un-Testable: Monitoring AI Products in the Wild Description: Evals are straightforward—like unit tests, they confirm your model got specific test cases right. But in the real world, your AI encounters millions of unpredictable interactions each day. How do you gauge user trust, identify frustrations, and adapt when there's no single "correct" output? Diving through endless logs and manually adding one eval at a time won't cut it. In this session, we'll explore how leading teams are moving beyond static evals; leveraging semantic analytics, LLM teachers, and AI-powered monitoring to deeply understand user experiences at scale—building AI products that don’t just pass tests, but genuinely resonate with users. Speaker: Ben Hylak (Co-Founder) Format: Talk ------------------------------------ Session Title: Benchmarks Are Memes: How What We Measure Shapes AI—and Us Description: Benchmarks shape more than just AI models—they shape our future. The things we choose to measure become self-fulfilling prophecies, guiding AI toward specific abilities and, ultimately, defining humanity’s evolving role in the AI era. Today’s benchmarks have propelled incredible progress, but now we have an exciting opportunity: thoughtfully designing benchmarks around what genuinely matters to us—cooperation, creativity, education, and meaningful human experiences. In this talk, we’ll explore how benchmarks function as powerful cultural memes, influencing not only technical outcomes but societal direction. Drawing on practical examples we have seen at Every consulting in industries like finance, journalism, education, and even personally making AI play diplomacy. We’ll uncover what makes a benchmark impactful, approachable, and inspiring. You’ll see our engaging new AI Diplomacy benchmark demo, illustrating vividly how thoughtful evaluation design can excite both engineers and the wider community. You’ll hopefully walk away inspired and equipped to define benchmarks intentionally, helping steer AI toward outcomes that truly matter. Speaker: Alex Duffy (Head of AI) Format: Talk ------------------------------------ Session Title: How to look at your data; what to look for, how to measure Description: By the end of this talk, you'll understand what it takes to apply clustering techniques and data analysis to understand what is the valuable work that your AI application is doing through analyzing conversation histories and how to create generative evals to benchmark your newly discovered superpowers. Speaker: Jason Liu (Principal) Format: Talk ------------------------------------ Session Title: Scoring models beat LLM judges any day of the week Description: Do you wish your LLM judge was highly accurate, rapid fast, data-tunable, and continuously integrated across your stack? This talk describes a new architecture for AI metrics based on foundation scoring models and a set of integrations that run above them. This architecture was inspired by decades of AI and machine learning development in Google Search, reinvented for the modern LLM stack by our team over the past year. We will share the history of that trajectory and then dive into the technical details: the use of encoder models trained specifically for scoring to enable higher accuracy and lower latency and the deployment of auto-generated tunable metric trees that can combine a wide variety of soft and hard signals across your stack into a human-calibrated score. We’ll end with various examples of how these scoring models are being deployed across the whole stack from online control flows for agents, to reward models for algorithms like RL, to rankers for inference time scaling methods like ensemble generation. Speaker: David Karam (CEO) Format: Workshop ------------------------------------ Session Title: How to look at your data; what to look for, how to measure Description: By the end of this talk, you'll understand what it takes to apply clustering techniques and data analysis to understand what is the valuable work that your AI application is doing through analyzing conversation histories and how to create generative evals to benchmark your newly discovered superpowers. Speaker: Jeff Huber (CEO) Format: Talk ====================================================================== --- Track: GENERATIVE MEDIA (June 5) --- ====================================================================== Session Title: Magic Editor Under the Hood: Weaving Generative AI into a Billion-User App Description: Go behind the scenes of Google Photos' Magic Editor. Explore the engineering feats required to integrate complex CV and cutting-edge generative AI models into a seamless mobile experience. We'll discuss optimizing massive models for latency/size, the crucial interplay with graphics rendering (OpenGL/Halide), and the practicalities of turning research concepts into polished features people actually use. Speaker: Kelvin Ma (Software Engineer ) Format: Talk ------------------------------------ Session Title: General Intelligence is Multimodal Description: Talking about Luma AI, our mission, and how our ML infrastructure enables SOTA multimodal model development Speaker: Keegan McCallum (Head of ML infrastructure at Luma AI) Format: Talk ------------------------------------ Session Title: LLMs Come Alive: Breathing Life into LLMs with Real-Time Animation Description: Creating AI agents that communicate naturally requires more than advanced language models—it demands believable, real-time animation that feels alive. While conversational agents excel at generating text, synchronizing expressive, Disney-level animation with real-time emotional feedback remains largely unexplored. To bridge this gap, we developed a multimodal AI architecture capable of driving real-time character animation directly from conversational context, vision-based emotion detection, and dynamically updated user profiles. In this session, we'll share our technical journey and engineering decisions, covering: Real-Time Neural-driven Animation Pipeline: Translating LLM-generated responses into precise visemes, gestures, and expressions using a transformer-based controller, guided dynamically by lightweight vision models capturing user emotions. Flexible, Provider-Agnostic LLM Integration: Using LangChain orchestration to dynamically switch between local models, AWS Bedrock, OpenAI APIs, or private deployments—carefully balancing latency, capability, and cost trade-offs. Hybrid Memory & User Profile Engine: Architecting a GDPR-compliant user profile system combining structured (SQL) and unstructured (NoSQL) data, gathering user interactions and preferences for sub-10 ms personalization lookups that dynamically influence conversations and animation. Scalable, Secure Serverless Infrastructure: Docker-based deployment on AWS ECS with OAuth-secured REST APIs, optimized for auto-scaling to seamlessly handle thousands of concurrent interactions. We'll present practical benchmarks, actionable heuristics ("async memory prefetch," "prompt-tuning vs. LoRA for personalization"), and lessons learned. Attendees will walk away with a playbook for adapting open models into hyper‑personalized, scalable roleplay experiences. Speaker: Colin Brady (Chief Creative and Technology Officer, Member, Academy of Motion Picture Arts and Sciences) Format: Talk ------------------------------------ Session Title: The State of Generative Media Today Description: Generative AI is reshaping the creative landscape, enabling the production of images, audio, and video with unprecedented speed and sophistication. This session offers an in-depth exploration of the current state of generative media, highlighting cutting-edge models, platforms, and tools that are transforming the industry. Speaker: Gorkem Yurtseven (CTO ) Format: Talk ------------------------------------ Session Title: LLMs Come Alive: Breathing Life into LLMs with Real-Time Animation Description: Creating AI agents that communicate naturally requires more than advanced language models—it demands believable, real-time animation that feels alive. While conversational agents excel at generating text, synchronizing expressive, Disney-level animation with real-time emotional feedback remains largely unexplored. To bridge this gap, we developed a multimodal AI architecture capable of driving real-time character animation directly from conversational context, vision-based emotion detection, and dynamically updated user profiles. In this session, we'll share our technical journey and engineering decisions, covering: Real-Time Neural-driven Animation Pipeline: Translating LLM-generated responses into precise visemes, gestures, and expressions using a transformer-based controller, guided dynamically by lightweight vision models capturing user emotions. Flexible, Provider-Agnostic LLM Integration: Using LangChain orchestration to dynamically switch between local models, AWS Bedrock, OpenAI APIs, or private deployments—carefully balancing latency, capability, and cost trade-offs. Hybrid Memory & User Profile Engine: Architecting a GDPR-compliant user profile system combining structured (SQL) and unstructured (NoSQL) data, gathering user interactions and preferences for sub-10 ms personalization lookups that dynamically influence conversations and animation. Scalable, Secure Serverless Infrastructure: Docker-based deployment on AWS ECS with OAuth-secured REST APIs, optimized for auto-scaling to seamlessly handle thousands of concurrent interactions. We'll present practical benchmarks, actionable heuristics ("async memory prefetch," "prompt-tuning vs. LoRA for personalization"), and lessons learned. Attendees will walk away with a playbook for adapting open models into hyper‑personalized, scalable roleplay experiences. Speaker: Tejas Rajurkar (AI Engineer Lead, AMGI Studios) Format: Talk ====================================================================== --- Track: GRAPHRAG (June 4) --- ====================================================================== Session Title: HybridRAG: A Fusion of Graph and Vector Retrieval to Enhance Data Interpretation Description: Interpreting complex information from unstructured text data poses significant challenges to Large Language Models (LLM), with difficulties often arising from specialized terminology and the multifaceted relationships between entities in document architectures. Conventional Retrieval Augmented Generation (RAG) methods face limitations in capturing these nuanced interactions, leading to suboptimal performance. In our talk, we introduce a novel approach integrating Knowledge Graph-based RAG (GraphRAG) with VectorRAG, designed to refine question-answering (Q&A) systems for more effective information extraction from complex texts. Our approach employs a dual retrieval strategy that harnesses both knowledge graphs and vector databases, enabling the generation of precise and contextually appropriate answers, thereby setting a new standard for LLMs in processing sophisticated data. Speaker: Mitesh Patel (Developer Advocate Manager) Format: Talk ------------------------------------ Session Title: Practical GraphRAG - Making LLMs smarter with Knowledge Graphs Description: RAG has become one standard architecture component for GenAI applications to address hallucinations and integrate factual knowledge. While vector search over text is common, knowledge graphs represent a proven advancement by leveraging advanced RAG patterns to access and integrate interconnected factual information, complementing the language skills of LLMs. This talk explores GraphRAG challenges, implementation patterns, and real-world agentic examples with Google's ADK, demonstrating how this approach delivers more trustworthy and explainable GenAI solutions with enhanced reasoning capabilities. Speaker: Jesús Barrasa (AI Field CTO) Format: Talk ------------------------------------ Session Title: Embeddings ARE NOT All You Need: Understanding Tradeoffs in Multimodal Search Description: Multimodal search demos show impressive capabilities but understanding how to scale these systems while balancing cost and performance is where things get tricky. Through three real-world implementations from wildlife stock footage to breaking news to sports highlights, I'll demonstrate how HNSW-indexed vector embeddings and more traditional caption-based approaches each excel in different domains. You'll see how querying pooled image embeddings struggle with spatiotemporal relationships that simple JSON tagging handles effortlessly. You'll see how traditional computer vision techniques applied as preprocessing steps enable video understanding models to more accurately answer “was the athlete over this line?”. I'll share implementation patterns for these approaches with animated architecture diagrams. We’ll cover why chunking and preprocessing strategies impact both accuracy and index size. By the end of this talk, you'll understand the nuanced technical tradeoffs between embeddings, metadata filtering, and hybrid retrieval for your specific multimodal search challenges. Most importantly, you'll get some real world costs around indexing and continued operation. Speaker: Randall Hunt (CTO at Caylent) Format: Talk ------------------------------------ Session Title: Leveraging Multi-Agent AI and Network Knowledge Graphs for Change Management and Network Testing Description: Traditional ticketing and testing workflows for change management and network operations often operate independently and lack critical real-world context and adaptive decision-making capabilities. This fragmented approach results in delayed resolutions, repeated incidents, escalations, and dissatisfied stakeholders. This session explores an innovative solution leveraging the synergy of natural language processing from IT Service Management (ITSM) systems, sophisticated Multi-agent reasoning, and dynamic context derived from live knowledge network graphs. Attendees will gain insights into an end-to-end architecture where natural language intents from ITSM tickets seamlessly integrate with AI agents specifically trained for complex workflow tasks, supported by continuous network knowledge-graph ingestion pipelines. Through a detailed production case study, we will demonstrate how AI-powered reasoning combined with dynamic network knowledge graph contexts significantly improves critical validation and workflow interactions. The showcased results will highlight dramatic improvements in ticket resolution efficiency, accuracy of network testing, and overall execution quality, delivering tangible value to both technical teams and business stakeholders. Speaker: Ola Mabadeje (Product Leader) Format: Talk ------------------------------------ Session Title: When Vectors Break Down: Graph-Based RAG for Dense Enterprise Knowledge Description: Enterprise knowledge bases are filled with "dense mapping," thousands of documents where similar terms appear repeatedly, causing traditional vector retrieval to return the wrong version or irrelevant information. When our customers kept hitting this wall with their RAG systems, we knew we needed a fundamentally different approach. In this talk, I'll share Writer's journey developing a graph-based RAG architecture that achieved 86.31% accuracy on the RobustQA benchmark while maintaining sub-second response times, significantly outperforming vector approaches. I'll survey the key techniques behind this performance leap and why graph-based approaches excel with complex enterprise information structures like product documentation, financial documents, and technical specifications that challenge traditional RAG systems. You'll learn about using specialized LLMs to build semantic relationships, how compression techniques efficiently handle concentrated enterprise data patterns, and how infusing key data points in the memory layer of the LLM lowers hallucination. The presentation will provide practical insights into identifying when graph-based approaches make sense for your organization's specific data challenges, helping you make informed architectural decisions for your next enterprise RAG system. Speaker: Sam Julien (Director of Developer Relations ) Format: Talk ------------------------------------ Session Title: Wisdom Discovery at Scale: Code Less KAG with n8n MultiAI Agents Description: "Wisdom Discovery at Scale: Code Less KAG with n8n MultiAI Agents" Speaker: Chin Keong Lam (AI Engineer & Co-Founder ) Format: Talk ------------------------------------ Session Title: Practical GraphRAG - Making LLMs smarter with Knowledge Graphs Description: RAG has become one standard architecture component for GenAI applications to address hallucinations and integrate factual knowledge. While vector search over text is common, knowledge graphs represent a proven advancement by leveraging advanced RAG patterns to access and integrate interconnected factual information, complementing the language skills of LLMs. This talk explores GraphRAG challenges, implementation patterns, and real-world agentic examples with Google's ADK, demonstrating how this approach delivers more trustworthy and explainable GenAI solutions with enhanced reasoning capabilities. Speaker: Stephen Chin (VP of Developer Relations) Format: Talk ------------------------------------ Session Title: Practical GraphRAG - Making LLMs smarter with Knowledge Graphs Description: RAG has become one standard architecture component for GenAI applications to address hallucinations and integrate factual knowledge. While vector search over text is common, knowledge graphs represent a proven advancement by leveraging advanced RAG patterns to access and integrate interconnected factual information, complementing the language skills of LLMs. This talk explores GraphRAG challenges, implementation patterns, and real-world agentic examples with Google's ADK, demonstrating how this approach delivers more trustworthy and explainable GenAI solutions with enhanced reasoning capabilities. Speaker: Michael Hunger (VP of Product Innovation) Format: Talk ------------------------------------ Session Title: Agentic Insights through Graph Analytics Description: Advanced GraphRAG techniques apply graph ML and algorithms, wrapped into a tidy agent. Speaker: Andreas Kollegger (GenAI Lead for Developer Relations) Format: Workshop ------------------------------------ Session Title: Beyond Documents: Implementing Knowledge Graphs in Legal Agents Description: Structured Representations are pretty important in the law, where the relationships between clauses, documents, entities, and multiple parties matter. Structured Representation means Structured Context Injection. Better Context, Less Hallucinations. We walk through a couple of case studies of systems that we’ve built in production for legal use-cases - from recursive contractual clause retrieval, to HITL legal reasoning news agents. You'll gain insights into how structured representations significantly improve the effectiveness and reliability of legal agents. Speaker: Tom Smoker (Technical Founder ) ====================================================================== --- Track: INFRASTRUCTURE (June 4) --- ====================================================================== Session Title: The infrastructure for the singularity Description: We're at an inflection point where AI agents are transitioning from experimental tools to practical coworkers. This new world will demand new infrastructure for RL training, test-time scaling, and deployment. This is why Morph Labs developed Infinibranch last year, and we are excited to finally unveil what's next. Speaker: Jesse Han (Founder) Format: Keynote ------------------------------------ Session Title: Containing Agent Chaos Description: AI agents promise breakthroughs but often deliver operational chaos. Building reliable, deployable systems with unpredictable LLMs feels like wrestling fog – testing outputs alone is insufficient when the underlying workflow is opaque and flaky. How do we move beyond fragile prototypes? This talk, from the creator of Docker, argues the solution lies *outside* the model: engineering **reproducible execution workflows** built on rigorous architectural discipline. Learn how **containerization**, applied not just to deployment but to *each individual step* of an agent's workflow, provides the essential **isolation and environmental consistency** needed. Discover how combining this granular container approach with patterns like immutable state management allows us to **contain agent chaos**, unlock effective testing, simplify debugging, and bring essential control and predictability back to building powerful AI agents you can actually ship with confidence. Speaker: Solomon Hykes (CEO || Creator of Docker) Format: Keynote ------------------------------------ Session Title: The Web Browser Is All You Need Description: (PLACEHOLDER) With the rise of MCP servers, A2A, and our trusty friend, OpenAPI, it turns our the web browser may be the default MCP server for the rest of the internet. In this talk, we'll walk through how a web browsing tool is probably the only tool you'll need to enable production AI Agents. Speaker: Paul Klein IV (Founder) Format: Talk ------------------------------------ Session Title: AX is the only Experience that Matters Description: If you’re building devtools for humans, you’re building for the past. Already a quarter of Y Combinator’s latest batch used AI to write 95% or more of their code. AI agents are scaling at an exponential rate and soon, they’ll outnumber human developers by orders of magnitude. The real bottleneck isn’t intelligence. It’s tooling. Terminals, local machines, and dashboards weren’t built for agents. They make do… until they can’t. In this talk, I’ll share how we killed the CLI at Daytona, rebuilt our infrastructure from first principles, and what it takes to build devtools that agents can actually use. Because in an agent-native future, if agents can’t use your tool, no one will. Speaker: Ivan Burazin (CEO) Format: Talk ------------------------------------ Session Title: Flipping the Inference Stack: Why GPUs Bottleneck Real-Time AI at Scale Description: AI inference today is stuck in a loop: throw more GPUs at the problem, scale horizontally, rinse and repeat. But that playbook is hitting a wall. Latency, cost, and energy grids are all suffering, and the capability for real-time AI at scale looks further and further away. In this talk, AI hardware expert and founder Gavin Uberti will break down why the current approach to inference is masking deep inefficiencies, and how rethinking the hardware stack from the ground up (starting with inference-first chips) is the only way to unlock real-time AI at scale. Speaker: Gavin Uberti (CEO) Format: Talk ------------------------------------ Session Title: Embeddings ARE NOT All You Need: Understanding Tradeoffs in Multimodal Search Description: Multimodal search demos show impressive capabilities but understanding how to scale these systems while balancing cost and performance is where things get tricky. Through three real-world implementations from wildlife stock footage to breaking news to sports highlights, I'll demonstrate how HNSW-indexed vector embeddings and more traditional caption-based approaches each excel in different domains. You'll see how querying pooled image embeddings struggle with spatiotemporal relationships that simple JSON tagging handles effortlessly. You'll see how traditional computer vision techniques applied as preprocessing steps enable video understanding models to more accurately answer “was the athlete over this line?”. I'll share implementation patterns for these approaches with animated architecture diagrams. We’ll cover why chunking and preprocessing strategies impact both accuracy and index size. By the end of this talk, you'll understand the nuanced technical tradeoffs between embeddings, metadata filtering, and hybrid retrieval for your specific multimodal search challenges. Most importantly, you'll get some real world costs around indexing and continued operation. Speaker: Randall Hunt (CTO at Caylent) Format: Talk ------------------------------------ Session Title: Hacking the Inference Pareto Frontier for Cheaper and Faster Tokens Without Breaking SLAs Description: Your model works! It aces the evals! It even passes the vibe check! All that’s required is inference, right? Oops, you’ve just stepped into a minefield: -Not low-latency enough? Choppy experience. Users churn from your app. -Not cheap enough? You’re losing money on every query. -Not high enough output quality? Your system can’t be used for that application. A model and the inference system around it form a “token factory” associated with a Pareto frontier— a curve representing the best possible trade-offs between cost, throughput, latency and quality, outside of which your LLM system cannot be applied successfully. Outside of the Pareto frontier? You’re back to square one. That is, unless you’re able to change the shape of the Pareto frontier. In this session, we’ll introduce NVIDIA Dynamo, a datacenter-scale distributed inference framework as well as the bleeding-edge techniques it enables to hack the Pareto frontier of your inference systems, including: -Disaggregation - separating phases of LLM generation to make them more efficient -Speculation - predicting multiple tokens per cycle -KV routing, storage, and manipulation - ensuring that we don’t redo work that has already been done -Pipelining improvements for agents - accelerating our workflows using information about the agent By the end of the talk, we’ll understand how the Pareto frontier limits where models can be applied, the intuition behind how inference techniques can be used to modify it, as well as the mechanics of how these techniques work. Speaker: Kyle Kranen US (Engineering Manager - Deep Learning Algorithms ) Format: Talk ------------------------------------ Session Title: Introduction to LLM serving with SGLang Description: Do you want to learn how to serve models like DeepSeek and Qwen with SOTA speeds on launch day? SGLang is an open-source fast serving framework for LLMs and VLMs that generates trillions of tokens per day at companies like xAI, AMD, and Meituan. This workshop guides AI engineers who are familiar with serving models using frameworks like vLLM, Ollama, and TensorRT-LLM through deploying and optimizing their first model with SGLang, as well as providing guidance on when SGLang is the appropriate tool for LLM workloads. Speaker: Yineng Zhang (Inference lead at SGLang) Format: Workshop ------------------------------------ Session Title: Geopolitics of AI Infrastructure Description: As AI reshapes the global balance of power, the infrastructure behind it—chips, data centers, power, and supply chains—has become a new arena for geopolitical competition. This talk explores how nations are racing to secure critical AI hardware, control compute capacity, and assert influence over the technologies and talent that define the future. Speaker: Dylan Patel (Founder, CEO, Chief Analyst - SemiAnalysis) Format: Talk ------------------------------------ Session Title: Introduction to LLM serving with SGLang Description: Do you want to learn how to serve models like DeepSeek and Qwen with SOTA speeds on launch day? SGLang is an open-source fast serving framework for LLMs and VLMs that generates trillions of tokens per day at companies like xAI, AMD, and Meituan. This workshop guides AI engineers who are familiar with serving models using frameworks like vLLM, Ollama, and TensorRT-LLM through deploying and optimizing their first model with SGLang, as well as providing guidance on when SGLang is the appropriate tool for LLM workloads. Speaker: Philip Kiely (Head of Developer Relations) Format: Workshop ------------------------------------ Session Title: Building Hyperbolic: The On-Demand AI Cloud for GPUs, Inference, and AI Services Description: AI moves fast. Legacy cloud can’t keep up. This session breaks down how Hyperbolic is redefining what developers should expect from AI infrastructure. We’ll cover how to instantly spin up low-cost GPUs, serve cutting-edge models with serverless inference, and deploy AI services at scale without the DevOps, rate limits, or pricing surprises. Whether you're training, fine-tuning, or just shipping fast, this is the new standard for building with AI. Speaker: Dr. Jasper Zhang, PhD (CEO) Format: Talk ====================================================================== --- Track: LEADERSHIP: ARCHITECTS (TBA) --- ====================================================================== Session Title: The AI Engineer’s Guide to Raising VC Description: A no fluff, all tactics discussion. More AI engineers should build startups, the world needs more software. But there’s a way to raise VC and it’s hard to do it if you’ve never seen it done. We are going to walk through the exact playbook to raise your first round of funding. We will show you real pitch decks, real cold emails and real term sheets so when you go out to raise your first round of funding, you are setup to do it. Every AI Engineer should be equip to start their own company and this session makes sure raising $$$ is not going to be the blocker. Speaker: Dani Grant (CEO) Format: Talk ------------------------------------ Session Title: The AI Engineer’s Guide to Raising VC Description: A no fluff, all tactics discussion. More AI engineers should build startups, the world needs more software. But there’s a way to raise VC and it’s hard to do it if you’ve never seen it done. We are going to walk through the exact playbook to raise your first round of funding. We will show you real pitch decks, real cold emails and real term sheets so when you go out to raise your first round of funding, you are setup to do it. Every AI Engineer should be equip to start their own company and this session makes sure raising $$$ is not going to be the blocker. Speaker: Chelcie Taylor (Investor ) Format: Talk ------------------------------------ Session Title: From Hype to Habit: How We’re Building an AI-First SaaS Company—While Still Shipping the Roadmap Description: What does it really take to move a modern SaaS company from AI experimentation to becoming truly AI-first? At Sprout Social, we’re in the midst of that transformation—rearchitecting strategy, systems, teams, and incentives to put AI at the heart of how we think, build, and deliver value. This is a story in motion: a behind-the-scenes look at how we’re evolving from isolated AI feature experiments to an AI-native operating model. I’ll share what we’re learning as we navigate the innovation dilemma—integrating disruptive AI capabilities without breaking what already works or our roadmap. That includes rethinking how we define success, how we hire, reward, grow talent, and how we handle legal and ethical complexity without slowing down. We’ll explore the real-world tensions between rapid innovation, value delivery, making progress on Responsible AI, all while elevating internal AI fluency, and engaging with the broader AI ecosystem to stay at the edge. This isn’t a playbook from the finish line—it’s a candid reflection from deep inside the journey. My goal is to help other leaders chart their own AI path with greater clarity, confidence, and care. Speaker: Rossella Blatt Vital (VP of Engineering - AI) Format: Talk ------------------------------------ Session Title: Building Applications with AI Agents Description: Generative AI has dramatically shortened the distance between ideas and implementation, enabling faster prototyping and deployment than ever before. But while language models can streamline individual tasks, true transformation comes from combining these capabilities into intelligent, autonomous systems—AI agents. This talk explores how to build and deploy foundation model-enabled agent systems that go beyond simple prompt chaining or chatbots. Drawing from real-world implementations and the latest research, it offers a clear and practical path to designing both single-agent and multi-agent systems capable of handling complex workflows with minimal oversight. Attendees will gain a deeper understanding of the core design principles behind agentic systems, the architectural trade-offs involved in orchestrating multiple agents, and the strategies required to develop tailored solutions that enhance efficiency and innovation. Whether just beginning or scaling up, participants will leave with actionable insights to navigate the rapidly evolving world of AI autonomy. Speaker: Michael Albada (Principal Applied Scientist) Format: Talk ------------------------------------ Session Title: Structuring a modern AI team Description: You've been given an AI mandate but don't have additional headcount, what next? Re-skilling, up-skilling and team augmentation become essential to delivering on a new mandate. In this talk we'll cover strategies to structure cross functional AI teams with domain experts, software engineers and ML engineers. We'll cover key skills and milestones that each traditional role can contribute to in unique ways. Speaker: Denys Linkov (Head of ML) Format: Talk ------------------------------------ Session Title: The Evolution of Software Development with AI Description: Artificial Intelligence is a rapidly advancing technology revolutionizing the world, especially in software creation. As AI advances, it handles more intricate tasks, significantly reshaping how we write software. The journey through the evolution of programming with AI can be categorized into seven distinct stages. Each stage represents a unique milestone in how AI integrates with and enhances software development. Speaker: Brett Kotch (AI For Technology) Format: Talk ------------------------------------ Session Title: AI That Pays: Lessons from Revenue Cycle Description: While much of the AI innovation in healthcare has centered on clinical and patient-facing applications, Revenue Cycle Management (RCM) remains an underexplored yet critical domain. Given the growing financial pressures facing providers, rethinking how healthcare gets paid is essential to ensuring access and sustainability. The combination of which makes RCM an opportune area for AI disruption. This session explores how the combination of vast structured and unstructured data, often rule-based workflows, and direct financial opportunity to drive meaningful outcomes. We’ll also share practical lessons from our journey evolving a traditional machine learning mindset to incorporate the latest advances in Generative AI, and how that shift is reshaping what's possible in healthcare operations. Speaker: Nathan Wan (Head of AI) Format: Talk ------------------------------------ Session Title: Monetizing AI: From Zero to Profit Description: As AI continues to transform industries, companies are faced with the critical challenge of effectively monetizing AI-driven products in a way that captures value, ensures customer adoption, and scales revenue sustainably. Unlike traditional SaaS models, AI-powered products have unique complexities - such as fluctuating usage patterns, variable compute costs, and evolving customer demands, making conventional pricing strategies unhelpful to the growth of an AI product-led startup. In this session, Alvaro Morales, CEO and co-founder of Orb, will explore why the often overlooked monetization aspect of AI is critical for businesses. He’ll share real-world examples and data to demonstrate how adaptive pricing models can drive cost savings, enhance customer experience, and reduce operational bottlenecks. Alvaro will lead a live demo, showcasing how engineers can simulate AI pricing strategies and subsequently integrate them with a simple plug-and-play solution. He’ll also share how real-world revenue simulations enable companies to test and refine pricing before implementing — reducing risk, boosting adoption, and unlocking new revenue streams. As a quick example, cloud software development platform Replit was looking to adopt a usage-based pricing model for a new product, but their existing billing system couldn't support the new model, and building a new billing system would delay the launch timeline. In order to get things done, they turned to Orb, which enabled them to make pricing changes up to the last minute. After the launch, Orb became the single source of truth for both Replit and its customers - providing usage alerts to notify Replit when users hit cost thresholds and provide insights into user spend and payment methods. Key takeaways: The challenge of AI monetization – Why traditional subscription-based SaaS pricing models don’t work for AI-powered products. Precision pricing – Exploring how usage-based, tiered, and hybrid pricing models can maximize revenue potential. Revenue simulation for AI pricing – Leveraging real-time data to test, adjust and optimize pricing strategies. Avoiding common pricing pitfalls – Identifying mistakes that can lead to revenue leakage and customer churn. This session is designed for AI executives, product leaders, and engineering teams looking for actionable strategies to build adaptive, scalable pricing models that drive long-term growth and profitability. Speaker: Alvaro Morales (CEO) Format: Talk ====================================================================== --- Track: LEADERSHIP: FORTUNE 500 (TBA) --- ====================================================================== Session Title: tba Description: Clay Bavor explores the future of human-computer interaction powered by advanced AI. Speaker: Clay Bavor (Co-founder) Format: Keynote ------------------------------------ Session Title: From Copilot to Colleague: Building Trustworthy Productivity Agents for High-Stakes Work Description: This keynote will explore what it takes to move from basic generative assistants to fully agentic AI—systems that don’t just suggest but plan, act, and adapt—all within the structured, high-trust environments where professionals actually work. Speaker: Joel Hron (CTO) Format: Talk ------------------------------------ Session Title: tba Description: Ben Kus discusses how Box is leveraging AI to transform content management and collaboration. Speaker: Ben Kus (CTO) Format: Talk ------------------------------------ Session Title: How agents will unlock the $500B promise of AI Description: AI agents are on the cusp of revolutionizing work as we know it. The number of use cases software can tackle is set to explode as AI handles tasks requiring real judgment. But to cross the gap between an interesting AI prototype and an essential business tool, you need agents built by developers with real guardrails and security. This means blending AI assistance with traditional coding in a multimodal approach that maximizes efficiency and control. The future isn't about dropping in an LLM — it requires integrating any model, any data, any system to deliver results. Companies utilizing this approach can finally turn their slice of the $500B+ of total AI investment into real business results. Speaker: David Hsu (CEO) Format: Talk ------------------------------------ Session Title: CIOs and Industry Leaders: Do You Trust Your AI’s Inferences? Description: Enterprise AI adoption is accelerating, but with it comes a hard question: Do we trust the model’s decisions? In this 18-minute talk, I’ll explore the invisible risks behind automated decision-making in safety-critical and revenue-sensitive environments. Drawing on case studies across manufacturing, telecom, and industrial IoT, I’ll highlight how explainability, traceability, and robust guardrails drive adoption and protect enterprise value. Attendees will walk away with: • A 3-step framework for operationalizing AI trust • Real-world lessons from building guardrails in on-prem and hybrid systems • Tools and techniques for debugging and explaining inferences at scale • A blueprint for building trust between models, engineers, and executive stakeholders Speaker: Hariharan Ganesan (Sr. Solutions Architect) Format: Talk ------------------------------------ Session Title: How Intuit uses LLMs to explain taxes to millions of taxpayers Description: I will talk about how Intuit uses LLMs to explain tax situations to Turbotax users. Users want explanations of their tax situations - this drives confidence in the product. Over the course of last two tax years, Intuit has built out explanations using Anthropic and openAI’s models to develop genAI powered explanations. This includes design a complex system with prompt engineered solutions and both LLM & human powered evaluations to ensure high quality bar that our users expect when filing taxes with us. During the course of my talk, I will talk across GenAI development lifecycle at scale - including development , evaluations and scaling. And security evaluations. We also developed a fine-tuned version of Claude Haiku & shall be covering that in the presentation. We also expanded into tax question and answering powered by RAG, including graphRAG and I would be covering those developments too. Speaker: Jaspreet Singh (Senior Staff Software Engineer) Format: Talk ------------------------------------ Session Title: Make your LLM app a Domain Expert: How to Build an LLM-Native Expert System Description: Vertical AI is a multi-trillion-dollar opportunity. But you can't build a domain-expert application simply by grabbing the latest LLMs off-the-shelf: you need a system for codifying latent insights from domain experts and using that to drive development of your application. In this talk, we'll describe the system we've built at Anterior which has enabled us to achieve SOTA clinical reasoning and serve health insurance providers covering 50 million American lives. We'll share: - how and why to encode domain-specific failure modes as an ontology - a practical system for converting domain expertise into quantifiable eval metrics - how we structure work and collaboration between our clinicians, engineer and PMs - our eval-driven AI iteration process and how this can be adapted to any industry Speaker: Christopher Lovejoy (Head of Clinical AI ) Format: Talk ------------------------------------ Session Title: Accelerating Investment Operations: How BlackRock Builds Custom Knowledge Apps at Scale. Description: Investment Operations teams are the backbone of asset and investment management firms. Their day-to-day work not only enables portfolio managers to respond swiftly to market events but also ensures that complex, unstructured data flows seamlessly across the organization. In this talk, we introduce a modular, Kubernetes-native AI framework purpose-built to scale custom Knowledge Apps across the enterprise. Designed with speed, flexibility, and compliance in mind, the framework empowers teams to launch production-grade document extraction applications in minutes instead of months, unlocking new levels of automation and efficiency for investment management workflows. We’ll also share how this framework has helped BlackRock streamline document extraction processes, generate investment signals, reduce operational overhead, and accelerate the delivery of high-impact business use cases—all while maintaining the robustness and control required in a regulated industry. Speaker: Vaibhav Page (Principal Engineer ) Format: Talk ------------------------------------ Session Title: Quantitative Research in the era of Agentic AI Description: Agents and vibe coding have already disrupted software engineering, but what about domains such as systematic quant trading? Can we build an agent to help find alpha - and ultimately to predict the future of financial markets? In this talk, we'll walk you through the journey of creating and deploying the Alpha Assistant - a semi-autonomous research coding agent ("outer loop") specifically designed for Man Group's Quantitative Research teams. Learn about what goes into building a domain-specific coding agent in a niche, technical, and scientific domain. We'll touch on key themes such as agentic interface design (and why we moved *away* from a Claude Code-esque CLI interface), what we've learnt applying the "bitter lesson for agents", and what full autonomy for quant research might look like. Speaker: Matthew Hertz (Head of Machine Learning Technology) Format: Talk ------------------------------------ Session Title: How to Build Planning Agents without losing control Description: Planning agents help solve complex tasks by breaking them into steps. They work across enterprise systems where data lives in many places. These agents are powerful but can be hard to control. This session shows how to use blueprints as guardrails for these agents. I will explain techniques to ensure agents follow the right plan. I will cover evaluation methods to verify agents stay aligned with user goals. Speaker: Yogendra Miraje (Lead AI Engineer) Format: Talk ------------------------------------ Session Title: The Rise of Open Models in the Enterprise Description: This year kicked off with the DeepSeek-R1 news cycle breaking out of our AI Engineering bubble into the mainstream tech and business world. Leaders at the highest levels of the largest enterprises started asking how open source models could enhance and accelerate their AI strategy. Open source models promise increased ownership of AI systems: control over performance and price, improved uptime and reliability, better compliance, and flexible hosting options. How are these promises playing out after months of implementation? In this talk, I’ll draw on hundreds of conversations with AI leaders at enterprise companies to discuss what has — and hasn’t — changed about enterprise AI strategy in a world where open-source models compete on the frontier of intelligence. Speaker: Amir Haghighat (CTO) Format: Talk ------------------------------------ Session Title: Machines of Buying & Selling Grace Description: How to go beyond browser automation to truly agentic commerce, where AI can buy, sell and negotiate on behalf of users and merchants. Speaker: Adam Behrens (CEO) Format: Talk ------------------------------------ Session Title: Accelerating Investment Operations: How BlackRock Builds Custom Knowledge Apps at Scale. Description: Investment Operations teams are the backbone of asset and investment management firms. Their day-to-day work not only enables portfolio managers to respond swiftly to market events but also ensures that complex, unstructured data flows seamlessly across the organization. In this talk, we introduce a modular, Kubernetes-native AI framework purpose-built to scale custom Knowledge Apps across the enterprise. Designed with speed, flexibility, and compliance in mind, the framework empowers teams to launch production-grade document extraction applications in minutes instead of months, unlocking new levels of automation and efficiency for investment management workflows. We’ll also share how this framework has helped BlackRock streamline document extraction processes, generate investment signals, reduce operational overhead, and accelerate the delivery of high-impact business use cases—all while maintaining the robustness and control required in a regulated industry. Speaker: Infant Vasanth (Senior Director of Engineering) Format: Talk ------------------------------------ Session Title: CIOs and Industry Leaders: Do You Trust Your AI’s Inferences? Description: Enterprise AI adoption is accelerating, but with it comes a hard question: Do we trust the model’s decisions? In this 18-minute talk, I’ll explore the invisible risks behind automated decision-making in safety-critical and revenue-sensitive environments. Drawing on case studies across manufacturing, telecom, and industrial IoT, I’ll highlight how explainability, traceability, and robust guardrails drive adoption and protect enterprise value. Attendees will walk away with: • A 3-step framework for operationalizing AI trust • Real-world lessons from building guardrails in on-prem and hybrid systems • Tools and techniques for debugging and explaining inferences at scale • A blueprint for building trust between models, engineers, and executive stakeholders Speaker: Sahil Yadav (Chief Product Officer) Format: Talk ====================================================================== --- Track: MCP (June 4) --- ====================================================================== Session Title: tba Description: Jerome Swannack discusses Anthropic's approach to developing safe and reliable AI systems. Speaker: Jerome Swannack (MTS, MCP) Format: Keynote ------------------------------------ Session Title: tba Description: Den Delimarsky presents the MCP Auth spec and its role in standardized, secure AI interactions. Speaker: Den Delimarsky (MCP Auth) Format: Keynote ------------------------------------ Session Title: A2A & MCP: Automating Business Processes with LLMs Description: Ever wished your webhooks could think for themselves? Join us to discover how A2A agents can transform passive webhook endpoints into intelligent workflow processors. In this session, we'll show you how to build a system that automatically spawns AI Agents to handle incoming webhooks. Using Google's Agent-to-Agent framework and MCP, you'll learn how to create dynamic AI agents that respond to events, communicate with external services, and make decisions based on content analysis. See the future of workflow automation where webhooks don't just trigger actions—they trigger intelligence! Speaker: Damien Murphy (Founding Engineer) Format: Workshop ------------------------------------ Session Title: Observable tools - the state of MCP observability Description: AI Engineers deserve observable tools! MCP getting adoption means that less and less of your agents code is running under your control, and this has DX and observability challenges, let's fix that! Join Alex Volkov from Weights & Biases and Steve Manual from mcp.run on this recap of the current state of MCP observability, including the observable.tools initiative, a recap of where the field stands and what to look forward to + a practical example of MCP tool usage evaluation framework from mcp.run! Speaker: Alex Volkov (AI Evangelist) Format: Talk ------------------------------------ Session Title: Observable tools - the state of MCP observability Description: AI Engineers deserve observable tools! MCP getting adoption means that less and less of your agents code is running under your control, and this has DX and observability challenges, let's fix that! Join Alex Volkov from Weights & Biases and Steve Manual from mcp.run on this recap of the current state of MCP observability, including the observable.tools initiative, a recap of where the field stands and what to look forward to + a practical example of MCP tool usage evaluation framework from mcp.run! Speaker: Steve Manuel (CEO) Format: Talk ------------------------------------ Session Title: MCP is all you need Description: Everyone is talking about agents, and right after that, they’re talking about agent-to-agent communications. Not surprisingly, various nascent, competing protocols are popping up to handle it. But maybe all we need is MCP — the OG of GenAI communication protocols (it's from way back in 2024!). Last year, Jason Liu gave the second most watched AIE talk — “Pydantic is all you need”. This year, I (the creator of Pydantic) am continuing the tradition by arguing that MCP might be all we need for agent-to-agent communications. What I’ll cover: - Misusing Common Patterns: MCP was designed for desktop/IDE applications like Claude Code and Cursor. How can we adapt MCP for autonomous agents? - Many Common Problems: MCP is great, but what can go wrong? How can you work around it? Can the protocol be extended to solve these issues? - Monitoring Complex Phenomena: How does observability work (and not work) with MCP? - Multiple Competing Protocols: A quick run-through of other agent communication protocols like A2A and AGNTCY, and probably a few more by June 😴 - Massive Crustaceans Party: What might success look like if everything goes to plan? Speaker: Samuel Colvin (Founder of Pydantic) Format: Talk ------------------------------------ Session Title: The rise of the agentic economy on the shoulders of MCP Description: Thanks to MCP and all the MCP server directories, agents can now autonomously discover new tools and other agents. This lays down the foundation for the future agentic economy, where businesses will sell to autonomous agents (B2A) and eventually agents will sell to other agents (A2A). But one key part is still missing: agents do not have a standard way to subscribe to external services and pay for them. In this talk, we’ll show how to give agents full autonomy to discover and pay for new external MCP-enabled services, even if those services don’t support it, using a little-known MCP server nesting capability. We’ll also cover how to monetize AI agents and the B2A/A2A business models. Speaker: Jan Curn (CEO) Format: Talk ====================================================================== --- Track: ONLINE TRACK (TBA) --- ====================================================================== Session Title: Does AI Actually Boost Developer Productivity? (Stanford / 100k Devs Study) Description: Forget vendor hype: Is AI actually boosting developer productivity, or just shifting bottlenecks? Stop guessing. Our study at Stanford cuts through the noise, analyzing real-world productivity data from nearly 100,000 developers across hundreds of companies. We reveal the hard numbers: while the average productivity boost is significant (~20%), the reality is complex – some teams even see productivity decrease with AI adoption. The crucial insights lie in why this variance occurs. Discover which company types, industries, and tech stacks achieve dramatic gains versus minimal impact (or worse). Leave with the objective, data-driven evidence needed to build a winning AI strategy tailored to your context, not just follow the trend. Speaker: Simon Obstbaum (Researcher, former CTO @ Crunchyroll) Format: Talk ------------------------------------ Session Title: Does AI Actually Boost Developer Productivity? (Stanford / 100k Devs Study) Description: Forget vendor hype: Is AI actually boosting developer productivity, or just shifting bottlenecks? Stop guessing. Our study at Stanford cuts through the noise, analyzing real-world productivity data from nearly 100,000 developers across hundreds of companies. We reveal the hard numbers: while the average productivity boost is significant (~20%), the reality is complex – some teams even see productivity decrease with AI adoption. The crucial insights lie in why this variance occurs. Discover which company types, industries, and tech stacks achieve dramatic gains versus minimal impact (or worse). Leave with the objective, data-driven evidence needed to build a winning AI strategy tailored to your context, not just follow the trend. Speaker: Yegor Denisov-Blanch (Developer Productivity Researcher at Stanford University) Format: Talk ====================================================================== --- Track: RAG (TBA) --- ====================================================================== Session Title: Embeddings ARE NOT All You Need: Understanding Tradeoffs in Multimodal Search Description: Multimodal search demos show impressive capabilities but understanding how to scale these systems while balancing cost and performance is where things get tricky. Through three real-world implementations from wildlife stock footage to breaking news to sports highlights, I'll demonstrate how HNSW-indexed vector embeddings and more traditional caption-based approaches each excel in different domains. You'll see how querying pooled image embeddings struggle with spatiotemporal relationships that simple JSON tagging handles effortlessly. You'll see how traditional computer vision techniques applied as preprocessing steps enable video understanding models to more accurately answer “was the athlete over this line?”. I'll share implementation patterns for these approaches with animated architecture diagrams. We’ll cover why chunking and preprocessing strategies impact both accuracy and index size. By the end of this talk, you'll understand the nuanced technical tradeoffs between embeddings, metadata filtering, and hybrid retrieval for your specific multimodal search challenges. Most importantly, you'll get some real world costs around indexing and continued operation. Speaker: Randall Hunt (CTO at Caylent) Format: Talk ====================================================================== --- Track: REASONING+RL (TBA) --- ====================================================================== Session Title: The infrastructure for the singularity Description: We're at an inflection point where AI agents are transitioning from experimental tools to practical coworkers. This new world will demand new infrastructure for RL training, test-time scaling, and deployment. This is why Morph Labs developed Infinibranch last year, and we are excited to finally unveil what's next. Speaker: Jesse Han (Founder) Format: Keynote ------------------------------------ Session Title: Verified Superintelligence Description: Christian Szegedy discusses approaches to building and verifying superintelligent AI systems. Speaker: Christian Szegedy (Co-founder) Format: Keynote ------------------------------------ Session Title: Training Agentic Reasoners Description: This talk will be a technical deep dive into RL for agentic reasoning via multi-turn tool calling, similar to OpenAI's o3 and Deep Research. In particular, we'll cover: - When, why, and how - GRPO vs PPO vs etc - Designing environments and rewards - Survey of recent research highlights - Results on example tasks - Overview of open-source ecosystem (libraries, compute requirements, tradeoffs, etc.) Speaker: Will Brown (Research Engineering Lead) Format: Talk ------------------------------------ Session Title: How to Train Your Agent: Building Reliable Agents with RL Description: Have you ever launched an awesome agentic demo, only to realize no amount of prompting will make it reliable enough to deploy in production? Agent reliability is a famously difficult problem to solve! In this talk we’ll learn how to use GRPO to help your agent learn from its successes and failures and improve over time. We’ve seen dramatic results with this technique, such as an email assistant agent that whose success rate jumped from 74% to 94% after replacing o4-mini with an open source model optimized using GRPO. We’ll share case studies as well as practical lessons learned around the types of problems this works well for and the unexpected pitfalls to avoid. Speaker: Kyle Corbitt (CEO ) Format: Talk ------------------------------------ Session Title: DeepCoder: A Fully Open-Source 14B Coder at O3-Level Description: This talk covers the full training recipe to achieve O3-level for coding. Speaker: Michael Zhi Yu Luo (UC Berkeley, PhD) Format: Talk ------------------------------------ Session Title: What Reinforcement Learning with Verifiable Rewards Changed Description: Reinforcement learning with verifiable rewards (RLVR) came onto the field with a storm after the DeepSeek R1 model showed that training reasonable models was accessible to the entire AI industry. Next it became table stakes for high scores in math and code, but quickly it's shifting to opening the doors on new types of models entirely -- those enabling tool use, reasoning, code execution, and everything to come together to new experiences. This talk describes how RLVR emerged in such a sudden way, what we can glean about AI research generally, and how RLVR changed the AI models we will use forever. Speaker: Nathan Lambert (Senior Research Scientist at Ai2 & Founder of Interconnects.ai) Format: Talk ------------------------------------ Session Title: Inside ARC Prize, Scaling Reasoning, and Dynamic Evals Description: ARC Prize Foundation is building the North Star for AGI—rigorous, open benchmarks that track reasoning progress in modern AI. We'll cover how we've evaluated frontier models since GPT-3.5 and share a preview of ARC-AGI-3: a dynamic, game-like benchmark launching next year to test general intelligence. Speaker: Greg Kamradt (President) Format: Talk ====================================================================== --- Track: RECSYS (TBA) --- ====================================================================== Session Title: Teaching Gemini to Speak YouTube: Adapting LLMs for Video Recommendations to 2B+ DAU Description: YouTube recommendations drive nearly two-thirds of the platform's staggering 5 billion daily watch hours for 2 billion+ DAU. Traditionally powered by large embedding models (LEMs), we're undertaking a fundamental shift: rebuilding our recommendation stack using foundation models like Gemini. This talk dives into our engineering journey adapting general-purpose LLMs (Gemini) for the highly specialized, dynamic, and massive-scale task of YouTube recommendations. We'll start with a critical first step: creating a "language" for YouTube videos. Learn how we developed 'SemanticID', a novel tokenization scheme that distills multimodal video features (text, audio, frames) into discrete tokens representable by an LLM. Our paper (Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations) is a landmark work in this space. We then adapt the base Gemini checkpoint to understand sequences of these video tokens alongside natural language, effectively teaching it the grammar of user watch behavior. A key insight: unlike static LLM training, YouTube's corpus evolves so rapidly (millions of new videos daily) that daily retraining is non-negotiable to maintain recommendation quality. Now we can prompt LRM with user history and context to generate personalized candidate recommendations, achieving the biggest engagement wins on YouTube in the last ~decade. There’s a lot of attention on the LLM-led transformation of Search (with AI Overviews, Perplexity, ChatGPT-Search etc). However, across large consumer apps, it’s the recommendation systems & feeds that drive most consumer engagement, not just search (eg. YouTube recs drive 67% of WatchTime). This talk is about the LLM-led transformation of recommendations & feeds – building a recommendation engine on top of Gemini. Speaker: Devansh Tandon (Principal Product Manager ) Format: Talk ====================================================================== --- Track: RETRIEVAL+SEARCH (TBA) --- ====================================================================== Session Title: Scaling Enterprise-Grade RAG Systems: Lessons from the Legal Frontier Description: In domains like law, compliance, and tax, building enterprise-grade RAG means very large scale, spikey workloads, a focus on accuracy, and non-negotiable privacy. In this talk, we'll share war stories and battle scars of how Harvey has built the world's most advanced AI agents for the legal profession on top of a highly optimized retrieval architecture. We'll cover how to get better retrieval via both sparse and dense retrieval methods, why domain-specific reranking is essential, and how to handle ambiguity in real-world queries. We'll also touch on how LanceDB's search engine enables this architecture by delivering low-latency, high-throughput retrieval across millions of documents of varying sizes without compromising privacy. This solid foundation enables Harvey to build a product that brings highly accurate answers to hundreds of law firms and professional services firms across 45 countries. Speaker: Aman Kishore (Former Founder) Format: Talk ------------------------------------ Session Title: Evaluating AI Search: A Practical Framework for Augmented AI Systems Description: AI search is becoming the front door to information, whether through Retrieval-Augmented Generation (RAG), Search-Augmented Generation (SAG), or custom agents that synthesize answers on top of indexed content. As users rely more heavily on these systems, evaluating their quality becomes mission-critical. But traditional metrics like precision and recall don’t capture the full picture. In this talk, we introduce a practical evaluation framework for AI-powered search, across three dimensions: - Are the retrieved sources relevant to the query? - And is the final answer complete? - Are the sources faithfully used in the generated answer? We’ll share lessons from working with search companies and present early findings from a new benchmark evaluating popular augmented AI systems across these dimensions. Rather than ranking winners and losers, we explore where different systems excel or break down, and how these tradeoffs inform product decisions. This talk is for AI engineers and product teams who want to build trusted, high-quality AI search experiences, and need a way to measure if it’s actually working. Speaker: Julia Neagu (CEO) Format: Talk ------------------------------------ Session Title: Scaling Enterprise-Grade RAG Systems: Lessons from the Legal Frontier Description: In domains like law, compliance, and tax, building enterprise-grade RAG means very large scale, spikey workloads, a focus on accuracy, and non-negotiable privacy. In this talk, we'll share war stories and battle scars of how Harvey has built the world's most advanced AI agents for the legal profession on top of a highly optimized retrieval architecture. We'll cover how to get better retrieval via both sparse and dense retrieval methods, why domain-specific reranking is essential, and how to handle ambiguity in real-world queries. We'll also touch on how LanceDB's search engine enables this architecture by delivering low-latency, high-throughput retrieval across millions of documents of varying sizes without compromising privacy. This solid foundation enables Harvey to build a product that brings highly accurate answers to hundreds of law firms and professional services firms across 45 countries. Speaker: Chang She (CEO) Format: Talk ------------------------------------ Session Title: Evaluating AI Search: A Practical Framework for Augmented AI Systems Description: AI search is becoming the front door to information, whether through Retrieval-Augmented Generation (RAG), Search-Augmented Generation (SAG), or custom agents that synthesize answers on top of indexed content. As users rely more heavily on these systems, evaluating their quality becomes mission-critical. But traditional metrics like precision and recall don’t capture the full picture. In this talk, we introduce a practical evaluation framework for AI-powered search, across three dimensions: - Are the retrieved sources relevant to the query? - And is the final answer complete? - Are the sources faithfully used in the generated answer? We’ll share lessons from working with search companies and present early findings from a new benchmark evaluating popular augmented AI systems across these dimensions. Rather than ranking winners and losers, we explore where different systems excel or break down, and how these tradeoffs inform product decisions. This talk is for AI engineers and product teams who want to build trusted, high-quality AI search experiences, and need a way to measure if it’s actually working. Speaker: Deanna Emery (Founding AI Researcher) Format: Talk ------------------------------------ Session Title: Information Retrieval from the Ground Up Description: Vector search is only a feature. Search engines and information retrieval have retaken their position as the foundation of RAG. This workshop takes you through decades of research, what has been working for a long time, and how it got better with Machine Learning. Speaker: Philipp Krenn (Code and conference monkey) Format: Workshop ------------------------------------ Session Title: Ingest, Chunk, Retrieve: How We Built an AI Sales Rep that Trains Herself Description: AI agents and digital workers are becoming an essential tool for teams of all sizes and across all industries. However, training these agents to become experts in your product, business, and customers remains a significant challenge. But what if onboarding a digital worker was as simple as uploading your pitch deck? At 11x, we built Alice, an AI sales rep that writes outbound emails with the nuance and context of a top-performing human - because she learns like one too. In this talk, we'll share how we built a RAG system that lets users train Alice on their internal materials: PDFs, websites, call recordings, and more. We'll walk through our ingestion flow, OCR and chunking pipeline, and explain how we leveraged different technologies and vendors to support a wide-range of file types. We'll also discuss how we leveraged Pinecone and other vector embedding technologies to drive relevant, high-performing messaging. Finally, I'll share what we’ve learned running this system in production across 300+ customers and over 1m prospect interactions each month. Speaker: Satwik Singh (Member of Technical Staff ) Format: Talk ------------------------------------ Session Title: Scaling Enterprise-Grade RAG Systems: Lessons from the Legal Frontier Description: In domains like law, compliance, and tax, building enterprise-grade RAG means very large scale, spikey workloads, a focus on accuracy, and non-negotiable privacy. In this talk, we'll share war stories and battle scars of how Harvey has built the world's most advanced AI agents for the legal profession on top of a highly optimized retrieval architecture. We'll cover how to get better retrieval via both sparse and dense retrieval methods, why domain-specific reranking is essential, and how to handle ambiguity in real-world queries. We'll also touch on how LanceDB's search engine enables this architecture by delivering low-latency, high-throughput retrieval across millions of documents of varying sizes without compromising privacy. This solid foundation enables Harvey to build a product that brings highly accurate answers to hundreds of law firms and professional services firms across 45 countries. Speaker: Calvin Qi (Tech Lead Manager) Format: Talk ------------------------------------ Session Title: Ingest, Chunk, Retrieve: How We Built an AI Sales Rep that Trains Herself Description: AI agents and digital workers are becoming an essential tool for teams of all sizes and across all industries. However, training these agents to become experts in your product, business, and customers remains a significant challenge. But what if onboarding a digital worker was as simple as uploading your pitch deck? At 11x, we built Alice, an AI sales rep that writes outbound emails with the nuance and context of a top-performing human - because she learns like one too. In this talk, we'll share how we built a RAG system that lets users train Alice on their internal materials: PDFs, websites, call recordings, and more. We'll walk through our ingestion flow, OCR and chunking pipeline, and explain how we leveraged different technologies and vendors to support a wide-range of file types. We'll also discuss how we leveraged Pinecone and other vector embedding technologies to drive relevant, high-performing messaging. Finally, I'll share what we’ve learned running this system in production across 300+ customers and over 1m prospect interactions each month. Speaker: Sherwood Callaway (Tech Lead, Alice ) Format: Talk ------------------------------------ Session Title: Building AI Agents that actually automate Knowledge Work Description: Agents are all the rage in 2025, and every single b2b SaaS startup/incumbent promises AI agents that can "automate work" in some way. But how do you actually build this? The answer is two fold: 1. really really good tools 2. carefully tailored agent reasoning over these tools that range from assistant-to-automation based UXs. The main goal of this talk is to a practical overview of agent architectures that can automate real-world work, with a focus on document-centric tasks. Learn the core building blocks of best-in-class "tools" around processing, manipulating, and indexing/retrieving PDFs to Excel spreadsheets. Also learn the range of agent architectures suited for different tasks, from chat assistant-based UXs with high human-in-the-loop, to automation UXs that rely on encoding a business process into an end-to-end task solver. These architectures have to be generalizable but also highly accurate as agents get increasingly better at reasoning and code-writing. Speaker: Jerry Liu (CEO) Format: Talk ------------------------------------ Session Title: Layering every technique in RAG, one query at a time Description: Start with the simplest Search - in-memory embeddings with relevance ranking. End with the most complex planet-scale Search - 70+ corpus mix of token, embeddings, and knowledge graphs, all jointly retrieved, custom ranked, joint re-ranked, and then LLM-processed, at 160,000 queries per second in under 200msec. This talk will be a fun “one query at a time” survey of all techniques in RAG in incremental complexity, showing the limits of each technique and what the next layered one opens up in terms of capabilities to handle ever-more complex queries in RAG. You’ll learn why queries like [falafel] are notoriously hard to Search over, why chunking your documents can be disastrous, how you can sometimes can get away with a simple bm25, and how some Search problems are so hard to solve that you’re better off punting the problem to the LLM or the UX. Brought to you by the team that worked on 50+ Search products, in the context of Google.com and custom Enterprise Search. Speaker: David Karam (CEO) Format: Talk ------------------------------------ Session Title: How to build Enterprise-aware agents Description: While LLMs demonstrated impressive reasoning capabilities, their out-of-the-box reasoning is akin to hiring a brilliant but brand-new employee who doesn’t have the enterprise context of “how things are done at this company”. In this talk, I'll introduce “Workflow Search” as a paradigm to build enterprise-aware agents that can balance predictability on common tasks, and flexibility on unforeseen tasks. Speaker: Chau Tran (Technical Lead) Format: Talk ------------------------------------ Session Title: Building a Smarter AI Agent with Neural RAG Description: RAG quality for AI agents is critical, and traditional keyword-based search engines consistently underperform in agentic or multi-step tasks, where semantic grounding and contextual nuance matter most. In this talk, Will Bryk, CEO of Exa will live code two AI agent applications–one using traditional keyword search RAG and one using neural network RAG via vector search. He’ll then evaluate both applications based on task performance, relevance, and latency. With a live demo (no theory or pre-baked applications), the audience will get a firsthand look at the practical differences between keyword and semantic systems in production, and learn embedding strategies, indexing trade-offs, hybrid retrieval techniques, prompt tuning, and more. Speaker: Will Bryk (CEO) Format: Talk ====================================================================== --- Track: SWE AGENTS (TBA) --- ====================================================================== Session Title: tba Description: Greg Brockman shares OpenAI's vision and progress towards achieving AGI. Speaker: Greg Brockman (President) Format: Keynote ------------------------------------ Session Title: tba Description: Boris Cherny discusses advancements and applications of Claude Code by Anthropic. Speaker: Boris Cherny (Claude Code) Format: Talk ------------------------------------ Session Title: Your Coding Agent Just Got Cloned And Your Brain Isn't Ready Description: Will the future engineer code alongside a single coding agent, or will they spend their day orchestrating many agents? Traditional development rewards synchronous focus. This session dives into the significant mindshift required to move from sequential coding to orchestrating parallel agents. We are the builders of "Jules", Google's massively parallel asynchronous coding agent (to be opened up in May). We'll share real-world insights from building Jules and explore how to rewire your brain for this powerful new "post-IDE" development paradigm. Speaker: Rustin Banks (Product Manager, AI Coding ) Format: Talk ------------------------------------ Session Title: Ship Agents that Ship: A Hands-On Workshop for SWE Agent Builders Description: Coding agents are transforming how software gets built, tested, and deployed, but engineering teams face a critical challenge: how to embrace this automation wave without sacrificing trust, control, or reliability. In this 110-minute workshop, you’ll go beyond toy demos and build production-minded AI agents using Dagger, the programmable delivery engine designed for real CI/CD and AI-native workflows. Whether you're debugging failures, triaging pull requests, generating tests, or shipping features, you'll learn how to orchestrate autonomous agents that live in and around your codebase: from your laptop to your CI platform. We’ll guide you through: Building real-world agents with Dagger and popular LLMs (GPT, Claude, etc.) Programming agent environments using real languages (Go, Python, TypeScript) Executing agent workflows locally and in GitHub Actions, so you can bring them to production Using a composable runtime that ensures isolation, determinism, traceability, and repeatability Designing agents that automate and enhance debugging, test generation, code review, bug fixing, and feature implementation By the end of the workshop, you’ll walk away ready to build your own army of autonomous agents, working collaboratively across your codebase, locally and in CI, accelerating development without ceding control. Let’s build agents that don’t just talk, they ship! Speaker: Kyle Penfound (Solutions Engineer at Dagger) Format: Workshop ------------------------------------ Session Title: Ship Production Software in Minutes, Not Months Description: Planning, coding, testing, monitoring—the endless cycle that spans 10+ tools that fragment our focus and slows delivery to a crawl. Vibe coding doesn't work when you've got 10TB of code. If you just sighed, you're one of many professional software engineers trapped in the traditional software development lifecycle (SDLC) that was designed before AI could parallelize your entire workflow. But what if you could orchestrate multiple AI agents on tasks beyond just generating code, while you focus on the creative decisions that matter? In this talk, I'll demonstrate how real enterprise organizations are changing their entire SDLC—going from understanding, planning, coding, and testing all the way to incident response—using AI agents. You'll witness the next evolution of software engineering—where AI doesn't just generate code, but orchestrates the entire development lifecycle. Speaker: Eno Reyes (CTO) Format: Talk ------------------------------------ Session Title: Vibe Coding, with Confidence Description: Everyone wants to do Vibe Code, even large Enterprises. But how can we ensure that the generated code is well-grounded with the dev team's code and software development standards? In this talk, Itamar will present how to use various tools and agents, including MCP and A2A, to achieve precisely that. Speaker: Itamar Friedman (CEO) Format: Talk ------------------------------------ Session Title: Production software keeps breaking and it will only get worse. Here’s how Traversal is fixing it. Description: Software is eating the world. AI is eating software. AI-powered SWE means a whole lot more software is going to be written that powers mission critical systems in the coming years, with hardly any of it written by humans. Hence, when these software systems inevitably break, it’s going to be next to impossible to troubleshoot them. Towards addressing this issue, we’ll do a product launch of Traversal’s AI, a significant step towards self-healing software systems. We will showcase how it is already used to autonomously troubleshoot production incidents in some of the most complex enterprise environments. Speaker: Anish Agarwal (CEO and Co-founder) Format: Talk ------------------------------------ Session Title: Production software keeps breaking and it will only get worse. Here’s how Traversal is fixing it. Description: Software is eating the world. AI is eating software. AI-powered SWE means a whole lot more software is going to be written that powers mission critical systems in the coming years, with hardly any of it written by humans. Hence, when these software systems inevitably break, it’s going to be next to impossible to troubleshoot them. Towards addressing this issue, we’ll do a product launch of Traversal’s AI, a significant step towards self-healing software systems. We will showcase how it is already used to autonomously troubleshoot production incidents in some of the most complex enterprise environments. Speaker: Raaz Dwivedi (Chief Scientist) Format: Talk ------------------------------------ Session Title: Software Development Agents: What Works and What Doesn't Description: The adoption of AI into software development has been bumpy. While autocomplete tools like Copilot have gone mainstream, autonomous agents like Devin and OpenHands have generated both enthusiasm and skepticism. Some engineers claim they generate a 10x productivity boost; others that they just create noise and tech debt. The difference between the enthusiasts and the skeptics is that the enthusiasts have reasonable expectations for what these agents can do, and have both practical and intuitive knowledge for how to use them effectively. In this session, we'll talk about what tasks are appropriate for today's software agents, what tasks they might start to succeed at in 2025, and what tasks are best left to humans no matter how good they get. Session Outline: Learn how to use software development agents like OpenHands (fka OpenDevin) effectively, without creating noise and tech debt. Speaker: Robert Brennan (CEO) Format: Talk ------------------------------------ Session Title: Production software keeps breaking and it will only get worse. Here’s how Traversal is fixing it. Description: Software is eating the world. AI is eating software. AI-powered SWE means a whole lot more software is going to be written that powers mission critical systems in the coming years, with hardly any of it written by humans. Hence, when these software systems inevitably break, it’s going to be next to impossible to troubleshoot them. Towards addressing this issue, we’ll do a product launch of Traversal’s AI, a significant step towards self-healing software systems. We will showcase how it is already used to autonomously troubleshoot production incidents in some of the most complex enterprise environments. Speaker: Raj Agrawal (CTO, Cofounder) Format: Talk ------------------------------------ Session Title: Ship Agents that Ship: A Hands-On Workshop for SWE Agent Builders Description: Coding agents are transforming how software gets built, tested, and deployed, but engineering teams face a critical challenge: how to embrace this automation wave without sacrificing trust, control, or reliability. In this 110-minute workshop, you’ll go beyond toy demos and build production-minded AI agents using Dagger, the programmable delivery engine designed for real CI/CD and AI-native workflows. Whether you're debugging failures, triaging pull requests, generating tests, or shipping features, you'll learn how to orchestrate autonomous agents that live in and around your codebase: from your laptop to your CI platform. We’ll guide you through: Building real-world agents with Dagger and popular LLMs (GPT, Claude, etc.) Programming agent environments using real languages (Go, Python, TypeScript) Executing agent workflows locally and in GitHub Actions, so you can bring them to production Using a composable runtime that ensures isolation, determinism, traceability, and repeatability Designing agents that automate and enhance debugging, test generation, code review, bug fixing, and feature implementation By the end of the workshop, you’ll walk away ready to build your own army of autonomous agents, working collaboratively across your codebase, locally and in CI, accelerating development without ceding control. Let’s build agents that don’t just talk, they ship! Speaker: Jeremy Adams (Head of Ecosystem) Format: Workshop ------------------------------------ Session Title: Post-Training Open Models with RL for Autonomous Coding Description: The models and techniques to build fully autonomous coding agents - not just coding copilots - are already here. In this talk, former Google DeepMind staff research scientist, now CEO of Reflection Misha Laskin will present new research on post-training open weight LLMs for autonomous SWE tasks. He’ll focus on how scaling LLMs with Reinforcement Learning improves the autonomous coding capabilities of LLMs, and provide insight on the technical challenges required to train such systems at scale. Speaker: Misha Laskin (CEO & co-founder) Format: Talk ------------------------------------ Session Title: Beyond the Prototype: Using AI to Write High-Quality Code Description: In this case study-based keynote, Josh Albrecht, CTO of Imbue, examines the critical engineering challenges in building AI coding systems that create more than just prototypes. Drawing from Imbue's research developing Sculptor, an experimental coding agent environment, Josh shares key insights into the fundamental technical obstacles encountered when evolving AI-assisted coding from toy applications to more robust software systems. The session will explore approaches to core challenges like safely executing code, managing context across large codebases, automating test generation, and creating systems that can identify potential pitfalls in AI-generated code. Attendees will gain practical insights into the technical underpinnings of next-generation coding agents that aim to handle complex software engineering challenges architecting larger systems, increasing meaningful test coverage and designing systems that are easy to debug—moving us closer to AI systems that can help create maintainable software. Speaker: Josh Albrecht (CTO and Co-founder) Format: Talk ------------------------------------ Session Title: Production software keeps breaking and it will only get worse. Here’s how Traversal is fixing it. Description: Software is eating the world. AI is eating software. AI-powered SWE means a whole lot more software is going to be written that powers mission critical systems in the coming years, with hardly any of it written by humans. Hence, when these software systems inevitably break, it’s going to be next to impossible to troubleshoot them. Towards addressing this issue, we’ll do a product launch of Traversal’s AI, a significant step towards self-healing software systems. We will showcase how it is already used to autonomously troubleshoot production incidents in some of the most complex enterprise environments. Speaker: Matthew Schoenbauer (Founding Engineer) Format: Talk ====================================================================== --- Track: SECURITY (June 5) --- ====================================================================== Session Title: Fuzzing in the GenAI Era Description: "Evaluation" is one of those concepts that every AI practitioner vaguely knows is important, but few practitioners truly understand. Is "eval" the dataset for measuring the quality of your AI system? Is "eval" the measure, the metric of quality? Is "eval" the process of human annotation and scoring? Or is "eval" a third-party dataset run once to benchmark a model? To mitigate this cacophony, this talk will provide an opinionated and principled perspective for what we actually mean when we say “evaluation”, beyond the traditional for-loop-over-a-static dataset. In particular, this perspective draws heavy inspiration from *fuzzing*, i.e. bombarding AI with simulated, unexpected user inputs to uncover corner cases at scale. This factors into sub-problems regarding: - Quality Metric. What is the actual criteria we, as humans, are using to determine if an AI system is producing good or bad responses? How do we elicit these criteria before the human SME can articulate them? How do we, as efficiently as possible, operationalize this criteria with an automated *Judge*? - Stimuli Generation. Given a metric, how do we know, with confidence, that an AI system is performing well with respect to the metric? What data is representative and sufficient for discovering all potential bugs of an AI system? And how do we generate this complex, diverse, faithful data at scale? We will discuss in detail the philosophy, technology, and case studies behind both problems of Quality Metric and Stimuli Generation, and how they interact in concert. Speaker: Leonard Tang (Founder & CEO) Format: Talk ------------------------------------ Session Title: The Unofficial Guide to Apple’s Private Cloud Compute Description: In October 2024, Apple released a new private AI technology onto millions of devices called “Private Cloud Compute”. It brings the same level of privacy and security a local device offers but on an “untrusted" remote server. This talk discusses how Private Cloud Compute represents a paradigm shift in confidential computing and explores the core advancements that made it possible to become mainstream. We’ll explore its novel architecture that allows developers to run sensitive, multi-tenant workloads with cryptographically-provably privacy guarantees at scale and at reasonable cost. Attendees will leave with an understanding of how to leverage this technology for data and AI applications where privacy and security is paramount. Speaker: Jonathan Mortensen (CEO) Format: Talk ------------------------------------ Session Title: How to defend your sites from AI bots Description: Constantly seeing CAPTCHAs? It used to be easy to detect the humans from the droids, but what else can we do when synthetic clients make up nearly half of all web requests. Rotating IPs, spoofed browsers, and agents acting on behalf of real users - are we doomed to forever be solving puzzles? In this talk, we’ll explore user agents, HTTP fingerprints, and IP reputation signals that make humans and agents stand out from scrapers, build a realistic threat model, and dig into the behaviors that reveal the LLM-mimicry. Leave with AX- and UX-safe code, benchmarks, and tools to help you take back control. Speaker: David Mytton (Founder) Format: Talk ------------------------------------ Session Title: How we hacked YC Spring 2025 batch’s AI agents Description: We hacked 7 of the16 publicly-accessible YC X25 AI agents. This allowed us to leak user data, execute code remotely, and take over databases. All within 30 minutes each. In this session, we'll walk through the common mistakes these companies made and how you can mitigate these security concerns before your agents put your business at risk. Speaker: Rene Brandel (CEO) Format: Talk ------------------------------------ Session Title: How to Secure Agents using OAuth Description: We all know sharing passwords is bad (unless you want free TV), so why are we sharing API keys with AI? We shouldn't, and that’s why we need to talk about OAuth. In this talk, we will give a brief intro to OAuth. Then we will talk about the state of authorization in MCP. We will show how an MCP client uses OAuth to authenticate a user and securely access private resources and tools hosted by an MCP server. Then we’ll look at ways autonomous agents can use OAuth on their own behalf, talking to other agents and MCP servers directly. We’ll learn how to use OAuth to build agents that humans and machines can trust. Speaker: Jared Hanson (Co-Founder) Format: Talk ====================================================================== --- Track: TINY TEAMS (June 4) --- ====================================================================== Session Title: tba Description: Jared Palmer dives into the tools and frameworks for creating dynamic, AI-enhanced web applications. Speaker: Jared Palmer (VP of AI) Format: Keynote ------------------------------------ Session Title: Tiny Teams Description: Sean reached out on X, happy to do a talk on how to build a tiny team Speaker: Grant Lee (CEO) Format: Talk ------------------------------------ Session Title: Using OSS models to build AI apps with millions of users Description: In this talk, Hassan will go over how he builds open source AI apps that get millions of users like roomGPT.io (2.9 million users), restorePhotos.io (1.1 million users), Blinkshot.io (1 million visitors), and LlamaCoder.io (1.4 million visitors). He'll go over his journey in AI, demo some of the apps that he's built, and dig into his tech stack and code to explain how he builds these apps from scratch. He’ll also go over how to market them and go over his top tips and tricks for building great full-stack AI applications quickly and efficiently. This talk will start from first principles and give you a glimpse into Hassan’s workflow of idea -> working app -> many users. Attendees should come out of this session equipped with the resources to build impressive AI applications and understand some of the behind the scenes of how they’re built and marketed. This will hopefully serve as an educational and inspirational talk that encourages builders to go build cool things. Speaker: Hassan El Mghari (DevRel lead ) Format: Talk ------------------------------------ Session Title: The New Lean Startup Description: In this session, I will be presenting a case study of Oleve's journey, revealing how we've scaled a profitable multi-product portfolio with a tiny team. I'll walk you through the emergence of "tiny teams," our two-track engineering methodology that has become our blueprint, as well as an inside look at our technical alpha – specifically how we've engineered deterministic AI agents to deliver magical and reliable consumer experiences to millions. You'll learn how we've built internal tools to grow leanly and created operating playbooks to scale operations without traditional headcount requirements. I'll also share our approach to scrappy infrastructure innovation and how our investment in internal tooling has served as a critical force multiplier. Finally, I'll give an overview of parts of the profitable portfolio playbook that keeps us lean, adaptable, and profitable across multiple product lines. Structure of talk: - the tiny teams revolution - the two-track engineering approach - technical alpha: deterministic ai agents at scale - scrappy infrastructure innovation - internal tooling as a multiplier - the profitable portfolio playbook Speaker: Sid Bendre (Co-Founder) Format: Talk ====================================================================== --- Track: VOICE (June 4) --- ====================================================================== Session Title: Building Effective Voice Agents Description: How to build production voice applications and learnings from working with customers along the way Speaker: Anoop Kotha (Applied AI) Format: Talk ------------------------------------ Session Title: Milliseconds to Magic: Real‑Time Workflows using the Gemini Live API and Pipecat Description: The Gemini Live API GA is now powered by Google's best cost-effective thinking model Gemini 2.5 Flash. We will do a deep dive on the capabilities that the Gemini Live API combined with Pipecat unlock for devs with special focus on session management, turn detection, tool use (including async function calls), proactivity, multilinguality and integration with telephony and other infra. We will demo some of the more innovative capabilities. We will also talk through some customer use cases - especially how customers can use Pipecat to extend these realtime multimodal capabilities to client side applications such as customer support agents, gaming agents, tutoring agents etc. In addition, we also have an experimental version of the Live API powered by with Google's native audio offering that can be tried in an experimental capacity . This experimental model can communicate with seamless, emotive, steerable, multilingual dialogue and enhances use cases where more natural voices can be a big differentiator. Speaker: Shrestha Basu Mallick (Product lead, Gemini Developer API) Format: Talk ------------------------------------ Session Title: What we can learn from self driving in autonomous voice agents Description: The reliability challenges facing voice & chat AI deployment today mirror those that the autonomous vehicle industry confronted years ago. This talk explores how evaluation methodologies developed for self-driving cars can be transferred to create autonomous, self-improving evaluation systems for conversational AI. Drawing from my experience building evaluation infrastructure at Waymo and now developing Coval, an enterprise-grade reliability platform for conversational agents, I'll demonstrate how systematic testing infrastructure is not just a technical requirement but a competitive advantage in the rapidly evolving AI landscape. Speaker: Brooke Hopkins (Founder ) Format: Talk ------------------------------------ Session Title: Building voice agents with OpenAI Description: We'll walk through the differences between chained and speech-to-speech powered voice agents, how to approach them, best practices and transform a text-based agent into our first voice-enabled agent Speaker: Dominik Kundel (Developer Experience ) Format: Workshop ------------------------------------ Session Title: Your realtime AI is ngmi Description: Sean DuBois of OpenAI and Pion, and Kwindla Hultman Kramer of Daily and Pipecat, will talk about why you have to design realtime AI systems from the network layer up. Most people who build realtime AI apps and frameworks get it wrong. They build from either the model out or the app layer down. But unless you start with the network layer and build up, you'll never be able to deliver realtime audio and video streams reliably. And perhaps even worse, you'll get core primitives wrong: interruption handling, conversation state management, asynchronous function calling. Sean and Kwin agree on most things: old-school realtime systems people against the rest of the world. But they disagree on some important things, too, and will argue about those things live on stage. Do you need to give developers "thick" client-side realtime SDKs? Can you build truly great vendor neutral APIs? (You'll be surprised which of them argues which side, on that topic.) Speaker: Sean DuBois (WebRTC and Realtime API) Format: Talk ------------------------------------ Session Title: Building the Voice-First Future: Omnipresent Agents that Listen, Talk and Act Description: We’re entering a world where talking to machines feels as natural as talking to people. Voice is about to become the dominant interface for technology - ambient, always-on, and human by default. To get there, we need infrastructure that can orchestrate voice, tools, memory, real-time reasoning and telephony. This talk explores the vision for voice and how we're making it work at scale. Speaker: Jordan Dearsley (CEO) Format: Talk ------------------------------------ Session Title: Your realtime AI is ngmi Description: Sean DuBois of OpenAI and Pion, and Kwindla Hultman Kramer of Daily and Pipecat, will talk about why you have to design realtime AI systems from the network layer up. Most people who build realtime AI apps and frameworks get it wrong. They build from either the model out or the app layer down. But unless you start with the network layer and build up, you'll never be able to deliver realtime audio and video streams reliably. And perhaps even worse, you'll get core primitives wrong: interruption handling, conversation state management, asynchronous function calling. Sean and Kwin agree on most things: old-school realtime systems people against the rest of the world. But they disagree on some important things, too, and will argue about those things live on stage. Do you need to give developers "thick" client-side realtime SDKs? Can you build truly great vendor neutral APIs? (You'll be surprised which of them argues which side, on that topic.) Speaker: Kwindla Kramer (CEO ) Format: Talk ------------------------------------ Session Title: Milliseconds to Magic: Real‑Time Workflows using the Gemini Live API and Pipecat Description: The Gemini Live API GA is now powered by Google's best cost-effective thinking model Gemini 2.5 Flash. We will do a deep dive on the capabilities that the Gemini Live API combined with Pipecat unlock for devs with special focus on session management, turn detection, tool use (including async function calls), proactivity, multilinguality and integration with telephony and other infra. We will demo some of the more innovative capabilities. We will also talk through some customer use cases - especially how customers can use Pipecat to extend these realtime multimodal capabilities to client side applications such as customer support agents, gaming agents, tutoring agents etc. In addition, we also have an experimental version of the Live API powered by with Google's native audio offering that can be tried in an experimental capacity . This experimental model can communicate with seamless, emotive, steerable, multilingual dialogue and enhances use cases where more natural voices can be a big differentiator. Speaker: Kwindla Kramer (CEO ) Format: Talk ## Expo Open across all 3 days. Featuring 30+ booths and demo areas showcasing the most relevant and forward-thinking AI infrastructure and developer tools. Meet the engineers and founders behind: Microsoft, AWS, MongoDB, Neo4j, Hasura, Galileo, Sourcegraph, LlamaIndex, Temporal, Baseten, Elastic, Orb, Gitpod, Freeplay, Dagger, Traceloop, Pydantic, Arize, Arcjet, Zed, Modal, Agentuity, Weights & Biases, Fly.io, Sierra, Vellum, GenSx, Redis, Langbase, Twilio, Descope, SuperAnnotate, Unstructured, Baz, VESSL AI, Riza, Tambo, Sentry, Xpander, Thomson Reuters, ElevenLabs, Pomerium, Daytona, Polar Signals, Vercel, Ampersand, Together AI, Distributional, and many more. **[Buy Tickets](https://ti.to/software-3/ai-engineer-worlds-fair-2025?source={{UTM_SOURCE}}) | [Watch 2023/2024/2025 Talks](https://youtube.com/@aidotengineer)** ## AI Architects Track Invite-only track for AI executives (VPs, CTOs, Heads of AI at enterprises with >1000 employees). - Closed-door briefings and roundtables - Topics include technical org design, model building, FMOps, evals, inference optimization, build/buy decisions - Exclusive access to premium lunches and networking in the View Lounge ## Side Events (2024 Examples) A full week of satellite events hosted by our partners: - Hackathons (e.g., AI21, GenLab x AIEWF) - Deep Tech Week launch parties - RAG++ pre-party, AI DevTools nights - Rooftop happy hours and after-parties - Demo Days, quality conferences, founders dinners If you are organizing an event around June 1–8, email **sponsorships@ai.engineer** to be added to the official calendar. ## Venue & Hotel **Marriott Marquis, San Francisco** 780 Mission St, San Francisco, CA 94103 - Yerba Buena Ballroom: Keynotes, expo, and large sessions - Golden Gate Ballroom: Dedicated space for workshops - View Lounge: Reserved for AI Architects and Leadership Track networking **Discounted rates:** - Marriott Marquis: $399/night (May 29–Jun 7) - Beacon Grand (10-min walk): $289/night with group code `0601AEWF` Book Marriott Marquis (sold out) | [Book Beacon Grand](https://www.beacongrand.com/) (use group code: 0601AEWF) ## Sponsors ### Community Partners - Data Council, Hall C, Ai LA, Open Web Foundation, SF Python, SF Node, SF Java, Weaviate, Prompt Engineering, R meetup, Hypergrowth, Vector DAO, GenAI Collective, Cambrian ML, Ai Tinkerers, CodingNomads, Ai Product Builders, Ai Salon, Ai Makers SF, OpenSourceGrill, Ai Breakfast Club, Ai Stack, Nexus Events, Seattle Ai Society, RVC, Ai Happy Hour, SF Ai, FourthBrain, Ai Comic Books, Ai Engineer Foundation ### Presenting Sponsor Microsoft ### Innovation Partner AWS ### Track Sponsors Neo4j, Braintrust, Hasura ### Platinum Sponsors Graphite, Daily, Windsurf, MongoDB, AugmentCode, WorkOS ### Gold Sponsors Neo4j, Hasura, Galileo, Sourcegraph, LlamaIndex, Temporal, Baseten, Elastic, Orb, Gitpod, Freeplay, Dagger, Traceloop, Pydantic, Arize, Arcjet, Zed, Modal, Agentuity ### Silver Sponsors Weights & Biases, Fly.io, Sierra, Vellum, GenSx, Redis, Langbase, Twilio, Descope, SuperAnnotate, Unstructured, Baz, VESSL AI, Riza, Tambo, Sentry, Xpander, Thomson Reuters, ElevenLabs, Pomerium, Daytona, Polar Signals, Vercel, Ampersand, Together AI, Distributional ### Supporters Circle ## Testimonials > "The most insightful and exciting conference I ever attended. High signal, deeply technical, and community-focused." > — Yanick J. S. > "By far the best AI conference I've ever attended." > — Dedy Kredo > "Reminded me of the early Twitter dev scene—a spark for a decade of innovation." > — Eric Ryan > "Months of effort distilled into powerful 20-minute talks." > — Yubrew > "You could feel the buzz and optimism everywhere." > — Eric Ness ## Stay Updated - **[Buy Tickets](https://ti.to/software-3/ai-engineer-worlds-fair-2025?source={{UTM_SOURCE}})** Early bird discounts available until sell-out. - **[Watch Talks](https://youtube.com/@aidotengineer)** Browse sessions from 2023, 2024, and upcoming 2025. - **[Subscribe to Newsletter](https://ai.engineer/newsletter)** Get notified about speakers, tickets, livestreams, and community events. - **[Follow on X](https://twitter.com/aiDotEngineer)** Live updates, real-time speaker quotes, and behind-the-scenes moments. - **[Subscribe on YouTube](https://www.youtube.com/@aiengineer)** Access full talk recordings and curated playlists from every year. ## Contact & Connect - [Sponsor Inquiry](mailto:sponsorships@ai.engineer) - [Volunteer](https://ai.engineer/volunteer) - [Jobs](https://ai.engineer/jobs) - [Scholarships](https://ai.engineer/scholarships) - [Code of Conduct](https://ai.engineer/code-of-conduct) - [About](https://ai.engineer/about) - [What is an AI Engineer?](https://ai.engineer/what-is-an-ai-engineer) **Copyright 2025 Software 3.0 LLC** **Note:** The 2025 tracks are subject to change. Check the website for the latest updates. **[Apply to Speak](https://sessionize.com/ai-engineer-worlds-fair-2025)**