Philip Kiely & Pankaj Gupta - From model weights to API endpoint with TensorRT-LLM

YBB Salons 12 - 15

GPUs & Inference: TensorRT-LLM is the highest-performance model serving framework, but it can have a steep learning curve when you’re just getting started. We run TensorRT and TensorRT-LLM in production and have seen both the incredible performance gains it offers and the hurdles to overcome in getting it up and running. In this workshop, participants will learn how to start using TensorRT-LLM, including selecting a model to optimize, building an engine for it with TensorRT-LLM, setting batch sizes and sequence lengths, and running it on a cloud GPU.

Philip Kiely

Philip Kiely is a software developer and author based out of Chicago. Originally from Clive, Iowa, he graduated from Grinnell College with honors in Computer Science. Philip joined Baseten in January 2022 and works across documentation, technical content, and developer experience. Outside of work, he's a lifelong martial artist, a voracious reader, and, unfortunately, a Bears fan.

Philip Kiely
Philip KielyHead of Developer Relations

Pankaj Gupta

Pankaj Gupta is a co-founder of Baseten, where he leads model performance. Pankaj has spent his career making systems faster and more efficient, from optimizing data processing libraries at Twitter to search infrastructure at Uber and media processing at Adobe. A graduate of IIT Delhi, Pankaj now lives in the Bay Area, where he enjoys gardening and evening walks around his neighborhood.

Pankaj Gupta
Pankaj GuptaCo-Founder

Buy Tickets

We have now sold out of Early Bird tickets; General Admission selling out soon.

Buy Tickets

* Expo sessions include talks, workshops, and facilitated discussions led by expo partners and organizer-curated speakers in the Expo Arena breakout rooms