Aparna Dhinkaran - LLM Evals That Work IRL

Video Available!

With nearly two-thirds of enterprise developers planning production deployments of large language models this year, LLM evaluation has never been more important. LLM evaluation is also an area where confusion reigns, starting with ambiguity around what “LLM evals” even means. Often, LLM model evaluation – quantifying general fitness (i.e. on the Hugging Face leaderboard) – is conflated with task-specific LLM system evaluation. And while many foundation model providers offer their own evals, AI engineers building LLM systems designed to plug into many models or tools need a way to objectively evaluate both different foundation models and their own systems with rigorous techniques. In this session, Arize AI founder Aparna Dhinakaran will release research onstage and walk attendees through real life examples of building an LLM Eval from scratch. This session will build on multiple research pieces that have garnered millions of views across social platforms, diving into techniques to build out robust LLM evals and ultimately gain a better understanding of the limits of LLM capabilities. Want to build your own LLM task evals for a specific use case leveraging open source tools? Want to see the latest research on which foundation models your company should be using for specific use cases? You won’t want to miss this session!

Buy Tickets

We have now sold out of Early Bird tickets; General Admission has also sold out.
Please join us online for the free livestream.

Buy Tickets SOLD OUT!