Aparna Dhinkaran - LLM Evals That Work IRL










With nearly two-thirds of enterprise developers planning production deployments of large language models this year, LLM evaluation has never been more important. LLM evaluation is also an area where confusion reigns, starting with ambiguity around what “LLM evals” even means. Often, LLM model evaluation – quantifying general fitness (i.e. on the Hugging Face leaderboard) – is conflated with task-specific LLM system evaluation. And while many foundation model providers offer their own evals, AI engineers building LLM systems designed to plug into many models or tools need a way to objectively evaluate both different foundation models and their own systems with rigorous techniques. In this session, Arize AI founder Aparna Dhinakaran will release research onstage and walk attendees through real life examples of building an LLM Eval from scratch. This session will build on multiple research pieces that have garnered millions of views across social platforms, diving into techniques to build out robust LLM evals and ultimately gain a better understanding of the limits of LLM capabilities. Want to build your own LLM task evals for a specific use case leveraging open source tools? Want to see the latest research on which foundation models your company should be using for specific use cases? You won’t want to miss this session!
Aparna Dhinakaran is the Co-Founder and Chief Product Officer at Arize AI, a pioneer, and early leader in machine learning (ML) observability. A frequent speaker at top conferences and thought leader in the space, Dhinakaran was recently named to the Forbes 30 Under 30. Before Arize, Dhinakaran was an ML engineer and leader at Uber, Apple, and TubeMogul (acquired by Adobe). During her time at Uber, she built several core ML Infrastructure platforms, including Michelangelo. She has a bachelor’s from Berkeley's Electrical Engineering and Computer Science program, where she published research with Berkeley's AI Research group. She is on a leave of absence from the Computer Vision Ph.D. program at Cornell University.
Aparna Dhinakaran is the Co-Founder and Chief Product Officer at Arize AI, a pioneer, and early leader in machine learning (ML) observability. A frequent speaker at top conferences and thought leader in the space, Dhinakaran was recently named to the Forbes 30 Under 30. Before Arize, Dhinakaran was an ML engineer and leader at Uber, Apple, and TubeMogul (acquired by Adobe). During her time at Uber, she built several core ML Infrastructure platforms, including Michelangelo. She has a bachelor’s from Berkeley's Electrical Engineering and Computer Science program, where she published research with Berkeley's AI Research group. She is on a leave of absence from the Computer Vision Ph.D. program at Cornell University.
We have now sold out of Early Bird tickets; General Admission has also sold out.
Please join us online for the free livestream.