“There is a large class of problems that are easy to imagine and build demos for, but extremely hard to make products out of. For example, self-driving: It’s easy to demo a car self-driving around a block, but making it into a product takes a decade.” - Karpathy
This talk is about practical patterns for integrating large language models (LLMs)
into systems and products. We’ll draw from academic research, industry resources,
and practitioner know-how, and try to distill them into key ideas and practices.
There are seven key patterns. I’ve also organized them along the spectrum of improving
performance vs. reducing cost/risk, and closer to the data vs. closer to the user.
- Evals: To measure performance
- RAG: To add recent, external knowledge
- Fine-tuning: To get better at specific tasks
- Caching: To reduce latency & cost
- Guardrails: To ensure output quality
- Defensive UX: To anticipate & manage errors gracefully
- Collect user feedback: To build our data flywheel