/ Designing with Feedback: Building Better LLM Prompts Through Evaluation

Description

In this talk, we’ll explore how thoughtful evaluation is transforming the way we design, test, and refine prompts for large language models at Preply. From manual review to automated scoring systems, we’ll walk through the tools and frameworks we use to ensure prompt quality and reliability at scale. Learn how evaluators—both human and model-based—help us iterate faster, uncover failure modes, and drive continuous improvement in our development process.

Lightning talk ⚡️ Intermediate ⭐⭐ Track: AI, ML, Bigdata, Python

LLM

GenAI

python

🗳️ Vote this talk
This website uses cookies to enhance the user experience. Read here