AI Assistant in Adobe Experience Platform: Evaluation and Continual Improvement
AI Assistant in Adobe Experience Platform represents a leap forward in building enterprise-grade applications in the Generative AI era. This article provides a behind-the-scenes account of how we approach evaluation and continual improvement as detailed in our research paper, Evaluation and Continual Improvement for an Enterprise AI Assistant.
︎Image credits: Adobe Stock
Problems
Enterprise users often face significant friction when trying to extract insights from their data. Conversational AI assistants, as illustrated in the figure below, promise to simplify this process, but delivering a reliable, precision-oriented enterprise-grade solution comes with unique challenges: fragmented data sources, evolving customer needs, and the risk of AI-generated errors that erode user trust.
︎
As we delved deeper into this project, we encountered a critical question: How do we effectively evaluate and improve an AI assistant that’s constantly evolving in a dynamic enterprise environment? This challenge is far from trivial. Enterprise AI assistants deal with sensitive customer data, need to adapt to shifting user bases, and must balance complex metrics while maintaining privacy and security. Traditional evaluation methods fall short in this context, often providing incomplete or misleading feedback.
Our Approach
To address these issues, we’ve developed a novel framework for evaluation and continual improvement. At its core is the observation that “not all errors are the same”. We have adopted a “severity-based” error taxonomy that aligns our metrics with real user experiences (see the table below):
- Severity 0 errors: These are the most insidious — answers that look correct but are wrong, potentially eroding user trust.
- Severity 1 errors: Incorrect answers that users can’t recover from, leading to frustration.
- Severity 2 errors: Errors that users can overcome through rephrasing, causing minor annoyance.
︎
This taxonomy allows us to prioritize improvements that have the most significant impact on user experience and trust. It’s part of a comprehensive approach that includes:
- Prioritizing metrics directly impacted by production changes
- Efficient allocation of human evaluators
- Collection of both end-to-end and component-wise metrics
- System-wide improvements across all components
︎
The impact of this framework on our customers has been substantial. By focusing on severity-based errors, we’re delivering more reliable and trustworthy AI assistance. Our human-centered approach ensures that improvements align with real user needs and pain points, as illustrated in the table below.
︎
What’s Next
We’re just getting started. Our focus now is on making AI Assistant in Adobe Experience Platform even more proactive, meeting the users in their natural workflow and expanding coverage. We’re also improving our evaluation framework along a few key dimensions:
1. Adding proactive evaluations over samples that are representative of production queries. This allows us to forecast the impact of new features and improvements on error rates.
2. Formalizing error-severity definitions by breaking down the subjective determinations into a series of less-subjective questions that a human annotator must answer. This has helped to improve the consistency of these error severity determinations.
3. Scaling evaluation with “LLM-as-judge” annotations– this is an extremely active area of research, and we are actively working on incorporating these methods, especially for tasks that do not require domain expertise to annotate.
To learn more about our work and the impact we’re seeing, read the full paper here and follow Adobe Experience Cloud on LinkedIn for updates on our latest innovations.
Start using AI Assistant in Adobe Experience Platform today and supercharge the productivity of your marketing teams. AI Assistant is now available in Real-Time CDP, Journey Optimizer, and Customer Journey Analytics! For more details on getting access, visit the Access AI Assistant in Experience Platform page.
Authors
Akash V. Maharaj, Kun Qian, Uttaran Bhattacharya, Sally Fang, Horia Galatanu, Manas Garg, Rachel Hanessian, Nishant Kapoor, Ken Russell, Shivakumar Vaithyanathan, and Yunyao Li
Guang-jie Ren, Huong Vu, and Namita Krishnan also contributed to this article.