So... My AI App Has Been Lying to Users (And How I Fixed It)
Show description
Hi my name is Chris and I build productivity apps π and this is a DEEP DIVE on how I build (and iterate on) AI systems --- Braintrust (the tool I use for AI evals): https://braintrustdata.link/Uww5Jin --- My apps and socials: https://chrisraroque.com Timestamps: 0:00 β Intro / What we are covering 1:52 β Overview of AI evals (and why you need them) 4:22 β My ACTUAL AI eval workflow 5:14 β Attempt 1 (split search and calculation system) 6:43 β Attempt 2 (mini agent) 7:55 β Attempt 3 (swapping search providers) 9:33 β Trying to squeeze more out of Exa 10:51 β My new AI system for Amy (using Exa) 11:23 β Braintrust (what i use to run AI evals) 14:23 β Common AI eval mistakes (that i made) 15:37 β Writing good test cases 16:22 β How to IMPROVE your AI system with user feedback 17:37 β A summary of what I learned from my experiments 18:33 β Final thoughts and thank you :) #appdevelopment #dayinthelife #softwareengineer #startup #softwaredev #indieappdeveloper #dayinthelifecoding #codewithme #buildinpublic #vlog
Have questions about this video?
Sign up to chat with AI and get deeper insights.
Sign up β 5 free creditsChris Raroque discusses improving the accuracy of his AI calorie tracking app through testing and experimentation.
The video provides meaningful insights into AI optimization with practical advice.
Any app developer or AI manager who's interested in improving AI accuracy and testing should watch.
Those not engaged with AI technologies or uninterested in technical deep dives may skip.
The video provides valuable insights into AI testing with transparency about method and outcomes.
Title accurately reflects content without exaggerated claims.
- Separating search and reasoning tasks in AI can improve control and accuracy.
- Switching search providers significantly impacted the accuracy of AI output.
- Complex solutions often underperformed compared to simpler ones.
- Continuous evaluation is vital for maintaining AI system integrity.
- Testing with real user data highlighted international data disparities.
- Brain Trust provides vital tools for comprehensive AI testing.
- New AI setups revealed differences in speed and cost.
- Experimentation confirmed necessity for specific test cases.
- Third-party AI toolsβ updates can unpredictably boost performance.
- DIY eval systems are crucial for deploying effective AI solutions.
- 1Set up an eval system for AI testing.
- 2Consider switching search providers based on latest data performance.
- 3Regularly update test cases based on user feedback.
- Basic understanding of AI models
- Knowledge of app development concepts
- Familiarity with performance testing methodologies
- evals
- Test cases used to evaluate AI system improvements.
- Brain Trust
- Platform used for issuing evals and scoring AI performance.
Used for evaluating and scoring AI effectiveness.
Initial search AI model used in the app.
Used as a reasoning model in testing.
Improved search provider for accuracy tests.
Mentioned as a source for nutrition databases.
vlog
positive
intermediate
moderate
App developers, AI enthusiasts, productivity tool users