Chapters

So... My AI App Has Been Lying to Users (And How I Fixed It)

Name: So... My AI App Has Been Lying to Users (And How I Fixed It)
Uploaded: 2026-04-07T15:03:00.000000Z
Duration: 18 min 59 s
Channel: Chris Raroque
Description: Chris Raroque discusses improving the accuracy of his AI calorie tracking app through testing and experimentation.

Chris Raroque

18:59

Apr 7, 2026

10.7K views

595

Show description

Hi my name is Chris and I build productivity apps 👋 and this is a DEEP DIVE on how I build (and iterate on) AI systems --- Braintrust (the tool I use for AI evals): https://braintrustdata.link/Uww5Jin --- My apps and socials: https://chrisraroque.com Timestamps: 0:00 – Intro / What we are covering 1:52 – Overview of AI evals (and why you need them) 4:22 – My ACTUAL AI eval workflow 5:14 – Attempt 1 (split search and calculation system) 6:43 – Attempt 2 (mini agent) 7:55 – Attempt 3 (swapping search providers) 9:33 – Trying to squeeze more out of Exa 10:51 – My new AI system for Amy (using Exa) 11:23 – Braintrust (what i use to run AI evals) 14:23 – Common AI eval mistakes (that i made) 15:37 – Writing good test cases 16:22 – How to IMPROVE your AI system with user feedback 17:37 – A summary of what I learned from my experiments 18:33 – Final thoughts and thank you :) #appdevelopment #dayinthelife #softwareengineer #startup #softwaredev #indieappdeveloper #dayinthelifecoding #codewithme #buildinpublic #vlog

Have questions about this video?

Deutsch Español Русский

AI accuracy challenges

AI evaluation system

AI model testing

Search provider impact

AI cost-performance optimization

TL;DR

Chris Raroque discusses improving the accuracy of his AI calorie tracking app through testing and experimentation.

Watch Score

The video provides meaningful insights into AI optimization with practical advice.

2/10

Clickbait

positive

Sentiment

Should watch

Any app developer or AI manager who's interested in improving AI accuracy and testing should watch.

Can skip

Those not engaged with AI technologies or uninterested in technical deep dives may skip.

Quality (9/10)

The video provides valuable insights into AI testing with transparency about method and outcomes.

Clickbait (2/10)

Title accurately reflects content without exaggerated claims.

Sponsorship Detected

Brain Trust — ~30s

Summary

Chris Raroque faces issues with accuracy in his AI-based calorie tracking app, Amy, leading to subscriber cancellations. The AI's accuracy is crucial since it pulls data from nutrition databases but often errs, especially with international products. Raroque demonstrates his approach to enhancing AI accuracy using real production data and iterative testing, known as "evals." He describes how he used Brain Trust's tools to run tests and get objective accuracy scores. Through multiple attempts to improve the AI, such as separating search and reasoning tasks, Raroque performs comparisons between different models and search providers. His trials show that although some innovations, like using Gemini 3 for reasoning, initially worsen the app's performance, others like switching search providers from Perplexity to Exa can boost the app's accuracy. Despite some failures, Exa emerges as a more effective search partner. Raroque reflects on the challenges and successes in optimizing his AI, emphasizing the need for continuous testing and adaptation to changes in data or performance over time. This ongoing process has increased the app's accuracy while maintaining cost-efficiency and speed. He further underscores the importance of setting up a robust eval system, involving judiciously choosing test cases and ensuring judgements are reliable. Finally, he encourages viewers to implement their own eval systems, sharing his tools and insights for better AI performance management.

Key Takeaways

Separating search and reasoning tasks in AI can improve control and accuracy.
Switching search providers significantly impacted the accuracy of AI output.
Complex solutions often underperformed compared to simpler ones.
Continuous evaluation is vital for maintaining AI system integrity.
Testing with real user data highlighted international data disparities.
Brain Trust provides vital tools for comprehensive AI testing.
New AI setups revealed differences in speed and cost.
Experimentation confirmed necessity for specific test cases.
Third-party AI tools’ updates can unpredictably boost performance.
DIY eval systems are crucial for deploying effective AI solutions.

Action Items

1Set up an eval system for AI testing.
2Consider switching search providers based on latest data performance.
3Regularly update test cases based on user feedback.

Prerequisites

Basic understanding of AI models
Knowledge of app development concepts
Familiarity with performance testing methodologies

Key Definitions

evals: Test cases used to evaluate AI system improvements.
Brain Trust: Platform used for issuing evals and scoring AI performance.

Mentioned Resources

Brain Trust(tool)

Used for evaluating and scoring AI effectiveness.

Perplexity Sonar(tool)

Initial search AI model used in the app.

Gemini 3 Flash(tool)

Used as a reasoning model in testing.

Exa(tool)

Improved search provider for accuracy tests.

My Fitness Pal(website)

Mentioned as a source for nutrition databases.

Content Analysis

Type

vlog

Sentiment

positive

Difficulty

intermediate

Complexity

moderate

Target Audience

App developers, AI enthusiasts, productivity tool users

#ai testing#app development#ai accuracy#calorie tracking app#productivity tools#brain trust#search providers#eval system#ai experimentation#model optimization