This is the story of one of our most successful implementations — a Customer Intake Agent for a regional logistics company we'll call "RapidRoute" (name changed for confidentiality).
RapidRoute has 45 employees and handles approximately 90–120 inbound customer emails per day. Before our engagement, two team members spent their entire mornings processing this inbox — a combined 5–6 hours of manual, repetitive work daily.
Week 1: Discovery and Architecture Design
Days 1–2: Process Mapping. We started by shadowing RapidRoute's intake team for a full morning. We documented every email type they received, every decision they made, and every action they took as a result. The result was a decision tree with 7 primary email categories and 14 possible actions.
Days 3–4: Data Architecture. We designed the Dataverse schema — the tables, columns, and relationships that the agent would create and update. We created an InboundRequest table, a Classification lookup table, and linked both to the existing Dynamics 365 Account and Contact tables.
Day 5: AI Model Selection. We chose a two-model architecture: GPT-4o for classification (fast, cheap, excellent at structured output) and Claude 3.5 Sonnet for response drafting (more nuanced, better at sensitive customer communications). We built and tested prompts for both.
Week 2: Build, Test, Deploy
Days 6–8: Core Agent Build. We built the Power Automate flow that triggers on every new email to the shared inbox. The flow calls the GPT-4o classification endpoint, writes the result to Dataverse, and then conditionally calls Claude for response drafting if the email type warrants it.
Days 9–11: Testing and Refinement. We ran 200 historical emails through the agent and reviewed outputs. Classification accuracy was 89% on day one. We refined the prompts over two days to reach 96% accuracy on our test set.
Days 12–14: Deployment and Handoff. We deployed to production with a "review queue" — all agent actions were logged and surfaced in a Dataverse-backed Power Apps review dashboard for the first two weeks. This gave RapidRoute's team confidence to verify outputs before fully trusting the agent.
Results After 30 Days
| Metric | Before Agent | After Agent |
|---|---|---|
| Daily intake time per person | 2.5–3 hours | 20–30 minutes |
| Average response time | 5.8 hours | 18 minutes |
| Emails requiring human review | 100% | ~14% |
| Team hours freed per week | — | ~20 hours |
Total build cost: This engagement was completed in our standard 3-week sprint. Ongoing API costs (GPT-4o + Claude) run approximately $85–120/month for RapidRoute's email volume. The ROI payback period was under 3 weeks.
Lessons Learned
Start with high-confidence cases. We had the agent auto-respond only to the simplest, most clear-cut email types in the first two weeks. As accuracy was confirmed, we expanded autonomy. This graduated approach builds organizational trust in the system.
Human review queues are a feature, not a compromise. The review dashboard wasn't just a safety net — it became RapidRoute's primary way of monitoring the agent and catching edge cases. It's now a permanent part of their workflow.
Two models beat one. Using GPT-4o for classification and Claude for drafting was more cost-effective than using one premium model for everything, and produced better results for each task. Model selection at the task level is a discipline worth investing in.