How PharmEasy improved chat closure by 95% while reducing LLM cost by 30%
PharmEasy's Easybot handles customer-support conversations involving operational actions: order status, refunds, cancellations, and more. With Divyam, PharmEasy moved from a static single-model path to an intelligent multi-model routing layer.
Digital Healthcare Platform
Tool-using support workflow
6 LLMs in routing pool
95%
Chat closure uplift
30%
Cost reduction
Zero
Significant quality dip
PharmEasy is one of India's leading digital healthcare platforms, serving customers across medicine delivery and healthcare commerce. To support customers at scale, PharmEasy built Easybot, a tool-using AI support agent designed to handle operational customer requests through natural conversation.
Unlike a generic chatbot, Easybot is a ReAct agent endowed with eight tools. Between a request and a response, the agent, guided by an LLM, can make any number of tool calls to check order status, retrieve refund details, raise tickets, process cancellations, and more. Support quality in such a system depends not just on fluent text, but on whether the right operational action is taken at the right moment.
9:41
My order #PE-48821 was supposed to arrive yesterday. Any update?
get_order_stats(order_id: "PE-48821")
Your order is delayed, high courier demand in your area caused a one-day slip. I've raised a priority ticket on your behalf. Updated delivery: tomorrow by 6 PM.
What if it still doesn't arrive tomorrow?
get_refund_details(order_id: "PE-48821")
If delivery misses the revised date, you're eligible for a full refund to your original payment method. I can initiate that for you, just reply here.
Ask Easybot
Meet Easybot
Easybot is a ReAct agent that handles customer-support conversations end-to-end. Each conversation is a sequence of turns; within each turn the agent can make any number of tool calls: order status, refund lookups, delayed-order ticketing, cancellations, wrong-update handling, and FAQ fallback. Quality means the right action taken in the right order, not just a fluent reply.
The Challenge
Static model choice is too limiting for a production support workflow
Most LLM applications begin with a single model. PharmEasy's Easybot was bound to GPT-4.1-mini, a capable model, but a single behavior profile. And a production support workflow does not have a single behavior profile.
The report that documented the engagement explicitly tracks four distinct dimensions of quality:
Tool-call accuracy: were the right tools called in the right order with the right arguments?
Post tool-call relevance: did the agent's response after a tool call actually address the user's need?
Cost: what is the LLM inference cost per conversation?
Frustration: a conversation-level signal for whether the customer experience broke down
Additional production data also revealed significant variation in Chat Closure Rate across models, the metric that most directly reflects whether a customer's issue was resolved. Across production slices, Chat Closure Rate varied from 8.6 to 24.2 depending on the model used. That spread is large enough to have a real business impact.
The problem is straightforward: a single model has a single cost-quality profile. Some turns are simple; some are complex. Some benefit from lower-cost models; some require stronger accuracy. Optimizing a production AI system requires more than picking the best single model and binding the application to it indefinitely.
The Solution
A smarter layer beneath Easybot, not a rebuild
The key insight behind the PharmEasy engagement is how Divyam integrates. The report states clearly: Divyam Router doesn't interfere with Easybot's architecture. It only decouples the ReAct agent from the LLM it was bound to.
That decoupling enables Easybot to benefit from the quality and cost diversity across a range of LLMs. Given a quality target, the Divyam Router performs an arbitrage across those LLMs, in PharmEasy's case, across six choices: GPT-4.1, GPT-4.1-mini, GPT-4.1-nano, Gemini 1.5 Flash, Gemini 2.0 Flash, and Gemini 2.5 Flash.
Divyam does not force a business to replace one model with another. It creates a smarter system underneath, one that can use a conglomeration of models, including the incumbent where appropriate, and route each turn to the model that best fits the cost-quality objective.
Evaluated safely in shadow mode
Divyam operated in shadow mode during the initial evaluation window (09/11–09/22). In shadow mode, Divyam makes routing recommendations but does not act on them; every live request continued to be served by GPT-4.1-mini. This allowed PharmEasy to measure the potential impact of routing without any risk to live traffic.
The evaluation was not reduced to a single generic score. Divyam wrote a custom tool-call accuracy eval specific to Easybot's behavior: to be called accurate, the same tools must be called in the same order with the same arguments. GPT-4.1 served as the frame of reference, at 100% accuracy. This grounded the evaluation in the actual behavior of a tool-using support system.
How Divyam routed Easybot traffic across 6 LLMs
Gemini 2.0 Flash
61.2%
Gemini 1.5 Flash
25.7%
GPT-4.1
11.4%
Other
1.7%
~87% of traffic routes to cost-efficient Gemini models. The remaining ~11% is deliberately retained on GPT-4.1, the most accurate model, for turns where accuracy demands it. That is what makes the Divyam layer powerful: not a wholesale model swap, but a better production system.
The Results
95% chat closure uplift and 30% lower cost in production
The most important outcome from the PharmEasy engagement was that Divyam.AI enabled Easybot to operate as a multi-model system rather than a single-model workflow. Using Divyam's routed mixture of models, PharmEasy saw a 95% uplift in Chat Closure Rate and a 30% reduction in cost, both customer-confirmed production metrics.
22.55% projected cost savings, with no statistically significant quality dip
Within the shadow-mode window covered by the evaluation report, Divyam projected a 22.55% cost reduction against GPT-4.1-mini with no statistically significant dip in either measured quality metric. Statistical significance was assessed using two-sided hypothesis testing at the 95% significance level.
Tool-call accuracy: 85.92% (baseline) vs. 85.09% (Divyam), not significant (p = 0.71)
Post tool-call relevance: 77.88% (baseline) vs. 76.88% (Divyam), not significant (p = 0.71)
A routed system can outperform static assignment
The eval benchmark told a clear story. Measured against a controlled dataset, the Divyam-routed system delivered a better cost-quality operating point than any single static baseline:
Model
Tool-Call Accuracy
Cost (USD, eval batch)
vs. Baseline
GPT-4o-mini
85.41%
0.1133
−62.5%
GPT-4.1-mini baseline
88.11%
0.3021
GPT-4.1-nano
83.24%
0.0755
−75.0%
Gemini 1.5 Flash
78.92%
0.0566
−81.3%
Divyam Router
85.95%
0.0691
−77.1%
Accuracy: measured against GPT-4.1's tool calls as ground truth (100%). GPT-4o-mini could not serve as the reference frame; if it were set at 100%, the router could never exceed its performance. Cost savings: vs. GPT-4.1-mini, the production baseline. Costs are total LLM spend across the evaluation batch, shown for relative comparison. The Divyam-routed system achieved higher accuracy than GPT-4o-mini at 61% of the cost (39% cheaper), a better cost-quality operating point than a simple static baseline.
95%
Chat closure uplift
30%
Cost reduction
Zero
Significant quality dip
6
LLMs in the routing pool
Before Divyam
Fixed
Single model for every turn
Static
No cost-quality optimization
Manual
New model adoption requires new experiments
With Divyam
Dynamic
Turn-level model routing across 6 LLMs
30% less
Inference cost, no quality loss
30% savings
Automatic
New models adopted immediately, no experiments needed
What's Next
Continuous optimization as the LLM landscape evolves
With Divyam's routing layer active in production, PharmEasy is no longer tied to a fixed model choice. As new, more capable models enter the market, the routing layer can evaluate and adopt them immediately, without engineering effort or manual re-evaluation.
That is a compounding advantage. Every model release is an opportunity to improve cost, quality, or both. With a static single-model setup, capturing that improvement requires a new experiment cycle. With Divyam, it happens continuously and automatically.
See what Divyam can do for your AI system
Join teams like PharmEasy and MakeMyTrip that are cutting LLM costs without sacrificing quality.