Customer Story

How PharmEasy improved chat closure by 95% while reducing LLM cost by 30%

PharmEasy's Easybot handles customer-support conversations involving operational actions: order status, refunds, cancellations, and more. With Divyam, PharmEasy moved from a static single-model path to an intelligent multi-model routing layer.

Digital Healthcare Platform Tool-using support workflow 6 LLMs in routing pool

95%

Chat closure uplift

30%

Cost reduction

Zero

Significant quality dip

PharmEasy is one of India's leading digital healthcare platforms, serving customers across medicine delivery and healthcare commerce. To support customers at scale, PharmEasy built Easybot, a tool-using AI support agent designed to handle operational customer requests through natural conversation.

Unlike a generic chatbot, Easybot is a ReAct agent endowed with eight tools. Between a request and a response, the agent, guided by an LLM, can make any number of tool calls to check order status, retrieve refund details, raise tickets, process cancellations, and more. Support quality in such a system depends not just on fluent text, but on whether the right operational action is taken at the right moment.

9:41

My order #PE-48821 was supposed to arrive yesterday. Any update?

get_order_stats(order_id: "PE-48821")

Your order is delayed, high courier demand in your area caused a one-day slip. I've raised a priority ticket on your behalf. Updated delivery: tomorrow by 6 PM.

What if it still doesn't arrive tomorrow?

get_refund_details(order_id: "PE-48821")

If delivery misses the revised date, you're eligible for a full refund to your original payment method. I can initiate that for you, just reply here.

Ask Easybot

Meet Easybot

Easybot is a ReAct agent that handles customer-support conversations end-to-end. Each conversation is a sequence of turns; within each turn the agent can make any number of tool calls: order status, refund lookups, delayed-order ticketing, cancellations, wrong-update handling, and FAQ fallback. Quality means the right action taken in the right order, not just a fluent reply.

The Challenge

Static model choice is too limiting for a production support workflow

Most LLM applications begin with a single model. PharmEasy's Easybot was bound to GPT-4.1-mini, a capable model, but a single behavior profile. And a production support workflow does not have a single behavior profile.

The report that documented the engagement explicitly tracks four distinct dimensions of quality:

Tool-call accuracy: were the right tools called in the right order with the right arguments?
Post tool-call relevance: did the agent's response after a tool call actually address the user's need?
Cost: what is the LLM inference cost per conversation?
Frustration: a conversation-level signal for whether the customer experience broke down

Additional production data also revealed significant variation in Chat Closure Rate across models, the metric that most directly reflects whether a customer's issue was resolved. Across production slices, Chat Closure Rate varied from 8.6 to 24.2 depending on the model used. That spread is large enough to have a real business impact.

The problem is straightforward: a single model has a single cost-quality profile. Some turns are simple; some are complex. Some benefit from lower-cost models; some require stronger accuracy. Optimizing a production AI system requires more than picking the best single model and binding the application to it indefinitely.

The Solution

A smarter layer beneath Easybot, not a rebuild

The key insight behind the PharmEasy engagement is how Divyam integrates. The report states clearly: Divyam Router doesn't interfere with Easybot's architecture. It only decouples the ReAct agent from the LLM it was bound to.

That decoupling enables Easybot to benefit from the quality and cost diversity across a range of LLMs. Given a quality target, the Divyam Router performs an arbitrage across those LLMs, in PharmEasy's case, across six choices: GPT-4.1, GPT-4.1-mini, GPT-4.1-nano, Gemini 1.5 Flash, Gemini 2.0 Flash, and Gemini 2.5 Flash.

Divyam does not force a business to replace one model with another. It creates a smarter system underneath, one that can use a conglomeration of models, including the incumbent where appropriate, and route each turn to the model that best fits the cost-quality objective.

Evaluated safely in shadow mode

Divyam operated in shadow mode during the initial evaluation window (09/11–09/22). In shadow mode, Divyam makes routing recommendations but does not act on them; every live request continued to be served by GPT-4.1-mini. This allowed PharmEasy to measure the potential impact of routing without any risk to live traffic.

The evaluation was not reduced to a single generic score. Divyam wrote a custom tool-call accuracy eval specific to Easybot's behavior: to be called accurate, the same tools must be called in the same order with the same arguments. GPT-4.1 served as the frame of reference, at 100% accuracy. This grounded the evaluation in the actual behavior of a tool-using support system.

How Divyam routed Easybot traffic across 6 LLMs

Gemini 2.0 Flash

61.2%

Gemini 1.5 Flash

25.7%

GPT-4.1

11.4%

Other

1.7%

~87% of traffic routes to cost-efficient Gemini models. The remaining ~11% is deliberately retained on GPT-4.1, the most accurate model, for turns where accuracy demands it. That is what makes the Divyam layer powerful: not a wholesale model swap, but a better production system.

The Results

95% chat closure uplift and 30% lower cost in production

The most important outcome from the PharmEasy engagement was that Divyam.AI enabled Easybot to operate as a multi-model system rather than a single-model workflow. Using Divyam's routed mixture of models, PharmEasy saw a 95% uplift in Chat Closure Rate and a 30% reduction in cost, both customer-confirmed production metrics.

22.55% projected cost savings, with no statistically significant quality dip

Within the shadow-mode window covered by the evaluation report, Divyam projected a 22.55% cost reduction against GPT-4.1-mini with no statistically significant dip in either measured quality metric. Statistical significance was assessed using two-sided hypothesis testing at the 95% significance level.

Tool-call accuracy: 85.92% (baseline) vs. 85.09% (Divyam), not significant (p = 0.71)
Post tool-call relevance: 77.88% (baseline) vs. 76.88% (Divyam), not significant (p = 0.71)

A routed system can outperform static assignment

The eval benchmark told a clear story. Measured against a controlled dataset, the Divyam-routed system delivered a better cost-quality operating point than any single static baseline:

Model	Tool-Call Accuracy	Cost (USD, eval batch)	vs. Baseline
GPT-4o-mini	85.41%	0.1133	−62.5%
GPT-4.1-mini baseline	88.11%	0.3021
GPT-4.1-nano	83.24%	0.0755	−75.0%
Gemini 1.5 Flash	78.92%	0.0566	−81.3%
Divyam Router	85.95%	0.0691	−77.1%

Accuracy: measured against GPT-4.1's tool calls as ground truth (100%). GPT-4o-mini could not serve as the reference frame; if it were set at 100%, the router could never exceed its performance. Cost savings: vs. GPT-4.1-mini, the production baseline. Costs are total LLM spend across the evaluation batch, shown for relative comparison. The Divyam-routed system achieved higher accuracy than GPT-4o-mini at 61% of the cost (39% cheaper), a better cost-quality operating point than a simple static baseline.

95%

Chat closure uplift

30%

Cost reduction

Zero

Significant quality dip

LLMs in the routing pool

Before Divyam

Fixed

Single model for every turn

Static

No cost-quality optimization

Manual

New model adoption requires new experiments

With Divyam

Dynamic

Turn-level model routing across 6 LLMs

30% less

Inference cost, no quality loss

30% savings

Automatic

New models adopted immediately, no experiments needed

What's Next

Continuous optimization as the LLM landscape evolves

With Divyam's routing layer active in production, PharmEasy is no longer tied to a fixed model choice. As new, more capable models enter the market, the routing layer can evaluate and adopt them immediately, without engineering effort or manual re-evaluation.

That is a compounding advantage. Every model release is an opportunity to improve cost, quality, or both. With a static single-model setup, capturing that improvement requires a new experiment cycle. With Divyam, it happens continuously and automatically.

See what Divyam can do for your AI system

Join teams like PharmEasy and MakeMyTrip that are cutting LLM costs without sacrificing quality.

Book a Demo More Customer Stories