Skip to main content
Research

Divyam.AI's Performance vis-a-vis Microsoft and Nvidia Routers

Intelligence on tap — but which tap should you turn?

DA
Divyam.AI Divyam.AI Research
· 22 min read

Today, you have a choice from a crowd of models which flex intelligence and capability. You have intelligence on tap, but which tap should you turn? Getting the right balance of power and proportionality when choosing your AI toolset plays a huge role in your success with AI deployments. In the context of LLMs, this challenge is crucial as it often bags the bulk of your AI expenditure. Divyam.AI addresses this exact challenge for you by helping you optimize your cost-performance balance for your GenAI deployments.

In this article, we present to you a comparative study of Divyam's Router (the DAI Router in the diagram below) capabilities vis-a-vis industry titans: Microsoft Model Router and NVIDIA LLM Router.

Client App api_key="divyam..." DIVYAM GATEWAY New Models Feed Global Knowledge Base Performance config Cost config Divyam AI Router Re-train Fine Tune A/B Test Reinforcement feedback Production Deployment

To understand the comparison, let us dig into the principle on which Divyam's Router works.


The science behind Divyam's Router

Suppose you want to assess the mental abilities and knowledge-based skills of thousands of students — what you would do is design a test with a questionnaire and make them take the test and rank them on their performance. Institutions have been doing this for decades using the psychometric framework called Item Response Theory (IRT), which has been around since 1968!

IRT is a family of psychometrically-grounded statistical models that takes the response matrix (i.e., how each student answered every question in the questionnaire) as input, and provides an estimate of “skill” possessed by each student, the “skill” needed to solve each question, along with their “difficulty”.

To draw a parallel, now consider each LLM as a student and evaluation benchmarks to be the test. Divyam extends the IRT family to enable the estimation of skill and difficulty of a hitherto unseen prompt, and utilizes the estimated skill of each LLM to estimate an ex-ante performance estimate.


Comparison of Routers

Routers are models that are trained to select the best large language model (LLM) to respond to a given prompt in real time. It uses a combination of preexisting models to provide high performance while saving on compute costs where possible, all packaged as a single model deployment.

The Divyam LLM Router employs a proprietary algorithm that assesses the skill required (and the difficulty level) of each prompt, and based on that, routes it to the available models for routing.

Dataset

For our comparative study, the benchmark that we have chosen is the MMLU-Pro, which tests the reasoning and knowledge capabilities of an LLM via a set of multiple choice questions spanning 14 subjects, such as Math, Physics, Computer Science, and Philosophy. Each question is presented with 10 possible answers, and an LLM, upon receiving a 5-shot prompt, must choose the sole correct answer. Randomly-chosen 20% samples (2,406 out of 12,032) serve as the test dataset, on which we report the performance.

LLM Performance

In the table below, we present the performance of a set of contemporary LLMs on MMLU-Pro. From the table, we can see that o4-mini has the best accuracy for this benchmark. Our subsequent tests will take o4-mini as the basis for our relative comparisons.

LLM Accuracy (%) Cost ($)
gpt-4.1 79.47 11.71
gpt-4.1-mini 78.55 2.35
gpt-4.1-nano 61.68 0.88
o4-mini 81.67 5.5
gemini-1.5-pro 71.61 5.01
gemini-2.0-flash 78.1 0.87
gemini-1.5-flash 63.8 0.33

Results with Microsoft Model Router

The Microsoft Model Router (MS Router) is packaged as a single Azure AI Foundry model that you deploy. Notably, Model Router cannot be fine-tuned on your data.

The LLMs are chosen from a pre-existing set of models namely gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o4-mini. Notably, one cannot add or remove to this list.

Unlike Microsoft Model Router, Divyam routes your queries to the right LLM based on your preference for (a) cost optimization (b) performance optimization.

The graph below of Cost Savings vs Accuracy presents Router performance where the selection is limited to the MS Router set of LLMs. Divyam's Quality Optimization parameters have been tuned to suit Accuracy requirements to compare to MS Router. The different points on the graph for Divyam represents Divyam's performance with different Quality Optimization parameters. This tuning is unique to Divyam and is not possible with MS Router.

Divyam v/s MS Router
O4 mini accuracy as baseline
2 0 -2 -4 -6 0 20 40 60 80 Cost Savings % Accuracy Drop % -0.3 -1 -1.9 -5.4 -5.1 Divyam MS Router O4

You can see that for a comparable accuracy — Divyam (-5.1) and MS Router (-5.4), the Cost Savings for Divyam (close to 60%) is almost double that of MS Router (close to 35%).

Key Finding

For comparable accuracy, Divyam's cost savings (~60%) are nearly double that of Microsoft Router (~35%) when limited to the same set of LLMs.

Adding more LLMs to the mix

Whereas MS Router is stuck with its choices of LLMs, nothing restricts Divyam to add the right set of LLMs for our customer. After 3 Gemini models are added to Divyam Router (along with the ones Microsoft Model Router was already routing to), we notice a clear uptick in the cost-performance Pareto.

Divyam v/s MS Router
O4 mini accuracy as baseline — with Gemini models added
2 0 -2 -4 -6 0 20 40 60 80 100 Cost Savings % Accuracy Drop % -0.6 -1.7 -5.4 -5 Divyam MS Router O4

You can see from the graph above that Divyam does even better in terms of Cost Savings and Accuracy compared to MS Router. For the same relative accuracy — Divyam -5% and MS Router -5.4%, Divyam's Cost Savings (around 84%) is nearly 3 times that of MS Router (around 35%).

Key Finding

With Gemini models added, Divyam achieves 84% cost savings — nearly 3x that of Microsoft Router's 35% — at comparable accuracy.

Want to see how Divyam.AI optimizes your LLM cost-performance balance?

Book a Demo

Results with NVIDIA Router

Divyam's ability to prioritize cost and accuracy and separate and combined parameters are unique and yield better and desirable results in both cases.

The NVIDIA LLM Router can be configured with one of 3 Router Models: 1) task-router 2) complexity-router 3) intent-router — each, in turn, are powered by a (pre-LLM era) language model — Microsoft/DeBERTa-v3-base — which contains 86M parameters.

Furthermore, we consider the task-router and the intent-router unsuitable for our purpose and focus only on the complexity-router. The complexity-router classifies each prompt into one of 6 pre-defined classes (e.g., “Domain”), and routes all prompts in a class to a single, configurable LLM. In our specific example, all queries belonging to “Domain” are routed to the LLM, whereas everything else is routed to the SLM. The 4 points in the graph for NVIDIA are for different combinations of SLMs for the same LLM gpt-4.1.

We have tuned Divyam's Quality Optimizer to “Priority Cost Saving” and “Priority Accuracy” represented as the 2 points in the graph.

All the comparisons are relative with respect to GPT 4.1 performance as baseline.

Divyam v/s NVidia
GPT 4.1 accuracy as baseline
5 0 -5 -10 -15 -20 0 20 40 60 80 100 Cost Savings % Accuracy Drop % -0.2 -1.8 -0.9 -18.1 +1.3 -0.2 Divyam NVidia GPT 4.1

From the above graph you can see that for the same range of Cost Saving, Divyam (-0.2%) surpasses NVIDIA (-18.1%) accuracy by a factor of 18 when tuned for Cost Saving. Also, Divyam's Accuracy (+1.3) surpasses that of GPT 4.1 when tuned for accuracy.

Key Finding

For the same cost savings range, Divyam's accuracy (-0.2%) surpasses NVIDIA's (-18.1%) by a factor of 18. When tuned for accuracy, Divyam even surpasses the GPT 4.1 baseline.


Divyam's MMLU-Pro Router Performance

The table below is a level deeper into the results in the above comparisons. It shows how Divyam has used the number of dimensions of LLM abilities to get the LLM with the best probable value to be correct for that prompt. It also lists the distribution of LLMs chosen for the percentage of prompts from the test set.

D = 5
Cost Optimized
-0.16% Accuracy
82.22% Cost Savings
LLM Distribution
gemini-2.0-flash 76.93%
gpt-4.1 10.43%
llama-70b-instruct 6.23%
gpt-4o-mini 5.86%
gemini-1.5-pro 0.29%
gemini-1.5-flash 0.25%
D = 3
Accuracy Optimized
+1.31% Accuracy
16.93% Cost Savings
LLM Distribution
gpt-4.1 81.71%
gemini-2.0-flash 12.34%
llama-70b-instruct 4.07%
gpt-4o-mini 1.41%
gemini-1.5-flash 0.29%
gemini-1.5-pro 0.17%

For a similar test, the NVIDIA LLM Router insights are depicted in the below graphs.

Distribution of Highest Complexity Metric
NVIDIA complexity-router classification of 2,406 MMLU-Pro prompts
2,500 2,000 1,500 1,000 500 0 Count 2,150 domain_ knowledge 119 computer_ ci 84 number_of_ few_shots 55 reasoning 2 contextual_ knowledge 1 creativity_ scope
Distribution of Predicted Models
NVIDIA complexity-router model assignments for 2,406 prompts
2,500 2,000 1,500 1,000 500 0 Count 2,150 nemotron- super-49b-v1 199 llama-3.1- 19b-instruct 55 minitron-16- 22b-instruct 2 llama-3.1- 8b-instruct

Conclusion

We see that Divyam's Router yields better Pareto for both Microsoft Router and NVIDIA Router, even though the philosophies of LLM choice are different in both comparisons. Divyam's ability to prioritize cost and accuracy and separate and combined parameters are unique and yield better and desirable results in both cases. Moreover, Divyam spans cartel borders in the industry and can easily incorporate LLMs from all segments.

Stay tuned for more experimental results on the cost-performance ratios and deeper tests on confirming Divyam's low running costs.

Ready to Scale Your AI?

See how Divyam can help your team ship AI to production with confidence.

Book a Demo