Surfing the LLM Waves: Continuously Benefitting from the Ecosystem Advances

Divyam.AI

February 24, 2025

Generated with Imagen 3, with inspiration from "The Great Wave off Kanagawa".

On September 17, 2024, OpenAI announced o1-preview, which heralded the era of reasoning Large Language Models – which not only auto-regressively generates output tokens, but ponders over them at inference time (via intermediate thinking tokens) to ensure quality. This model enjoys a good performance rating (Artificial Analysis Performance Index: 86), but comes at a high cost (input: $15.00/mt; output: $60.00/mt – where the output tokens also include thinking tokens, and mt is an abbreviation for million tokens). On January 20, 2025, DeepSeek R1 was announced. It delivers an even more impressive performance (Artificial Analysis Performance Index: 89) at ~20-25x lower price (input: $0.75/mt; output: $2.40 on DeepInfra). Shortly thereafter, on January 31, 2025, OpenAI followed suit with o3-mini, which matches the DeepSeek R1 quality (Artificial Analysis Quality Index: 89), but at an intermediate price point (input: $1.1/mt; output: $4.40/mt).

If you are an application developer who benefits from the reasoning capability, should you migrate from o1-preview to DeepSeek R1, and then again to o3-mini? In an intensely competitive field such as frontier LLM, such potentially disruptive events tend to occur frequently – e.g., when a new frontier LLM arrives; when a provider, such as Groq, is able to slash the cost of access. In future as fine-tuning becomes commoditised, we surmise such events will occur even more frequently. Irrespective of these events that cause a step-jump, the quality and price of every provider or LLM change with time (a phenomenon christened LLMflation by a16z: for the same performance, LLM inference cost gets 10x cheaper every year). This begs the question: must we migrate continuously?

The question of migration is even more nuanced. Two frontier LLMs with equivalent overall performance may perform differently on different tasks: e.g., while both o3-mini and DeepSeek R1 enjoy an equal Artificial Analysis Quality Index of 89, on quantitate reasoning task (MATH-500 benchmark), DeepSeek R1 fetches 97%, whereas o3-mini fetches 92%. This makes the migration decision further contingent on the nature of the application.

An application developer, thus, needs a mechanism for continuous migration – which enables her to decouple the choice of the provider/LLM from the application logic.

As an aside, in the world of finance, a trader would need to re-allocate her portfolio in response to (predicted) movements in the asset prices. A quantitative trader offloads this task to an algorithm.

At Divyam, we believe that an algorithmic, continuous, fractional migration is feasible – where the migration decision is offloaded to a router at a per prompt granularity.

To study the efficacy of routers, we conducted an experiment at Divyam. Specifically, we took the MT-Bench dataset, which contains 80 two-turn conversations between a human expert and a LLM. With Divyam’s evaluation harness, we replayed these conversations to both o1-preview and DeepSeek-R1-Distill-Llama-70B (input cost: $0.23/mt; output cost: $0.69 on DeepInfra; almost equal performance as o1-preview in MATH-500 and HumanEval), and used o1-preview to judge the resulting responses. The prompt template for the judge follows the best practices listed in the landmark Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena paper (note that we allow the application developer to plug in her own eval, instead). The result shows that 48 out of the 80 conversations (60%) elicit an equivalent or a better response if we choose to use the cheaper alternative in place of o1-preview – which amounts to slashing a $100 bill to $42.4 (~2.4x reduction) – at the expense of slight reduction in quality on half the conversations (note that this is a function of willingness-to-pay, a knob that the application developer can tune to suit her appetite for tradeoff). We present a visual summary below:-

Model comparison across traffic, cost, and quality.

This, however, is only an upper-bound. To operationalise this insight, one needs to actually build a router. While a detailed discussion of routing algorithms is deferred to a later blogpost, we illustrate the intuition behind a conceptually simple routing algorithm: k-Nearest Neighbour (kNN). kNN builds an “atlas” of all the prompts in the application log, and remembers the quality that each LLM yielded on them. Presented with a new prompt, kNN simply places it on that atlas, looks up k nearest neighbours around it, and chooses to route to that LLM which yielded the highest average quality in this neighbourhood. The following figure (left panel) visualises the atlas of the MT-Bench. This atlas was obtained by first embedding each prompt into a 384-dimensional space with the “all-MiniLM-L12-v2” Sentence Transformer, and then subsequently projecting them into the plane with t-SNE – a dimension-reduction algorithm – and, lastly, by colouring each conversation according to the most performant LLM for it. The right panel segments it as per the routing decision: i.e., if a prompt maps to a red region, the kNN router, when k=3, routes it to DeepSeek; else, if it falls into the green region, it goes to o1-preview.

Model performance distribution on MT-Bench.

At Divyam, we built an agentic workflow, where agents specialising in evaluation, routing, etc.collaborate to facilitate continuous and fractional (i.e., per prompt) migration – allowing the application developer to direct 100% focus on application development, devoid of any distraction posed by migration.This workflow requires a low-touch integration with the application, and can be deployed on the client’s infrastructure.

Explore More AI Insights

Stay ahead of the curve with our latest blogs on AI innovations, industry trends, and business transformations. Discover how intelligent systems are shaping the future—one breakthrough at a time.

BLOG

Enterprise GenAI: Turning Hype into Tangible Value

April 24, 2025

Generative AI (GenAI) has moved beyond the realm of science fiction and into the strategic considerations of enterprises across industries. While the ability of GenAI to generate creative content like poems and images creates a lot of excitement, its true potential lies in its capacity to revolutionize business operations, enhance customer experiences, and drive innovation at scale. However, realizing this potential requires a pragmatic approach that moves beyond the hype and focuses on tangible value creation.

Laying the Foundation for Successful GenAI Adoption

Implementing GenAI in an enterprise setting is not merely an IT project; it's a team sport that demands collaboration across different business functions. Business leaders must take the lead in defining the specific problems they aim to solve with GenAI, while data scientists play a crucial role in addressing data-related challenges such as accuracy, governance, and security. IT professionals are essential for implementing and maintaining the underlying infrastructure and ensuring the technology functions correctly.

A successful GenAI strategy requires a holistic approach that considers several key factors:

Accuracy and Reliability

Enterprise GenAI deployments for mission-critical tasks demand 100% accuracy. Ensuring the reliability of GenAI outputs through techniques like grounding the models with relevant data is paramount.

Governance and Traceability

Enterprises must establish clear governance frameworks for their GenAI initiatives, including mechanisms for tracking data lineage, model development, and deployment processes. This ensures compliance with regulatory requirements and facilitates auditing.

Scalability and Performance

Enterprise-grade GenAI solutions must be able to handle large volumes of data and user requests while maintaining optimal performance. Building a scalable AI infrastructure that can adapt to evolving business needs is crucial.

Security and Privacy

Protecting sensitive enterprise data is paramount. GenAI systems must be designed with robust security measures to prevent data breaches and ensure compliance with privacy regulations.

Cost-Effectiveness and Sustainability

The economic and environmental impact of GenAI deployments must be carefully considered. Enterprises should strive for cost-effective solutions with a focus on minimizing their carbon footprint.

Fostering a Culture of Innovation for GenAI

Beyond the technological considerations, cultivating a culture of innovation is essential for unlocking the full potential of GenAI within an organization. This involves creating an environment where employees feel empowered to experiment, share bold ideas, and learn from their mistakes.

Use Cases and the Future of Enterprise GenAI

The applications of GenAI in the enterprise are rapidly expanding across various domains, including:

Customer Experience

GenAI-powered virtual assistants and chatbots can provide personalized and efficient customer support, resolve queries quickly, and even proactively identify potential issues.

Content Creation

GenAI can automate the creation of marketing materials, technical documentation, and other forms of content, freeing up human employees for more strategic tasks.

Data Analysis and Insights

GenAI can analyze large datasets to identify hidden patterns, generate actionable insights, and support better decision-making.

Software Development

GenAI tools can assist developers with code generation, testing, and debugging, accelerating the software development lifecycle.

Risk Management and Compliance

GenAI can be used to detect fraudulent activities, automate compliance processes, and improve risk assessment.

While GenAI has tremendous potential to transform the enterprises, a robust and intelligent infrastructure is the bedrock upon which successful adoption and tangible ROI will be built.

The journey from the current inception of GenAI to realizing its true business value will undoubtedly depend on how well organizations can establish that kind of underlying foundation.

At Divyam.ai, we are solving this complex problem for our customers by creating a fully resilient, optimized, and autonomous infrastructure so that businesses can focus on innovation in their business domains!

BLOG

Transforming your Organization in the Era of GenAI

June 4, 2024

As we move into an era where artificial intelligence surpasses many human capabilities, it’s crucial to reconsider how we define the requirements for human skills and how we can enhance them with AI. Drawing from my experience in building high-performance tech teams and also being responsible for org wide developer productivity, this blog is particularly relevant to tech-driven organizations.

Categorization of Talent in the Org

People generally perform roles in an organization which can broadly be classified into one of the following 4 categories.

Idea Generators
Idea Refiners
Executioners
Support Functions

People will perform roles that may belong to more than one category, but there will be a predominant role that one is expected to perform. In the following section, I will explain the role of each category and also how the recent GenAI enhancements are going to shape these roles.

Impact of Generative AI

Idea Generators

These are key people who understand the context the business operates in, bring in strategic thinking, and take the company into new territories, concepts, products or solutions. The CEO, Business owners, Inventors, and Product Managers, etc., fall into this category.

Idea Generators have always played a critical role in business and technology and will continue to do so in the age of Generative AI. We are where we are today because of them. These people are not limited by technology or complexity on what can be done and this gives them the wings. They should be unconstrained in their thinking. While artificial intelligence tools can assist idea Generators by saving time in separating the wheat from the chaff, AI is unlikely to significantly aid them in their core functions.

Idea Refiners

Idea refiners take the unconstrained ideas coming from Generators and convert them into actionable achievable plans while avoiding potential pitfalls. They break down a complex goal into a series of tractable problems and then provide solutions to each of these sub-problems in a way that the executioners can implement. Examples of people in this role are your senior technical people like principal engineers & architects, designers, business analysts and other specialists.

Problems, when they come, do not recognize the limited scope of one’s knowledge.

If you are like a farmer from the 19th century, you will only ask for bigger and bigger bulls, but unlikely to ask for a tractor since you haven’t seen one ever. The idea refiners shouldn’t be people with a very limited repertoire of tools in their kitty. It’s going to fundamentally limit the quality of what you produce with eventual disastrous consequences, which will only be apparent down the line. One needs to check if the problem in the company is lack of good ideas or lack of good refiners. AI will be like an assistant who has all the knowledge at its disposal, is able to provide answers to the your queries quickly and also help present choices to the idea refiner.

In my career, I have seen people and organizations using the wrong tools to solve problems, leading to unnecessary complexity due to their chosen approach, not the inherent complexity of the problem.

If a different abstraction and consequently different set of tools were selected to solve the problem, the complexity wouldn’t have been there in the first place.

Again LLM based tooling is not going to help you there if you are unaware. If you ask AI to write a multithreaded program to solve a reactive problem, it’s likely to obey and generate code which will make supporting future requirements a very arduous or expensive task.

Artificial intelligence will also provide idea refiners the tools to take care of the implementation by themselves. The freedom to implement their ideas on their own, will have consequences beyond just the org size reduction. The problems due to gaps in understanding between the refiner and executioners will lead to faster innovation as pace of experimentation will increase drastically. I think most of the refiners should fancy being able to do the implementation themselves because of the innovation it unleashes and compromises it eliminates.

Executioners

This is a role where AI is already much better than most of the programmers. For example GenAI is able to generate code within minutes that would have taken a week of time even from the best performer. This in raw terms represents a speed up of 100x. What needed 45 hours ( 1 week) to finish can now be generated in less than half an hour. In a raw sense this kind of speed up of ~100X is not possible with LLMs and not in the foreseeable future, because it does what you tell it to do and that’s the limitation. When it starts doing things on its own, it will be another kind of problem to deal with, but that’s the one for the entire human kind!

We expect the idea generators will start taking care of implementation themselves, likewise the smartest of people who are currently playing the role of executioners, will start becoming idea refiners. The organization should facilitate this journey by seriously investing in training the people so as to improve their knowledge and also expose them to a much larger set of problems that are being solved. They should pair the executioners with idea generators or idea refiners in a 1–1 or 1–2 setting. The old system of 1 senior person telling 10 junior people (directly or in a hierarchy) what to do will no longer be required. If your old org looked like the one below.

The new one should look like the below one. The reduction on number of executioners from 10 to 2 roughly represents a 70% reduction in the org size. This will vary depending on how many people are playing the implementer role. I have seen org-spans much higher than 10 on the IC level on an average.

Org Shape after using GenAI but with the same number of initiatives

What is limiting the speed up in efficiency gain

Based on my first hand experience building stuff in my start-up and previous experiences with building and managing high performance teams as well as being responsible for developer productivity, the limitation is not on the AI side. It’s more on our ability to use it effectively. I have seen people (including myself) where 5X-10X improvement looks like a low hanging fruit and I have seen cases where people have claimed to have obtained no benefits. In most of the cases where people have claimed to have not seen any speedup, the limitation seems more from the mindset and inability to use the AI effectively. In every single case where I have used AI, in hindsight it was apparent to me that I could have done better if I had thought a bit more and been more precise in specifying what I wanted to achieve.

You might disagree with me on the numbers. Hopefully your conclusion will be the same as mine. The limitation of effectively using the GenAI tools is more on our side than on the tool’s side. The tools will of course become much smarter very fast, and the gap between the effective users of AI and non-users will become bigger and bigger.

What will happen to executioners in the GenAI era? I think most of them will move to become idea generators and idea refiners and build and expand their repertoire of knowledge. Most of the implementation work will be done by the refiners themselves using GenAI agents rather than real humans. 80% reduction is achievable in a short span of time as the people become more familiar with the tool.

Will we see job losses?

We will certainly see changes in job profiles. People should move away from being pure play executioners or be in support functions to idea refiners and idea generators. They need to find avenues where their knowledge and exposure is increasing rapidly and they have a direct opportunity to work with idea generators and idea refiners. While the requirement for the number of executioners for the current amount of output will shrink by 70–80%, it’s unlikely that in the highly competitive world, companies will not be investing the savings into conquering more territories by increasing the number of initiatives by 3–4X.

Following animation can give you an idea of the transformation a current org will likely go through to take 3X more initiatives while still being more nimble than the original org.

Transition from the Initial org prior to GenAI adoptio to the one post GenAI adoption with 3X initiatives

Final Shape of the new org post GenAI adoption with 3X initiatives. Generators and Refiners triple in number, Implementors and Supporters slighly are lesser than the original

Support Functions

Support functions are critical to an organization. The support functions are inherently tied to the size of the organization. They will shrink or grow in line with the number of people which in turn is dependent on the number of initiatives the org is taking. If your support functions need more people, probably investing in automation is required.

Summary

For the individual

Creativity and problem solving skills are going to be huge demand. A wide repertoire of tools in your kitty will help you in selecting the right set of tools to solve the problem. One of the best way to develop these skills is to be associated with idea generators and idea refiners and learn from them till you become one. Staying in a pure play executioner role will eventually going to make you redundant.

For the organization

Since most of the execution work is going to be done by artificial intelligence, your org for the same amount of work should shrink by atleast 2/3. Assuming a highly competitive landscape, instead of reducing the work-force one should triple the number of initiatives, which is what most of your competition will also do. In parallel invest in creating a flatter organization and spend money not in skill development but in enhancing the creativity and problems solving skills of your people and make them become idea generators or refiners.

I hope this article was helpful in initiating some thoughts. Please leave your comments or suggestions.

Acknowledgements

I had a lot of fun creating the org diagram and the animation for this article. The org charts (fig1 and fig2) in this article were created using Graphviz and the animation was created using Manim (Manim Community v0.18.1). While the initial .dot files for graphviz and .py file for manim were generated with the help of ChatGPT, I had to switch to manual modification as I found that to be faster to make the desired changes. ChatGPT was extensively used to quickly learn ‘HowTo’ about these tools. The .gif was generated using FFmpeg version 5.1.2, Copyright © 2000–2022 the FFmpeg developers.

Cut costs. Boost accuracy. Stay secure.

Smarter enterprise workflows start with Divyam.ai.

Book a Demo