What is a unified LLM API?

A unified LLM API is a single, provider-agnostic request/response schema that translates between the native formats of different LLM providers (OpenAI Chat Completions, OpenAI Responses, Google Gemini, Anthropic, and open-source providers). It lets application code stay constant while the underlying model changes, which is a prerequisite for LLM routing, A/B testing, and fast model migration. divyam-llm-interop is an open-source Python implementation released under Apache 2.0.

What is an AI gateway?

An AI gateway is the infrastructure layer between an application and one or more LLM providers. It typically bundles four capabilities: a unified LLM API (one request/response format across providers), LLM routing (which model handles which prompt), observability (logging, tracing, cost tracking), and governance (rate limits, quotas, audit). divyam-llm-interop provides the unified-API building block; the rest of the gateway stack sits above it.

Open Source

Unified LLM API: Introducing divyam-llm-interop for LLM API Translation

A minimal, open-source library for provider-agnostic LLM request and response translation.

April 14, 2026 · 5 min read

Key definition

A unified LLM API is a single request/response format that works across multiple LLM providers (OpenAI Chat Completions, OpenAI Responses, Google Gemini, Anthropic, open-source), so application code doesn't change when the underlying model does. It is one of the two building blocks of an AI gateway; the other is LLM routing. divyam-llm-interop is Divyam.AI's open-source (Apache 2.0) implementation of the unified-API layer.

Every time a new model drops (a better one, a cheaper one, one that finally handles your edge cases) the same question comes up: how much will it cost to switch?

Not in dollars. In engineering time.

OpenAI has two API formats: the Chat Completions API and the newer Responses API. Google's Gemini speaks its own dialect. Route a request through a different provider and you're not just swapping a model name. You're reconciling different request shapes, different response structures, different parameter semantics. For every new provider you want to evaluate, someone writes an adapter. For every adapter, someone maintains it.

That plumbing work has nothing to do with your application. It shouldn't live inside your application.

It also stands in the way of something larger. Equivalent intelligence now exists across proprietary, open-source, and locally hosted models, and no enterprise should be hostage to the pricing power of a few providers. Capturing that means letting every model earn its traffic on measurable quality, which is only practical if switching a model is mechanical rather than a rewrite. A provider-agnostic translation layer is the precondition for that freedom.

Introducing divyam-llm-interop

divyam-llm-interop is a minimal, open-source Python library that solves exactly this problem. It provides a unified interface for translating AI model requests and responses across providers, keeping request and response semantics consistent regardless of which model you're talking to.

The library is available on PyPI:

pip install divyam-llm-interop

How it works

The primary API is ChatTranslator, a single class that handles both request and response translation for text-based chat interactions. You tell it the source model and format, the target model and format, and it does the rest.

Translating a request

Here's a concrete example: you have a request written for Gemini 1.5 Pro using the Chat Completions API format, and you want to send it to GPT-4.1 using the Responses API format. One call handles the entire translation:

from divyam_llm_interop.translate.chat.api_types import ModelApiType
from divyam_llm_interop.translate.chat.translate import ChatTranslator
from divyam_llm_interop.translate.chat.types import ChatRequest, ChatResponse, Model

# Translate gemini-1.5-pro Chat Completions API request to a gpt-4.1
# Responses API request
translator = ChatTranslator()
chat_request = ChatRequest(body={
    "model": "gemini-1.5-pro",
    "messages": [
        {
            "role": "system",
            "content": (
                "You are a highly knowledgeable trivia assistant. "
                "Provide clear, accurate answers across history, geography, "
                "science, pop culture, and general knowledge. "
                "When explaining, keep it concise unless asked otherwise."
            )
        },
        {
            "role": "user",
            "content": "What is the capital of India?"
        }
    ],
    "temperature": 0.7,
    "top_p": 1.0,
    "max_tokens": 100000,
    "presence_penalty": 0.5
})
source = Model(name="gemini-1.5-pro", api_type=ModelApiType.COMPLETIONS)
target = Model(name="gpt-4.1", api_type=ModelApiType.RESPONSES)
translated = translator.translate_request(chat_request, source, target)

Your application logic (the system prompt, the user message, the parameters) stays exactly as it was. The library handles the structural translation between formats.

Translating a response

The same translator works in reverse. If your application expects a Chat Completions-style response but your model returned a Responses API-shaped one, a single call normalizes it:

from divyam_llm_interop.translate.chat.api_types import ModelApiType
from divyam_llm_interop.translate.chat.translate import ChatTranslator
from divyam_llm_interop.translate.chat.types import ChatResponse, Model

# Translate Responses API response to Chat Completions API response.
translator = ChatTranslator()

# Response body most likely obtained from a LLM call.
chat_response = ChatResponse(body={
    "id": "resp_abc123",
    "object": "response",
    "model": "gpt-4.1",
    "created": 1733400000,
    "output": [
        {
            "role": "assistant",
            "content": [
                {
                    "type": "output_text",
                    "text": "The capital of India is New Delhi."
                }
            ]
        }
    ],
    "usage": {
        "input_tokens": 35,
        "output_tokens": 10,
        "total_tokens": 45
    },
    "metadata": {
        "temperature": 0.7,
        "top_p": 1.0,
        "presence_penalty": 0.5
    }
})

source = Model(name="gpt-4.1", api_type=ModelApiType.RESPONSES)
target = Model(name="gpt-4.1", api_type=ModelApiType.COMPLETIONS)
translated = translator.translate_response(chat_response, source, target)

Why this matters for intelligent routing

Selecting the best-fit model for each request in production, across quality, latency, and cost, is the core of what Divyam's Model Router does, and it requires this kind of translation to work transparently. When the Router's selector routes a request to a different provider or a different API generation, your application shouldn't notice. The interop layer absorbs the format differences, so the routing decision and your application logic stay cleanly separated. It is what lets every model earn its traffic without your code being locked to any one provider.

divyam-llm-interop is that layer, made available as a standalone open-source library under the Apache 2.0 license.

Open source and open to contributions

The library is open source and welcomes contributions. If you're working with LLM APIs across providers and want to help expand coverage, the repository is on GitHub. The contributing guide covers everything from forking and branching to code style checks and running the test suite.

Key Takeaways

The problem: every LLM provider and API generation has a different request/response format. Switching models means rewriting adapters.
divyam-llm-interop: a minimal, provider-agnostic Python library for translating requests and responses across models and API types.
One class: ChatTranslator handles both request translation and response translation via translate_request and translate_response.
The foundation for routing: clean interop is what allows a model router to switch providers transparently, without your application logic changing.
Open source: available on PyPI under Apache 2.0. Contributions welcome.

Unified LLM API: Introducing divyam-llm-interop for LLM API Translation

Introducing divyam-llm-interop

How it works

Translating a request

Translating a response

Why this matters for intelligent routing

Open source and open to contributions

Suggested Reading

The Divyam.AI Platform at a Glance

LLM Routing in Practice: Surfing the LLM Waves with Intelligent Model Routing

Open Source LLMs Just Caught Up: Why Your LLM Router Needs to Switch in a Day