How to Create an OpenAI-Compatible Wrapper for Ollama

As large language models become more integral to modern applications, developers often face the challenge of switching between providers like OpenAI and local solutions such as Ollama. The OpenAI Python client has emerged as a standard for interacting with chat models, providing a simple interface and wide community support. By merely changing the base URL and API key, you can seamlessly switch between OpenAI and Ollama models while maintaining the same interface.

Installing Ollama

To get started with Ollama, install it using the following command:

curl -fsSL https://ollama.com/install.sh | sh

Once installed, you can pull models by running (for example, pulling the llama3.2 model):

ollama pull llama3.2

A complete list of available models can be found at https://ollama.com/search. To test a model interactively, try:

ollama run llama3.2

Installing the OpenAI Python Client

Install the official OpenAI Python client with pip:

pip install openai

Creating an Ollama Wrapper with the OpenAI Client (Python)

You can create a class that points to the local Ollama server but uses the same interface as the OpenAI client. This lets you interact with Ollama models just like OpenAI models:

from openai import OpenAI

class ChatModel:
    def __init__(self, base_url, key):
        self.client = OpenAI(
            base_url=base_url,
            api_key=key,
        )

    def chat_completion(self, model, messages):
        response = self.client.chat.completions.create(
            model=model,
            messages=messages
        )
        return response

BASE_URL = "http://localhost:11434/v1"  # Default local URL for Ollama
chatModel = ChatModel(base_url=BASE_URL, key="fake-key")  # Key is required but not used by Ollama

messages = [
    {"role": "system", "content": "You are a Jetson-based assistant."},
    {"role": "user", "content": "How can I optimize GPU usage on a Jetson Nano?"},
    {"role": "assistant", "content": "Use TensorRT for inference and disable services you don't need."},
    {"role": "user", "content": "Got it, thanks!"}
]

response = chatModel.chat_completion(model="llama3", messages=messages)
print(response.choices[0].message.content)

Using the Same Wrapper with OpenAI (Python)

To switch to OpenAI's hosted models, just change the base URL and provide a valid OpenAI API key:

from openai import OpenAI

class ChatModel:
    def __init__(self, base_url, key):
        self.client = OpenAI(
            base_url=base_url,
            api_key=key,
        )

    def chat_completion(self, model, messages):
        response = self.client.chat.completions.create(
            model=model,
            messages=messages
        )
        return response

BASE_URL = "https://api.openai.com/v1"  # OpenAI API URL
chatModel = ChatModel(base_url=BASE_URL, key="your-openai-key")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the World Series in 2020?"},
    {"role": "assistant", "content": "The LA Dodgers won in 2020."},
    {"role": "user", "content": "Where was it played?"}
]

response = chatModel.chat_completion(model="gpt-4o", messages=messages)
print(response.choices[0].message.content)

Setting Model Parameters (Python)

The chat.completions.create method supports several parameters to fine-tune model behavior. Some commonly used parameters include:

max_completion_tokens: Limits the number of tokens in the response.
temperature: Controls randomness (higher values yield more randomness).
top_p: Uses nucleus sampling (an alternative to temperature).
stop: Specifies sequences that trigger the model to stop generating.

Example:

response = self.client.chat.completions.create(
    model=model,
    messages=messages,
    max_completion_tokens=200,
    temperature=0.7
)

Refer to the official OpenAI documentation for a complete list of parameters.

Using Ollama with OpenAI’s JavaScript Library

First, install the OpenAI package:

npm install openai

Then, you can interact with Ollama by pointing the OpenAI library to the local Ollama server:

import OpenAI from 'openai'

const openai = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama', // required but unused
})

const completion = await openai.chat.completions.create({
  model: 'llama3.2',
  messages: [{ role: 'user', content: 'Why is the sky blue?' }],
})

console.log(completion.choices[0].message.content)

This approach keeps the same workflow, allowing you to switch between Ollama and OpenAI with minimal changes to your application.

❯