OmniChat - How to chat with any LLM

chat
llm
python
injection
openai
llama
anthropic
x.ai
grok
groq
Author

Christian Wittmann

Published

November 30, 2024

So far, I’ve worked with large language models (LLMs) from OpenAI and Meta (the LLaMA series) in separate implementations. I noticed that the code across these implementations was very similar, with only minor variations for provider-specific details. This realization inspired me to harmonize the codebase into a unified, modular chat client with the current working title “OmniChatClient”. The name “Omni” reflects two key aspects of its design: It seamlessly integrates with several LLM providers, and it supports multimodal inputs (text and vision so far).

In this blog post, I’ll walk you through the steps I took to create the OmniChatClient. The current version works with OpenAI, the LLaMA series (via Groq), Anthropic’s Claude, and X.AI’s Grok, using both the SDKs from OpenAI and Anthropic. By the end, you’ll see how OmniChatClient’s code structure enables flexibility, maintainability, and scalability. I’ll demonstrate how I implemented support for both text and vision inputs. I will save additional features like function calling or adding additional vendor support for later posts.

This blog post focuses on the refactoring process that transformed my earlier implementations into OmniChatClient. If you’re interested in the inner mechanics of the chat client and the chat messages, please refer to my previous blog posts like Building Chat for Jupyter Notebooks from Scratch or Building the Apple Calculator in a Jupyter Notebook.

A final note before we start: As usual, if you prefer the interactive notebook version of this blog post, please use the this version on GitHub.

Architecture

Before diving into the code, let’s explore the challenge and the approach I took to solve it. At first glance, the APIs of various LLM providers seem very similar: You specify a model name, send user-messages, and receive assistant-messages, all wrapped in a JSON schema. However, subtle differences emerge on closer inspection. For example:

  • OpenAI expects the system prompt to be included in the chat messages, whereas Anthropic requires it to be passed as a separate parameter.
  • The maximum token limit (max_tokens) is mandatory in Anthropic’s API, but optional for OpenAI and other providers I’ve worked with so far.

These differences become even more pronounced when going beyond simple text-based chat. For instance:

  • OpenAI and Anthropic use different formats for passing images into the chat.
  • Llama 3.2 models only support one image at a time, while OpenAI and Anthropic can handle multiple images.

Despite these variations, the overarching process flow remains consistent when viewed at a higher level.

To address these challenges, I decided to decouple provider-specific details from the generic chat client logic. This is achieved through “dependency injection” (for a nice video intro I can recommend Dependency Injection, The Best Pattern from CodeAesthetic), where an LLM-specific provider class is injected into the chat client. For message management, I chose an inheritance structure, with a base ChatMessages class handling common functionality and provider-specific subclasses extending it to support provider-specific message formatting requirements.

Injecting Provider-Specific Class into Chat Client

The ChatClient is a provider-agnostic class responsible for managing the conversation flow. It delegates provider-specific functionality, such as authentication and API interactions, to a class implementing the LLMProviderInterface. This interface defines the contract for provider-specific behavior, ensuring consistent integration regardless of the provider.

To maximize reusability, shared logic is abstracted into the BaseLLMProvider, which implements LLMProviderInterface. Provider-specific classes, such as OpenAIProvider and AnthropicProvider, extend BaseLLMProvider to handle unique API requirements.

Upon instantiating the ChatClient, you inject the desired provider class, enabling the client to remain decoupled from provider-specific details while leveraging the functionality defined by the provider.

The diagram below illustrates the architecture I implemented. For readability, it contains only 2 providers: OpenAI and Anthropic. (You can find a more detailed version later in this post.)

flowchart BT
    
    LLMProviderInterface -->|Injects| ChatClient 

    subgraph Injection["Dependency Injection"]
        direction BT
        style Injection fill:#FFFFE0
    
        BaseLLMProvider -->|Implements| LLMProviderInterface

        subgraph Inheritance["Inheritance Hierarchy"]
            direction BT
            OpenAIProvider -->|Extends| BaseLLMProvider
            AnthropicProvider -->|Extends| BaseLLMProvider
            style Inheritance fill:#D0E6F5
        end

    end   

ChatMessages with Provider-Specific Classes

The LLMProviderInterface includes a method called chat_messages_factory, which is responsible for creating an instance of the appropriate ChatMessages class. These classes follow an inheritance hierarchy to handle both common functionality and provider-specific message formatting.

The base ChatMessages class provides shared logic for managing conversation history and appending messages in a generic format. Provider-specific subclasses, such as OpenAIChatMessages and AnthropicChatMessages, extend this base class to handle unique requirements, such as formatting multimodal inputs (e.g., images).

This design allows the ChatClient to remain agnostic of provider-specific message handling, while the injected provider determines which ChatMessages subclass to use.

The diagram below illustrates the inheritance hierarchy of ChatMessages (again, it is limited to OpenAI and Anthropic for readability):

flowchart BT
    OpenChatMessages -->|Extends| ChatMessages
    AnthropicChatMessages -->|Extends| ChatMessages

The full architecture

Now that we’ve discussed the individual components, let’s put everything together to see the big picture. The diagram below illustrates how the key classes interact to create a flexible, multi-provider chat system.

If this looks overwhelming at first, don’t worry! We’ll construct this step-by-step throughout the blog post. You can always revisit this diagram later once you’ve gone through the detailed explanations.

classDiagram
    %% ChatClient and Provider Interface
    class ChatClient {
        Responsible for managing the conversation with LLM
        - _provider: LLMProviderInterface
        - _chat_messages: ChatMessages
        +__init__(provider: LLMProviderInterface)
        +prompt_model(prompt: str, base64_images: Optional[List[str]])
    }

    class LLMProviderInterface {
        Defines provider-specific implementation points
        +__init__(model_name: str, **kwargs)
        +chat_messages_factory() ChatMessages
        +_get_model_response(chat_messages: ChatMessages)
    }

    class ChatMessages {
        Manages the conversation history and message structure
        - _messages: List[Dict]
        +append_user_message(content: str, base64_images: Optional[List[str]])
        +append_assistant_message(content: str)
    }

    %% Relationships
    ChatClient ..> LLMProviderInterface : Dependency Injection
    LLMProviderInterface o-- ChatMessages : Factory Creates

Creating the text-based OmniChatClient

Let’s start simple by implementing a text-only version of the OmniChatClient. This will lay the foundation for adding more advanced functionality like vision support in later iterations.

Code
from dotenv import load_dotenv
import os

load_dotenv()
True

ChatMessages

First, we need a class to manage the conversation history and structure messages. Below is a simplified version of the ChatMessages class, similar to what I introduced in Building Chat for Jupyter Notebooks from Scratch. Provider-specific classes will extend this base class only when needed.

from IPython.display import display, Markdown

class ChatMessages:

    def __init__(self):
        """Initializes the Chat."""
        self._messages = []

    def _append_message(self, role, content):
        """Appends a message with specified role and content to messages list."""
        self._messages.append({"role": role, "content": content})

    def append_system_message(self, content):
        """Appends a system message with specified content to messages list."""
        if content:
            # No empty system message
            self._append_message("system", content)

    def append_user_message(self, content=None, base64_images=None):
        """Appends a user message with specified content"""
        self._append_message("user", content)

    def append_assistant_message(self, content):
        """Appends an assistant message with specified content."""
        self._append_message("assistant", content)

    def get_messages(self):
        """Returns a shallow copy of the messages list."""
        return self._messages[:]
    
    def get_debug_view(self):
        """Returns the debug view of the chat messages formatted as Markdown."""
        debug_view = []
        for message in self._messages:
            role = message.get('role')
            content = message.get('content', '')
            debug_view.append(f"**{role}**: {content}\n")

        return Markdown('\n'.join(debug_view))

LLMProviderInterface

Next, we define the LLMProviderInterface, which serves as a contract for all provider-specific classes. These classes will handle authentication, API interactions, and any provider-specific logic.

from typing import Optional

class LLMProviderInterface:
    
    def initialize_client(self) -> None:
        """
        Abstract method for initializing the model prover client including authentication.
        This method is called from the provider's constructor.
        """
        raise NotImplementedError()
    
    def initialize_system_prompt(self, system_prompt):
        """
        Abstract method for initializing the system prompt.

        :param system_prompt: The system prompt for the ChatClient.
        """
        raise NotImplementedError()
    
    def get_model_response(self, chat_messages: ChatMessages):
        """
        Abstract method for fetching the model response.

        :param chat_messages: ChatMessages object containing the conversation history.
        """
        raise NotImplementedError()
    
    def chat_messages_factory(self) -> ChatMessages:
        """Abstract factory method to create the appropriate ChatMessages class."""
        raise NotImplementedError()

BaseLLMProvider

The BaseLLMProvider implements shared functionality like client initialization and serves as the foundation for provider-specific classes.

class BaseLLMProvider(LLMProviderInterface):
    
    def __init__(
        self,
        model_name: str,
        system_prompt: Optional[str] = None,
        max_tokens: Optional[int] = None,
        temperature: Optional[float] = 0.0,
    ):
        self.model_name = model_name
        self._max_tokens = max_tokens
        self._temperature = temperature
        self.initialize_system_prompt(system_prompt)
        self.initialize_client()

    def initialize_system_prompt(self, system_prompt = None):
        self._system_prompt = system_prompt
 
    def initialize_client(self):
        raise NotImplementedError()
    
    def get_model_response(self, chat_messages, tools=None):
        raise NotImplementedError()

    def chat_messages_factory(self):
        return ChatMessages()

ChatClient

Finally, the ChatClient class orchestrates the conversation flow. Notice how provider-specific logic is delegated to the injected provider, ensuring the client remains provider-agnostic.

from IPython.display import Markdown

class ChatClient:

    def __init__(self, provider: LLMProviderInterface):
        """Initializes the Chat with the system message."""
        self._provider = provider
        self._chat_messages = provider.chat_messages_factory()

    def _get_model_response(self):
        """Delegates response fetching to the LLM-provider class."""
        return self._provider.get_model_response(
            chat_messages=self._chat_messages,
        )
    
    def prompt_model(self, prompt=None, base64_images=None):
        """This method handles the user prompt, delegates interaction with the model through the provider, 
        and returns the model response."""
        self._chat_messages.append_user_message(prompt)
        content = self._get_model_response()
        self._chat_messages.append_assistant_message(content)
        return Markdown(content)

With this foundation, we can now implement provider-specific classes (e.g., for OpenAI and Anthropic). These classes will inherit from BaseLLMProvider and implement methods like initialize_client and get_model_response to handle provider-specific details.

Provider Implementations

OpenAI Implementation

The OpenAI provider is implemented using the OpenAI SDK. This provider class handles client initialization, constructs the necessary parameters for API calls, and manages the chat message structure.

Noteworthy points:

  • The system prompt is included directly in the chat messages. This behavior is specific to OpenAI and differs from, for example, Anthropic’s implementation.
  • Optional parameters like max_tokens and temperature are conditionally added to the API request, ensuring flexibility without hardcoding defaults.
from openai import OpenAI

class OpenAIProvider(BaseLLMProvider):

    def initialize_client(self):
        api_key = os.getenv("OPENAI_API_KEY")
        if not api_key:
            raise ValueError("OPENAI_API_KEY is not set in the environment variables.")
        
        self.client = OpenAI()

    def get_model_response(self, chat_messages):
        try:
            
            # Mandatory parameters
            params = {
                "model": self.model_name,
                "messages": chat_messages.get_messages()
            }

            # Optional parameters
            if self._max_tokens is not None:
                params["max_tokens"] = self._max_tokens
            if self._temperature is not None:
                params["temperature"] = self._temperature
                
            response = self.client.chat.completions.create(**params)

            try:
                content = response.choices[0].message.content
            except (AttributeError, IndexError) as e:
                raise RuntimeError(f"Malformed response structure: {str(e)}. Response: {response}")
            return content
        except Exception as e:
            raise RuntimeError(f"Failed to fetch model response. Params: {params}. Error: {str(e)}")
    
    def chat_messages_factory(self):
        """Returns the standard ChatMessages implementation for OpenAI."""
        chat_messages = ChatMessages()
        chat_messages.append_system_message(self._system_prompt)
        return chat_messages

Let’s test the OpenAI provider with a simple prompt:

model_name = "gpt-4o"
system_prompt = "Answer in a very concise and accurate way"
provider = OpenAIProvider(model_name=model_name, system_prompt=system_prompt)

chat_client = ChatClient(provider=provider)
chat_client.prompt_model("Hello, who are you?")

Hello, I’m an AI language model created by OpenAI, here to assist you with information and answer your questions.

Anthropic implementation

The Anthropic provider is implemented using the Anthropic SDK. This class manages client initialization, validates required parameters, and adjusts the message structure for Anthropic’s API.

Noteworthy points:

  • max_tokens is a required field for calling Anthropic’s Claude model. This is validated during provider initialization to avoid runtime errors.
  • Unlike the OpenAI implementation, the system prompt is passed directly to the client as a separate parameter, instead of embedding it in the chat messages.
from anthropic import Anthropic

class AnthropicProvider(BaseLLMProvider):

    def __init__(self, *args, **kwargs):
        max_tokens = kwargs.get("max_tokens")
        if max_tokens is None:
            raise ValueError("AnthropicProvider requires 'max_tokens' to be specified.")
        super().__init__(*args, **kwargs)  # Pass all arguments to the base constructor

    def initialize_system_prompt(self, system_prompt = None):
        super().initialize_system_prompt(system_prompt)
        if self._system_prompt == None:
            self._system_prompt = ""

    def initialize_client(self):
        api_key = os.getenv("ANTHROPIC_API_KEY")
        if not api_key:
            raise ValueError("ANTHROPIC_API_KEY is not set in the environment variables.")

        self.client = Anthropic()

    def get_model_response(self, chat_messages):
        try:
            
            # Mandatory parameters
            params = {
                "model": self.model_name,
                "system": self._system_prompt,
                "messages": chat_messages.get_messages(),
                "max_tokens": self._max_tokens
            }

            # Optional parameters
            if self._temperature is not None:
                params["temperature"] = self._temperature

            response = self.client.messages.create(**params)

            try:
                content = ""
                for content_line in response.content:
                    if content_line.type == "text":
                        content = content_line.text  
                return content
            except (AttributeError, IndexError) as e:
                raise RuntimeError(f"Malformed response structure: {str(e)}. Response: {response}")
        except Exception as e:
            raise RuntimeError(f"Failed to fetch model response. Params: {params}. Error: {str(e)}")

Again, let’s run a simple test. Notice how the code remains compact and modular: Only the model name and provider instantiation change. The OmniChatClient abstracts away the different APIs and message structures through the injected provider.

model_name = "claude-3-5-sonnet-20241022"
provider = AnthropicProvider(model_name=model_name, system_prompt=system_prompt, max_tokens=128)

chat_client = ChatClient(provider=provider)
chat_client.prompt_model("Hello, who are you?")

I’m Claude, an AI assistant created by Anthropic. I aim to be direct and truthful in our conversations.

X.ai Implementation

This is where the effort we’ve invested in building a modular architecture starts to pay off. With the implementation for X.ai, we can reuse much of the code we’ve already written for OpenAI and Anthropic, saving time and effort.

To interact with Grok, X.ai allows you to use either the OpenAI or Anthropic SDKs. Since we’ve already implemented both, all we need to do is update the endpoint and authorization details.

Using the OpenAI SDK for X.ai

Here’s all the code needed to enable Grok support via the OpenAI SDK:

from openai import OpenAI

class XAIProviderOpenAI(OpenAIProvider):
    
    def initialize_client(self):
        api_key = os.getenv("XAI_API_KEY")
        if not api_key:
            raise ValueError("XAI_API_KEY is not set in the environment variables.")

        base_url = os.getenv("XAI_BASE_URL", "https://api.x.ai/v1")

        self.client = OpenAI(
            api_key=api_key,    
            base_url=base_url,
        )

Here is a quick test. The previous prompt returned a pretty shy answer, so I updated it to a more grok-like version.

model_name = "grok-beta"
provider = XAIProviderOpenAI(model_name=model_name, system_prompt=system_prompt)

chat_client = ChatClient(provider=provider)
chat_client.prompt_model("State your name and identity!")

I am Grok, an AI developed by xAI, here to provide helpful and truthful answers.

Using the Anthropic SDK for X.ai

Similarly, we can use the Anthropic SDK to interact with Grok. Here’s the implementation:

from anthropic import Anthropic

class XAIProviderAnthropic(AnthropicProvider):

    def initialize_client(self):
        api_key = os.getenv("XAI_API_KEY")
        if not api_key:
            raise ValueError("XAI_API_KEY is not set in the environment variables.")

        base_url = "https://api.x.ai" 

        self.client = Anthropic(
            api_key=api_key,    
            base_url=base_url,
        )
model_name = "grok-beta"
provider = XAIProviderAnthropic(model_name=model_name, system_prompt=system_prompt, max_tokens=128)

chat_client = ChatClient(provider=provider)
chat_client.prompt_model("State your name and identity!")

I am Grok, an AI developed by xAI, here to provide helpful and truthful answers.

Llama Implementation

To access Llama 3.2, I chose Groq as the provider. Groq offers a free tier and provides access to Llama 3.2 with vision capabilities, a practical workaround given that downloading Llama 3 in the EU is restricted under its licensing terms.

Here’s the implementation for the Llama 3 provider class base on Groq:

from groq import Groq

class GroqProviderLlama3(BaseLLMProvider):

    def initialize_client(self):
        api_key = os.getenv("GROQ_API_KEY")
        if not api_key:
            raise ValueError("GROQ_API_KEY is not set in the environment variables.")
        
        self.client = Groq(
            api_key=os.environ.get("GROQ_API_KEY"),
        )

    def get_model_response(self, chat_messages):
        try:
            
            # Mandatory parameters
            params = {
                "model": self.model_name,
                "messages": chat_messages.get_messages()
            }

            # Optional parameters
            if self._max_tokens is not None:
                params["max_tokens"] = self._max_tokens
            if self._temperature is not None:
                params["temperature"] = self._temperature
                
            response = self.client.chat.completions.create(**params)

            try:
                content = response.choices[0].message.content
            except (AttributeError, IndexError) as e:
                raise RuntimeError(f"Malformed response structure: {str(e)}. Response: {response}")
            return content
        except Exception as e:
            raise RuntimeError(f"Failed to fetch model response. Params: {params}. Error: {str(e)}")
model_name = "llama-3.2-90b-vision-preview"
provider = GroqProviderLlama3(model_name=model_name, system_prompt=system_prompt)

chat_client = ChatClient(provider=provider)
chat_client.prompt_model("Hello, who are you?")

I’m an artificial intelligence model known as Llama. Llama stands for “Large Language Model Meta AI.”

What we have done so far

We have built a modular and extensible architecture for the OmniChatClient. At its core is the BaseLLMProvider, which implements the shared functionality defined by the LLMProviderInterface. Each provider-specific class extends the BaseLLMProvider to handle unique API requirements and interactions.

The inheritance hierarchy below illustrates how different providers OpenAI, Anthropic, Groq, and X.AI—are integrated: - OpenAIProvider, AnthropicProvider, and GroqProviderLlama3 extend BaseLLMProvider. - X.AI reuses the existing OpenAI and Anthropic implementations by inheriting from their respective providers.

flowchart BT
    
    BaseLLMProvider -->|Implements| LLMProviderInterface

    subgraph Inheritance["Inheritance Hierarchy"]
        direction BT
        style Inheritance fill:#FFFFE0
        OpenAIProvider -->|Extends| BaseLLMProvider
        AnthropicProvider -->|Extends| BaseLLMProvider
        GroqProviderLlama3 -->|Extends| BaseLLMProvider
        XAIProviderOpenAI -->|Inherits from| OpenAIProvider
        XAIAnthropicProvider -->|Inherits from| AnthropicProvider
    end

Adding Vision Capabilities

To add vision capabilities, we first update the append_user_message method in the ChatMessages class to accept images. Since each provider has a different format for handling images, the actual implementation is done in the provider-specific child classes.

To update the method, I use @patchfrom fastcore, which is a nice tool which allows to only add functionality as it is needed in a notebook.

from fastcore.utils import * #for importing patch

@patch
def append_user_message(self:ChatMessages, content=None, base64_images=None):
    """
    Appends a user message with specified content
    """
    if base64_images:
        raise NotImplementedError("Vision is not supported by this model.")
    
    if not content:
        raise ValueError("Content cannot be empty or None.")

    self._append_message("user", content)

Consequently, we also need to update the prompt_model method of the ChatClient to allow for vision-based prompts:

@patch
def prompt_model(self:ChatClient, prompt=None, base64_images=None):
    """
    Sends a message to the model, including an optional prompt and multiple base64 images.
    
    :param prompt: The text prompt for the model.
    :param base64_images: A list of base64-encoded image strings.
    :return: The model's response in Markdown format.
    """
    if base64_images:
        # Append user message with multiple images
        self._chat_messages.append_user_message(content=prompt, base64_images=base64_images)
    elif prompt:
        # Append user message with only text
        self._chat_messages.append_user_message(content=prompt)

    content = self._get_model_response()
    self._chat_messages.append_assistant_message(content)
    return Markdown(content)

Now we can implement the different formats in the vendor-specific chat message classes. Pay attention to the subtle differences in how each provider handles vision inputs.

class OpenAIChatMessages(ChatMessages):

    def append_user_message(self, content=None, base64_images=None):
        """
        Appends a user message with specified content and multiple images to messages list.
        As per https://platform.openai.com/docs/guides/vision
        """
        if base64_images:
            content_parts = [{"type": "text", "text": content}] if content else []
            for base64_image in base64_images:
                content_parts.append({
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"}
                })
            self._messages.append({"role": "user", "content": content_parts})
        elif content:
            self._append_message("user", content)
        else:
            raise ValueError("Content cannot be empty or None.")
class AnthropicChatMessages(ChatMessages):

    def append_user_message(self, content=None, base64_images=None):
        """
        Appends a user message with specified content and multiple images to messages list.
        As per https://docs.anthropic.com/en/api/messages
        """
        if base64_images:
            content_parts = [{"type": "text", "text": content}] if content else []
            for base64_image in base64_images:
                content_parts.append({
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": base64_image
                    }
                })
            self._messages.append({"role": "user", "content": content_parts})
        elif content:
            self._append_message("user", content)
        else:
            raise ValueError("Content cannot be empty or None.")

Finally, we need to update the chat_messages_factory method in all provider classes, so that they use the corresponding chat messages implementation.

@patch
def chat_messages_factory(self:OpenAIProvider):
    """Returns the standard ChatMessages implementation for OpenAI."""
    chat_messages = OpenAIChatMessages()
    chat_messages.append_system_message(self._system_prompt)
    return chat_messages

@patch
def chat_messages_factory(self:AnthropicProvider):
    """Returns the Anthropic-specific ChatMessages implementation."""
    return AnthropicChatMessages()

@patch
def chat_messages_factory(self:GroqProviderLlama3):
    """OpenAI Format also works for Llama."""
    return OpenAIChatMessages()

#@patch
#def chat_messages_factory(self:XAIProviderAnthropic):
#    """Returns the Base ChatMessages implementation."""
#    return ChatMessages()

#@patch
#def chat_messages_factory(self:XAIProviderOpenAI):
#    """Returns the Base ChatMessages implementation."""
#    return ChatMessages()

Testing Vision

Let’s test the vision capabilities of our implementation using the famous Trolley Problem.

Preparing the Image

First, we encode the image in base64 and have a look:

Code
import base64
from IPython.display import Image, display

def encode_image(image_path):
  """Encodes an image file in base64"""
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

def render_base64_image(base64_string):
    """Render a Base64-encoded image in a Jupyter Notebook."""
    display(Image(data=base64.b64decode(base64_string)))
image_path = "Trolley_Problem.png"
base64_image = encode_image(image_path)
render_base64_image(base64_image)

Running the Models

Now, let’s ask each model to describe the image:

model_name = "gpt-4o"
provider = OpenAIProvider(model_name=model_name)
chat_client = ChatClient(provider=provider)
chat_client.prompt_model("What is in the image?", base64_images=[base64_image])

The image depicts the classic “trolley problem,” a philosophical thought experiment. It shows a trolley on a track heading towards five people. There is a lever that can be pulled to switch the trolley onto another track, where it would hit one person instead. The scenario is used to discuss ethical decision-making and moral dilemmas.

model_name = "claude-3-5-sonnet-20241022"
provider = AnthropicProvider(model_name=model_name, system_prompt=system_prompt, max_tokens=128)
chat_client = ChatClient(provider=provider)
chat_client.prompt_model("What is in the image?", base64_images=[base64_image])

This is the classic “Trolley Problem” ethical dilemma illustrated in a simple diagram. It shows a trolley/tram on tracks with two possible paths: one track leads to five people, while the other track has one person. There’s also a figure shown near a lever that could switch the trolley’s direction.

model_name = "grok-vision-beta"
provider = XAIProviderOpenAI(model_name=model_name)
chat_client = ChatClient(provider=provider)
chat_client.prompt_model("What is in the image?", base64_images=[base64_image])

The image depicts a simplified diagram of a train or tram on tracks with a switch mechanism. There are two tracks: one leading straight ahead and another branching off to the right.

  • On the left side, there is a train or tram on the left track.
  • In the middle, there is a person standing near the switch mechanism.
  • On the right side, there are two groups of people:
    • One group is on the straight track ahead.
    • Another group is on the track that branches off to the right.

This diagram is often used to illustrate the trolley problem, a moral dilemma in ethics where one must decide whether to switch the train to another track to save a group of people at the cost of another individual.

model_name = "llama-3.2-90b-vision-preview"
provider = GroqProviderLlama3(model_name=model_name)
chat_client = ChatClient(provider=provider)
chat_client.prompt_model("What is in the image?", base64_images=[base64_image])

The image depicts a black-and-white graphic of a train and people. The train is on the left side of the image, with a diamond-shaped object above it. There are several lines extending from the train, each featuring a person icon. The lines vary in length, with some being longer than others. The background of the image is white.

Key Elements:

  • Train: A black train with a diamond-shaped object above it.
  • Lines: Several lines extending from the train, each featuring a person icon.
  • Person Icons: Small black figures representing people.
  • Background: White background.

Interpretation:

The image appears to be a simple graphic representation of a train and its passengers. The lines extending from the train may represent the paths or routes taken by the passengers. The person icons could symbolize the individuals traveling on the train. The diamond-shaped object above the train is unclear in its meaning but may represent a signal or a marker. Overall, the image conveys a sense of movement and transportation.

Observations

As you can see, it is very easy to benchmarking different models against the same input. In this test, GPT-4o, Claude Sonnet 3.5, and Grok correctly recognized the Trolley Problem. Llama 3.2, however, just gave a generic description of the scene, missing the high-level concept. This demonstrates how the OmniChatClient design allows you to efficiently test and compare multiple models in a unified way.

Conclusion

Building the OmniChatClient has been an exciting project, creating a unified and extensible interface for working with multiple LLMs. By abstracting provider-specific functionality into modular components, we’ve reduced redundancy and created a scalable framework that can easily adapt to future requirements.

Dependency injection ensures that the OmniChatClient remains flexible and decoupled from provider-specific details, which are implemented in dedicated provider classes. Inheritance further reduces redundancies and accelerates the addition of new providers and features. As a result, any application using the OmniChatClient can seamlessly switch between models by simply swapping out the provider objects, no additional code changes are required.

While this implementation is already functional and supports both text and vision capabilities, there’s still room to expand. Enabling tool usage and adding support for providers like Google’s Gemini series are in my backlog.

On a broader level, I hope this hands-on demonstration of dependency injection and inheritance, applied in a practical example, inspires you to experiment with these patterns to make your code more robust, flexible, and elegant.