This is a Jupyter notebook

Cookbook: LiteLLM (Proxy) + Langfuse OpenAI Integration + `@observe` Decorator

We want to share a stack that's commonly used by the Langfuse community to quickly experiment with 100+ models from different providers without changing code. This stack includes:

LiteLLM Proxy (opens in a new tab) (GitHub (opens in a new tab)) which standardizes 100+ model provider APIs on the OpenAI API schema. It removes the complexity of direct API calls by centralizing interactions with these APIs through a single endpoint. You can also self-host the LiteLLM Proxy as it is open-source.
Langfuse OpenAI SDK Wrapper (Python (opens in a new tab), JS (opens in a new tab)) to natively instrument calls to all these 100+ models via the OpenAI SDK. This automatically captures token counts, latencies, streaming response times (time to first token), api errors, and more.
Langfuse: OSS LLM Observability, full overview here (opens in a new tab).

This cookbook is an end-to-end guide to set up and use this stack. As we'll use Python in this example, we will also use the @observe decorator to create nested traces. More on this below.

Let's dive right in!

Install dependencies

!pip install "litellm[proxy]" langfuse openai

Setup environment

import os
from langfuse.openai import openai
 
# Get keys for your project from the project settings page
# https://cloud.langfuse.com
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com" # 🇪🇺 EU region
# os.environ["LANGFUSE_HOST"] = "https://us.cloud.langfuse.com" # 🇺🇸 US region
 
# Your openai key
os.environ["OPENAI_API_KEY"] = ""
 
# Test connection to Langfuse, not recommended for production as it is blocking
openai.langfuse_auth_check()

Setup Lite LLM Proxy

In this example, we'll use GPT-3.5-turbo directly from OpenAI, and llama3 and mistral via the Ollama on our local machine.

Steps

Create a litellm_config.yaml to configure which models are available (docs). We'll use gpt-3.5-turbo, and llama3 and mistral via Ollama in this example. Make sure to replace <openai_key> with your OpenAI API key.

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
      api_key: <openai_key>
  - model_name: ollama/llama3
    litellm_params:
      model: ollama/llama3
  - model_name: ollama/mistral
    litellm_params:
      model: ollama/mistral

Ensure that you installed Ollama and have pulled the llama3 (8b) and mistral (7b) models: ollama pull llama3 && ollama pull mistral
Run the following cli command to start the proxy: litellm --config litellm_config.yaml

The Lite LLM Proxy should be now running on http://0.0.0.0:4000 (opens in a new tab)

To verify the connection you can run litellm --test

Log single LLM Call via Langfuse OpenAI Wrapper

The Langfuse SDK offers a wrapper function around the OpenAI SDK, automatically logging all OpenAI calls as generations to Langfuse.

For more details, please refer to our documentation (opens in a new tab).

from langfuse.openai import openai
 
# Set PROXY_URL to the url of your lite_llm_proxy (by default: http://0.0.0.0:4000)
PROXY_URL="http://0.0.0.0:4000"
 
system_prompt = "You are a very accurate calculator. You output only the result of the calculation."
 
# Configure the OpenAI client to use the LiteLLM proxy
client = openai.OpenAI(base_url=PROXY_URL)
 
gpt_completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  name="gpt-3.5", # optional name of the generation in langfuse
  messages=[
      {"role": "system", "content": system_prompt},
      {"role": "user", "content": "1 + 1 = "}],
)
print(gpt_completion.choices[0].message.content)
 
llama_completion = client.chat.completions.create(
  model="ollama/llama3",
  name="llama3", # optional name of the generation in langfuse
  messages=[
      {"role": "system", "content": system_prompt},
      {"role": "user", "content": "3 + 3 = "}],
)
print(llama_completion.choices[0].message.content)

Public trace links for the following examples:

Trace nested LLM Calls via Langfuse OpenAI Wrapper and `@observe` decorator

Via the Langfuse @observe() decorator we can automatically capture execution details of any python function such as inputs, outputs, timings, and more. The decorator simplifies achieving in-depth observability in your applications with minimal code, especially when non-LLM calls are involved for knowledge retrieval (RAG) or api calls (agents).

For more details on how to utilize this decorator and customize your tracing, refer to our documentation (opens in a new tab).

Let's have a look at a simple example which uses all three models we have set up in the LiteLLM Proxy:

from langfuse.decorators import observe
from langfuse.openai import openai
 
@observe()
def rap_battle(topic: str):
    client = openai.OpenAI(
        base_url=PROXY_URL,
    )
 
    messages = [
        {"role": "system", "content": "You are a rap artist. Drop a fresh line."},
        {"role": "user", "content": "Kick it off, today's topic is {topic}, here's the mic..."}
    ]
 
    # First model (gpt-3.5-turbo) starts the rap
    gpt_completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        name="rap-gpt-3.5-turbo", # add custom name to Langfuse observation
        messages=messages,
    )
    first_rap = gpt_completion.choices[0].message.content
    messages.append({"role": "assistant", "content": first_rap})
    print("Rap 1:", first_rap)
 
    # Second model (ollama/llama3) responds
    llama_completion = client.chat.completions.create(
        model="ollama/llama3",
        name="rap-llama3",
        messages=messages,
    )
    second_rap = llama_completion.choices[0].message.content
    messages.append({"role": "assistant", "content": second_rap})
    print("Rap 2:", second_rap)
 
    # Third model (ollama/mistral) adds the final touch
    mistral_completion = client.chat.completions.create(
        model="ollama/mistral",
        name="rap-mistral",
        messages=messages,
    )
    third_rap = mistral_completion.choices[0].message.content
    messages.append({"role": "assistant", "content": third_rap})
    print("Rap 3:", third_rap)
    
    return messages
 
# Call the function
rap_battle("typography")

Public trace (opens in a new tab)

Public Trace

Learn more

Check out the docs to learn more about all components of this stack:

If you do not want to capture traces via the OpenAI SDK Wrapper, you can also directly log requests from the LiteLLM Proxy to Langfuse. For more details, refer to the LiteLLM Docs (opens in a new tab).

Cookbook: LiteLLM (Proxy) + Langfuse OpenAI Integration + `@observe` Decorator

Install dependencies

Setup environment

Setup Lite LLM Proxy

Log single LLM Call via Langfuse OpenAI Wrapper

Trace nested LLM Calls via Langfuse OpenAI Wrapper and `@observe` decorator

Learn more

Was this page useful?

Questions? We're here to help

Subscribe to updates

Cookbook: LiteLLM (Proxy) + Langfuse OpenAI Integration + @observe Decorator

Install dependencies

Setup environment

Setup Lite LLM Proxy

Log single LLM Call via Langfuse OpenAI Wrapper

Trace nested LLM Calls via Langfuse OpenAI Wrapper and @observe decorator

Learn more

Was this page useful?

Questions? We're here to help

Subscribe to updates

Cookbook: LiteLLM (Proxy) + Langfuse OpenAI Integration + `@observe` Decorator

Trace nested LLM Calls via Langfuse OpenAI Wrapper and `@observe` decorator