Multi-Modality

Langfuse supports rendering Multi-Modal traces, including both text and image formats. We follow OpenAI's format convention.

How to trace Multi-Modal content in Langfuse?

To utilize our Multi-Modal Trace support, your trace or observation input/output should include a list of messages comprising the conversation so far. Each message should contain a role (system, user, assistant) and content. To display multi-modal content, you can pass a combination of text and image URLs. We plan to extend support to base64 images, file attachments (e.g., PDFs), and audio soon. The content property of the messages follows the OpenAI convention.

Visual Representation in Langfuse

Trace in Langfuse UI

Content Format

Content	Type	Description
Default: Text content	string	The text contents of the message.
Multi-Modal: Array of content parts	array	An array of content parts with a defined type, each can be of type `text` or `image_url`. You can pass multiple images by adding multiple image_url content parts.

Content Examples

{
  "content": [
    {
      "role": "system",
      "content": "You are an AI trained to describe and interpret images. Describe the main objects and actions in the image."
    },
    {
      "role": "user",
      "content": [
        { 
          "type": "text",
          "text": "What's happening in this image?"
        },
        { 
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg"
          }
        }
      ]
    }
  ]
}

Content Part Types

Property	Type	Description
type	`text`	Type of content part
text	string	Text content of the message

For more details and examples, refer to our OpenAI cookbook.

Log Levels Sampling

Was this page useful?

Questions? We're here to help

GitHub Q&AEmail Talk to sales