Multi-Modality
Langfuse supports rendering Multi-Modal traces, including both text and image formats. We follow OpenAI's format convention.
How to trace Multi-Modal content in Langfuse?
To utilize our Multi-Modal Trace support, your trace or observation input
/output
should include a list of messages comprising the conversation so far. Each message should contain a role
(system, user, assistant) and content
. To display multi-modal content, you can pass a combination of text and image URLs. We plan to extend support to base64 images, file attachments (e.g., PDFs), and audio soon. The content
property of the messages follows the OpenAI convention.
Visual Representation in Langfuse
Content Format
Content | Type | Description |
---|---|---|
Default: Text content | string | The text contents of the message. |
Multi-Modal: Array of content parts | array | An array of content parts with a defined type, each can be of type text or image_url . You can pass multiple images by adding multiple image_url content parts. |
Content Examples
{
"content": [
{
"role": "system",
"content": "You are an AI trained to describe and interpret images. Describe the main objects and actions in the image."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's happening in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}
Content Part Types
Property | Type | Description |
---|---|---|
type | text | Type of content part |
text | string | Text content of the message |
For more details and examples, refer to our OpenAI cookbook.