Qwen-Omni supports multiple input modalities, including video, audio, image, and text. It can output audio and text.
Overview and billing
Compared to Qwen-VL, Qwen-Omni can:
Understand visual and audio information in video files.
Understand multimodal data.
Output audio.
Qwen-Omni also excels in visual and audio understanding capabilities.
Open source
Name | Context | Maximum input | Maximum output | Free quota |
(Tokens) | ||||
qwen2.5-omni-7b | 32,768 | 30,720 | 2,048 | 1 million tokens (regardless of modality) Valid for 180 days after activation |
You cannot use open-source Qwen-Omni after their free quota runs out. Please stay tuned for updates.
Usage notes
Input
Input modalities
The following input combinations are supported:
Do not put multiple non-text modalities in a single user message.
Input method
Image, audio, and video files support Base64 encoding and public URL. The following sample codes use public URLs. To use Base64-encoded files, see Input Base64 encoded local file.
Output
Currently, Qwen-Omni only supports streaming output.
Output modalities
The output can include text and audio. You can use the modalities
parameter to control.
Output modality |
| Response style |
Text | ["text"] (default) | Relatively formal. |
Text+Audio | ["text","audio"] | Casual and guides the user to further communicate. When output modality includes audio, do not set system message. |
The output audio is Base64 encoded and requires decoding, see Parse Base64 encoded audio output.
Audio languages
Currently, output audio only supports Mandarin Chinese and English.
Audio voices
The audio
parameter controls the voice and file format (which only supports "wav"
). Example: audio={"voice": "Chelsie", "format": "wav"}
.
Valid values for voice
: ["Ethan", "Chelsie"]
.
Voice name | Voice effect |
Ethan | |
Chelsie |
Get started
Prerequisites
Qwen-Omni supports only OpenAI-compatible methods. You must first obtain an API Key and set the API key as an environment variable. If using the OpenAI SDK, you must also install the SDK. We recommend that you follow this topic to install the latest version, otherwise your request may fail.
You must have OpenAI Python SDK version 1.52.0 or later, or OpenAI Node.js SDK version 4.68.0 or later.
Text input
Qwen-Omni can accept plain text as input. Currently, only streaming output is supported.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# If environment variables are not configured, replace the line below with: api_key="sk-xxx" using your Model Studio API Key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen2.5-omni-7b",
messages=[{"role": "user", "content": "Who are you"}],
# Set output data modalities, currently supports two types: ["text","audio"], ["text"]
modalities=["text", "audio"],
audio={"voice": "Chelsie", "format": "wav"},
# stream must be set to True, otherwise an error will occur
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)
import OpenAI from "openai";
const openai = new OpenAI(
{
// If environment variables are not configured, replace the line below with: apiKey: "sk-xxx" using your Model Studio API Key
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen2.5-omni-7b",
messages: [
{ role: "user", content: "Who are you?" }
],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Chelsie", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-omni-7b",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
],
"stream":true,
"stream_options":{
"include_usage":true
},
"modalities":["text","audio"],
"audio":{"voice":"Chelsie","format":"wav"}
}'
Image+Text input
Qwen-Omni can accept multiple images at a time. The requirements for input images are:
The size of a single image file must not exceed 10 MB.
The number of images is limited by the model's total token limit for text and images (that is, maximum input). The total token count of all images must be less than the model's maximum input.
The width and height of images must be greater than 10 pixels. The aspect ratio must not exceed 200:1 or 1:200.
Learn about supported images formats.
Currently, only streaming output is supported.
The following sample codes use public image URLs. To use local files, see Input Base64 encoded local file.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# If environment variables are not configured, replace the line below with: api_key="sk-xxx" using your Model Studio API Key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen2.5-omni-7b",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
},
},
{"type": "text", "text": "What scene is depicted in this image?"},
],
},
],
# Set output data modalities, currently supports two types: ["text","audio"], ["text"]
modalities=["text", "audio"],
audio={"voice": "Chelsie", "format": "wav"},
# stream must be set to True, otherwise an error will occur
stream=True,
stream_options={
"include_usage": True
}
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)
import OpenAI from "openai";
const openai = new OpenAI(
{
// If environment variables are not configured, replace the line below with: apiKey: "sk-xxx" using your Model Studio API Key
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen2.5-omni-7b",
messages: [
{
"role": "user",
"content": [{
"type": "image_url",
"image_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg" },
},
{ "type": "text", "text": "What scene is depicted in this image?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Chelsie", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-omni-7b",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
},
{
"type": "text",
"text": "What scene is depicted in this image?"
}
]
}
],
"stream":true,
"stream_options":{
"include_usage":true
},
"modalities":["text","audio"],
"audio":{"voice":"Chelsie","format":"wav"}
}'
Audio+Text input
Qwen-Omni can accept only one audio file at a time, with a size limit of 10 MB and a duration limit of 3 minutes. Currently, only streaming output is supported.
The following sample codes use public audio URLs. To use local files, see Input Base64 encoded local file.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# If environment variables are not configured, replace the line below with: api_key="sk-xxx" using your Model Studio API Key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen2.5-omni-7b",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
"format": "wav",
},
},
{"type": "text", "text": "What is this audio saying"},
],
},
],
# Set output data modalities, currently supports two types: ["text","audio"], ["text"]
modalities=["text", "audio"],
audio={"voice": "Chelsie", "format": "wav"},
# stream must be set to True, otherwise an error will occur
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)
import OpenAI from "openai";
const openai = new OpenAI(
{
// If environment variables are not configured, replace the line below with: apiKey: "sk-xxx" using your Model Studio API Key
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen2.5-omni-7b",
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": { "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav", "format": "wav" },
},
{ "type": "text", "text": "What is this audio saying" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Chelsie", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-omni-7b",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
"format": "wav"
}
},
{
"type": "text",
"text": "What is this audio saying"
}
]
}
],
"stream":true,
"stream_options":{
"include_usage":true
},
"modalities":["text","audio"],
"audio":{"voice":"Chelsie","format":"wav"}
}'
Video+Text input
Qwen-Omni can accept video as an image sequence or a video file (it can understand audio in the video). Currently, only streaming output is supported.
Image sequence
At least 4 images and at most 80 images.
Video file
Only one video file, with a size limit of 150 MB and a duration limit of 40 seconds.
Visual and audio information in video files are billed separately.
The following sample codes use public video URLs. To use local files, see Input Base64 encoded local file.
Image sequence
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# If environment variables are not configured, replace the line below with: api_key="sk-xxx" using your Model Studio API Key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen2.5-omni-7b",
messages=[
{
"role": "user",
"content": [
{
"type": "video",
"video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg",
],
},
{"type": "text", "text": "Describe the specific process in this video"},
],
}
],
# Set output data modalities, currently supports two types: ["text","audio"], ["text"]
modalities=["text", "audio"],
audio={"voice": "Chelsie", "format": "wav"},
# stream must be set to True, otherwise an error will occur
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)
import OpenAI from "openai";
const openai = new OpenAI(
{
// If environment variables are not configured, replace the line below with: apiKey: "sk-xxx" using your Model Studio API Key
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen2.5-omni-7b",
messages: [{
role: "user",
content: [
{
type: "video",
video: [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
]
},
{
type: "text",
text: "Describe the specific process in this video"
}
]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Chelsie", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-omni-7b",
"messages": [
{
"role": "user",
"content": [
{
"type": "video",
"video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
]
},
{
"type": "text",
"text": "Describe the specific process in this video"
}
]
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"modalities": ["text", "audio"],
"audio": {
"voice": "Chelsie",
"format": "wav"
}
}'
Video file (Qwen-Omni can understand audio in the video)
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# If environment variables are not configured, replace the line below with: api_key="sk-xxx" using your Model Studio API Key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen2.5-omni-7b",
messages=[
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
},
{"type": "text", "text": "What is the content of the video?"},
],
},
],
# Set output data modalities, currently supports two types: ["text","audio"], ["text"]
modalities=["text", "audio"],
audio={"voice": "Chelsie", "format": "wav"},
# stream must be set to True, otherwise an error will occur
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)
import OpenAI from "openai";
const openai = new OpenAI(
{
// If environment variables are not configured, replace the line below with: apiKey: "sk-xxx" using your Model Studio API Key
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen2.5-omni-7b",
messages: [
{
"role": "user",
"content": [{
"type": "video_url",
"video_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4" },
},
{ "type": "text", "text": "What is the content of the video?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Chelsie", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-omni-7b",
"messages": [
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
}
},
{
"type": "text",
"text": "What is the content of the video"
}
]
}
],
"stream":true,
"stream_options": {
"include_usage": true
},
"modalities":["text","audio"],
"audio":{"voice":"Chelsie","format":"wav"}
}'
Multi-round conversation
When using the multi-round conversation feature of Qwen-Omni, take note of the following:
Assistant message
Assistant messages added to the messages array can only contain text.
User message
A user message can only contain text and one type of non-text data. In multi-round conversations, you can put different types of data in different user messages.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# If environment variable is not configured, replace the following line with: api_key="sk-xxx", using Model Studio API Key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen2.5-omni-7b",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
"format": "mp3",
},
},
{"type": "text", "text": "What is this audio saying"},
],
},
{
"role": "assistant",
"content": [{"type": "text", "text": "This audio is saying: Welcome to Alibaba Cloud"}],
},
{
"role": "user",
"content": [{"type": "text", "text": "Can you introduce this company?"}],
},
],
# Set the modality of output data, currently supporting two types: ["text","audio"], ["text"]
modalities=["text"],
# stream must be set to True, otherwise an error will occur
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)
import OpenAI from "openai";
const openai = new OpenAI(
{
// If environment variables are not configured, replace the following line with: apiKey: "sk-xxx", using Model Studio API Key
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen2.5-omni-7b",
messages: [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
"format": "mp3",
},
},
{ "type": "text", "text": "What is this audio saying" },
],
},
{
"role": "assistant",
"content": [{ "type": "text", "text": "This audio is saying: Welcome to Alibaba Cloud" }],
},
{
"role": "user",
"content": [{ "type": "text", "text": "Can you introduce this company?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text"]
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-omni-7b",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
},
{
"type": "text",
"text": "What is this audio saying"
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "This audio is saying: Welcome to Alibaba Cloud"
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Can you introduce this company?"
}
]
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"modalities": ["text"]
}'
Parse Base64 encoded audio output
Qwen-Omni outputs Base64 encoded data in stream. You can maintain a string variable during model generation, add the Base64 encoding of each chunk to the string variable, and then decode it. Alternatively, you can decode and play each chunk's Base64 encoding in real-time.
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
client = OpenAI(
# If environment variables are not configured, replace the following line with: api_key="sk-xxx" using Model Studio API Key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen2.5-omni-7b",
messages=[{"role": "user", "content": "Who are you"}],
# Set the output data modality, currently supports two types: ["text","audio"], ["text"]
modalities=["text", "audio"],
audio={"voice": "Chelsie", "format": "wav"},
# stream must be set to True, otherwise an error will occur
stream=True,
stream_options={"include_usage": True},
)
# Method 1: Decode after generation is complete
audio_string = ""
for chunk in completion:
if chunk.choices:
if hasattr(chunk.choices[0].delta, "audio"):
try:
audio_string += chunk.choices[0].delta.audio["data"]
except Exception as e:
print(chunk.choices[0].delta.audio["transcript"])
else:
print(chunk.usage)
wav_bytes = base64.b64decode(audio_string)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
sf.write("audio_assistant_py.wav", audio_np, samplerate=24000)
# Method 2: Decode while generating (comment out Method 1 code when using Method 2)
# # Initialize PyAudio
# import pyaudio
# import time
# p = pyaudio.PyAudio()
# # Create audio stream
# stream = p.open(format=pyaudio.paInt16,
# channels=1,
# rate=24000,
# output=True)
# for chunk in completion:
# if chunk.choices:
# if hasattr(chunk.choices[0].delta, "audio"):
# try:
# audio_string = chunk.choices[0].delta.audio["data"]
# wav_bytes = base64.b64decode(audio_string)
# audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
# # Play audio data directly
# stream.write(audio_np.tobytes())
# except Exception as e:
# print(chunk.choices[0].delta.audio["transcript"])
# time.sleep(0.8)
# # Clean up resources
# stream.stop_stream()
# stream.close()
# p.terminate()
// Preparations before running:
// Windows/Mac/Linux common:
// 1. Make sure Node.js is installed (recommended version >= 14)
// 2. Run the following commands to install necessary dependencies:
// npm install openai wav
//
// If you want to use real-time playback (Method 2), you also need:
// Windows:
// npm install speaker
// Mac:
// brew install portaudio
// npm install speaker
// Linux (Ubuntu/Debian):
// sudo apt-get install libasound2-dev
// npm install speaker
import OpenAI from "openai";
const openai = new OpenAI(
{
// If environment variables are not configured, replace the following line with: apiKey: "sk-xxx" using Model Studio API Key
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen2.5-omni-7b",
messages: [
{
"role": "user",
"content": "Who are you?"
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Chelsie", format: "wav" }
});
// Method 1: Decode after generation is complete
// Need to install: npm install wav
import { createWriteStream } from 'node:fs'; // node:fs is a built-in Node.js module, no need to install
import { Writer } from 'wav';
async function convertAudio(audioString, audioPath) {
try {
// Decode Base64 string to Buffer
const wavBuffer = Buffer.from(audioString, 'base64');
// Create WAV file write stream
const writer = new Writer({
sampleRate: 24000, // Sample rate
channels: 1, // Mono channel
bitDepth: 16 // 16-bit depth
});
// Create output file stream and establish pipe connection
const outputStream = createWriteStream(audioPath);
writer.pipe(outputStream);
// Write PCM data and end writing
writer.write(wavBuffer);
writer.end();
// Use Promise to wait for file writing to complete
await new Promise((resolve, reject) => {
outputStream.on('finish', resolve);
outputStream.on('error', reject);
});
// Add extra waiting time to ensure audio is complete
await new Promise(resolve => setTimeout(resolve, 800));
console.log(`Audio file has been successfully saved as ${audioPath}`);
} catch (error) {
console.error('Error occurred during processing:', error);
}
}
let audioString = "";
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
if (chunk.choices[0].delta.audio) {
if (chunk.choices[0].delta.audio["data"]) {
audioString += chunk.choices[0].delta.audio["data"];
}
}
} else {
console.log(chunk.usage);
}
}
// Execute conversion
convertAudio(audioString, "audio_assistant_mjs.wav");
// Method 2: Real-time playback while generating
// Need to install necessary components according to the system instructions above
// import Speaker from 'speaker'; // Import audio playback library
// // Create speaker instance (configuration consistent with WAV file parameters)
// const speaker = new Speaker({
// sampleRate: 24000, // Sample rate
// channels: 1, // Number of sound channels
// bitDepth: 16, // Bit depth
// signed: true // Signed PCM
// });
// for await (const chunk of completion) {
// if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
// if (chunk.choices[0].delta.audio) {
// if (chunk.choices[0].delta.audio["data"]) {
// const pcmBuffer = Buffer.from(chunk.choices[0].delta.audio.data, 'base64');
// // Write directly to speaker for playback
// speaker.write(pcmBuffer);
// }
// }
// } else {
// console.log(chunk.usage);
// }
// }
// speaker.on('finish', () => console.log('Playback complete'));
// speaker.end(); // Call based on actual API stream end condition
Input Base64 encoded local file
Image
Using eagle.png saved locally as an example.
import os
from openai import OpenAI
import base64
client = OpenAI(
# If environment variables are not configured, replace the following line with: api_key="sk-xxx" using Model Studio API Key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Base64 encoding format
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
base64_image = encode_image("eagle.png")
completion = client.chat.completions.create(
model="qwen2.5-omni-7b",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{base64_image}"},
},
{"type": "text", "text": "What scene is depicted in this image?"},
],
},
],
# Set the output data modality, currently supports two types: ["text","audio"], ["text"]
modalities=["text", "audio"],
audio={"voice": "Chelsie", "format": "wav"},
# stream must be set to True, otherwise an error will occur
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// If environment variables are not configured, replace the following line with: apiKey: "sk-xxx" using Model Studio API Key
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
const base64Image = encodeImage("eagle.png")
const completion = await openai.chat.completions.create({
model: "qwen2.5-omni-7b",
messages: [
{
"role": "user",
"content": [{
"type": "image_url",
"image_url": { "url": `data:image/png;base64,${base64Image}` },
},
{ "type": "text", "text": "What scene is depicted in this image?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Chelsie", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}
Audio
Using welcome.mp3 saved locally as an example.
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
import requests
client = OpenAI(
# If environment variables are not configured, replace the following line with: api_key="sk-xxx" using Model Studio API Key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
def encode_audio(audio_path):
with open(audio_path, "rb") as audio_file:
return base64.b64encode(audio_file.read()).decode("utf-8")
base64_audio = encode_audio("welcome.mp3")
completion = client.chat.completions.create(
model="qwen2.5-omni-7b",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": f"data:;base64,{base64_audio}",
"format": "mp3",
},
},
{"type": "text", "text": "What is this audio saying"},
],
},
],
# Set the output data modality, currently supports two types: ["text","audio"], ["text"]
modalities=["text", "audio"],
audio={"voice": "Chelsie", "format": "wav"},
# stream must be set to True, otherwise an error will occur
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// If environment variables are not configured, replace the following line with: apiKey: "sk-xxx" using Model Studio API Key
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeAudio = (audioPath) => {
const audioFile = readFileSync(audioPath);
return audioFile.toString('base64');
};
const base64Audio = encodeAudio("welcome.mp3")
const completion = await openai.chat.completions.create({
model: "qwen2.5-omni-7b",
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": { "data": `data:;base64,${base64Audio}`, "format": "mp3" },
},
{ "type": "text", "text": "What is this audio saying" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Chelsie", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}
Video
Video files
Using spring_mountain.mp4 saved locally as an example.
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
client = OpenAI(
# If environment variables are not configured, replace the following line with: api_key="sk-xxx" using Model Studio API Key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Base64 encoding format
def encode_video(video_path):
with open(video_path, "rb") as video_file:
return base64.b64encode(video_file.read()).decode("utf-8")
base64_video = encode_video("spring_mountain.mp4")
completion = client.chat.completions.create(
model="qwen2.5-omni-7b",
messages=[
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {"url": f"data:;base64,{base64_video}"},
},
{"type": "text", "text": "What is she singing"},
],
},
],
# Set the output data modality, currently supports two types: ["text","audio"], ["text"]
modalities=["text", "audio"],
audio={"voice": "Chelsie", "format": "wav"},
# stream must be set to True, otherwise an error will occur
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// If environment variables are not configured, replace the following line with: apiKey: "sk-xxx" using Model Studio API Key
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeVideo = (videoPath) => {
const videoFile = readFileSync(videoPath);
return videoFile.toString('base64');
};
const base64Video = encodeVideo("spring_mountain.mp4")
const completion = await openai.chat.completions.create({
model: "qwen2.5-omni-7b",
messages: [
{
"role": "user",
"content": [{
"type": "video_url",
"video_url": { "url": `data:;base64,${base64Video}` },
},
{ "type": "text", "text": "What is she singing" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Chelsie", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}
Image sequence
Using football1.jpg, football2.jpg, football3.jpg and football4.jpg saved locally as examples.
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
client = OpenAI(
# If environment variables are not configured, replace the following line with: api_key="sk-xxx" using Model Studio API Key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Base64 encoding format
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
base64_image_1 = encode_image("football1.jpg")
base64_image_2 = encode_image("football2.jpg")
base64_image_3 = encode_image("football3.jpg")
base64_image_4 = encode_image("football4.jpg")
completion = client.chat.completions.create(
model="qwen2.5-omni-7b",
messages=[
{
"role": "user",
"content": [
{
"type": "video",
"video": [
f"data:image/jpeg;base64,{base64_image_1}",
f"data:image/jpeg;base64,{base64_image_2}",
f"data:image/jpeg;base64,{base64_image_3}",
f"data:image/jpeg;base64,{base64_image_4}",
],
},
{"type": "text", "text": "Describe the specific process of this video"},
],
}
],
# Set the output data modality, currently supports two types: ["text","audio"], ["text"]
modalities=["text", "audio"],
audio={"voice": "Chelsie", "format": "wav"},
# stream must be set to True, otherwise an error will occur
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// If environment variables are not configured, replace the following line with: apiKey: "sk-xxx" using Model Studio API Key
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
const base64Image1 = encodeImage("football1.jpg")
const base64Image2 = encodeImage("football2.jpg")
const base64Image3 = encodeImage("football3.jpg")
const base64Image4 = encodeImage("football4.jpg")
const completion = await openai.chat.completions.create({
model: "qwen2.5-omni-7b",
messages: [{
role: "user",
content: [
{
type: "video",
video: [
`data:image/jpeg;base64,${base64Image1}`,
`data:image/jpeg;base64,${base64Image2}`,
`data:image/jpeg;base64,${base64Image3}`,
`data:image/jpeg;base64,${base64Image4}`
]
},
{
type: "text",
text: "Describe the specific process of this video"
}
]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Chelsie", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}
Error codes
If the call failed and an error message is returned, see Error messages.