> ## Documentation Index > Fetch the complete documentation index at: https://docs.bigmodel.cn/llms.txt > Use this file to discover all available pages before exploring further. # GLM-4-Voice ## 概览 GLM-4-Voice 是智谱推出的首个端到端语音模型。它能够直接理解和生成中英文语音，实现实时语音对话，并可根据用户指令灵活调整语音的情感、语调、语速和方言等特性，使语音交互更加自然生动。 }> 80 元 / 百万 Tokens }> 音频、文本 }> 音频 }> 8K }> 4K ## 推荐场景 AI 通过虚拟角色（如游戏角色、虚拟偶像）与用户进行情感化对话，虚拟角色可以设定为特定性格、背景和声音，实现全天候陪伴。 AI 导游与用户进行实时语音交互，为用户提供详细的历史背景、文化意义和建筑特点，通过语音描述帮助用户规划游览路线，解答用户关于景点的疑问。 AI 英语老师通过模拟真实场景（如点餐、问路）与用户进行对话练习，解答用户关于语法规则的疑问，实时纠正用户发音、学习日常表达和语法知识，并提供改进建议。 AI 辅导老师与学生通过详细讲解课程内容，为学生提供课程讲解、作业辅导和学习建议，涵盖多个学科（如数学、历史、科学），解答学生在作业中遇到的问题，通过多轮对话帮助学生理解难点。 ## 使用资源 [接口文档](/api-reference/%E6%A8%A1%E5%9E%8B-api/%E5%AF%B9%E8%AF%9D%E8%A1%A5%E5%85%A8)：API 调用方式 ## 详细介绍 }> 凭借其实时语音对话功能，GLM-4-Voice 为用户提供高效流畅的沟通体验。GLM-4-Voice具备情感表达、方言生成和语速调节的能力，同时支持中英文双语。它的应用场景广泛，覆盖虚拟角色互动、智慧教育、智能旅游、儿童陪伴等多个领域。通过灵活的语音输入和输出能力，GLM-4-Voice 能够为用户提供高效且个性化的服务体验。在企业应用方面，GLM-4-Voice 可针对不同垂直行业定制专业的场景解决方案，帮助开发者以较低成本快速适应和融入大模型时代。 # 调用示例 **安装 SDK** ```bash theme={null} # 安装最新版本 pip install zai-sdk # 或指定版本 pip install zai-sdk==0.2.3 ``` **验证安装** ```python theme={null} import zai print(zai.__version__) ``` **调用示例** ```python theme={null} import wave import base64 from zai import ZhipuAiClient def save_audio_as_wav(audio_data, filepath): """保存音频数据为 WAV 文件（模型返回的语音用）""" with wave.open(filepath, 'wb') as wav_file: wav_file.setnchannels(1) wav_file.setsampwidth(2) wav_file.setframerate(44100) wav_file.writeframes(audio_data) print(f"Audio saved to {filepath}") def get_base64_from_wav(wav_path): """将 WAV 文件转为 Base64 编码字符串""" with open(wav_path, "rb") as f: audio_bytes = f.read() return base64.b64encode(audio_bytes).decode("utf-8") client = ZhipuAiClient(api_key="your_api_key") # 请填写您自己的 APIKey input_wav_path = "your_voice.wav" # 你的 WAV 文件路径 base64_voice = get_base64_from_wav(input_wav_path) response = client.chat.completions.create( model="glm-4-voice", messages=[ { "role": "user", "content": [ { "type": "text", "text": "你好，这是我的语音输入测试，请慢速复述一遍" }, { "type": "input_audio", "input_audio": { "data": base64_voice, "format": "wav" } } ] } ], stream=False ) print(response.choices[0].message.content) # 解析并保存模型返回的语音 try: audio_data = response.choices[0].message.audio['data'] decoded_data = base64.b64decode(audio_data) save_audio_as_wav(decoded_data, "output.wav") except Exception as e: print("处理音频失败：", e) ``` **安装 SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.5 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.5' ``` **调用示例** ```java theme={null} import ai.z.openapi.ZhipuAiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.InputAudio; import ai.z.openapi.service.model.MessageContent; import java.io.File; import java.io.IOException; import java.nio.file.Files; import java.util.Arrays; import java.util.Base64; import java.util.Collections; public class GLM4VoiceExample { public static void main(String[] args) throws IOException { ZhipuAiClient client = ZhipuAiClient.builder().ofZHIPU().apiKey("API_KEY").build(); File audioFile = new File("your_path.asr.wav"); byte[] audioBytes = Files.readAllBytes(audioFile.toPath()); String base64 = Base64.getEncoder().encodeToString(audioBytes); ChatCompletionCreateParams request = ChatCompletionCreateParams.builder().model("glm-4-voice") .messages(Collections.singletonList(ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content( Arrays.asList(MessageContent.builder().type("text").text("你好，这是我的语音输入测试").build(), MessageContent.builder().type("input_audio").inputAudio(InputAudio.builder() .data(base64).format("wav").build()).build())).build())).build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage().getContent(); System.out.println(reply); } else { System.err.println("错误: " + response.getMsg()); } } } ``` ```python theme={null} import zhipuai import wave import base64 def get_base64_from_wav(wav_path): """将 WAV 文件转为 Base64 编码字符串""" with open(wav_path, "rb") as f: audio_bytes = f.read() return base64.b64encode(audio_bytes).decode("utf-8") zhipuai.api_key = "your_api_key" # 请填写您自己的 APIKey input_wav_path = "your_voice.wav" base64_voice = get_base64_from_wav(input_wav_path) response = zhipuai.model_api.invoke( model="glm-4-voice", prompt="你好，这是我的语音输入测试", audio_data=base64_voice, audio_format="wav" ) print(response) ``` ```json theme={null} { "id": "20250605132035222ead927d794645", "object": "chat.completion", "created": 1749187238, "model": "glm-4-voice", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "你好！我听到了你的语音输入。有什么我可以帮助你的吗？", "audio": { "data": "707hTvovBW8zH3FPxH/1sCvgTXB/kJPQtJCqMIEgcCBUcDRQBZ...", "expires_at": 1749187238, "id": "f8d4bf4b-a376-48e6-8c81-54bb6a9a31d0" } }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 107, "completion_tokens": 340, "total_tokens": 447 }, "request_id": "20250605132035222ead927d794645" } ``` ## 用户并发权益 API 调用会受到速率限制，当前我们限制的维度是请求并发数量（在途请求任务数量）。不同等级的用户并发保障如下。 | V0 | V1 | V2 | V3 | | :- | :- | :- | :- | | 5 | 10 | 15 | 20 |