异步解析

产品简介

智谱文件解析 API 是一款面向开发者和企业的统一文件解析解决方案，实现了多格式文件解析、智能内容抽取、灵活结果输出的一站式服务。该 API 支持主流办公文档（PDF、Word、Excel、PPT）、结构化/非结构化数据文件（CSV、MD、TXT）以及多种图片格式（JPG、PNG等），能够快速提取文件中的文本、表格、图片和版面结构，生成标准化输出，便于直接接入下游业务系统或大模型处理链路。

应用场景

大模型前置解析
知识库构建管理
OCR识别及扫描件处理
行业垂直解决方案

将PDF、Word、PPT等复杂文档解析为结构化文本或Markdown，减少手工清洗，直接作为大模型输入，提升问答与推理效果。典型应用： 智能问答系统、文档对话、内容生成等。

能力支持

多样化解析能力整合

一套API选择三种解析服务

多格式文件支持

涵盖主流文档及图片格式

多输出方式

• 下载链接：图片 + Markdown 文件 + 包含布局信息的json文件
• 纯文本：适配大模型输入

文件大小灵活支持

不同服务最大可支持至 100M 文件

下载时效

解析结果下载有效期 24 小时

解析服务对比

服务类型	支持格式	最大文件大小	解析结果	计费方式	核心优势
Prime	pdf,docx,doc,xls xlsx,ppt,pptx,png jpg,jpeg,csv,txt md,html,bmp gif,webp,heic,eps icns,im,pcx,ppm tiff,xbm,heif,jp2	PDF/DOC/DOCX/PPT ≤100MB XLS/XLSX/CSV ≤10MB PNG/JPG/JPEG ≤20MB	图片 + Markdown 文件 + 包含布局信息的 json文件	按解析页数消耗后付费优惠后0.12 元/页	- 适配双栏、混排、三栏等复杂版式 - 高精度解析图文、公式、表格等元素 - 多模态能力强，适合高要求解析场景
Expert	pdf	≤100M	图片 + Markdown 文件	按页数计费，限时 6 折优惠优惠后0.012 元/页	- PDF、图片解析能力突出 - 表格与公式识别精度高 - 多领域表现稳定，兼顾精度与成本
Lite	pdf,docx,doc,xls xlsx,ppt,pptx,png jpg,jpeg,csv,txt,md	≤50M	纯文本（无图片）	按调用次数计费当前免费	- 支持常见办公文档解析 - 基础结构化能力完备，解析速度快 - 成本低，适合批量处理与轻量任务

解析耗时

解析时长与文档结构复杂度等因素密切相关，最终耗时以实际解析结果为准。

使用资源

接口文档：API 调用方式 接口使用方法

调用接口创建解析任务，获取 task_id；
保存并记录下 task_id；
使用该 task_id 轮询查询接口，获取解析结果。

字段属性

字段名称	字段描述
file	本地待解析文件
tool_type	使用的解析工具类型: `lite, expert, prime`
file_type	文件类型: `PDF, DOCX, DOC, XLS, XLSX, PPT, PPTX, PNG, JPG, JPEG, CSV, TXT, MD, HTML, EPUB, BMP, GIF, WEBP, HEIC, EPS, ICNS, IM, PCX, PPM, TIFF, XBM, HEIF, JP2`
taskId	文件解析任务 ID
format_type	结果返回格式类型: `text, download_link`

调用示例

调用示例里面的参数属性参考上方字段属性和对应的 API 文档。

创建文件解析任务

cURL
Python
Python(旧)
Java
响应示例

创建文件解析任务

curl --location --request POST 'https://open.bigmodel.cn/api/paas/v4/files/parser/create' \
--header  'Authorization: Bearer YOUR_API_KEY' \
--form 'file=@example-file' \
--form 'tool_type="prime"' \
--form 'file_type="PDF"'

异步获取解析结果

curl --request GET \
--url https://open.bigmodel.cn/api/paas/v4/files/parser/result/{taskIid}/{format_type} \
--header 'Authorization: Bearer YOUR_API_KEY'

# 安装最新版本
pip install zai-sdk

# 或指定版本
pip install zai-sdk==0.2.3

from zai import ZhipuAiClient

client = ZhipuAiClient(api_key="YOUR_API_KEY")
# 用于上传发起文件解析任务
# 返回task_id
response = client.file_parser.create(file=open('example.pdf', 'rb'), file_type='pdf', tool_type='lite')
task_id = getattr(response, "task_id", None)

# 获取文件内容抽取: format_type = text / download_link
# text模式最长返回1m以内的文本内容，download_link响应更快
res_response = client.file_parser.content(task_id=task_id, format_type="download_link")

print(response.json())  # 新版推荐用法
print(response.content.decode('utf-8')) # 旧版解码字节流用法依然支持

更新 SDK 至 2.1.5.20250825

# 安装最新版本
pip install zhipuai

# 或指定版本
pip install zhipuai==2.1.5.20250825

from pathlib import Path
from zhipuai import ZhipuAI

client = ZhipuAI(api_key="YOUR_API_KEY")
# 用于上传发起文件解析任务
# 返回task_id
response = client.file_parser.create(file=open('example.pdf', 'rb'), file_type='pdf', tool_type='lite')
print(response)

# 获取文件内容抽取
response = client.file_parser.content(task_id="your task_id", format_type="text")
print(response.content.decode('utf-8'))

安装 SDKMaven

<dependency>
    <groupId>ai.z.openapi</groupId>
    <artifactId>zai-sdk</artifactId>
    <version>0.3.5</version>
</dependency>

import ai.z.openapi.ZhipuAiClient;
import ai.z.openapi.service.fileparsing.FileParsingDownloadReq;
import ai.z.openapi.service.fileparsing.FileParsingDownloadResponse;
import ai.z.openapi.service.fileparsing.FileParsingResponse;
import ai.z.openapi.service.fileparsing.FileParsingUploadReq;
import ai.z.openapi.utils.StringUtils;

public class FileParsingExample {

    public static void main(String[] args) {
        // 初始化客户端
        ZhipuAiClient client = ZhipuAiClient.builder().ofZHIPU()
             .apiKey("YOUR_API_KEY")
             .build();

        try {
            // 示例1: 创建解析任务
            System.out.println("=== 文件解析任务创建示例 ===");
            String filePath = "your file path";
            String taskId = createFileParsingTaskExample(client, filePath, "pdf", "lite");

            // 示例2: 获取解析结果
            System.out.println("\n=== 获取解析结果示例 ===");
            getFileParsingResultExample(client, taskId);

        } catch (Exception e) {
            System.err.println("发生异常: " + e.getMessage());
            e.printStackTrace();
        }
    }

    /**
    * 示例：创建解析任务（上传文件并解析）
    *
    * @param client ZhipuAiClient 实例
    * @return 解析任务的 taskId
    */
    private static String createFileParsingTaskExample(ZhipuAiClient client, String filePath, String fileType, String toolType) {
        if (StringUtils.isEmpty(filePath)) {
            System.err.println("无效的文件路径。");
            return null;
        }
        try {
            FileParsingUploadReq uploadReq = FileParsingUploadReq.builder()
                    .filePath(filePath)
                    .fileType(fileType)  // 支持: pdf, docx 等
                    .toolType(toolType) // 解析工具类型: lite, prime, expert
                    .build();

            System.out.println("正在上传并创建解析任务...");
            FileParsingResponse response = client.fileParsing().createParseTask(uploadReq);
            if (response.isSuccess()) {
                if (null != response.getData().getTaskId()) {
                    String taskId = response.getData().getTaskId();
                    System.out.println("解析任务创建成功，TaskId: " + taskId);
                    return taskId;
                } else {
                    System.err.println("解析任务创建失败: " + response.getData().getMessage());
                }
            } else {
                System.err.println("解析任务创建失败: " + response.getMsg());
            }
        } catch (Exception e) {
            System.err.println("文件解析任务错误: " + e.getMessage());
        }
        // 返回 null 表示创建失败
        return null;
    }

    /**
    * 示例：获取解析结果
    *
    * @param client ZhipuAiClient 实例
    * @param taskId 解析任务ID
    */
    private static void getFileParsingResultExample(ZhipuAiClient client, String taskId) {
        if (taskId == null || taskId.isEmpty()) {
            System.err.println("无效的任务ID，无法获取解析结果。");
            return;
        }

        try {
            int maxRetry = 100;      // 最多轮询100次
            int intervalMs = 3000;  // 每次间隔3秒
            for (int i = 0; i < maxRetry; i++) {
                FileParsingDownloadReq downloadReq = FileParsingDownloadReq.builder()
                        .taskId(taskId)
                        .formatType("text")
                        .build();

                FileParsingDownloadResponse response = client.fileParsing().getParseResult(downloadReq);

                if (response.isSuccess()) {
                    String status = response.getData().getStatus();
                    System.out.println("当前任务状态: " + status);

                    if ("succeeded".equalsIgnoreCase(status)) {
                        System.out.println("解析结果获取成功！");
                        System.out.println("解析内容: " + response.getData().getContent());
                        System.out.println("内容下载链接: " + response.getData().getParsingResultUrl());
                        return;
                    } else if ("processing".equalsIgnoreCase(status)) {
                        System.out.println("解析进行中，请稍候...");
                        Thread.sleep(intervalMs);
                    } else {
                        System.out.println("解析任务异常，状态: " + status + "，消息: " + response.getData().getMessage());
                        return;
                    }
                } else {
                    System.err.println("解析结果获取失败: " + response.getMsg());
                    return;
                }
            }
            System.out.println("等待超时，请稍后自行查询解析结果。");
        } catch (Exception e) {
            System.err.println("获取解析结果时异常: " + e.getMessage());
        }
    }
}

创建文件解析任务响应

{
    "message": "任务创建成功",
    "success": true,
    "task_id": "task_id"
}

异步获取解析结果响应

{
    "status": "succeeded",
    "message": "结果获取成功",
    "content": "parsed result text",
    "task_id": "your task_id",
    "parsing_result_url": "download url"
}

注意事项

文件大小限制： 避免超出最大支持文件导致解析失败
优先选择适合场景的服务： 复杂文档选择对应服务
下载结果后及时保存： 下载链接 24 小时后失效
如需大模型处理： 建议直接获取纯文本输出

常见问题

Q：解析结果能保留原始图片吗？ A：Prime 与 Expert 支持图片保留（打包下载），Lite 服务不保留图片。 Q：下载链接失效怎么办？ A：需重新调用解析API生成新链接。 Q：为什么我的复杂 PDF 解析效果不好？ A：Lite 服务不适合复杂排版和 OCR 场景，请使用 Prime 服务或 Expert 服务。

开始使用

模型介绍

模型能力

模型工具

GLM 全模态知识库

平台服务

产品简介

应用场景

能力支持

多样化解析能力整合

多格式文件支持

多输出方式

文件大小灵活支持

下载时效

解析服务对比

解析耗时

使用资源

调用示例

创建文件解析任务

注意事项

常见问题

​产品简介

​应用场景

​能力支持

多样化解析能力整合

多格式文件支持

多输出方式

文件大小灵活支持

下载时效

​解析服务对比

​解析耗时

​使用资源

​调用示例

​创建文件解析任务

​注意事项

​常见问题

产品简介

应用场景

能力支持

解析服务对比

解析耗时

使用资源

调用示例

创建文件解析任务

注意事项

常见问题