合并数据 (V2)
内测版

You can use this interface to add fields with or without data to an existing collection.

If you choose to add fields with data, you need to upload the data file to an AWS S3 bucket or a Zilliz Cloud Stage and ensure that the data share the same merge key as the source collection. The merged data will be stored in a new collection that you specify.
If you choose to add without data, the new fields will be created with null values.

You need to be a project admin or above to perform this operation. For details, refer to Merge Data.

POST

/v2/etl/merge

Base URL

该 API 的 Base URL 格式如下：

https://api.cloud.zilliz.com.cn

📘说明

控制面接口速率限制为单用户单接口每秒最多 20 个请求。

如果您在使用该接口过程中遇到任何问题，欢迎您联系 Zilliz Cloud 技术支持。

export BASE_URL="https://api.cloud.zilliz.com.cn"

参数

Authorizationstringheaderrequired

认证令牌，应为具备适当权限的 API 密钥。

示例值：Bearer {{TOKEN}}

请求体application/json

USE STAGE

clusterIdstring必填项

本操作目标 Collection 所在集群的名称。

示例值：in00-xxxxxxxxxxxxxxxxxx

dbNamestring必填项

本操作目标 Collection 所在数据库的名称。

collectionNamestring必填项

本操作目标 Collection 的名称。

destDbNamestring必填项

待创建 Collection 所在数据库的名称。

destCollectionNamestring必填项

待创建 Collection 的名称。该 Collection 将保存合并后的数据。

dataSourceobject

待与指定 Collection 合并的数据。您需要将 PARQUET 格式的数据文件上传到阿里云 OSS 存储桶或 Zilliz Cloud Stage 作为数据源，并提供数据文件的 URL 及访问凭据（如有）。

typestring

数据源的类型。

示例值：stage

stageNamestring

Zilliz Cloud 存储点的名称。此参数仅当您将 type 设置为 stage 时有效。有关创建存储点的详细信息，请参阅 Create Stage 操作的文档。

dataPathstring

待与指定 Collection 合并的 PARQUET 文件的 URL。

示例值：path/to/your/data.parquet

mergeFieldstring

数据合并操作类似于关系型数据库中的左连接操作，其合并字段充当源 Collection 和 Parquet 文件之间共享键的角色。您需要提供共享键的名称作为合并字段。合并字段必须在源 Collection 和 Parquet 文件中都存在。在一般情况下，您可以使用主键作为合并字段。

newFieldsarray

待创建 Collection 的各字段数据结构。该参数的值应该为字段数据结构数组。

[]newFieldsobject

The schema of a field to add.

fieldNamestring

Name of the current field to add.

dataTypestring

Data type of the current field to add.

枚举值：

paramsobject

Extra settings for the current field to add.

maxLengthinteger

VARCHAR 字段的最大长度。该参数仅当 dataType 设置为 VARCHAR 时可用。

USE OBJECT STORAGE

clusterIdstring必填项

本操作目标 Collection 所在集群的名称。

示例值：in00-xxxxxxxxxxxxxxxxxx

dbNamestring必填项

本操作目标 Collection 所在数据库的名称。

collectionNamestring必填项

本操作目标 Collection 的名称。

destDbNamestring必填项

待创建 Collection 所在数据库的名称。

destCollectionNamestring必填项

待创建 Collection 的名称。该 Collection 将保存合并后的数据。

dataSourceobject

typestring

数据源的类型。如果您选择使用阿里云 OSS 存储桶，请设置为 oss。

dataPathstring

待与指定 Collection 合并的 PARQUET 文件的 URL。

credentialobject

存放待合并数据文件的存储桶的访问凭据。此参数仅当您将 type 设置为 oss 时有效。

accessKeystring

存放待合并数据文件的存储桶的访问密钥。

secretKeystring

存放待合并数据文件的存储桶的访问密钥。

mergeFieldstring

数据合并操作类似于关系型数据库中的左连接操作，合并字段充当源 Collection 和 Parquet 文件之间共享键的角色。您需要提供共享键的名称作为合并字段。合并字段必须在源 Collection 和 Parquet 文件中都存在。在一般情况下，您可以使用主键作为合并字段。

newFieldsarray

待添加字段的 Schema。该参数应为一个字段 Schema 的数组。

[]newFieldsobject

The schema of a field to add.

fieldNamestring

Name of the current field to add.

dataTypestring

Data type of the current field to add.

枚举值：

paramsobject

Extra settings for the current field to add.

maxLengthinteger

VARCHAR 字段的最大长度。该参数仅当 dataType 设置为 VARCHAR 时可用。

export TOKEN="YOUR_API_KEY"

curl --request POST \
--url "${BASE_URL}/v2/etl/merge" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Content-Type: application/json" \
-d '{
    "clusterId": "in00-xxxxxxxxxxxxxxx",
    "dbName": "my_database",
    "collectionName": "my_collection",
    "destDbName": "my_database",
    "destCollectionName": "my_merged_collection",
    "dataSource": {
        "type": "stage",
        "stageName": "my_stage",
        "dataPath": "/path/to/your/data.parquet"
    },
    "mergeField": "id",
    "newFields": [
        {
            "fieldName": "my_field1",
            "dataType": "VARCHAR",
            "params": {
                "maxLength": 512
            }
        }
    ]
}'

响应200 - application/json

SUCCESS

codeinteger

响应码。

示例值：0

dataarray

响应负载，包含了当前操作创建的数据合并任务的 ID。

[]dataobject

A created data-merge job.

jobIdstring

当前创建的数据合并任务 ID。

FAILURE

返回错误消息。

codeinteger

响应码。

messagestring

错误描述。

SUCCESS

{
    "code": 0,
    "data": {
        "jobId": "job-xxxxxxxxxxxxxxxxxxxxx"
    }
}