StructArray
StructArray 用于有序存放元素数据类型为 Struct 的 Array。Array 中的每个 Struct 共享同一个 Schema,可包含多个向量和标量字段。
下面是某个包含 Struct Array 字段的 Collection 中提取的一条 Entity 记录。
{
'id': 0,
'title': 'Walden',
'title_vector': [0.1, 0.2, 0.3, 0.4, 0.5],
'author': 'Henry David Thoreau',
'year_of_publication': 1845,
// highlight-start
'chunks': [
{
'text': 'When I wrote the following pages, or rather the bulk of them...',
'text_vector': [0.3, 0.2, 0.3, 0.2, 0.5],
'chapter': 'Economy',
},
{
'text': 'I would fain say something, not so much concerning the Chinese and...',
'text_vector': [0.7, 0.4, 0.2, 0.7, 0.8],
'chapter': 'Economy'
}
]
// hightlight-end
}
在该示例中,chunks 是一个 Struct Array 字段。其中的每个 Struct 元素都有由相同的字段组成,包括 text、text_vector 和 chapter。
适用场景
从自动驾驶到多模态检索,现代 AI 应用越来越依赖嵌套且异构的数据。传统扁平数据模型很难表示复杂关系,例如「一个文档包含多个带注释的 Chunk」或「一个驾驶场景包含多个观测到的操作」。这正是 Milvus 中 StructArray 数据类型的适用场景。
要快速判断 StructArray 字段是否适合您的应用场景,请考虑以下问题:
-
您的数据是否具有层级结构,例如一个文档包含多个带注释的 Chunk。
-
搜索结果是否应该是文档本身,而不是 Chunk,就像上面的示例一样。
-
搜索结果是否包含大量重复 Entity,并且很难通过分组、去重和重排等技术获得最终结果。
如果您对上述问题的回答为“是”,则应使用 StructArray。
使用限制
-
数据类型
当您创建 Collection 时,您可以在定义 Array 字段时将它的元素数据类型指定为 Struct。但是您不可以为已有 Collection 添加 Struct Array 字段。另外,Zilliz Cloud 也不支持用 Struct 类型定义 Collection 的字段。
Array 中的 Struct 元素有着相同的 Schema。您需要在创建 Array 字段前定义该 Schema。
Struct 中可以包含向量和标量字段。目前,您可以在 Struct 的 Schema 定义中使用如下类型的字段:
字段类型
数据类型
Vector
FLOAT_VECTORScalar
VARCHARINT8/16/32/64FLOATDOUBLEBOOLEAN您需要确保 Collection Schema 和 Struct Schema 中定义的向量字段的数量不超过您的集群支持的数量上限。更多内容,可以参考使用限制。
-
Nullable 与默认值
Struct Array 字段不可为 Null 也不接受任何默认值。
-
Function
不支持使用 Function 从标量字段派生向量字段。
-
索引类型和相似度类型
Collection 中的所有向量字段都需要建立索引。对于 Struct Array 字段中的向量字段,Zilliz Cloud 使用 EmbeddingList 来组织各 Struct 元素中相同字段的向量,并为每个 EmbeddingList 创建索引。
您可以使用
AUTOINDEX或HNSW作为索引类型,并使用下表中列出的索引类型为 EmbeddingList 创建索引。索引类型
相似度类型
备注
AUTOINDEXMAX_SIM_COSINE适用于如下类型的 EmbeddingList:
- FLOAT_VECTOR
MAX_SIM_IPMAX_SIM_L2Struct Array 中的标量字段尚不支持索引。
-
Upsert 数据
不支持使用 Upsert 的合并模式更新 Struct 元素中的字段。但是您仍旧可以使用覆盖模式来更新 Struct Array 字段。
-
标量过滤
不支持在 Search 和 Query 中使用针对 Struct 元素中的标量字段的过滤表达式。
添加 Struct Array
在 Zilliz Cloud clusters 中创建 Struct Array 字段,需要先在 Collection 中创建一个 Array 类型的字段,并将其元素的数据类型设置为 Struct。具体流程如下:
-
在向 Collection Schema 中添加 Array 字段时,将该字段的数据类型设置为
DataType.ARRAY。 -
然后将该字段的
element_type属性设置为DataType.STRUCT。 -
为 Struct 元素创建 Schema,并向其中添加需要的字段。然后在 Struct Array 字段的
struct_schema属性中引用该 Schema。 -
设置 Struct Array 字段的
max_capacity属性,用于指定每条 Entity 中可以包含的 Struct 元素的最大数量。 -
(可选)您还可以为 Struct 元素中的字段设置
mmap.enabled属性,以便实现冷热数据存储的平衡。
如下示例演示了如何定义一个包含了 Struct Array 字段的 Collection Schema。
- Python
- Java
- Go
- NodeJS
- cURL
from pymilvus import MilvusClient, DataType
client = MilvusClient(
uri="https://{cluster-id}.{region}.vectordb.zilliz.com.cn:19530",
token="YOUR_CLUSTER_TOKEN"
)
schema = client.create_schema()
# add the primary field to the collection
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True, auto_id=True)
# add some scalar fields to the collection
schema.add_field(field_name="title", datatype=DataType.VARCHAR, max_length=512)
schema.add_field(field_name="author", datatype=DataType.VARCHAR, max_length=512)
schema.add_field(field_name="year_of_publication", datatype=DataType.INT64)
# add a vector field to the collection
schema.add_field(field_name="title_vector", datatype=DataType.FLOAT_VECTOR, dim=5)
# highlight-start
# Create a struct schema
struct_schema = client.create_struct_field_schema()
# add a scalar field to the struct
struct_schema.add_field("text", DataType.VARCHAR, max_length=65535)
struct_schema.add_field("chapter", DataType.VARCHAR, max_length=512)
# add a vector field to the struct with mmap enabled
struct_schema.add_field("text_vector", DataType.FLOAT_VECTOR, mmap_enabled=True, dim=5)
# reference the struct schema in an Array field with its
# element type set to \`DataType.STRUCT\`
schema.add_field("chunks", datatype=DataType.ARRAY, element_type=DataType.STRUCT,
struct_schema=struct_schema, max_capacity=1000)
# highlight-end
import io.milvus.v2.common.DataType;
import io.milvus.v2.service.collection.request.AddFieldReq;
import io.milvus.v2.service.collection.request.CreateCollectionReq;
CreateCollectionReq.CollectionSchema collectionSchema = CreateCollectionReq.CollectionSchema.builder()
.build();
collectionSchema.addField(AddFieldReq.builder()
.fieldName("id")
.dataType(DataType.Int64)
.isPrimaryKey(true)
.autoID(true)
.build());
collectionSchema.addField(AddFieldReq.builder()
.fieldName("title")
.dataType(DataType.VarChar)
.maxLength(512)
.build());
collectionSchema.addField(AddFieldReq.builder()
.fieldName("author")
.dataType(DataType.VarChar)
.maxLength(512)
.build());
collectionSchema.addField(AddFieldReq.builder()
.fieldName("year_of_publication")
.dataType(DataType.Int64)
.build());
collectionSchema.addField(AddFieldReq.builder()
.fieldName("title_vector")
.dataType(DataType.FloatVector)
.dimension(5)
.build());
Map<String, String> params = new HashMap<>();
params.put("mmap_enabled", "true");
collectionSchema.addField(AddFieldReq.builder()
.fieldName("chunks")
.dataType(DataType.Array)
.elementType(DataType.Struct)
.maxCapacity(1000)
.addStructField(AddFieldReq.builder()
.fieldName("text")
.dataType(DataType.VarChar)
.maxLength(65535)
.build())
.addStructField(AddFieldReq.builder()
.fieldName("chapter")
.dataType(DataType.VarChar)
.maxLength(512)
.build())
.addStructField(AddFieldReq.builder()
.fieldName("text_vector")
.dataType(DataType.FloatVector)
.dimension(VECTOR_DIM)
.typeParams(params)
.build())
.build());
// go
import { MilvusClient, DataType } from "@zilliz/milvus2-sdk-node";
const milvusClient = new MilvusClient("https://{cluster-id}.{region}.vectordb.zilliz.com.cn:19530");
const schema = [
{
name: "id",
data_type: DataType.INT64,
is_primary_key: true,
auto_id: true,
},
{
name: "title",
data_type: DataType.VARCHAR,
max_length: 512,
},
{
name: "author",
data_type: DataType.VARCHAR,
max_length: 512,
},
{
name: "year_of_publication",
data_type: DataType.INT64,
},
{
name: "title_vector",
data_type: DataType.FLOAT_VECTOR,
dim: 5,
},
// highlight-start
{
name: "chunks",
data_type: DataType.ARRAY,
element_type: DataType.STRUCT,
fields: [
{
name: "text",
data_type: DataType.VARCHAR,
max_length: 65535,
},
{
name: "chapter",
data_type: DataType.VARCHAR,
max_length: 512,
},
{
name: "text_vector",
data_type: DataType.FLOAT_VECTOR,
dim: 5,
mmap_enabled: true,
},
],
max_capacity: 1000,
},
// highlight-end
];
# restful
SCHEMA='{
"autoID": true,
"fields": [
{
"fieldName": "id",
"dataType": "Int64",
"isPrimary": true
},
{
"fieldName": "title",
"dataType": "VarChar",
"elementTypeParams": { "max_length": "512" }
},
{
"fieldName": "author",
"dataType": "VarChar",
"elementTypeParams": { "max_length": "512" }
},
{
"fieldName": "year_of_publication",
"dataType": "Int64"
},
{
"fieldName": "title_vector",
"dataType": "FloatVector",
"elementTypeParams": { "dim": "5" }
}
],
"structArrayFields": [
{
"name": "chunks",
"description": "Array of document chunks with text and vectors",
"elementTypeParams":{
"max_capacity": 1000
},
"fields": [
{
"fieldName": "text",
"dataType": "VarChar",
"elementTypeParams": { "max_length": "65535" }
},
{
"fieldName": "chapter",
"dataType": "VarChar",
"elementTypeParams": { "max_length": "512" }
},
{
"fieldName": "text_vector",
"dataType": "FloatVector",
"elementTypeParams": {
"dim": "5",
"mmap_enabled": "true"
}
}
]
}
]
}'
上述高亮部分演示了如何在 Collection Schema 中添加一个 Struct Array 字段。
设置索引参数
您需要为每个向量字段创建索引,无论该向量字段在 Collection Schema 中,还是在 Struct Array 字段中的 Struct 元素中。
如需为一个 EmbeddingList 创建索引,您需要将其索引类型设置为 AUTOINDEX ,然后使用 MAX_SIM_COSINE 作为相似度类型,以便让 Zilliz Cloud clusters 度量两个 EmbedingList 的相似度。
- Python
- Java
- Go
- NodeJS
- cURL
# Create index parameters
index_params = client.prepare_index_params()
# Create an index for the vector field in the collection
index_params.add_index(
field_name="title_vector",
index_type="AUTOINDEX",
metric_type="L2",
)
# highlight-start
# Create an index for the vector field in the element Struct
index_params.add_index(
field_name="chunks[text_vector]",
index_type="AUTOINDEX",
metric_type="MAX_SIM_COSINE",
)
# highlight-end
import io.milvus.v2.common.IndexParam;
List<IndexParam> indexParams = new ArrayList<>();
indexParams.add(IndexParam.builder()
.fieldName("title_vector")
.indexType(IndexParam.IndexType.AUTOINDEX)
.metricType(IndexParam.MetricType.L2)
.build());
indexParams.add(IndexParam.builder()
.fieldName("chunks[text_vector]")
.indexType(IndexParam.IndexType.AUTOINDEX)
.metricType(IndexParam.MetricType.MAX_SIM_COSINE)
.build());
// go
await milvusClient.createCollection({
collection_name: "books",
fields: schema,
});
const indexParams = [
{
field_name: "title_vector",
index_type: "AUTOINDEX",
metric_type: "L2",
},
// highlight-start
{
field_name: "chunks[text_vector]",
index_type: "AUTOINDEX",
metric_type: "MAX_SIM_COSINE",
},
// highlight-end
];
# restful
INDEX_PARAMS='[
{
"fieldName": "title_vector",
"indexName": "title_vector_index",
"indexType": "AUTOINDEX",
"metricType": "L2"
},
{
"fieldName": "chunks[text_vector]",
"indexName": "chunks_text_vector_index",
"indexType": "AUTOINDEX",
"metricType": "MAX_SIM_COSINE"
}
]'
创建 Collection
当 Schema 和索引参数都准备就绪后,您就可以创建一个带有 Struct Array 字段的 Collection 了。
- Python
- Java
- Go
- NodeJS
- cURL
client.create_collection(
collection_name="my_collection",
schema=schema,
index_params=index_params
)
import io.milvus.v2.service.collection.request.CreateCollectionReq;
CreateCollectionReq requestCreate = CreateCollectionReq.builder()
.collectionName("my_collection")
.collectionSchema(collectionSchema)
.indexParams(indexParams)
.build();
client.createCollection(requestCreate);
// go
await milvusClient.createCollection({
collection_name: "my_collection",
fields: schema,
indexes: indexParams,
});
# restful
curl -X POST "https://{cluster-id}.{region}.vectordb.zilliz.com.cn:19530/v2/vectordb/collections/create" \
-H "Content-Type: application/json" \
-H "Request-Timeout: 10" \
-d "{
\"collectionName\": \"my_collection\",
\"description\": \"A collection for storing book information with struct array chunks\",
\"schema\": $SCHEMA,
\"indexParams\": $INDEX_PARAMS
}"
插入数据
在 Collection 创建成功后,您可以参考如下代码向该 Collection 中插入数据。
- Python
- Java
- Go
- NodeJS
- cURL
# Sample data
data = {
'title': 'Walden',
'title_vector': [0.1, 0.2, 0.3, 0.4, 0.5],
'author': 'Henry David Thoreau',
'year_of_publication': 1845,
'chunks': [
{
'text': 'When I wrote the following pages, or rather the bulk of them...',
'text_vector': [0.3, 0.2, 0.3, 0.2, 0.5],
'chapter': 'Economy',
},
{
'text': 'I would fain say something, not so much concerning the Chinese and...',
'text_vector': [0.7, 0.4, 0.2, 0.7, 0.8],
'chapter': 'Economy'
}
]
}
# insert data
client.insert(
collection_name="my_collection",
data=[data]
)
import com.google.gson.Gson;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;
import io.milvus.v2.service.vector.request.InsertReq;
import io.milvus.v2.service.vector.response.InsertResp;
Gson gson = new Gson();
JsonObject row = new JsonObject();
row.addProperty("title", "Walden");
row.add("title_vector", gson.toJsonTree(Arrays.asList(0.1f, 0.2f, 0.3f, 0.4f, 0.5f)));
row.addProperty("author", "Henry David Thoreau");
row.addProperty("year_of_publication", 1845);
JsonArray structArr = new JsonArray();
JsonObject struct1 = new JsonObject();
struct1.addProperty("text", "When I wrote the following pages, or rather the bulk of them...");
struct1.add("text_vector", gson.toJsonTree(Arrays.asList(0.3f, 0.2f, 0.3f, 0.2f, 0.5f)));
struct1.addProperty("chapter", "Economy");
structArr.add(struct1);
JsonObject struct2 = new JsonObject();
struct2.addProperty("text", "I would fain say something, not so much concerning the Chinese and...");
struct2.add("text_vector", gson.toJsonTree(Arrays.asList(0.7f, 0.4f, 0.2f, 0.7f, 0.8f)));
struct2.addProperty("chapter", "Economy");
structArr.add(struct2);
row.add("chunks", structArr);
InsertResp insertResp = client.insert(InsertReq.builder()
.collectionName("my_collection")
.data(Collections.singletonList(row))
.build());
// go
{
id: 0,
title: "Walden",
title_vector: [0.1, 0.2, 0.3, 0.4, 0.5],
author: "Henry David Thoreau",
"year-of-publication": 1845,
chunks: [
{
text: "When I wrote the following pages, or rather the bulk of them...",
text_vector: [0.3, 0.2, 0.3, 0.2, 0.5],
chapter: "Economy",
},
{
text: "I would fain say something, not so much concerning the Chinese and...",
text_vector: [0.7, 0.4, 0.2, 0.7, 0.8],
chapter: "Economy",
},
],
},
];
await milvusClient.insert({
collection_name: "books",
data: data,
});
# restful
curl -X POST "https://{cluster-id}.{region}.vectordb.zilliz.com.cn:19530/v2/vectordb/entities/insert" \
-H "Content-Type: application/json" \
-H "Request-Timeout: 10" \
-d '{
"collectionName": "my_collection",
"data": [
{
"title": "Walden",
"title_vector": [0.1, 0.2, 0.3, 0.4, 0.5],
"author": "Henry David Thoreau",
"year_of_publication": 1845,
"chunks": [
{
"text": "When I wrote the following pages, or rather the bulk of them...",
"text_vector": [0.3, 0.2, 0.3, 0.2, 0.5],
"chapter": "Economy"
},
{
"text": "I would fain say something, not so much concerning the Chinese and...",
"text_vector": [0.7, 0.4, 0.2, 0.7, 0.8],
"chapter": "Economy"
}
]
}
]
}'
还需要更多数据?
import json
import random
from typing import List, Dict, Any
# Real classic books (title, author, year)
BOOKS = [
("Pride and Prejudice", "Jane Austen", 1813),
("Moby Dick", "Herman Melville", 1851),
("Frankenstein", "Mary Shelley", 1818),
("The Picture of Dorian Gray", "Oscar Wilde", 1890),
("Dracula", "Bram Stoker", 1897),
("The Adventures of Sherlock Holmes", "Arthur Conan Doyle", 1892),
("Alice's Adventures in Wonderland", "Lewis Carroll", 1865),
("The Time Machine", "H.G. Wells", 1895),
("The Scarlet Letter", "Nathaniel Hawthorne", 1850),
("Leaves of Grass", "Walt Whitman", 1855),
("The Brothers Karamazov", "Fyodor Dostoevsky", 1880),
("Crime and Punishment", "Fyodor Dostoevsky", 1866),
("Anna Karenina", "Leo Tolstoy", 1877),
("War and Peace", "Leo Tolstoy", 1869),
("Great Expectations", "Charles Dickens", 1861),
("Oliver Twist", "Charles Dickens", 1837),
("Wuthering Heights", "Emily Brontë", 1847),
("Jane Eyre", "Charlotte Brontë", 1847),
("The Call of the Wild", "Jack London", 1903),
("The Jungle Book", "Rudyard Kipling", 1894),
]
# Common chapter names for classics
CHAPTERS = [
"Introduction", "Prologue", "Chapter I", "Chapter II", "Chapter III",
"Chapter IV", "Chapter V", "Chapter VI", "Chapter VII", "Chapter VIII",
"Chapter IX", "Chapter X", "Epilogue", "Conclusion", "Afterword",
"Economy", "Where I Lived", "Reading", "Sounds", "Solitude",
"Visitors", "The Bean-Field", "The Village", "The Ponds", "Baker Farm"
]
# Placeholder text snippets (mimicking 19th-century prose)
TEXT_SNIPPETS = [
"When I wrote the following pages, or rather the bulk of them...",
"I would fain say something, not so much concerning the Chinese and...",
"It is a truth universally acknowledged, that a single man in possession...",
"Call me Ishmael. Some years ago—never mind how long precisely...",
"It was the best of times, it was the worst of times...",
"All happy families are alike; each unhappy family is unhappy in its own way.",
"Whether I shall turn out to be the hero of my own life, or whether that station...",
"You will rejoice to hear that no disaster has accompanied the commencement...",
"The world is too much with us; late and soon, getting and spending...",
"He was an old man who fished alone in a skiff in the Gulf Stream..."
]
def random_vector() -> List[float]:
return [round(random.random(), 1) for _ in range(5)]
def generate_chunk() -> Dict[str, Any]:
return {
"text": random.choice(TEXT_SNIPPETS),
"text_vector": random_vector(),
"chapter": random.choice(CHAPTERS)
}
def generate_record(record_id: int) -> Dict[str, Any]:
title, author, year = random.choice(BOOKS)
num_chunks = random.randint(1, 5) # 1 to 5 chunks per book
chunks = [generate_chunk() for _ in range(num_chunks)]
return {
"title": title,
"title_vector": random_vector(),
"author": author,
"year_of_publication": year,
"chunks": chunks
}
# Generate 1000 records
data = [generate_record(i) for i in range(1000)]
# Insert the generated data
client.insert(collection_name="my_collection", data=data)
针对 Struct Array 字段进行向量搜索
您可以如同在 Collection 中的向量字段上进行相似性搜索一样在 Struct Array 字段中的向量字段上进行相同操作。
值得注意的是,在搜索请求中指定向量字段名称(anns_field)时,需要按下方代码中的方式拼接 Struct Array 字段的名称及需要搜索的 Struct 元素中的向量字段名称。并使用 EmbeddingList 对象组织查询向量。
Zilliz Cloud 提供了 EmbeddingList 对象,帮助您在针对 Struct Array 字段中的 EmbeddingList 进行相似性搜索时组织查询向量。每个 EmbeddingList 需要包含至少一个向量,并返回一组结果。
EmbeddingList 仅可用于 search() 请求中。但不支持 Range Search 或 Grouping Search。也不支持在 search_iterator() 中使用。
- Python
- Java
- Go
- NodeJS
- cURL
from pymilvus.client.embedding_list import EmbeddingList
# each query embedding list triggers a single search
embeddingList1 = EmbeddingList()
embeddingList1.add([0.2, 0.9, 0.4, -0.3, 0.2])
embeddingList2 = EmbeddingList()
embeddingList2.add([-0.2, -0.2, 0.5, 0.6, 0.9])
embeddingList2.add([-0.4, 0.3, 0.5, 0.8, 0.2])
# a search with a single embedding list
results = client.search(
collection_name="my_collection",
data=[ embeddingList1 ],
anns_field="chunks[text_vector]",
search_params={"metric_type": "MAX_SIM_COSINE"},
limit=3,
output_fields=["chunks[text]"]
)
import io.milvus.v2.service.vector.request.data.EmbeddingList;
import io.milvus.v2.service.vector.request.data.FloatVec;
EmbeddingList embeddingList1 = new EmbeddingList();
embeddingList1.add(new FloatVec(new float[]{0.2f, 0.9f, 0.4f, -0.3f, 0.2f}));
EmbeddingList embeddingList2 = new EmbeddingList();
embeddingList2.add(new FloatVec(new float[]{-0.2f, -0.2f, 0.5f, 0.6f, 0.9f}));
embeddingList2.add(new FloatVec(new float[]{-0.4f, 0.3f, 0.5f, 0.8f, 0.2f}));
Map<String, Object> params = new HashMap<>();
params.put("metric_type", "MAX_SIM_COSINE");
SearchResp searchResp = client.search(SearchReq.builder()
.collectionName("my_collection")
.annsField("chunks[text_vector]")
.data(Collections.singletonList(embeddingList1))
.searchParams(params)
.limit(3)
.outputFields(Collections.singletonList("chunks[text]"))
.build());
// go
const embeddingList1 = [[0.2, 0.9, 0.4, -0.3, 0.2]];
const embeddingList2 = [
[-0.2, -0.2, 0.5, 0.6, 0.9],
[-0.4, 0.3, 0.5, 0.8, 0.2],
];
const results = await milvusClient.search({
collection_name: "books",
data: embeddingList1,
anns_field: "chunks[text_vector]",
search_params: { metric_type: "MAX_SIM_COSINE" },
limit: 3,
output_fields: ["chunks[text]"],
});
# restful
embeddingList1='[[0.2,0.9,0.4,-0.3,0.2]]'
embeddingList2='[[-0.2,-0.2,0.5,0.6,0.9],[-0.4,0.3,0.5,0.8,0.2]]'
curl -X POST "https://{cluster-id}.{region}.vectordb.zilliz.com.cn:19530/v2/vectordb/entities/search" \
-H "Content-Type: application/json" \
-H "Request-Timeout: 10" \
-d "{
\"collectionName\": \"my_collection\",
\"data\": [$embeddingList1],
\"annsField\": \"chunks[text_vector]\",
\"searchParams\": {\"metric_type\": \"MAX_SIM_COSINE\"},
\"limit\": 3,
\"outputFields\": [\"chunks[text]\"]
}"
在上述代码示例中,chunks[text_vector] 指代了 Struct 元素中的 text_vector 字段。您可以使用该格式来定义 anns_field 和 output_fields。
执行成功后,搜索结果为一个包含了三个与查询 EmbeddingList 最相似的三个 Entity。
搜索结果
# [
# [
# {
# 'id': 461417939772144945,
# 'distance': 0.9675756096839905,
# 'entity': {
# 'chunks': [
# {'text': 'The world is too much with us; late and soon, getting and spending...'},
# {'text': 'All happy families are alike; each unhappy family is unhappy in its own way.'}
# ]
# }
# },
# {
# 'id': 461417939772144965,
# 'distance': 0.9555778503417969,
# 'entity': {
# 'chunks': [
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'},
# {'text': 'He was an old man who fished alone in a skiff in the Gulf Stream...'},
# {'text': 'When I wrote the following pages, or rather the bulk of them...'},
# {'text': 'It was the best of times, it was the worst of times...'},
# {'text': 'The world is too much with us; late and soon, getting and spending...'}
# ]
# }
# },
# {
# 'id': 461417939772144962,
# 'distance': 0.9469035863876343,
# 'entity': {
# 'chunks': [
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'},
# {'text': 'The world is too much with us; late and soon, getting and spending...'},
# {'text': 'He was an old man who fished alone in a skiff in the Gulf Stream...'},
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'},
# {'text': 'The world is too much with us; late and soon, getting and spending...'}
# ]
# }
# }
# ]
# ]
您也可以在搜索请求中的 data 参数中添加多个查询 EmbeddingList,以便通过该请求中发起多个相似性搜索。
- Python
- Java
- Go
- NodeJS
- cURL
# a search with multiple embedding lists
results = client.search(
collection_name="my_collection",
data=[ embeddingList1, embeddingList2 ],
anns_field="chunks[text_vector]",
search_params={"metric_type": "MAX_SIM_COSINE"},
limit=3,
output_fields=["chunks[text]"]
)
print(results)
Map<String, Object> params = new HashMap<>();
params.put("metric_type", "MAX_SIM_COSINE");
SearchResp searchResp = client.search(SearchReq.builder()
.collectionName("my_collection")
.annsField("chunks[text_vector]")
.data(Arrays.asList(embeddingList1, embeddingList2))
.searchParams(params)
.limit(3)
.outputFields(Collections.singletonList("chunks[text]"))
.build());
List<List<SearchResp.SearchResult>> searchResults = searchResp.getSearchResults();
for (int i = 0; i < searchResults.size(); i++) {
System.out.println("Results of No." + i + " embedding list");
List<SearchResp.SearchResult> results = searchResults.get(i);
for (SearchResp.SearchResult result : results) {
System.out.println(result);
}
}
// go
const results2 = await milvusClient.search({
collection_name: "books",
data: [embeddingList1, embeddingList2],
anns_field: "chunks[text_vector]",
search_params: { metric_type: "MAX_SIM_COSINE" },
limit: 3,
output_fields: ["chunks[text]"],
});
# restful
curl -X POST "https://{cluster-id}.{region}.vectordb.zilliz.com.cn:19530/v2/vectordb/entities/search" \
-H "Content-Type: application/json" \
-H "Request-Timeout: 10" \
-d "{
\"collectionName\": \"my_collection\",
\"data\": [$embeddingList1, $embeddingList2],
\"annsField\": \"chunks[text_vector]\",
\"searchParams\": {\"metric_type\": \"MAX_SIM_COSINE\"},
\"limit\": 3,
\"outputFields\": [\"chunks[text]\"]
}"
此时,搜索结果中会返回与每一个查询 EmbeddingList 最相似的三个 Entity。
搜索结果
# [
# [
# {
# 'id': 461417939772144945,
# 'distance': 0.9675756096839905,
# 'entity': {
# 'chunks': [
# {'text': 'The world is too much with us; late and soon, getting and spending...'},
# {'text': 'All happy families are alike; each unhappy family is unhappy in its own way.'}
# ]
# }
# },
# {
# 'id': 461417939772144965,
# 'distance': 0.9555778503417969,
# 'entity': {
# 'chunks': [
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'},
# {'text': 'He was an old man who fished alone in a skiff in the Gulf Stream...'},
# {'text': 'When I wrote the following pages, or rather the bulk of them...'},
# {'text': 'It was the best of times, it was the worst of times...'},
# {'text': 'The world is too much with us; late and soon, getting and spending...'}
# ]
# }
# },
# {
# 'id': 461417939772144962,
# 'distance': 0.9469035863876343,
# 'entity': {
# 'chunks': [
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'},
# {'text': 'The world is too much with us; late and soon, getting and spending...'},
# {'text': 'He was an old man who fished alone in a skiff in the Gulf Stream...'},
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'},
# {'text': 'The world is too much with us; late and soon, getting and spending...'}
# ]
# }
# }
# ],
# [
# {
# 'id': 461417939772144663,
# 'distance': 1.9761409759521484,
# 'entity': {
# 'chunks': [
# {'text': 'It was the best of times, it was the worst of times...'},
# {'text': 'It is a truth universally acknowledged, that a single man in possession...'},
# {'text': 'Whether I shall turn out to be the hero of my own life, or whether that station...'},
# {'text': 'He was an old man who fished alone in a skiff in the Gulf Stream...'}
# ]
# }
# },
# {
# 'id': 461417939772144692,
# 'distance': 1.974656581878662,
# 'entity': {
# 'chunks': [
# {'text': 'It is a truth universally acknowledged, that a single man in possession...'},
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'}
# ]
# }
# },
# {
# 'id': 461417939772144662,
# 'distance': 1.9406685829162598,
# 'entity': {
# 'chunks': [
# {'text': 'It is a truth universally acknowledged, that a single man in possession...'}
# ]
# }
# }
# ]
# ]
在上述示例中,embeddingList1 包含了一个向量,而 embeddingList2 包含了两个向量。每个 EmbeddingList 都会触发一次独立的相似性搜索并返回 topK 个与之最相似的 Entity。
针对 Struct Array 字段进行标量过滤
您可以使用 Element Filter 和 Match 系列运算符,对 StructArray 中的标量子字段进行标量过滤。有关上述两类运算符的更多详细信息和示例,请参阅 StructArray 操作符。
Element Filter
这是 Entity 级别的过滤条件,用于检查某个 Entity 的 StructArray 字段中是否至少有一个元素满足谓词。例如,以下 Element Filter 会返回 text 子字段中至少有一个 Chunk 以 "Red" 开头的 Entity。
element_filter(chunks, $[text] LIKE "Red%")
您几乎可以在谓词中使用所有比较、范围和算术运算符。该谓词会逐元素求值,并且可以使用逻辑运算符组合针对同一元素的多个条件。有关详细信息,请参阅 基本操作符。
如果 Filtered Search 或 Query 请求中存在多个标量过滤表达式,请将 Element Filter 表达式放在所有 Entity 级别过滤表达式之后,如下所示。
# correct
id > 0 && element_filter(chunks, $[x] > 1)
# incorrect, resulting errors
element_filter(chunks, $[x] > 1) && id > 0
Match 系列运算符
Match 系列运算符也适用于 StructArray 字段。它们不只是检查是否存在满足条件的元素,还可以指定必须有多少个元素(或多大比例的元素)满足元素谓词。
-
MATCH_ANY(chunks, $[text] LIKE "Red%")返回
text子字段中至少有一个 Chunk 以"Red"开头的 Entity;从语义上看,这等价于element_filter。 -
MATCH_ALL(chunks, $[text] LIKE "Red%")返回所有 Chunk 的
text子字段都以"Red"开头的 Entity。 -
MATCH_LEAST(chunks, $[text] LIKE "Red%", k)返回
text子字段中至少有k个 Chunk 以"Red"开头的 Entity。 -
MATCH_MOST(chunks, $[text] LIKE "Red%", k)返回
text子字段中至多有k个 Chunk 以"Red"开头的 Entity。 -
MATCH_EXACT(chunks, $[text] LIKE "Red%", k)返回
text子字段中恰好有k个 Chunk 以"Red"开头的 Entity。
后续步骤
支持 Struct Array 对于 Zilliz Cloud 来说是一个大的跨越,提升了其处理复杂数据结构的能力。为了更好地理解 Struct Array 的使用场景,并最大化该特性带来的收益,建议您阅读使用 Struct Array 进行 Schema 设计。