版本：BYOC 开发指南

vLLM Ranker
联系销售开启 BYOC

vLLM Ranker 利用 vLLM 推理框架，通过语义重排来提高搜索相关性。它代表了一种超越传统向量相似度的先进搜索结果排序方法。

vLLM Ranker 在精度和上下文至关重要的应用场景中尤其有价值，例如：

技术文档搜索需要对概念有深入理解
语义关系比关键词匹配更重要的研究数据库
需要将用户问题与相关解决方案进行匹配的客户支持系统
必须理解产品属性和用户意图的电子商务搜索

前提条件

在 Zilliz Cloud 中实现 vLLM Ranker 之前，请确保您具备以下条件：

一个 Zilliz Cloud Collection，其中包含一个 VARCHAR 字段，该字段包含待重排序的文本

一个具备重排序功能的正在运行的 vLLM 服务。有关设置 vLLM 服务的详细说明，请参考 vLLM 官方文档。要验证 vLLM 服务的可用性，可参考如下示例

# Replace YOUR_VLLM_ENDPOINT_URL with the actual URL (e.g., http://<service-ip>:<port>/v1/rerank)
# Replace 'BAAI/bge-reranker-base' if you deployed a different model

curl -X 'POST' \
  'YOUR_VLLM_ENDPOINT_URL' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "BAAI/bge-reranker-base",
  "query": "What is the capital of France?",
  "documents": [
    "The capital of Brazil is Brasilia.",
    "The capital of France is Paris.",
    "Horses and cows are both animals"
  ]
}'

成功的响应应返回按相关性分数排序的文档，类似于OpenAI重排API响应。

有关更多服务器参数和选项，请参考 vLLM OpenAI兼容服务器相关文档。

创建一个 vLLM Ranker 函数

要在您的 Zilliz Cloud 应用程序中使用 vLLM Ranker，请创建一个 Function（函数）对象，该对象指定重排序应如何操作。此函数将被传递给 Zilliz Cloud 搜索请求，以增强结果排序。

Python
Java
NodeJS
Go
cURL

from pymilvus import MilvusClient, Function, FunctionType

# Connect to your Milvus server
client = MilvusClient(
    uri="YOUR_CLUSTER_ENDPOINT"  # Replace with your Milvus server URI
)

# Create a vLLM Ranker function
vllm_ranker = Function(
    name="vllm_semantic_ranker",    # Choose a descriptive name
    input_field_names=["document"],  # Field containing text to rerank
    function_type=FunctionType.RERANK,  # Must be RERANK
    params={
        "reranker": "model",        # Specifies model-based reranking
        "provider": "vllm",         # Specifies vLLM service
        "queries": ["renewable energy developments"],  # Query text
        "endpoint": "http://localhost:8080",  # vLLM service address
        "max_client_batch_size": 32,              # Optional: batch size
        "truncate_prompt_tokens": 256,  # Optional: Use last 256 tokens
    }
)

// java

// nodejs

// go

# restful

vLLM Ranker 特有参数

以下参数是 vLLM Ranker 特有的：

参数	必选？	描述	值 / 示例
`reranker`	是	必须设置为`"model"`才能启用模型重排序。	`"model"`
`provider`	是	用于重排序的模型服务提供商。	`"vllm"`
`queries`	是	重排模型用于计算相关性得分的查询字符串列表。查询字符串的数量必须与搜索操作中的查询数量完全匹配（即使使用查询向量而非文本），否则将报错。	["search query"]
`endpoint`	是	您的vLLM服务地址。	`"http://localhost:8080"`
`max_client_batch_size`	否	由于模型服务可能无法一次性处理所有数据，因此这里设置了在多次请求中访问模型服务的批量大小。	`32` (默认)
`truncate_prompt_tokens`	否	如果设置为整数k，则仅使用提示中的最后k个词元（即左截断）。默认为None（即不进行截断）。	`256`

📘注释

对于所有 Model Ranker 共享的通用参数（例如，provider、queries），请参考创建 Model Ranker。

在标准向量搜索中使用

要将 vLLM Ranker 应用于标准向量搜索：

Python
Java
NodeJS
Go
cURL

# Execute search with vLLM reranking
results = client.search(
    collection_name="your_collection",
    data=["AI Research Progress", "What is AI"],  # Search queries
    anns_field="dense_vector",                   # Vector field to search
    limit=5,                                     # Number of results to return
    output_fields=["document"],                  # Include text field for reranking
    #  highlight-next-line
    ranker=vllm_ranker,                         # Apply vLLM reranking
    consistency_level="Bounded"
)

// java

// nodejs

// go

# restful

应用于混合搜索

vLLM Ranker 也可与混合搜索结合使用，以融合稠密和稀疏搜索：

Python
Java
NodeJS
Go
cURL

from pymilvus import AnnSearchRequest

# Configure dense vector search
dense_search = AnnSearchRequest(
    data=["AI Research Progress", "What is AI"],
    anns_field="dense_vector",
    param={},
    limit=5
)

# Configure sparse vector search  
sparse_search = AnnSearchRequest(
    data=["AI Research Progress", "What is AI"],
    anns_field="sparse_vector", 
    param={},
    limit=5
)

# Execute hybrid search with vLLM reranking
hybrid_results = client.hybrid_search(
    collection_name="your_collection",
    [dense_search, sparse_search],              # Multiple search requests
    ranker=vllm_ranker,                        # Apply vLLM reranking to combined results
    #  highlight-next-line
    limit=5,                                   # Final number of results
    output_fields=["document"]
)

// java

// nodejs

// go

# restful

前提条件​

创建一个 vLLM Ranker 函数​

vLLM Ranker 特有参数​

在标准向量搜索中使用​

应用于混合搜索​

前提条件

创建一个 vLLM Ranker 函数

vLLM Ranker 特有参数

在标准向量搜索中使用

应用于混合搜索