This document explains how to configure upstream endpoints for Large Language Models (LLMs) within your gateway’s routing configuration.
Each endpoint within a cluster is defined by an id and can contain an llm_meta block for custom behavior.clusters:
clusters:
- name: "my_llm_cluster"
endpoints:
- id: "provider-1-main"
socket_address:
domains:
- api.deepseek.com
llm_meta:
# ... other LLM-specific configuration goes here ...
- id: "provider-2-fallback"
socket_address:
domains:
- api.openai.com/v1
llm_meta:
# ... other LLM-specific configuration goes here ...
llm_meta Configuration FieldsThe llm_meta block holds all the configuration specific to how the gateway should treat this LLM endpoint.
fallback
booleantrue, and if this endpoint fails, the gateway will attempt the next available endpoint. When the value is false, and if this endpoint fails, the process stops, and the last error is returned to the client.api_key
stringretry_policy
Type: object
Description: An object that defines the retry strategy to use if a request to this endpoint fails.
The retry_policy object contains the following fields:
name
stringconfig
objectHere are the built-in retry policies you can specify by name.
CountBased
This policy retries a fixed number of times with no delay between attempts.
Example:
retry_policy:
name: "CountBased"
config:
times: 2
ExponentialBackoff
This policy retries a specified number of times, increasing the delay between each subsequent retry. This is the recommended strategy for handling rate limits and transient network issues.
Example:
retry_policy:
name: "ExponentialBackoff"
config:
times: 3
initialInterval: 200ms
maxInterval: 5s
multiplier: 2.0
NoRetry
This is the default retry policy. This policy only performs the initial attempt and does not perform any retries if it fails.
This policy does not require a config block.
Example:
retry_policy:
name: "NoRetry"
This example shows a cluster with two endpoints.
The first (deepseek-primary) uses an ExponentialBackoff retry policy and will fall back to the next endpoint on failure.
The second (deepseek-fallback) is the final endpoint, using a simple CountBased retry policy and with fallback disabled.
clusters:
- name: deepseek_cluster
lb_policy: lb # No need to enable load balancing for this cluster.
endpoints:
# --- Primary Endpoint ---
- id: deepseek-primary
socket_address:
domains:
- api.deepseek.com
llm_meta:
# If all retries fail, move to the next endpoint.
fallback: true
# Your API key for this endpoint.
api_key: "your_deepseek_api_key_here"
# Use a robust retry strategy for the primary endpoint.
retry_policy:
name: ExponentialBackoff
config:
times: 3
initialInterval: 200ms
maxInterval: 8s
multiplier: 2.5
# --- Fallback Endpoint ---
- id: openai-fallback
socket_address:
domains:
- api.openai.com/v1
llm_meta:
# This is the last resort; do not fall back further.
fallback: false
# Your API key for this endpoint.
api_key: "your_openai_api_key_here"
# Use a simpler, faster retry for the fallback.
retry_policy:
name: CountBased
config:
times: 1