在实际使用场景中,基于网关的不同域名、将请求路由到不同的模型服务是常见的需求。本文简单介绍如何在ACK推理网关上使用Gateway API配置基于不同的域名路由到不同的模型服务。
前提条件
- 已经在集群中部署了qwen和deepseek 模型服务,并声明了如下两个对应的InferencePool资源
apiVersion: inference.networking.x-k8s.io/v1alpha2 kind: InferencePool metadata: name: qwen-inference-pool namespace: default spec: selector: app: qwen targetPortNumber: 8000 --- apiVersion: inference.networking.x-k8s.io/v1alpha2 kind: InferencePool metadata: name: deepseek-inference-pool namespace: default spec: selector: app: deepseek targetPortNumber: 8000
操作步骤
1、创建Gateway资源,创建网关
apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: inference-gateway spec: gatewayClassName: ack-gateway infrastructure: parametersRef: group: gateway.envoyproxy.io kind: EnvoyProxy name: custom-proxy-config listeners: - allowedRoutes: namespaces: from: Same name: http-llm port: 8080 protocol: HTTP
2、创建两个HTTPRoute资源,在网关上创建两条HTTP路由
apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: qwen-inference-route namespace: default spec: parentRefs: - group: gateway.networking.k8s.io kind: Gateway name: inference-gateway hostnames: - "qwen.test" rules: - backendRefs: - group: inference.networking.x-k8s.io kind: InferencePool name: qwen-inference-pool weight: 1 matches: - path: type: PathPrefix value: /v1
apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: deepseek-inference-route namespace: default spec: parentRefs: - group: gateway.networking.k8s.io kind: Gateway name: inference-gateway hostnames: - "deepseek.test" rules: - backendRefs: - group: inference.networking.x-k8s.io kind: InferencePool name: deepseek-inference-pool weight: 1 matches: - path: type: PathPrefix value: /v1
两条HTTPRoute分别具有不同的hostnames字段,以区分不同的域名。
3、测试路由效果
export GATEWAY_HOST=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
将请求带上qwen.test域名
curl -XPOST $GATEWAY_HOST:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Host: qwen.test" \ -d '{ "model": "qwen", "messages": [{"role": "user", "content": "你是谁?"}], "temperature": 0.7 }'
预期结果
{"id":"chatcmpl-30793fc0-35bc-470e-b622-73c44312f690","object":"chat.completion","created":1761530271,"model":"qwen","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"我是来自阿里云的大规模语言模型,我叫通义千问。","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":22,"total_tokens":39,"completion_tokens":17,"prompt_tokens_details":null},"prompt_logprobs":null}
将请求带上deepseek.test域名
curl -XPOST $GATEWAY_HOST:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Host: deepseek.test" \ -d '{ "model": "deepseek", "messages": [{"role": "user", "content": "你是谁?"}], "temperature": 0.7 }'
预期结果
{"id":"chatcmpl-56027d40-0183-402c-8525-7858213573db","object":"chat.completion","created":1761530292,"model":"deepseek","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"您好!我是由中国的深度求索(DeepSeek)公司开发的智能助手DeepSeek-R1。如您有任何任何问题,我会尽我所能为您提供帮助。\n</think>\n\n您好!我是由中国的深度求索(DeepSeek)公司开发的智能助手DeepSeek-R1。如您有任何任何问题,我会尽我所能为您提供帮助。","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":8,"total_tokens":81,"completion_tokens":73,"prompt_tokens_details":null},"prompt_logprobs":null}