使用ACK推理网关基于域名路由到不同模型服务

2025-10-27 429

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 本文介绍如何在ACK推理网关中通过Gateway API配置基于不同域名的路由规则，实现将请求按域名分发至qwen和deepseek等不同模型服务，并提供完整的操作步骤与测试示例。

在实际使用场景中，基于网关的不同域名、将请求路由到不同的模型服务是常见的需求。本文简单介绍如何在ACK推理网关上使用Gateway API配置基于不同的域名路由到不同的模型服务。

前提条件

已经在集群中部署了qwen和deepseek 模型服务，并声明了如下两个对应的InferencePool资源

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferencePool
metadata:
  name: qwen-inference-pool
  namespace: default
spec:
  selector:
    app: qwen
  targetPortNumber: 8000
---
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferencePool
metadata:
  name: deepseek-inference-pool
  namespace: default
spec:
  selector:
    app: deepseek
  targetPortNumber: 8000

操作步骤

1、创建Gateway资源，创建网关

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: inference-gateway
spec:
  gatewayClassName: ack-gateway
  infrastructure:
    parametersRef:
      group: gateway.envoyproxy.io
      kind: EnvoyProxy
      name: custom-proxy-config
  listeners:
    - allowedRoutes:
        namespaces:
          from: Same
      name: http-llm
      port: 8080
      protocol: HTTP

2、创建两个HTTPRoute资源，在网关上创建两条HTTP路由

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: qwen-inference-route
  namespace: default
spec:
  parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: inference-gateway
  hostnames:
    - "qwen.test"
  rules:
    - backendRefs:
        - group: inference.networking.x-k8s.io
          kind: InferencePool
          name: qwen-inference-pool
          weight: 1
      matches:
        - path:
            type: PathPrefix
            value: /v1

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: deepseek-inference-route
  namespace: default
spec:
  parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: inference-gateway
  hostnames:
    - "deepseek.test"
  rules:
    - backendRefs:
        - group: inference.networking.x-k8s.io
          kind: InferencePool
          name: deepseek-inference-pool
          weight: 1
      matches:
        - path:
            type: PathPrefix
            value: /v1

两条HTTPRoute分别具有不同的hostnames字段，以区分不同的域名。

3、测试路由效果

export GATEWAY_HOST=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')

将请求带上qwen.test域名

curl -XPOST $GATEWAY_HOST:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Host: qwen.test" \
  -d '{
    "model": "qwen",
    "messages": [{"role": "user", "content": "你是谁？"}],
    "temperature": 0.7
  }'

预期结果

{"id":"chatcmpl-30793fc0-35bc-470e-b622-73c44312f690","object":"chat.completion","created":1761530271,"model":"qwen","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"我是来自阿里云的大规模语言模型，我叫通义千问。","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":22,"total_tokens":39,"completion_tokens":17,"prompt_tokens_details":null},"prompt_logprobs":null}

将请求带上deepseek.test域名

curl -XPOST $GATEWAY_HOST:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Host: deepseek.test" \
  -d '{
    "model": "deepseek",
    "messages": [{"role": "user", "content": "你是谁？"}],
    "temperature": 0.7
  }'

预期结果

{"id":"chatcmpl-56027d40-0183-402c-8525-7858213573db","object":"chat.completion","created":1761530292,"model":"deepseek","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"您好！我是由中国的深度求索（DeepSeek）公司开发的智能助手DeepSeek-R1。如您有任何任何问题，我会尽我所能为您提供帮助。\n</think>\n\n您好！我是由中国的深度求索（DeepSeek）公司开发的智能助手DeepSeek-R1。如您有任何任何问题，我会尽我所能为您提供帮助。","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":8,"total_tokens":81,"completion_tokens":73,"prompt_tokens_details":null},"prompt_logprobs":null}

相关实践学习

深入解析Docker容器化技术

Docker是一个开源的应用容器引擎，让开发者可以打包他们的应用以及依赖包到一个可移植的容器中，然后发布到任何流行的Linux机器上，也可以实现虚拟化，容器是完全使用沙箱机制，相互之间不会有任何接口。Docker是世界领先的软件容器平台。开发人员利用Docker可以消除协作编码时“在我的机器上可正常工作”的问题。运维人员利用Docker可以在隔离容器中并行运行和管理应用，获得更好的计算密度。企业利用Docker可以构建敏捷的软件交付管道，以更快的速度、更高的安全性和可靠的信誉为Linux和Windows Server应用发布新功能。在本套课程中，我们将全面的讲解Docker技术栈，从环境安装到容器、镜像操作以及生产环境如何部署开发的微服务应用。本课程由黑马程序员提供。     相关的阿里云产品：容器服务 ACK 容器服务 Kubernetes 版（简称 ACK）提供高性能可伸缩的容器应用管理能力，支持企业级容器化应用的全生命周期管理。整合阿里云虚拟化、存储、网络和安全能力，打造云端最佳容器化应用运行环境。了解产品详情: https://www.aliyun.com/product/kubernetes

使用ACK推理网关基于域名路由到不同模型服务

前提条件

操作步骤

容器服务

热门文章

最新文章

相关电子书