引导
在本文中,我们将展示如何将一个包含异步的api接口封装成tool并被agent在chat过程中调用执行的过程,具体包括如下几步:
- 了解Agent原理, Modelscope-Agent框架
- 测试并了解API特性
- 将API封装到一个tool里面,并测试效果
- 通过agent的方式进行调用
- agentfabric中使用该API
本文添加的api接口即为涂鸦作画
涂鸦作画能力
这是一种图生图能力,一张草图搭配一段描述生成内容丰富的图片能力。
举例说明譬如:
选用相关接口采用 dashscope的涂鸦作画接口(https://help.aliyun.com/zh/dashscope/developer-reference/tongyi-wanxiang-api-for-doodle?spm=a2c4g.11186623.0.0.4e534393H1eB3n)
环境准备
环境:
- 系统:任何系统均可,推荐使用魔搭镜像
- LLM: Dashscope 上的qwen-max(需要申请账号有免费token使用)
- 框架:Modelscope-Agent
其他:
框架熟悉
- 熟悉Agent
- 熟悉Modelscope-Agent代码框架:
https://github.com/modelscope/modelscope-agent
接口实验
- 异步请求生成图片
curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/image2image/image-synthesis' \ --header 'X-DashScope-Async: enable' \ --header 'Authorization: Bearer <your dashscope api token>' \ --header 'Content-Type: application/json' \ --header 'X-DashScope-OssResourceResolve: enable' \ --data '{ "input": { "prompt": "绿色的猫", "sketch_image_url": "http://synthesis-source.oss-accelerate.aliyuncs.com/lingji/datasets/QuickDraw_sketches_final/cat/4503626191994880.png" }, "model": "wanx-sketch-to-image-lite" }'
返回
{"output":{"task_status":"PENDING","task_id":"76a71d5b-8fc5-4d47-8ef8-c16af80951f3"},"request_id":"1ad6a3f4-8a80-9118-b805-4515376a9404"}
- 状态查询
curl -X GET \ --header 'Authorization: Bearer <your dashscope api token>' \ https://dashscope.aliyuncs.com/api/v1/tasks/76a71d5b-8fc5-4d47-8ef8-c16af80951f3
返回
{"request_id":"5441c445-ec10-963e-9c74-8907e507d1e2","output":{"task_id":"76a71d5b-8fc5-4d47-8ef8-c16af80951f3","task_status":"SUCCEEDED","submit_time":"2024-07-02 23:07:03.292","scheduled_time":"2024-07-02 23:07:03.317","end_time":"2024-07-02 23:07:15.401","results":[{"url":"https://dashscope-result-hz.oss-cn-hangzhou.aliyuncs.com/1d/db/20240702/96f6710c/0b3c9685-1683-4843-87b6-f0ce9bfe8972-1.png?Expires=1720019235&OSSAccessKeyId=LTAI5tQZd8AEcZX6KZV4G8qL&Signature=kdVTIwCb9OTr6V0vTRnnqWqpt4Q%3D"}],"task_metrics":{"TOTAL":1,"SUCCEEDED":1,"FAILED":0}},"usage":{"image_count":1}}
新建工具到Modelscope-Agent
- 注册新工具链路解读:
- register_tool:用于框架层面注册tool,并唯一标识名字
- description,name 以及parameters对齐 openai的tool calling格式,方便tool args的生成
- call 具体执行 tool的入口function
import os import time import json import requests from modelscope_agent.constants import BASE64_FILES, LOCAL_FILE_PATHS, ApiNames from modelscope_agent.tools.base import BaseTool, register_tool from modelscope_agent.utils.utils import get_api_key, get_upload_url from requests.exceptions import RequestException, Timeout MAX_RETRY_TIMES = 3 WORK_DIR = os.getenv('CODE_INTERPRETER_WORK_DIR', '/tmp/ci_workspace') @register_tool('sketch_to_image') class SketchToImage(BaseTool): description = '调用sketch_to_image api通过图片加文本生成图片' name = 'sketch_to_image' parameters: list = [{ 'name': 'input.sketch_image_url', 'description': '用户上传的照片的相对路径', 'required': True, 'type': 'string' }, { 'name': 'input.prompt', 'description': '详细描述了希望生成的图像具有什么样的特点', 'required': True, 'type': 'string' }] def call(self, params: str, **kwargs) -> str: pass
- 添加涂鸦作画能力核心链路到tool:
a. 解析入参,生成图片url
def _parse_input(self, *args, **kwargs): kwargs = super()._parse_files_input(*args, **kwargs) restored_dict = {} for key, value in kwargs.items(): if '.' in key: # Split keys by "." and create nested dictionary structures keys = key.split('.') temp_dict = restored_dict for k in keys[:-1]: temp_dict = temp_dict.setdefault(k, {}) temp_dict[keys[-1]] = value else: # if the key does not contain ".", directly store the key-value pair into restored_dict restored_dict[key] = value kwargs = restored_dict image_path = kwargs['input'].pop('sketch_image_url', None) if image_path and image_path.endswith(('.jpeg', '.png', '.jpg')): # 生成 image_url,然后设置到 kwargs['input'] 中 # 复用dashscope公共oss if LOCAL_FILE_PATHS not in kwargs: image_path = f'file://{os.path.join(WORK_DIR,image_path)}' else: image_path = f'file://{kwargs["local_file_paths"][image_path]}' image_url = get_upload_url( model= 'wanx-sketch-to-image-lite', file_to_upload=image_path, api_key=os.environ.get('DASHSCOPE_API_KEY', '')) kwargs['input']['sketch_image_url'] = image_url else: raise ValueError('请先上传一张正确格式的图片') kwargs['model'] = 'wanx-sketch-to-image-lite' print('草图生图的tool参数:', kwargs) return kwargs
b.调用异步请求生成接口
def call(self, params: str, **kwargs) -> str: params = self._verify_args(params) if isinstance(params, str): return 'Parameter Error' if BASE64_FILES in kwargs: params[BASE64_FILES] = kwargs[BASE64_FILES] remote_parsed_input = self._parse_input(**params) remote_parsed_input = json.dumps(remote_parsed_input) url = kwargs.get( 'url', 'https://dashscope.aliyuncs.com/api/v1/services/aigc/image2image/image-synthesis' ) try: self.token = get_api_key(ApiNames.dashscope_api_key, **kwargs) except AssertionError: raise ValueError('Please set valid DASHSCOPE_API_KEY!') retry_times = MAX_RETRY_TIMES headers = { 'Content-Type': 'application/json', 'Authorization': f'Bearer {self.token}', 'X-DashScope-Async': 'enable' } # 解析oss headers['X-DashScope-OssResourceResolve'] = 'enable' while retry_times: retry_times -= 1 try: response = requests.request( 'POST', url=url, headers=headers, data=remote_parsed_input) if response.status_code != requests.codes.ok: response.raise_for_status() origin_result = json.loads(response.content.decode('utf-8')) self.final_result = origin_result return self._get_dashscope_image_result() except Timeout: continue except RequestException as e: raise ValueError( f'Remote call failed with error code: {e.response.status_code},\ error message: {e.response.content.decode("utf-8")}') raise ValueError( 'Remote call max retry times exceeded! Please try to use local call.' )
c.调用同步请求结果轮训接口
def _get_dashscope_result(self): if 'task_id' in self.final_result['output']: task_id = self.final_result['output']['task_id'] get_url = f'https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}' get_header = {'Authorization': f'Bearer {self.token}'} retry_times = MAX_RETRY_TIMES while retry_times: retry_times -= 1 try: response = requests.request( 'GET', url=get_url, headers=get_header) if response.status_code != requests.codes.ok: response.raise_for_status() origin_result = json.loads(response.content.decode('utf-8')) get_result = origin_result return get_result except Timeout: continue except RequestException as e: raise ValueError( f'Remote call failed with error code: {e.response.status_code},\ error message: {e.response.content.decode("utf-8")}') raise ValueError( 'Remote call max retry times exceeded! Please try to use local call.' ) def _get_dashscope_image_result(self): try: result = self._get_dashscope_result() while True: result_data = result output = result_data.get('output', {}) task_status = output.get('task_status', '') if task_status == 'SUCCEEDED': print('任务已完成') # 取出result里url的部分,提高url图片展示稳定性 output_url = result['output']['results'][0]['url'] return f'![IMAGEGEN]({output_url})' elif task_status == 'FAILED': raise Exception(output.get('message', '任务失败,请重试')) else: # 继续轮询,等待一段时间后再次调用 time.sleep(0.5) # 等待 0.5 秒钟 result = self._get_dashscope_result() print(f'Running:{result}') except Exception as e: raise Exception('get Remote Error:', str(e))
- 测试用例确保功能完善
import os import pytest from modelscope_agent.tools.dashscope_tools.sketch_to_image import SketchToImage from modelscope_agent.agents.role_play import RolePlay # NOQA IS_FORKED_PR = os.getenv('IS_FORKED_PR', 'false') == 'true' @pytest.mark.skipif(IS_FORKED_PR, reason='only run modelscope-agent main repo') def test_sketch_to_image(): # 图片默认上传到default work_dir params = """{'input.sketch_image_url': 'sketch.png', 'input.prompt': '绿色的猫'}""" style_repaint = SketchToImage() res = style_repaint.call(params) assert (res.startswith('![IMAGEGEN](http')) @pytest.mark.skipif(IS_FORKED_PR, reason='only run modelscope-agent main repo') def test_sketch_to_image_role(): role_template = '你扮演一个绘画家,用尽可能丰富的描述调用工具绘制各种风格的图画。' llm_config = {'model': 'qwen-max', 'model_server': 'dashscope'} # input tool args function_list = ['sketch_to_image'] bot = RolePlay( function_list=function_list, llm=llm_config, instruction=role_template) response = bot.run('[上传文件 "sketch.png"],我想要一只绿色耳朵带耳环的猫') text = '' for chunk in response: text += chunk print(text) assert isinstance(text, str)
- 额外添加到modelscope_agent/tools/base.py 中注册
# register_map = { 'sketch_to_image': 'SketchToImage', 'amap_weather': 'AMAPWeather', 'storage': 'Storage', 'web_search': 'WebSearch', 'image_gen': 'TextToImageTool', 'image_gen_lite': 'TextToImageLiteTool',
- 类引用路径添加懒加载:modelscope_agent/tools/__init__.py
import sys from ..utils import _LazyModule from .contrib import * # noqa F403 _import_structure = { 'amap_weather': ['AMAPWeather'], 'code_interpreter': ['CodeInterpreter'], 'contrib': ['AliyunRenewInstanceTool'], 'dashscope_tools': [ 'ImageEnhancement', 'TextToImageTool', 'TextToImageLiteTool', 'ParaformerAsrTool', 'QWenVL', 'SambertTtsTool', 'StyleRepaint', 'WordArtTexture', 'SketchToImage' ],
- 提交代码pull request到代码仓库
在AgentFabric中应用
Agentfabric能力介绍:
应用新增涂鸦作画能力:
- 在代码中修改tool_config.json,确保新tool能够被调用:apps/agentfabric/config/tool_config.json
- 在agentfabric中添加并使用新生成的tool
总结
- agent的可以通过外部tool,将LLM能力快速扩展到更广泛的应用场景
- tool的正确调用依赖LLM的指令理解和指令生成,因此需要正确的定义指令
- tool的调用在agent内部是同步的,因此对于一些调用异步接口需要等待的场景可以有多种办法,一种是在单步调用tool的时候等待结果产生并抛出结果,还有就是通过多个tool来让agent帮助你去做结果的查询。