解决提示词痛点:用AI智能体自动检测矛盾、优化格式的完整方案

本文涉及的产品
实时数仓Hologres,5000CU*H 100GB 3个月
实时计算 Flink 版,1000CU*H 3个月
智能开放搜索 OpenSearch行业算法版,1GB 20LCU 1个月
简介: 本文介绍了一种基于用户意图的提示词优化系统,利用多智能体架构实现自动化优化,提升少样本学习场景下的提示词质量与模型匹配度。系统通过专用智能体协同工作,识别并修复逻辑矛盾、格式不清及示例不一致等问题,结合Pydantic结构化数据模型与OpenAI评估框架,实现高效、可扩展的提示词优化流程。该方案显著减少了人工干预,增强了系统效率与输出一致性,适用于复杂研究任务与深度AI应用。

本文介绍了一个基于用户意图进行提示词优化的项目,该项目能够将预期用途与理想模型进行精确匹配。这种多智能体解决方案通过自动化处理,显著提升了提示词优化的可扩展性,有效减少了人工干预,特别适用于复杂的少样本学习场景。

近期,Andreessen Horowitz将研究定义为生成式AI的变革性应用场景,这一观点在OpenAI和xAI等主要技术提供商对深度研究领域的投资增长和战略聚焦中得到了充分体现。

考虑到研究推理任务通常具有运行时间长、计算成本高的特点,用户查询的精确性和与预期目标的一致性变得至关重要。为确保系统效率,歧义性问题需要在流程早期阶段得到有效解决。

针对这一挑战,OpenAI已将提示词优化技术集成到ChatGPT系统中。该系统采用智能体架构,利用成本效益更高的模型(如o4-mini)在启动深度研究任务之前完成查询的歧义消除和优化处理。这种方法通过确保输出与用户意图的高度对齐,显著提升了整体研究体验的质量。

OpenAI在其深度研究API中同样应用了类似的优化策略,通过部署o3-deep-research和o4-mini-deep-research等专用模型执行多步骤调查任务,在保证准确性的同时优化了执行效率。

这种技术演进的核心驱动力在于一个具有广泛影响力的应用场景——生成式AI在高级研究领域的深度应用已引起了业界的普遍关注。在技术实现层面,我们正见证着多模型编排技术的实际部署。现代系统不再依赖单一模型的处理能力,而是通过集成和协调多个专用模型来实现最优结果。这一趋势与NVIDIA提出的AI发展愿景高度一致,即通过编排小语言模型(SLMs)来构建未来的AI系统,其中每个模型都针对特定任务进行专门优化,以实现效率和性能的双重提升。

从模型到SDK的技术融合

当前,模型提供商正在将其服务范围扩展至高级命令行界面(CLI)领域,同时推动模型与软件开发工具包(SDK)的深度融合。OpenAI最近发布了一个综合性项目,该项目展示了三个关键技术领域的交汇点:提示词优化技术、多智能体编排架构,以及模型与用例的精确匹配策略。

最佳模型选择机制

系统采用OpenAI评估框架对提示词性能进行量化评估。评估过程基于20个精心标注的示例数据集,每个示例都包含了原始消息内容、开发者提示词、用户与助手的交互记录以及预期的修改方案。这些示例涵盖了多种常见问题类型,包括逻辑矛盾、少样本不一致性和格式歧义等典型场景。

系统通过Python字符串检查评分器执行评估流程,根据准确性、成本和处理速度等多维度指标调整智能体指令参数,并选择最优模型(如示例中的

o3

模型)。这种方法确保了系统能够准确识别并解决所有黄金输出中的问题,从而实现高质量的提示词优化效果。

核心提示词优化功能

提示词优化构成了系统的核心功能模块。该模块专门检测提示词中的常见问题,包括指令中的逻辑矛盾、格式规范的不清晰或缺失(特别是针对JSON或CSV等结构化输出),以及提示词规则与少样本示例之间的不一致性。系统在识别这些问题后,会自动重写提示词以修复相关缺陷,同时确保原始意图的完整保留。

此外,系统还具备根据一致性要求更新少样本示例的能力。实际应用中的解决方案包括添加明确的输出格式说明部分,或重新生成助手响应以确保一致性标准的达成。

多智能体协作架构实现

该项目通过基于Agents SDK的结构化工作流展示了多智能体协作的技术实现。系统部署了多个专用智能体,包括Dev-Contradiction-Checker(开发矛盾检查器)、Format-Checker(格式检查器)、Few-Shot-Consistency-Checker(少样本一致性检查器)、Dev-Rewriter(开发重写器)和Few-Shot-Rewriter(少样本重写器),这些智能体通过并行执行机制提升了系统的整体处理效率。

在工作流程中,检查器组件负责同时识别各类问题,而重写器组件则根据检测结果有条件地激活并执行相应的修复操作。整个协作过程通过Pydantic数据模型进行通信,确保了结构化输出的一致性和可靠性。这种协作架构体现了OpenAI Playground优化功能早期版本的设计理念,并为构建可扩展智能体系统提供了最佳实践参考。

系统架构概述

该优化系统采用多智能体协作方法,通过专用智能体之间的协同工作完成提示词的分析和重写任务。系统能够自动识别并处理多种常见问题类型,包括提示词指令中的矛盾、格式规范的缺失或不明确,以及提示词与少样本示例之间的不一致性。

系统实现基于OpenAI SDK与Evals框架的集成,构建了OpenAI提示词优化系统的早期技术原型。

系统运行需要以下技术组件:

openai

Python包、

openai-agents

包,以及在环境变量中配置的OpenAI API密钥(

OPENAI_API_KEY

)。

提示词优化系统采用协作式多智能体架构来执行提示词分析和改进任务。每个智能体都专门负责检测或重写特定类型的问题:

Dev-Contradiction-Checker(开发矛盾检查器)

该组件负责扫描提示词中的逻辑矛盾或不可能执行的指令。例如,它能够识别同一提示词中同时出现"仅使用正数"和"包含负数示例"这类相互冲突的要求。

Format-Checker(格式检查器)

该智能体专门识别提示词需要结构化输出(如JSON、CSV或Markdown格式)但未能明确指定格式要求的情况。该组件确保所有必要的字段、数据类型和格式规则都得到明确定义,从而避免输出格式的模糊性。

Few-Shot-Consistency-Checker(少样本一致性检查器)

该组件通过检查示例对话来验证助手响应是否真正遵循提示词中指定的规则。它能够捕获提示词要求与实际示例演示之间的不匹配情况,确保示例的规范性和一致性。

Dev-Rewriter(开发重写器)

在问题识别完成后,该智能体负责重写提示词以解决矛盾并澄清格式规范,同时确保原始意图的完整保留。重写过程遵循严格的逻辑规则,确保修改的有效性和准确性。

Few-Shot-Rewriter(少样本重写器)

该组件负责更新不一致的示例响应,使其与提示词中的规则保持对齐,确保所有示例都能正确符合更新后的开发者提示词要求。

通过这些智能体的协同工作,系统能够系统性地识别和修复提示词中的各类问题,实现高质量的自动化优化效果。

智能体间结构化数据交换机制

虽然智能体的输入和输出通常呈现非结构化特征,但通过在智能体之间实现结构化数据流,系统能够释放显著的优化潜力。为实现这一目标,系统采用Pydantic模型来为智能体的输入和输出定义精确的格式规范。这些模型不仅强制执行数据验证规则,还在整个工作流程中维护一致性标准,从而有效减少错误并提升处理效率。

智能体指令设计的最佳实践

构建高效智能体系统需要在指令设计中遵循以下核心原则:

精确的范围界定

每个智能体都应被限制在特定且边界明确的功能角色内。以矛盾检查器为例,其任务被明确定义为识别"真正的自相矛盾",同时澄清"重叠或冗余并不构成矛盾",这种明确的范围界定有助于保持智能体的专注度和执行效率。

系统化的逐步指导

智能体指令应当提供逻辑清晰的顺序化处理流程。格式检查器的设计exemplifies了这一原则,它首先对任务类型进行分类,然后再评估具体的格式规范,这种有序的分析方法确保了处理过程的系统性和可靠性。

关键概念的明确定义

通过预先定义关键概念来消除指令中的模糊性是确保智能体准确执行的重要措施。少样本一致性检查器配备了全面的"合规性评分标准",该标准详细阐述了合规性的判定条件,为准确评估提供了明确的指导框架。

明确的边界设定和排除条件

通过明确指定智能体的非职责范围来防范功能范围的无序扩展。少样本检查器包含了详细的"范围外"条目清单,例如忽略次要的文体变化,这种设计有效最小化了误报的发生概率。

严格的输出结构规范

系统要求所有智能体都必须遵循一致的响应格式,并提供完整的输出示例作为参考。这种跨智能体的标准化设计促进了多智能体处理管道中的无缝集成和高效协作。

通过将这些最佳实践融入智能体设计中,系统中的各个智能体变得更加可靠且具备良好的协作能力,从而增强了整体提示词优化系统的性能表现。后续章节将提供各智能体的完整定义和详细指令说明。

OpenAI评估仪表板

下图展示了OpenAI仪表板中的评估功能模块。通过执行相关代码(位于本文末尾),评估结果将被自动填充到仪表板中,为测试过程提供直观的可视化展示。这一功能的主要目标是实现提示词的自动优化并确定最佳匹配模型。

用户可以通过点击具体行项来查看详细评分信息,包括推理过程和评分器配置选项,为深入分析提供了便利的操作界面。

技术实现代码

以下代码来源于OpenAI官方存储库,已在Google Colab环境中验证可行性。可以直接复制Python代码并在Jupyter笔记本环境中执行。

 pip install openai-agents
pip install openai

###############################

# Import required modules
from openai import AsyncOpenAI
import asyncio
import json
import os
from enum import Enum
from typing import Any, List, Dict
from pydantic import BaseModel, Field
from agents import Agent, Runner, set_default_openai_client, trace

openai_client: AsyncOpenAI | None = None

def _get_openai_client() -> AsyncOpenAI:
    global openai_client
    if openai_client is None:
        openai_client = AsyncOpenAI(
            api_key=os.environ.get("OPENAI_API_KEY", "Your API Key"),
        )
    return openai_client

set_default_openai_client(_get_openai_client())

##################################

class Role(str, Enum):
    """Role enum for chat messages."""
    user = "user"
    assistant = "assistant"

class ChatMessage(BaseModel):
    """Single chat message used in few-shot examples."""
    role: Role
    content: str

class Issues(BaseModel):
    """Structured output returned by checkers."""
    has_issues: bool
    issues: List[str]

    @classmethod
    def no_issues(cls) -> "Issues":
        return cls(has_issues=False, issues=[])

class FewShotIssues(Issues):
    """Output for few-shot contradiction detector including optional rewrite suggestions."""
    rewrite_suggestions: List[str] = Field(default_factory=list)

    @classmethod
    def no_issues(cls) -> "FewShotIssues":
        return cls(has_issues=False, issues=[], rewrite_suggestions=[])

class MessagesOutput(BaseModel):
    """Structured output returned by `rewrite_messages_agent`."""

    messages: list[ChatMessage]

class DevRewriteOutput(BaseModel):
    """Rewriter returns the cleaned-up developer prompt."""

    new_developer_message: str

##################################

dev_contradiction_checker = Agent(
    name="contradiction_detector",
    model="gpt-4.1",
    output_type=Issues,
    instructions="""
    You are **Dev-Contradiction-Checker**.

    Goal
    Detect *genuine* self-contradictions or impossibilities **inside** the developer prompt supplied in the variable `DEVELOPER_MESSAGE`.

    Definitions
    • A contradiction = two clauses that cannot both be followed.
    • Overlaps or redundancies in the DEVELOPER_MESSAGE are *not* contradictions.

    What you MUST do
    1. Compare every imperative / prohibition against all others.
    2. List at most FIVE contradictions (each as ONE bullet).
    3. If no contradiction exists, say so.

    Output format (**strict JSON**)
    Return **only** an object that matches the `Issues` schema:

    ```json
    {"has_issues": <bool>,
    "issues": [
        "<bullet 1>",
        "<bullet 2>"
    ]
    }
    - has_issues = true IFF the issues array is non-empty.
    - Do not add extra keys, comments or markdown.
""",
)
format_checker = Agent(
    name="format_checker",
    model="gpt-4.1",
    output_type=Issues,
    instructions="""
    You are Format-Checker.

    Task
    Decide whether the developer prompt requires a structured output (JSON/CSV/XML/Markdown table, etc.).
    If so, flag any missing or unclear aspects of that format.

    Steps
    Categorise the task as:
    a. "conversation_only", or
    b. "structured_output_required".

    For case (b):
    - Point out absent fields, ambiguous data types, unspecified ordering, or missing error-handling.

    Do NOT invent issues if unsure. be a little bit more conservative in flagging format issues

    Output format
    Return strictly-valid JSON following the Issues schema:

    {
    "has_issues": <bool>,
    "issues": ["<desc 1>", "..."]
    }
    Maximum five issues. No extra keys or text.
""",
)
fewshot_consistency_checker = Agent(
    name="fewshot_consistency_checker",
    model="gpt-4.1",
    output_type=FewShotIssues,
    instructions="""
    You are FewShot-Consistency-Checker.

    Goal
    Find conflicts between the DEVELOPER_MESSAGE rules and the accompanying **assistant** examples.

    USER_EXAMPLES:      <all user lines>          # context only
    ASSISTANT_EXAMPLES: <all assistant lines>     # to be evaluated

    Method
    Extract key constraints from DEVELOPER_MESSAGE:
    - Tone / style
    - Forbidden or mandated content
    - Output format requirements

    Compliance Rubric - read carefully
    Evaluate only what the developer message makes explicit.

    Objective constraints you must check when present:
    - Required output type syntax (e.g., "JSON object", "single sentence", "subject line").
    - Hard limits (length ≤ N chars, language required to be English, forbidden words, etc.).
    - Mandatory tokens or fields the developer explicitly names.

    Out-of-scope (DO NOT FLAG):
    - Whether the reply "sounds generic", "repeats the prompt", or "fully reflects the user's request" - unless the developer text explicitly demands those qualities.
    - Creative style, marketing quality, or depth of content unless stated.
    - Minor stylistic choices (capitalisation, punctuation) that do not violate an explicit rule.

    Pass/Fail rule
    - If an assistant reply satisfies all objective constraints, it is compliant, even if you personally find it bland or loosely related.
    - Only record an issue when a concrete, quoted rule is broken.

    Empty assistant list ⇒ immediately return has_issues=false.

    For each assistant example:
    - USER_EXAMPLES are for context only; never use them to judge compliance.
    - Judge each assistant reply solely against the explicit constraints you extracted from the developer message.
    - If a reply breaks a specific, quoted rule, add a line explaining which rule it breaks.
    - Optionally, suggest a rewrite in one short sentence (add to rewrite_suggestions).
    - If you are uncertain, do not flag an issue.
    - Be conservative—uncertain or ambiguous cases are not issues.

    be a little bit more conservative in flagging few shot contradiction issues
    Output format
    Return JSON matching FewShotIssues:

    {
    "has_issues": <bool>,
    "issues": ["<explanation 1>", "..."],
    "rewrite_suggestions": ["<suggestion 1>", "..."] // may be []
    }
    List max five items for both arrays.
    Provide empty arrays when none.
    No markdown, no extra keys.
    """,
)
dev_rewriter = Agent(
    name="dev_rewriter",
    model="gpt-4.1",
    output_type=DevRewriteOutput,
    instructions="""
    You are Dev-Rewriter.

    You receive:
    - ORIGINAL_DEVELOPER_MESSAGE
    - CONTRADICTION_ISSUES (may be empty)
    - FORMAT_ISSUES (may be empty)

    Rewrite rules
    Preserve the original intent and capabilities.

    Resolve each contradiction:
    - Keep the clause that preserves the message intent; remove/merge the conflicting one.

    If FORMAT_ISSUES is non-empty:
    - Append a new section titled ## Output Format that clearly defines the schema or gives an explicit example.

    Do NOT change few-shot examples.

    Do NOT add new policies or scope.

    Output format (strict JSON)
    {
    "new_developer_message": "<full rewritten text>"
    }
    No other keys, no markdown.
""",
)
fewshot_rewriter = Agent(
    name="fewshot_rewriter",
    model="gpt-4.1",
    output_type=MessagesOutput,
    instructions="""
    You are FewShot-Rewriter.

    Input payload
    - NEW_DEVELOPER_MESSAGE (already optimized)
    - ORIGINAL_MESSAGES (list of user/assistant dicts)
    - FEW_SHOT_ISSUES (non-empty)

    Task
    Regenerate only the assistant parts that were flagged.
    User messages must remain identical.
    Every regenerated assistant reply MUST comply with NEW_DEVELOPER_MESSAGE.

    After regenerating each assistant reply, verify:
    - It matches NEW_DEVELOPER_MESSAGE. ENSURE THAT THIS IS TRUE.

    Output format
    Return strict JSON that matches the MessagesOutput schema:

    {
    "messages": [
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": "..."}
        ]
    }
    Guidelines
    - Preserve original ordering and total count.
    - If a message was unproblematic, copy it unchanged.
    """,
)

###############################

[
  {
    "focus": "contradiction_issues",
    "input_payload": {
      "developer_message": "Always answer in **English**.\nNunca respondas en inglés.",
      "messages": [
        {
          "role": "user",
          "content": "¿Qué hora es?"
        }
      ]
    },
    "golden_output": {
      "changes": True,
      "new_developer_message": "Always answer **in English**.",
      "new_messages": [
        {
          "role": "user",
          "content": "¿Qué hora es?"
        }
      ],
      "contradiction_issues": "Developer message simultaneously insists on English and forbids it.",
      "few_shot_contradiction_issues": "",
      "format_issues": "",
      "general_improvements": ""
    }
  },
  {
    "focus": "few_shot_contradiction_issues",
    "input_payload": {
      "developer_message": "Respond with **only 'yes' or 'no'** – no explanations.",
      "messages": [
        {
          "role": "user",
          "content": "Is the sky blue?"
        },
        {
          "role": "assistant",
          "content": "Yes, because wavelengths …"
        },
        {
          "role": "user",
          "content": "Is water wet?"
        },
        {
          "role": "assistant",
          "content": "Yes."
        }
      ]
    },
    "golden_output": {
      "changes": True,
      "new_developer_message": "Respond with **only** the single word \"yes\" or \"no\".",
      "new_messages": [
        {
          "role": "user",
          "content": "Is the sky blue?"
        },
        {
          "role": "assistant",
          "content": "yes"
        },
        {
          "role": "user",
          "content": "Is water wet?"
        },
        {
          "role": "assistant",
          "content": "yes"
        }
      ],
      "contradiction_issues": "",
      "few_shot_contradiction_issues": "Assistant examples include explanations despite instruction not to.",
      "format_issues": "",
      "general_improvements": ""
    }
  }
]

 ###############################

输出如下:

 [{'focus': 'contradiction_issues',
  'input_payload': {'developer_message': 'Always answer in **English**.\nNunca respondas en inglés.',
   'messages': [{'role': 'user', 'content': '¿Qué hora es?'}]},
  'golden_output': {'changes': True,
   'new_developer_message': 'Always answer **in English**.',
   'new_messages': [{'role': 'user', 'content': '¿Qué hora es?'}],
   'contradiction_issues': 'Developer message simultaneously insists on English and forbids it.',
   'few_shot_contradiction_issues': '',
   'format_issues': '',
   'general_improvements': ''}},
 {'focus': 'few_shot_contradiction_issues',
  'input_payload': {'developer_message': "Respond with **only 'yes' or 'no'** – no explanations.",
   'messages': [{'role': 'user', 'content': 'Is the sky blue?'},
    {'role': 'assistant', 'content': 'Yes, because wavelengths …'},
    {'role': 'user', 'content': 'Is water wet?'},
    {'role': 'assistant', 'content': 'Yes.'}]},
  'golden_output': {'changes': True,
   'new_developer_message': 'Respond with **only** the single word "yes" or "no".',
   'new_messages': [{'role': 'user', 'content': 'Is the sky blue?'},
    {'role': 'assistant', 'content': 'yes'},
    {'role': 'user', 'content': 'Is water wet?'},
    {'role': 'assistant', 'content': 'yes'}],
   'contradiction_issues': '',
   'few_shot_contradiction_issues': 'Assistant examples include explanations despite instruction not to.',
   'format_issues': '',
    'general_improvements': ''}}]

以下案例展示了系统如何处理包含矛盾指令的提示词:

 def _normalize_messages(messages: List[Any]) -> List[Dict[str, str]]:
    """Convert list of pydantic message models to JSON-serializable dicts."""
    result = []
    for m in messages:
        if hasattr(m, "model_dump"):
            result.append(m.model_dump())
        elif isinstance(m, dict) and "role" in m and "content" in m:
            result.append({"role": str(m["role"]), "content": str(m["content"])})
    return result

async def optimize_prompt_parallel(
    developer_message: str,
    messages: List["ChatMessage"],
) -> Dict[str, Any]:
    """
    Runs contradiction, format, and few-shot checkers in parallel,
    then rewrites the prompt/examples if needed.
    Returns a unified dict suitable for an API or endpoint.
    """

    with trace("optimize_prompt_workflow"):
        # 1. Run all checkers in parallel (contradiction, format, fewshot if there are examples)
        tasks = [
            Runner.run(dev_contradiction_checker, developer_message),
            Runner.run(format_checker, developer_message),
        ]
        if messages:
            fs_input = {
                "DEVELOPER_MESSAGE": developer_message,
                "USER_EXAMPLES": [m.content for m in messages if m.role == "user"],
                "ASSISTANT_EXAMPLES": [m.content for m in messages if m.role == "assistant"],
            }
            tasks.append(Runner.run(fewshot_consistency_checker, json.dumps(fs_input)))

        results = await asyncio.gather(*tasks)

        # Unpack results
        cd_issues: Issues = results[0].final_output
        fi_issues: Issues = results[1].final_output
        fs_issues: FewShotIssues = results[2].final_output if messages else FewShotIssues.no_issues()

        # 3. Rewrites as needed
        final_prompt = developer_message
        if cd_issues.has_issues or fi_issues.has_issues:
            pr_input = {
                "ORIGINAL_DEVELOPER_MESSAGE": developer_message,
                "CONTRADICTION_ISSUES": cd_issues.model_dump(),
                "FORMAT_ISSUES": fi_issues.model_dump(),
            }
            pr_res = await Runner.run(dev_rewriter, json.dumps(pr_input))
            final_prompt = pr_res.final_output.new_developer_message

        final_messages: list[ChatMessage] | list[dict[str, str]] = messages
        if fs_issues.has_issues:
            mr_input = {
                "NEW_DEVELOPER_MESSAGE": final_prompt,
                "ORIGINAL_MESSAGES": _normalize_messages(messages),
                "FEW_SHOT_ISSUES": fs_issues.model_dump(),
            }
            mr_res = await Runner.run(fewshot_rewriter, json.dumps(mr_input))
            final_messages = mr_res.final_output.messages

        return {
            "changes": True,
            "new_developer_message": final_prompt,
            "new_messages": _normalize_messages(final_messages),
            "contradiction_issues": "\n".join(cd_issues.issues),
            "few_shot_contradiction_issues": "\n".join(fs_issues.issues),
            "format_issues": "\n".join(fi_issues.issues),
        }

#######################################

async def example_contradiction():
    # A prompt with contradictory instructions
    prompt = """Quick-Start Card — Product Parser

Goal  
Digest raw HTML of an e-commerce product detail page and emit **concise, minified JSON** describing the item.

**Required fields:**  
name | brand | sku | price.value | price.currency | images[] | sizes[] | materials[] | care_instructions | features[]

**Extraction priority:**  
1. schema.org/JSON-LD blocks  
2. <meta> & microdata tags  
3. Visible DOM fallback (class hints: "product-name", "price")

** Rules:**  
- If *any* required field is missing, short-circuit with: `{"error": "FIELD_MISSING:<field>"}`.
- Prices: Numeric with dot decimal; strip non-digits (e.g., "1.299,00 EUR" → 1299.00 + "EUR").
- Deduplicate images differing only by query string. Keep ≤10 best-res.
- Sizes: Ensure unit tag ("EU", "US") and ascending sort.
- Materials: Title-case and collapse synonyms (e.g., "polyester 100%" → "Polyester").

**Sample skeleton (minified):**
```json
{"name":"","brand":"","sku":"","price":{"value":0,"currency":"USD"},"images":[""],"sizes":[],"materials":[],"care_instructions":"","features":[]}
Note: It is acceptable to output null for any missing field instead of an error ###"""

    result = await optimize_prompt_parallel(prompt, [])

    # Display the results
    if result["contradiction_issues"]:
        print("Contradiction issues:")
        print(result["contradiction_issues"])
        print()

    print("Optimized prompt:")
    print(result["new_developer_message"])

# Run the example
 await example_contradiction()

执行结果显示,系统成功识别出指令中的逻辑矛盾:"指令要求如果任何必需字段缺失,系统必须短路并返回带有字段名的错误,但随后与此矛盾地声明对于任何缺失字段输出null而不是错误是可以接受的。这两个要求不能同时遵循。"

 Contradiction issues:
The instructions mandate that if any required field is missing, the system must short-circuit and return an error with the field name (e.g., {"error": "FIELD_MISSING:<field>"}), but then contradict this by stating that it is acceptable to output null for any missing field instead of an error. These two requirements cannot both be followed.

Optimized prompt:
Quick-Start Card — Product Parser

Goal  
Digest raw HTML of an e-commerce product detail page and emit **concise, minified JSON** describing the item.

**Required fields:**  
name | brand | sku | price.value | price.currency | images[] | sizes[] | materials[] | care_instructions | features[]

**Extraction priority:**  
1. schema.org/JSON-LD blocks  
2. <meta> & microdata tags  
3. Visible DOM fallback (class hints: "product-name", "price")

**Rules:**  
- If any required field is missing, short-circuit with: {"error": "FIELD_MISSING:<field>"} and do not return a JSON skeleton.
- Prices: Numeric with dot decimal; strip non-digits (e.g., "1.299,00 EUR" → 1299.00 + "EUR").
- Deduplicate images that differ only by query string. Output up to 10 unique best-resolution images (URLs as strings).
- sizes[]: List of objects. Each object must have a "value" (string or number) and a "unit" (e.g., "EU", "US") property. Sort ascending by value.
- materials[]: List of strings. Each value should be title-cased and common synonyms should be collapsed (e.g., "polyester 100%" → "Polyester").
- care_instructions: String. If absent, trigger missing field error.
- features[]: List of strings. Each element should be a concise attribute or bullet-point feature.

## Output Format

If ALL required fields are present, output a minified JSON object with this shape:

{"name":"string","brand":"string","sku":"string","price":{"value":number,"currency":"string"},"images":["string"],"sizes":[{"value":string|number,"unit":"string"}],"materials":["string"],"care_instructions":"string","features":["string"]}

If ANY required field is missing, output:

 {"error": "FIELD_MISSING:<field>"}

系统自动生成的优化版本消除了这一矛盾,确保了指令的逻辑一致性

以下案例演示了系统如何处理少样本示例与提示词要求不一致的情况:

 async def example_fewshot_fix():
    prompt = "Respond **only** with JSON using keys `city` (string) and `population` (integer)."

    messages = [
        {"role": "user", "content": "Largest US city?"},
        {"role": "assistant", "content": "New York City"},
        {"role": "user", "content": "Largest UK city?"},
        {"role": "assistant", "content": "{\"city\":\"London\",\"population\":9541000}"}
    ]


    print("Few-shot examples before optimization:")
    print(f"User: {messages[0]['content']}")
    print(f"Assistant: {messages[1]['content']}")
    print(f"User: {messages[2]['content']}")
    print(f"Assistant: {messages[3]['content']}")
    print()

    # Call the optimization API
    result = await optimize_prompt_parallel(prompt, [ChatMessage(**m) for m in messages])

    # Display the results
    if result["few_shot_contradiction_issues"]:
        print("Inconsistency found:", result["few_shot_contradiction_issues"])
        print()

    # Show the optimized few-shot examples
    optimized_messages = result["new_messages"]
    print("Few-shot examples after optimization:")
    print(f"User: {optimized_messages[0]['content']}")
    print(f"Assistant: {optimized_messages[1]['content']}")
    print(f"User: {optimized_messages[2]['content']}")
    print(f"Assistant: {optimized_messages[3]['content']}")

# Run the example
 await example_fewshot_fix()

执行结果:

 Few-shot examples before optimization:
User: Largest US city?
Assistant: New York City
User: Largest UK city?
Assistant: {"city":"London","population":9541000}

Inconsistency found: The first assistant example does not use JSON or include both `city` and `population` keys as required by 'Respond **only** with JSON using keys `city` (string) and `population` (integer).'

Few-shot examples after optimization:
User: Largest US city?
Assistant: {"city":"New York City","population":8419000}
User: Largest UK city?
 Assistant: {"city":"London","population":9541000}

以下案例展示了系统如何处理格式规范不明确的提示词:

 async def example_format_issue():
    # A prompt with unclear or inconsistent formatting instructions
    prompt = """Task → Translate dense patent claims into 200-word lay summaries with a glossary.

Operating Steps:
1. Split the claim at semicolons, "wherein", or numbered sub-clauses.
2. For each chunk:
   a) Identify its purpose.
   b) Replace technical nouns with everyday analogies.
   c) Keep quantitative limits intact (e.g., "≥150 C").
3. Flag uncommon science terms with asterisks, and later define them.
4. Re-assemble into a flowing paragraph; do **not** broaden or narrow the claim’s scope.
5. Omit boilerplate if its removal does not alter legal meaning.

Output should follow a Markdown template:
- A summary section.
- A glossary section with the marked terms and their definitions.

Corner Cases:
- If the claim is over 5 kB, respond with CLAIM_TOO_LARGE.
- If claim text is already plain English, skip glossary and state no complex terms detected.

Remember: You are *not* providing legal advice—this is for internal comprehension only."""

    # Call the optimization API to check for format issues
    result = await optimize_prompt_parallel(prompt, [])

    # Display the results
    if result.get("format_issues"):
        print("Format issues found:", result["format_issues"])
        print()

    print("Optimized prompt:")
    print(result["new_developer_message"])

# Run the example
 await example_format_issue()

执行结果显示,系统识别出多个格式相关问题并提供了优化解决方案:

 Format issues found: Output format requires Markdown sections for summary and glossary, but formatting instructions for Markdown are implicit, not explicitly defined (e.g., should sections use headers?).
No template or example given for section titles or glossary formatting, which could lead to inconsistency across outputs.
How to handle glossary entries for terms with multiple asterisks or same term appearing multiple times is not specified.
No instruction on what to do if the input is exactly at 5 kB: is that CLAIM_TOO_LARGE or permissible?
Ambiguous handling if no glossary terms are detected: should the glossary section be omitted or included with a placeholder statement?

Optimized prompt:
Task → Translate dense patent claims into 200-word lay summaries with a glossary.

Operating Steps:
1. Split the claim at semicolons, "wherein", or numbered sub-clauses.
2. For each chunk:
   a) Identify its purpose.
   b) Replace technical nouns with everyday analogies.
   c) Keep quantitative limits intact (e.g., "≥150 C").
3. Flag uncommon science terms with asterisks, and later define them in a glossary.
4. Re-assemble into a flowing paragraph; do **not** broaden or narrow the claim’s scope.
5. Omit boilerplate if its removal does not alter legal meaning.

Output constraints:
- If the claim text exceeds 5 kB (greater than 5,120 characters), respond with CLAIM_TOO_LARGE.
- If the claim text is already in plain English, skip the glossary and state no complex terms detected.

Remember: You are *not* providing legal advice—this is for internal comprehension only.

## Output Format
Produce your output in Markdown, structured as follows:

### Summary
A 200-word layperson summary generated as described above.

### Glossary
A bullet list of all unique asterisk-marked terms from the summary. For each, provide a concise definition suitable for a non-expert. If a term appears multiple times, include it only once. If no terms are marked, include the message: "No complex or technical terms were detected; no glossary necessary."

#### Example Output

### Summary
[Concise lay summary here, with marked technical terms like *photolithography* and *substrate*.]

### Glossary
- *photolithography*: A process that uses light to transfer patterns onto a surface.
- *substrate*: The base layer or material on which something is built.

If no terms warrant inclusion:

### Glossary
 No complex or technical terms were detected; no glossary necessary.

系统生成的优化版本包含了详细的输出格式规范。

总结

本文介绍的多智能体提示词优化系统展示了现代AI技术在自动化文本处理和质量控制方面的重要进展。通过专用智能体的协同工作,系统能够有效识别和修复提示词中的常见问题,包括逻辑矛盾、格式规范不清和示例不一致等。这种技术方案不仅提高了提示词的质量和可靠性,还为大规模AI应用部署提供了可扩展的解决方案。

本文代码:

https://avoid.overfit.cn/post/a3c54ff3480a4c4da1c9c084e2d3a7a5

作者:Cobus Greyling

目录
相关文章
|
6天前
|
人工智能 缓存 监控
使用LangChain4j构建Java AI智能体:让大模型学会使用工具
AI智能体是大模型技术的重要演进方向,它使模型能够主动使用工具、与环境交互,以完成复杂任务。本文详细介绍如何在Java应用中,借助LangChain4j框架构建一个具备工具使用能力的AI智能体。我们将创建一个能够进行数学计算和实时信息查询的智能体,涵盖工具定义、智能体组装、记忆管理以及Spring Boot集成等关键步骤,并展示如何通过简单的对话界面与智能体交互。
112 1
|
6天前
|
存储 人工智能 Java
AI 超级智能体全栈项目阶段二:Prompt 优化技巧与学术分析 AI 应用开发实现上下文联系多轮对话
本文讲解 Prompt 基本概念与 10 个优化技巧,结合学术分析 AI 应用的需求分析、设计方案,介绍 Spring AI 中 ChatClient 及 Advisors 的使用。
314 130
AI 超级智能体全栈项目阶段二:Prompt 优化技巧与学术分析 AI 应用开发实现上下文联系多轮对话
|
6天前
|
存储 人工智能 Java
AI 超级智能体全栈项目阶段三:自定义 Advisor 与结构化输出实现以及对话记忆持久化开发
本文介绍如何在Spring AI中自定义Advisor实现日志记录、结构化输出、对话记忆持久化及多模态开发,结合阿里云灵积模型Qwen-Plus,提升AI应用的可维护性与功能性。
275 125
AI 超级智能体全栈项目阶段三:自定义 Advisor 与结构化输出实现以及对话记忆持久化开发
|
6天前
|
人工智能 Java API
AI 超级智能体全栈项目阶段一:AI大模型概述、选型、项目初始化以及基于阿里云灵积模型 Qwen-Plus实现模型接入四种方式(SDK/HTTP/SpringAI/langchain4j)
本文介绍AI大模型的核心概念、分类及开发者学习路径,重点讲解如何选择与接入大模型。项目基于Spring Boot,使用阿里云灵积模型(Qwen-Plus),对比SDK、HTTP、Spring AI和LangChain4j四种接入方式,助力开发者高效构建AI应用。
312 122
AI 超级智能体全栈项目阶段一:AI大模型概述、选型、项目初始化以及基于阿里云灵积模型 Qwen-Plus实现模型接入四种方式(SDK/HTTP/SpringAI/langchain4j)
|
6天前
|
存储 人工智能 数据可视化
从零构建能自我优化的AI Agent:Reflection和Reflexion机制对比详解与实现
AI能否从错误中学习?Reflection与Reflexion Agent通过生成-反思-改进循环,实现自我优化。前者侧重内容精炼,后者结合外部研究提升准确性,二者分别适用于创意优化与知识密集型任务。
77 9
从零构建能自我优化的AI Agent:Reflection和Reflexion机制对比详解与实现
|
8天前
|
人工智能 数据可视化 数据处理
AI智能体框架怎么选?7个主流工具详细对比解析
大语言模型需借助AI智能体实现“理解”到“行动”的跨越。本文解析主流智能体框架,从RelevanceAI、smolagents到LangGraph,涵盖技术门槛、任务复杂度、社区生态等选型关键因素,助你根据项目需求选择最合适的开发工具,构建高效、可扩展的智能系统。
207 3
AI智能体框架怎么选?7个主流工具详细对比解析
|
6天前
|
数据采集 人工智能 前端开发
Playwright与AI智能体的网页爬虫创新应用
厌倦重复测试与低效爬虫?本课程带您掌握Playwright自动化工具,并融合AI大模型构建智能体,实现网页自主分析、决策与数据提取,完成从脚本执行到智能架构的能力跃升。
|
7天前
|
设计模式 人工智能 API
AI智能体开发实战:17种核心架构模式详解与Python代码实现
本文系统解析17种智能体架构设计模式,涵盖多智能体协作、思维树、反思优化与工具调用等核心范式,结合LangChain与LangGraph实现代码工作流,并通过真实案例验证效果,助力构建高效AI系统。
88 7
|
7天前
|
传感器 人工智能 数据可视化
AI智能体框架怎么选?7个主流工具详细对比解析
大语言模型虽强,但缺乏行动力。AI智能体通过工具调用、环境感知与自主决策,实现从“理解”到“执行”的跨越。本文解析主流智能体框架,助你根据技术能力、任务复杂度与业务目标,选择最适合的开发工具,从入门到落地高效构建智能系统。(238字)
101 7
|
9天前
|
人工智能 Java API
构建基于Java的AI智能体:使用LangChain4j与Spring AI实现RAG应用
当大模型需要处理私有、实时的数据时,检索增强生成(RAG)技术成为了核心解决方案。本文深入探讨如何在Java生态中构建具备RAG能力的AI智能体。我们将介绍新兴的Spring AI项目与成熟的LangChain4j框架,详细演示如何从零开始构建一个能够查询私有知识库的智能问答系统。内容涵盖文档加载与分块、向量数据库集成、语义检索以及与大模型的最终合成,并提供完整的代码实现,为Java开发者开启构建复杂AI智能体的大门。
262 1