3

吴恩达来信: 智能体设计模式2:Reflection

 1 month ago
source link: https://zhuanlan.zhihu.com/p/689492556
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

吴恩达来信: 智能体设计模式2:Reflection

全球人工智能教育及研究领导者、DeepLearning.AI创始人
5 人赞同了该文章

Dear friends,

Last week, I described four design patterns for AI agentic workflows that I believe will drive significant progress this year: Reflection, Tool use, Planning and Multi-agent collaboration. Instead of having an LLM generate its final output directly, an agentic workflow prompts the LLM multiple times, giving it opportunities to build step by step to higher-quality output. In this letter, I'd like to discuss Reflection. For a design pattern that’s relatively quick to implement, I've seen it lead to surprising performance gains.

You may have had the experience of prompting ChatGPT/Claude/Gemini, receiving unsatisfactory output, delivering critical feedback to help the LLM improve its response, and then getting a better response. What if you automate the step of delivering critical feedback, so the model automatically criticizes its own output and improves its response? This is the crux of Reflection.

Take the task of asking an LLM to write code. We can prompt it to generate the desired code directly to carry out some task X. After that, we can prompt it to reflect on its own output, perhaps as follows:

Here’s code intended for task X: [previously generated code]
Check the code carefully for correctness, style, and efficiency, and give constructive criticism for how to improve it.

Sometimes this causes the LLM to spot problems and come up with constructive suggestions. Next, we can prompt the LLM with context including (i) the previously generated code and (ii) the constructive feedback and (iii) ask it to use the feedback to rewrite the code. This can lead to a better response. Repeating the criticism/rewrite process might yield further improvements. This self-reflection process allows the LLM to spot gaps and improve its output on a variety of tasks including producing code, writing text, and answering questions.

And we can go beyond self-reflection by giving the LLM tools that help evaluate its output; for example, running its code through a few unit tests to check whether it generates correct results on test cases or searching the web to double-check text output. Then it can reflect on any errors it found and come up with ideas for improvement.

Further, we can implement Reflection using a multi-agent framework. I've found it convenient to create two different agents, one prompted to generate good outputs and the other prompted to give constructive criticism of the first agent's output. The resulting discussion between the two agents leads to improved responses.

Reflection is a relatively basic type of agentic workflow, but I've been delighted by how much it improved my applications’ results in a few cases. I hope you will try it in your own work. If you’re interested in learning more about reflection, I recommend these papers:

I’ll discuss the other agentic design patterns in future letters.

Keep learning!

Andrew

P.S. New JavaScript short course! Learn to build full-stack web applications that use RAG in “JavaScript RAG Web Apps with LlamaIndex,” taught by Laurie Voss, VP of Developer Relations at LlamaIndex and co-founder of npm.

  • Build a RAG application for querying your own data.
  • Develop tools that interact with multiple data sources and use an agent to autonomously select the right tool for a given query.
  • Create a full-stack web app step by step that lets you chat with your data.
  • Dig further into production-ready techniques like how to persist your data, so you don’t need to reindex constantly.

Sign up here!


亲爱的朋友们,

上周,我介绍了AI智能体工作流的四种设计模式,我相信它们将在今年推动重大进展:Reflection, Tool use, Planning and Multi-agent collaboration。智能体工作流不是让LLM直接生成最终输出,而是多次提示LLM,使其有机会逐步构建更高质量的输出。在本周的来信中,我想重点讨论一下Reflection(反思)。对于实现速度相对较快的设计模式,我已经看到它带来了惊人的性能提升效果。

我们可能都有过这样的经历:提示ChatGPT/Claude/Gemini,得到不满意的输出,提供关键反馈以帮助LLM改进其响应,然后获得更好的响应。如果使用自动化交付关键反馈的步骤,让模型自动批评自己的输出并改进其响应,结果会怎样?这是Reflection的关键。

以要求LLM编写代码为例。我们可以提示它直接生成所需的代码来执行某个任务x。之后,我们可以提示它反思自己的输出,可能如下所示:

下面是任务X的代码:[之前生成的代码]
仔细检查代码的正确性、风格和效率,并对如何改进它提出建设性的批评。

有时这会使LLM发现问题并提出建设性建议。接下来,我们可以用上下文提示LLM,包括(i)以前生成的代码和(ii)建设性的反馈和(iii)要求它使用反馈来重写代码。这可以带来更好的反应。重复批评/重写过程可能会产生进一步的改进。这种自我反思过程使LLM能够发现差距并改善其在各种任务上的输出,包括生成代码,编写文本和回答问题。

我们可以通过给LLM提供工具来帮助评估其产出,从而超越自我反思;例如,通过几个单元测试来运行它的代码,以检查它是否在测试用例上生成正确的结果,或者搜索网页以检查文本输出。然后,它可以反思它发现的任何错误,并提出改进的想法。

此外,我们可以使用多智能体框架实现Reflection。我发现创建两个不同的智能体很方便,一个提示生成良好的输出,另一个提示对第一个智能体的输出给出建设性的批评。两个智能体之间的讨论推动了改进的响应。

Reflection是一种相对基本的智能体工作流类型,但我很高兴它在一些情况下改善了我的应用程序的结果。我希望你能在自己的工作中尝试一下。如果你有兴趣了解更多关于Reflection的知识,我推荐这些论文:
● “Self-Refine: Iterative Refinement with Self-Feedback,” Madaan et al., 2023
● “Reflexion: Language Agents with Verbal Reinforcement Learning,” Shinn et al., 2023
● “CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing,” Gou et al., 2024

我将在以后的来信中讨论其他智能体设计模式。

请不断学习!
吴恩达

P.S.新的JavaScript短期课程现已上线!在“JavaScript RAG web Apps with LlamaIndex”课程中学习使用RAG构建全栈web应用程序,该课程由LlamaIndex的开发者关系副总裁兼npm的联合创始人Laurie Voss教授。
● 构建一个用于查询自己的数据的RAG应用程序。
● 开发与多个数据源交互的工具,并使用智能体为给定查询自主选择正确的工具。
● 一步一步地创建一个全栈web应用程序,可以让你与数据聊天。
● 深入研究生产就绪 (production-ready)的技术,比如如何持久化数据,这样就不需要不断地重新索引了。

点此注册学习~


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK