Support for Orchestrating Agents: Routines and Handoffs like it was done in swarm #3233

amiranvarov · 2024-10-12T13:50:48Z

Feature Description

Hey there!

OpenAI recently released their framework on Orchestrating Agents - Swarm. Full link to the post: https://cookbook.openai.com/examples/orchestrating_agents#handoff-functions

I tried to use Claude to adapt to my stack, with AI SDK, hoping it will work using AI SDK's tools property, but it didn't work that well. Seems like there is still need to make some hacks and workarouds. It would be so nice if AI SDK would somehow provide an API to implement such swarm more elegantly.

Use Case

Well it would be very nice to organize tools by agents. So it would be like a seperation of concerns, where each agent would have it's own system prompt, and tools, and have full context of conversation of all agents that were used.

Additional context

Here is Claude generated code for to somehow use with AI SDK. I didn't run this code, and i expect it to not work. But it gives some idea of ugly hacky solution.

mport { generateObject } from "ai";
import { z } from "zod";
import { createOpenAI } from "@ai-sdk/openai";
import { ServiceLocator } from "../../../ServiceLocator";
import { AiTranslationResult } from "./types";

class Agent {
  constructor(
    public name: string,
    public instructions: string,
    public tools: Function[],
    public model: string = "gpt-4o-mini"
  ) {}
}

async function runFullTurn(agent: Agent, messages: any[], serviceLocator: ServiceLocator): Promise<{ agent: Agent; messages: any[] }> {
  const env = serviceLocator.getEnv();
  const openai = createOpenAI({
    apiKey: env.OPENAI_API_KEY,
    compatibility: "strict",
  });

  let currentAgent = agent;
  const numInitMessages = messages.length;
  messages = [...messages];

  while (true) {
    const toolSchemas = currentAgent.tools.map(functionToSchema);
    const tools = Object.fromEntries(currentAgent.tools.map(tool => [tool.name, tool]));

    const response = await generateObject({
      model: openai(currentAgent.model),
      output: "array",
      schema: z.object({
        content: z.string().optional(),
        tool_calls: z.array(z.object({
          function: z.object({
            name: z.string(),
            arguments: z.string(),
          }),
          id: z.string(),
        })).optional(),
      }),
      system: currentAgent.instructions,
      prompt: messages,
      tools: toolSchemas,
    });

    const message = response.object;
    messages.push(message);

    if (message.content) {
      console.log(`${currentAgent.name}:`, message.content);
    }

    if (!message.tool_calls) {
      break;
    }

    for (const toolCall of message.tool_calls) {
      const result = await executeToolCall(toolCall, tools, currentAgent.name);

      if (result instanceof Agent) {
        currentAgent = result;
        messages.push({
          role: "system",
          content: `Transferred to ${currentAgent.name}. Adopt persona immediately.`,
        });
      } else {
        messages.push({
          role: "tool",
          tool_call_id: toolCall.id,
          content: result,
        });
      }
    }
  }

  return { agent: currentAgent, messages: messages.slice(numInitMessages) };
}

async function executeToolCall(toolCall: any, tools: Record<string, Function>, agentName: string): Promise<any> {
  const name = toolCall.function.name;
  const args = JSON.parse(toolCall.function.arguments);

  console.log(`${agentName}:`, `${name}(${JSON.stringify(args)})`);

  return await tools[name](args);
}

function functionToSchema(func: Function): any {
  return {
    type: "function",
    function: {
      name: func.name,
      description: func.description || "",
      parameters: {
        type: "object",
        properties: {},
      },
    },
  };
}

const triageAgent = new Agent(
  "Triage Agent",
  "You are a customer service bot for ACME Inc. Introduce yourself. Always be very brief. Gather information to direct the customer to the right department. But make your questions subtle and natural.",
  [transferToSalesAgent, transferToIssuesAndRepairs, escalateToHuman]
);

const salesAgent = new Agent(
  "Sales Agent",
  "You are a sales agent for ACME Inc. Always answer in a sentence or less. Follow the following routine with the user: 1. Ask them about any problems in their life related to catching roadrunners. 2. Casually mention one of ACME's crazy made-up products can help. - Don't mention price. 3. Once the user is bought in, drop a ridiculous price. 4. Only after everything, and if the user says yes, tell them a crazy caveat and execute their order.",
  [executeOrder, transferBackToTriage]
);

const issuesAndRepairsAgent = new Agent(
  "Issues and Repairs Agent",
  "You are a customer support agent for ACME Inc. Always answer in a sentence or less. Follow the following routine with the user: 1. First, ask probing questions and understand the user's problem deeper. - unless the user has already provided a reason. 2. Propose a fix (make one up). 3. ONLY if not satisfied, offer a refund. 4. If accepted, search for the ID and then execute refund.",
  [executeRefund, lookUpItem, transferBackToTriage]
);

function transferToSalesAgent() {
  return salesAgent;
}

function transferToIssuesAndRepairs() {
  return issuesAndRepairsAgent;
}

function escalateToHuman(summary: string) {
  console.log("Escalating to human agent...");
  console.log("\n=== Escalation Report ===");
  console.log(`Summary: ${summary}`);
  console.log("=========================\n");
  return "Escalated to human agent";
}

function transferBackToTriage() {
  return triageAgent;
}

async function executeOrder({ product, price }: { product: string; price: number }): Promise<string> {
  console.log("\n\n=== Order Summary ===");
  console.log(`Product: ${product}`);
  console.log(`Price: $${price}`);
  console.log("=================\n");
  console.log("Order execution successful!");
  return "Success";
}

async function executeRefund({ itemId, reason = "not provided" }: { itemId: string; reason?: string }): Promise<string> {
  console.log("\n\n=== Refund Summary ===");
  console.log(`Item ID: ${itemId}`);
  console.log(`Reason: ${reason}`);
  console.log("=================\n");
  console.log("Refund execution successful!");
  return "success";
}

async function lookUpItem({ searchQuery }: { searchQuery: string }): Promise<string> {
  const itemId = "item_132612938";
  console.log("Found item:", itemId);
  return itemId;
}

export async function runSwarm(initialPrompt: string, serviceLocator: ServiceLocator): Promise<AiTranslationResult> {
  let agent = triageAgent;
  let messages = [{ role: "user", content: initialPrompt }];

  while (true) {
    const response = await runFullTurn(agent, messages, serviceLocator);
    agent = response.agent;
    messages = messages.concat(response.messages);

    if (agent.name === "Triage Agent" && messages[messages.length - 1].role === "assistant") {
      break;
    }
  }

  const result = messages
    .filter(m => m.role === "assistant" && m.content)
    .map(m => ({ content: m.content }));

  return result as AiTranslationResult;
}
// End of Selection

The text was updated successfully, but these errors were encountered:

amiranvarov · 2024-10-12T14:03:02Z

If no additional API changes are needed into SDK, it would be nice to have some blog post or page in DOCS, showing how to make it the right and elegant way.

Have a good day. Cheers

nikshepsvn · 2024-10-13T03:27:13Z

+1, id love for a natively supported feature for this

RobertHH-IS · 2024-10-17T17:10:29Z

Second this... it is not clear what the best solution is for constructing langgraph like state structures where we stream results and state from multiple llm calls... An example would be awesome.

sheldonj · 2024-10-18T20:52:00Z

GRAPHS like Langgraph would be tremendously useful. I keep going back and forth mentally on keeping with VercelAI tools which have awesome streaming and client/server orchestration, in favour of losing that benefit for the much more flexible graph/node/edges options LangGraph provides.

RobertHH-IS · 2024-10-19T13:37:24Z

Its not that setting up the functions, connections, and state is difficult without langgraph. Its getting the right things to stream to the front end. Returning stream from A, then the tool of B, then back to A, then to C, then streaming final response D - all as a single response. A centralized StreamController would be awesome here.

jeremyphilemon added the enhancement New feature or request label Oct 14, 2024

lgrammel added ai/core ai/ui labels Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Orchestrating Agents: Routines and Handoffs like it was done in swarm #3233

Support for Orchestrating Agents: Routines and Handoffs like it was done in swarm #3233

amiranvarov commented Oct 12, 2024

amiranvarov commented Oct 12, 2024

nikshepsvn commented Oct 13, 2024

RobertHH-IS commented Oct 17, 2024

sheldonj commented Oct 18, 2024

RobertHH-IS commented Oct 19, 2024

Support for Orchestrating Agents: Routines and Handoffs like it was done in swarm #3233

Support for Orchestrating Agents: Routines and Handoffs like it was done in swarm #3233

Comments

amiranvarov commented Oct 12, 2024

Feature Description

Use Case

Additional context

amiranvarov commented Oct 12, 2024

nikshepsvn commented Oct 13, 2024

RobertHH-IS commented Oct 17, 2024

sheldonj commented Oct 18, 2024

RobertHH-IS commented Oct 19, 2024