Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Orchestrating Agents: Routines and Handoffs like it was done in swarm #3233

Open
amiranvarov opened this issue Oct 12, 2024 · 5 comments
Labels
ai/core ai/ui enhancement New feature or request

Comments

@amiranvarov
Copy link

Feature Description

Hey there!

OpenAI recently released their framework on Orchestrating Agents - Swarm. Full link to the post: https://cookbook.openai.com/examples/orchestrating_agents#handoff-functions

I tried to use Claude to adapt to my stack, with AI SDK, hoping it will work using AI SDK's tools property, but it didn't work that well. Seems like there is still need to make some hacks and workarouds. It would be so nice if AI SDK would somehow provide an API to implement such swarm more elegantly.

Use Case

Well it would be very nice to organize tools by agents. So it would be like a seperation of concerns, where each agent would have it's own system prompt, and tools, and have full context of conversation of all agents that were used.

Additional context

Here is Claude generated code for to somehow use with AI SDK. I didn't run this code, and i expect it to not work. But it gives some idea of ugly hacky solution.

mport { generateObject } from "ai";
import { z } from "zod";
import { createOpenAI } from "@ai-sdk/openai";
import { ServiceLocator } from "../../../ServiceLocator";
import { AiTranslationResult } from "./types";

class Agent {
  constructor(
    public name: string,
    public instructions: string,
    public tools: Function[],
    public model: string = "gpt-4o-mini"
  ) {}
}

async function runFullTurn(agent: Agent, messages: any[], serviceLocator: ServiceLocator): Promise<{ agent: Agent; messages: any[] }> {
  const env = serviceLocator.getEnv();
  const openai = createOpenAI({
    apiKey: env.OPENAI_API_KEY,
    compatibility: "strict",
  });

  let currentAgent = agent;
  const numInitMessages = messages.length;
  messages = [...messages];

  while (true) {
    const toolSchemas = currentAgent.tools.map(functionToSchema);
    const tools = Object.fromEntries(currentAgent.tools.map(tool => [tool.name, tool]));

    const response = await generateObject({
      model: openai(currentAgent.model),
      output: "array",
      schema: z.object({
        content: z.string().optional(),
        tool_calls: z.array(z.object({
          function: z.object({
            name: z.string(),
            arguments: z.string(),
          }),
          id: z.string(),
        })).optional(),
      }),
      system: currentAgent.instructions,
      prompt: messages,
      tools: toolSchemas,
    });

    const message = response.object;
    messages.push(message);

    if (message.content) {
      console.log(`${currentAgent.name}:`, message.content);
    }

    if (!message.tool_calls) {
      break;
    }

    for (const toolCall of message.tool_calls) {
      const result = await executeToolCall(toolCall, tools, currentAgent.name);

      if (result instanceof Agent) {
        currentAgent = result;
        messages.push({
          role: "system",
          content: `Transferred to ${currentAgent.name}. Adopt persona immediately.`,
        });
      } else {
        messages.push({
          role: "tool",
          tool_call_id: toolCall.id,
          content: result,
        });
      }
    }
  }

  return { agent: currentAgent, messages: messages.slice(numInitMessages) };
}

async function executeToolCall(toolCall: any, tools: Record<string, Function>, agentName: string): Promise<any> {
  const name = toolCall.function.name;
  const args = JSON.parse(toolCall.function.arguments);

  console.log(`${agentName}:`, `${name}(${JSON.stringify(args)})`);

  return await tools[name](args);
}

function functionToSchema(func: Function): any {
  return {
    type: "function",
    function: {
      name: func.name,
      description: func.description || "",
      parameters: {
        type: "object",
        properties: {},
      },
    },
  };
}

const triageAgent = new Agent(
  "Triage Agent",
  "You are a customer service bot for ACME Inc. Introduce yourself. Always be very brief. Gather information to direct the customer to the right department. But make your questions subtle and natural.",
  [transferToSalesAgent, transferToIssuesAndRepairs, escalateToHuman]
);

const salesAgent = new Agent(
  "Sales Agent",
  "You are a sales agent for ACME Inc. Always answer in a sentence or less. Follow the following routine with the user: 1. Ask them about any problems in their life related to catching roadrunners. 2. Casually mention one of ACME's crazy made-up products can help. - Don't mention price. 3. Once the user is bought in, drop a ridiculous price. 4. Only after everything, and if the user says yes, tell them a crazy caveat and execute their order.",
  [executeOrder, transferBackToTriage]
);

const issuesAndRepairsAgent = new Agent(
  "Issues and Repairs Agent",
  "You are a customer support agent for ACME Inc. Always answer in a sentence or less. Follow the following routine with the user: 1. First, ask probing questions and understand the user's problem deeper. - unless the user has already provided a reason. 2. Propose a fix (make one up). 3. ONLY if not satisfied, offer a refund. 4. If accepted, search for the ID and then execute refund.",
  [executeRefund, lookUpItem, transferBackToTriage]
);

function transferToSalesAgent() {
  return salesAgent;
}

function transferToIssuesAndRepairs() {
  return issuesAndRepairsAgent;
}

function escalateToHuman(summary: string) {
  console.log("Escalating to human agent...");
  console.log("\n=== Escalation Report ===");
  console.log(`Summary: ${summary}`);
  console.log("=========================\n");
  return "Escalated to human agent";
}

function transferBackToTriage() {
  return triageAgent;
}

async function executeOrder({ product, price }: { product: string; price: number }): Promise<string> {
  console.log("\n\n=== Order Summary ===");
  console.log(`Product: ${product}`);
  console.log(`Price: $${price}`);
  console.log("=================\n");
  console.log("Order execution successful!");
  return "Success";
}

async function executeRefund({ itemId, reason = "not provided" }: { itemId: string; reason?: string }): Promise<string> {
  console.log("\n\n=== Refund Summary ===");
  console.log(`Item ID: ${itemId}`);
  console.log(`Reason: ${reason}`);
  console.log("=================\n");
  console.log("Refund execution successful!");
  return "success";
}

async function lookUpItem({ searchQuery }: { searchQuery: string }): Promise<string> {
  const itemId = "item_132612938";
  console.log("Found item:", itemId);
  return itemId;
}

export async function runSwarm(initialPrompt: string, serviceLocator: ServiceLocator): Promise<AiTranslationResult> {
  let agent = triageAgent;
  let messages = [{ role: "user", content: initialPrompt }];

  while (true) {
    const response = await runFullTurn(agent, messages, serviceLocator);
    agent = response.agent;
    messages = messages.concat(response.messages);

    if (agent.name === "Triage Agent" && messages[messages.length - 1].role === "assistant") {
      break;
    }
  }

  const result = messages
    .filter(m => m.role === "assistant" && m.content)
    .map(m => ({ content: m.content }));

  return result as AiTranslationResult;
}
// End of Selection
@amiranvarov
Copy link
Author

If no additional API changes are needed into SDK, it would be nice to have some blog post or page in DOCS, showing how to make it the right and elegant way.

Have a good day. Cheers

@nikshepsvn
Copy link

+1, id love for a natively supported feature for this

@jeremyphilemon jeremyphilemon added the enhancement New feature or request label Oct 14, 2024
@RobertHH-IS
Copy link

Second this... it is not clear what the best solution is for constructing langgraph like state structures where we stream results and state from multiple llm calls... An example would be awesome.

@sheldonj
Copy link

GRAPHS like Langgraph would be tremendously useful. I keep going back and forth mentally on keeping with VercelAI tools which have awesome streaming and client/server orchestration, in favour of losing that benefit for the much more flexible graph/node/edges options LangGraph provides.

@RobertHH-IS
Copy link

Its not that setting up the functions, connections, and state is difficult without langgraph. Its getting the right things to stream to the front end. Returning stream from A, then the tool of B, then back to A, then to C, then streaming final response D - all as a single response. A centralized StreamController would be awesome here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai/core ai/ui enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants