heading
Development

ChatGPT API function trees

By Matt Tew ·

I've built for myself over the years a personal productivity app that connects to the third party apps and communication tools that I use with the intention to bring sanity to my workflow. Recently, I've been working on a personal AI assistant that is able to access all this varied data. The challenge is to find a balance between providing the functions it needs while keeping token usage down.

Function calling: the basics

When using the ChatGPT API it is possible to provide functions to the chat that ChatGPT is able to call in order to access data or perform activities within the application calling the API

ChatGPT doesn't actually execute the functions. Instead, given enough detail about the functions available to it, it has been fine tuned to identify when it is suitable to use one of the provided functions.

When it does so, instead of returning a text response, it will return an indication that it would like to the call the required function. The calling application then executes the function and provides the data back to the context, where ChatGPT will continue to prepare its response.

For example, we can provide a function definition that will allow ChatGPT to find out the current time. To do this, we add the following function definition to the functions property of our chat completion request;


{
    "name": "get_current_time",
    "description": "Returns the current time in UTC",
    "parameters": {
        "type": "object",
        "properties": {}
    },
}

We can also define parameters that ChatGPT is able to construct and provide as part of the function call, but none are necessary for this example.

Now when we ask "What is the time", ChatGPT will respond with the following via the function_call property of the response message:

{
  "name": "get_current_time",
  "arguments": {}
}

We check for the presence of this function call in the response, and if we find it, we generate the time on the server. We then add a message to the context containing the value and send it back to the API.

From there the agent responds with the answer to our original question.

Functions and token usage

For my personal assistant, I have a bunch of functions I want it to be able to call - from checking my current tasks, to being able to send email on my behalf.

However, the function definitions we provide are included in the token usage for the request. The more function definitions we add, the more tokens are used. This affects both a) the cost of each request, and b) the amount of data we are able to store in the context and send with each request.

A common way to tackle this issue is to allow ChatGPT to spawn sub agents to work out what needs to be done (à la AutoGPT). This, however, will generally use more tokens as an outcome, and would make the response too slow for my needs.

Function trees

The solution I have found for this is to create a tree of functions that the GhatGPT agent is able to discover and navigate through as needed.

For example, instead of initially providing a whole list of functions that the agent is able to call, we first group the functions together, and provide functions for each group to allow the agent to discover further functions.

The following code (in C#, using Beltago.OpenAI) is an example of providing these;


var defaultFunctons = new List<FunctionDefinition>();

defaultFunctions.Add(
  new FunctionDefinitionBuilder(
    "get_task_functions", 
    "Provides functions you can call that are related to tasks, including the ability to get the current task, list tasks, reply to communication based tasks, resolve (complete) tasks and submerge (postpone) tasks.")
  .Validate()
  .Build()
);

defaultFunctions.Add(
  new FunctionDefinitionBuilder(
    "get_media_functions", 
    "Provides functions you can call that are related to Matt's media, such as movies, tv shows, music, books and video games.")
  .Validate()
  .Build()
);

var request = new ChatCompletionCreateRequest
{
  Messages = messages,
  Functions = defaultFunctions,
  Model = Model,
};

...

Now, the agent has the ability to discover functions related to both tasks and media that it doesn't initially have access to. For example if I were to ask the bot "Please recommend a new TV show to watch from my watchlist", the agent is smart enought to know it needs to call get_media_functions to discover what it can do related to media.

In this case, we return a new message, but this time populate the functions definition list with our media related functions.


if (response.FunctionCall != null)
{
  Logger.LogInformation($"Invoking {response.FunctionCall.Name} with params: {response.FunctionCall.Arguments}");

  // Need to add the message with the function call into the context
  response.Content ??= string.Empty;
  response.Role = StaticValues.ChatMessageRoles.Assistant;
  request.Messages.Add(response);

  // Get the function call from the message (or delta in case of chunks)
  var functionCall = response.FunctionCall;

  // If our request included the default functions, then look for a call for a function group
  if (request.Functions == defaultFunctions)
  {

    // Sometimes using this method, the agent will hallucinate a function name with a prefix of our function discovery method (eg get_media_functions.get_media) - which is odd, since a "." is an invalid character for a function call.
    if (functionCall.Name.Contains("."))
    {
      messages.Add(ChatMessage.FromSystem($"Called function name {functionCall.Name} is invalid. Do not guess at the function name. Call the relevant function group function first to get other functions you can call."));
      functionCall.Name = functionCall.Name.Substring(functionCall.Name.IndexOf(".") + 1);
      continue;
    }

    switch (functionCall.Name)
    {
      case "get_media_functions":
        // We need to return some response to the function call in addition to providing the new functions. Here we simply list the new function names that are in the definition in yaml format (yaml uses less tokens than json)

        var yaml = "- get_queued_media\n- remove_queued_media";

        messages.Add(ChatMessage.FromFunction(yaml, functionCall.Name));

        // Set the functions to our predefined media functions, and loop back to send the request again
        request.Functions = mediaFunctions;
        continue;
      ...
    }
  } else if(request.functions == mediaFunctions) {
    // Call the requested media function and return the response
    ...
  }

This is to illustrate the general concept. In reality, you'd probably want to build helper functions to avoid long lists of else if statements.

In my testing, this works very well and the agent is consistently and accurately able to discover the relevant function group and then go on to call the correct function in order to generate a response.

The further advantage of this method is that the function definitions do not need to remain part of the context, meaning that as we switch between function groups only the currently provided function definitions count towards token usage.

Free consultation

Get a free consultation

Finally build that website, app, or platform that you need to transform your business. Get in touch with us to make it happen.

Book appointment

Schedule a meeting with us to get things started and we'll talk about your business needs and ideas.

MO TU WE TH FR

Thank you for your booking.

Please check your email for your booking confirmation.

Contact us

Enquire or just say hi. We'd love to hear from you.

Thank you for your enquiry.

We'll get back to you real soon.

We'd love to connect with you. Follow us on

Connect with us