State of “Function Calling” in LLM

Minyang Chen
5 min readMar 8, 2024

Last year, OpenAI introduced a new function-calling capability in the Chat Completions API, which you can find here. With this feature, you can describe functions in an API call, and the model will intelligently choose to output a JSON object containing the arguments to call. This is a reliable way to connect ChatGPT’s capabilities with external tools and APIs, which is good news for everyone.
One of the prerequisites for function calling is that the LLM model must be fine-tuned to detect when a function needs to be called, depending on the user prompt, and to respond with JSON that adheres to the function signature. In short, two API calls are necessary to obtain the final result. Please refer to the diagram below for illustration purposes.

Figure.1 Function Calling Flow

Function calling is important for following reasons

1.Data Privacy: application own all outputs generated from their requests and their API data will not be used for training.

2.Connectivity: Unlimited connectivity with existing services,external tools and APIs.

One drawback is that this method requires making two API calls, which could be expensive if you are using a ChatGPT or LLM provider that charges by the number of API calls.

Additionally, when making function calls, you also need to pass in the intent functions that the LLM allows you to use. In open-source models implementation, several techniques have been applied to accomplish function calling.

1. Function Calling with OpenAI Python Client

Here’s an good example from LLama-cpp-python on their implementation support of function calling. First, run this function calling fine-tuned model (abetlen/functionary-7b-v1-GGUF).

import openai
import json


client = openai.OpenAI(
api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", # can be anything
base_url = "http://100.64.159.73:8000/v1" # NOTE: Replace with IP address and port of your llama-cpp-python server
)

# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
"""Get the current weather in a given location"""
if "tokyo" in location.lower():
return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
elif "san francisco" in location.lower():
return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
elif "paris" in location.lower():
return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
else:
return json.dumps({"location": location, "temperature": "unknown"})

def run_conversation():
# Step 1: send the conversation and available functions to the model
messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
response = client.chat.completions.create(
model="gpt-3.5-turbo-1106",
messages=messages,
tools=tools,
tool_choice="auto", # auto is default, but we'll be explicit
)
response_message = response.choices[0].message
tool_calls = response_message.tool_calls
# Step 2: check if the model wanted to call a function
if tool_calls:
# Step 3: call the function
# Note: the JSON response may not always be valid; be sure to handle errors
available_functions = {
"get_current_weather": get_current_weather,
} # only one function in this example, but you can have multiple
messages.append(response_message) # extend conversation with assistant's reply
# Step 4: send the info for each function call and function response to the model
for tool_call in tool_calls:
function_name = tool_call.function.name
function_to_call = available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
function_response = function_to_call(
location=function_args.get("location"),
unit=function_args.get("unit"),
)
messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": function_response,
}
) # extend conversation with function response
second_response = client.chat.completions.create(
model="functionary-7b-v1-GGUF,
messages=messages,
) # get a new response from the model where it can see the function response
return second_response
print(run_conversation())

Note: this implement two calls and using tools — nice work!

2. Function Calling v2

Similar to above example, except the internal model fine-tuned instruction is different. the client code also openai-compatible. First run the model using vLLM. For more details see here on MeetKai/functionary gitrepo:

python3 server_vllm.py --model "meetkai/functionary-small-v2.2" --host 0.0.0.0
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="functionary")

client.chat.completions.create(
model="meetkai/functionary-small-v2.2",
messages=[{"role": "user",
"content": "What is the weather for Istanbul?"}
],
tools=[{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}],
tool_choice="auto"
)

One requirement for running this model is high vram requirement— author recommend (Any GPU with 24GB VRAM).

Instead of running in vLLM server, the model also support running model locally with Llamacpp-python. see example below :

from llama_cpp import Llama
from llama_cpp.llama_tokenizer import LlamaHFTokenizer

# We should use HF AutoTokenizer instead of llama.cpp's tokenizer because we found that Llama.cpp's tokenizer doesn't give the same result as that from Huggingface. The reason might be in the training, we added new tokens to the tokenizer and Llama.cpp doesn't handle this successfully
llm = Llama.from_pretrained(
repo_id="meetkai/functionary-small-v2.2-GGUF",
filename="functionary-small-v2.2.q4_0.gguf",
chat_format="functionary-v2",
tokenizer=LlamaHFTokenizer.from_pretrained("meetkai/functionary-small-v2.2-GGUF"),
n_gpu_layers=-1
)

messages = [
{"role": "user", "content": "what's the weather like in Hanoi?"}
]
tools = [ # For functionary-7b-v2 we use "tools"; for functionary-7b-v1.4 we use "functions" = [{"name": "get_current_weather", "description":..., "parameters": ....}]
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g., San Francisco, CA"
}
},
"required": ["location"]
}
}
}
]

result = llm.create_chat_completion(
messages = messages,
tools=tools,
tool_choice="auto",
)

print(result["choices"][0]["message"])

Please note, there is a catch here: You need to swith to use HF Tokenizer to make above code work due some incompatability with LLamaCpp tokenizer implementation.

3. Function Calling Generation Model

This model can run in oLLama for event registration and works well for generation of a JSON result object. What I like about is the interactive follow up questions. here’s how it can be run it locally.

ollama run calebfahlgren/natural-functions

Insert following to system promt

System Prompt
You are a helpful assistant with access to the following functions. Use them if required -

{
"name": "create_event",
"description": "Create a new event in the calendar",
"parameters": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The title of the event"
},
"date": {
"type": "string",
"format": "date",
"description": "The date of the event"
},
"time": {
"type": "string",
"format": "time",
"description": "The time of the event"
},
"location": {
"type": "string",
"description": "The location of the event (optional)"
}
},
"required": [
"title",
"date",
"time"
]
}
}

next, you can ask question in the chat window

 User> can you create a party for March Break at 5:00 PM?
Agent> Sure, I can help with that. What time do you want the event to start? and would you like to provide a location?

User> 5PM at grandpa's house
Agent> {"name":"create_event":"arguments":{"title":"Mark break Party","date":"2023-03-12","time":"5","location":"grandpa house"}

The limitation is that this model only generate a json output. it doesn’t make function call. so you have to add second step to make function calling.

In Summary

Function calling is a great tool to be add into the LLM integration stack.

I did actually create a implementation of this on this project here: https://github.com/PavAI-Research/pavai-workspace.git for LlamaCPP.

here’s the full code implemenation: https://github.com/PavAI-Research/pavai-workspace/blob/main/genai_at_work/chatllamacpp.py

Hope you like it and found helpful.

Have a nice day!

--

--

Minyang Chen

Enthusiastic in AI, Cloud, Big Data and Software Engineering. Sharing insights from my own experiences.