Local LLM for Function Calling with Open Llama, NexusRaven and JS/TS

Calvin,4 min read

In a previous post, I discussed using OpenAI to call functions, a useful feature for integrating AI into existing projects without local setup. This post explores running Function Calling locally with NexusRaven, a model trained specifically for this task. Using a local LLM is preferable when you want to keep your data on-premise rather than sending it to the cloud. However, running NexusRaven locally requires a powerful computer with good GPU for reasonable performance.

NexusRaven (opens in a new tab) is a model trained specifically to perform function calling. It is trained on a large dataset of Python code and can be used to generate code that calls functions in serial, nested or parallel manner.


First, install Open Llama from the official download page (opens in a new tab). My computer is a MacBook Pro with M1 chip, so I downloaded the Apple Silicon version. After downloading, follow the installation instructions to install Open Llama.

Next, install NexusRaven (opens in a new tab) using the ollama command line tool. Open a terminal and run the following command:

ollama pull nexusRaven;

Then run the following command to start the server and get the API port:

ollama serve;

The default port should be

Running the Model

Now that the Ollama server with NexusRaven model is running on the local machine, we can invoke the model using the following JS code:

/** Prepare LLM model with tools */
const model = new Ollama({
  baseUrl: "http://localhost:11434",
  model: "nexusRaven",
  temperature: 0.001,

According to the official documentation's suggestion, the temperature should be set very low (0.001) to achieve optimal results. The temperature parameter is employed to regulate the randomness of the output.

Next, prepare the prompt in the correct format. To obtain the best outcome, please refer to the official documentation (https://github.com/nexusflowai/NexusRaven/blob/main/docs/prompting_readme.md (opens in a new tab)) for the proper prompt structure. In essence, you need to provide function descriptions using Python-like syntax.

async function generateResponse(model: LLM, question: string) {
  const prompt = PromptTemplate.fromTemplate(`<human>:
    <func_start>def hello_world(n : int)<func_end>
    Prints hello world to the user.
    n (int) : Number of times to print hello world.
    <func_start>def hello_universe(n : int)<func_end>
    Prints hello universe to the user.
    n (int) : Number of times to print hello universe.
    User Query: Question: {question}
    Please pick a function from the above options that best answers the user query and fill in the appropriate arguments.<human_end>
  const response = await model.invoke(await prompt.format({ question }));
  return response;

Putting it all together, we can now call the generateResponse function to get the result. Here is an example of calling the function with the question "Please print hello universe 31 times.".

Hello, this is demo of local LLM using Ollama and NexusRaven.
✔ Please print hello universe 31 times.
[11.08s] LLM
result Call: hello_universe(n=31)
Thought: The function call `hello_universe(n=31)` answers the question "Please print hello universe 31 times." because it passes the value 31 as an argument to the `hello_universe` function, which is defined in the options provided.

The `hello_universe` function takes a single argument `n`, which represents the number of times to print the message "Hello universe". In this case, we are passing the value 31 as the argument `n`, so the function will print the message "Hello universe" 31 times.

Therefore, the function call `hello_universe(n=31)` answers the question by calling the `hello_universe` function with the argument `n=31`, which will result in the message being printed 31 times.

The result is a call to the function hello_universe(n=31).