There’s something even more stressful than answering customer questions: answering the same question over and over again. But what if you had an assistant who could handle all of that while you take a break? Can you imagine it?
Building an Assistant
Boring life without assistant








Imagine we’re working as assistants in a shipping company. Our job is to answer every question that clients or potential clients might have, such as:
- Can I send a package from Mexico to Canada?
- How much does it cost to ship from Italy to the United States?
- What services do you provide?
- What’s the status of my package?
- If I send a package tomorrow, when will it arrive?
…and many more.
Wouldn’t it be great if we could build an assistant that could handle all of these questions for us? An assistant that we can “teach” with all the information it needs, and then let it take care of the responses—while we relax a little.
That’s exactly what we’re going to do here: build one of these assistants. While working on this post, I couldn’t help but think of Frankenstein’s monster—we’ll bring our assistant to life step by step. Hopefully, unlike Frankenstein’s creation, ours won’t try to kill us in the end!So, what will we use to build this assistant? We’ll be working with an LLM (Large Language Model), LLM-related tools, and some Python code. But before we dive in, let’s start at the beginning: what exactly is an LLM?
🧠Anatomy of the Monster’s Mind: Large Language Models (LLMs)
An LLM (Large Language Model) is a type of artificial intelligence trained to understand and generate human language.
In simple terms, it’s a model that can chat like a person — but how exactly does it do that?
Let’s take a model like ChatGPT-4 as an example.

You’ve probably noticed that it can hold surprisingly natural conversations. Sometimes it even feels like you’re talking to a real human (and honestly, I liked that second joke too!).
But then you might wonder — how is this even possible? It’s just software, right? How can it “think” and keep a coherent conversation?I’ll try to explain it in a simple, big-picture way. We won’t go too deep here (I plan to write more detailed posts later), but this will give you a solid idea of how an LLM actually works.
⚙️ How It Works (in Simple Terms)
An LLM uses something called a Language Model to predict the next word (or token) that makes the most sense in a given context.
Each time it predicts a word, it adds it to the ongoing conversation — and then predicts the next one, and the next one, and so on — until it reaches the end of a logical sentence.
For example, if you say:
“Hi, how are you doing today?”
ChatGPT’s language model might predict that the best continuation is something like:
“Hey! I’m doing great… ”It doesn’t just generate the full answer in one go — it builds it token by token, predicting the next most likely piece of text over and over again until it reaches a special “end of sentence” token that tells it to stop.
🧩 What’s a “Token”?
A token is a small chunk of text. It can be:
- A whole word (cat)
- Part of a long word (clever → cle + ver)
- Or even a special symbol that marks structure in the text.
Using tokens instead of full words helps the model work with a manageable vocabulary and handle any word in a language — even new or rare ones.
It also allows the model to include special markers, like:
<Human> Hi, how are you doing?
<AI Assistant> Hey! I’m doing great… so far? <end of sentence>
So, to summarize:
- You write something.
- The model reads it as tokens.
- It predicts the next most likely token.
- Then it repeats this process again and again, building up a full, coherent response — word by word, token by token — until it “decides” the thought is complete.
That’s the basic idea behind how the “brain” of our digital Frankenstein — the LLM — actually works.
Inside the Monster’s Brain: Understanding Neural Networks
A Language Model is built using a Neural Network, and a neural network is a special kind of machine learning model — one that’s inspired by how the human brain works.
But wait, what’s Machine Learning exactly?
Machine Learning (ML) is a field of Artificial Intelligence (AI) where we teach computers to learn from data, instead of manually programming every single rule.
🍕 A Simple Example
Let’s make it more concrete with an example.
Suppose we want to write a function that predicts whether a person prefers Pizza or Pasta, based on two simple questions:
- Do you like cheese?
- Do you like bread?
Here’s our first (hand-coded) version:
def predict_pizza_preference(like_cheese: bool, like_bread: bool) -> float:
if like_cheese and like_bread:
return 0.85 # 85% -> very likely prefers Pizza
elif like_cheese and not like_bread:
return 0.65 # 65% -> cheese lover likely leans Pizza
elif like_bread and not like_cheese:
return 0.4 # 40% -> bread lover but no cheese, leans Pasta
else:
return 0.3 # 30% -> doesn't like either, mild Pasta preferenceThis looks fine… until reality surprises us.
Some time ago, I met someone who hates cheese (any kind of it!) but still prefers Pizza over Pasta.
Our little function would always get this case wrong — and no matter how many people like this we meet, we’d have to keep updating our rules manually.
What if, instead, we collect data from many people — their cheese/bread preferences and whether they like Pizza or Pasta — and then let the computer learn by itself how to make the best prediction?That’s exactly what Machine Learning does.
🧠 Enter Neural Networks
A Neural Network works in the same spirit.
It learns patterns automatically from data — but it’s inspired by how our brain processes information.
A neural network is made of neurons connected to each other and organized into layers:
- Input Layer — receives the input values (like “likes cheese?” and “likes bread?”).
- Hidden Layers — process these inputs and find patterns.
- Output Layer — produces the final prediction.
The neuron of each layers are connected with all the neuron in the next layer, we can see a picture about how it looks like following:

In our example:
- The input layer has 2 neurons (for cheese and bread).
- The output layer has 1 neuron (a single number showing the preference for Pizza over Pasta).
- Let’s add a hidden layer with 2 neurons to process the data in between.
So a picture of our Neuronal Network looks like this:

🔢 Converting Inputs to Numbers
Since our inputs are booleans, we’ll represent them with numbers:
- X1 = 1 if the person likes cheese, 0 otherwise.
- X2 = 1 if the person likes bread, 0 otherwise.
Also let label our hidden layers neuron with:
- Neuron A → “Pizza Lover”
- Neuron B → “Pasta Lover”
So when the value of “Pizza Lover” is high it means that this is Pizza Lover, and when the “Pasta Lover” neuron value is high it means that this is a Pizza Lover, now our hidden layer neurons have a meaning.
If you check the previous picture again, you are going to see connections between each neuron, each connection between neurons has a weight, which says how important that connection is.
For example, maybe cheese matters more for Pizza lovers, and bread matters more for Pasta lovers.
Now we assign weights to represent how each input influences each neuron:
| Connection | Description | Weight | Note |
| W1,A | Cheese → Pizza Lover | 0.9 | Cheese is very important for Pizza Lover |
| W2,A | Bread → Pizza Lover | 0.3 | Brean is less important for Pizza Lover |
| W1,B | Cheese → Pasta Lover | 0.5 | Chase is less important for Pizza Lover |
| W2,B | Bread → Pasta Lover | 1.0 | Bread is the base of Pasta |
Let update our picture, now we Neuronal Network looks like:

So if you watch carefully you can see that out Output layers neuron connection also has weight, Let’s assign new weights for these final connections:
| Connection | Description | Weight |
| W1,C | Pizza Lover → Output | 0.8 |
| W2,C | Pasta Lover → Output | -0.6 |
This value makes sense because we want to know how much this person prefers Pizza over Pasta, so if it is a Pizza lover is more important in this case, right?
⚙️ Calculating the Hidden Layer
Let’s test this with a person who likes cheese but doesn’t like bread (X1=1, X2=0).
We calculate each hidden neuron:
Pizza Lover = X1 * W1,A + X2 * W2,A = 1 * 0.9 + 0 * 0.3 = 0.9
Pasta Lover = X1 * W1,B + X2 * W2,B = 1 * 0.5 + 0 * 1.0 = 0.5
To keep values between 0 and 1, neural networks use a sigmoid function (σ).
Let’s apply it (we’ll skip the math details for now):
Pizza Lover = σ(0.9) = 0.71 → Y1
Pasta Lover = σ(0.5) = 0.62 → Y2
🧾 The Output Layer
Finally, the output neuron combines these values.
And we calculate the output:
Output = σ(Y1 * W1,C + Y2 * W2,C)
= σ(0.71 * 0.8 + 0.62 * -0.6)
= σ(0.196)
= 0.56
🍝 The Result
The final number 0.56 means this person is 56% more inclined toward Pizza than Pasta.
Bringing the Brain to Life: Training Frankenstein’s Mind
Wait a second — did I just trick you? It looks almost the same as before — we just replaced the if statements with weights!
Well, the real difference is that these weights can be learned automatically through a process called training.
When a neural network is in its training phase, it’s learning how to set its own weights.
In our previous examples, we manually chose weights that made sense for our pizza vs pasta scenario, but that’s not how real neural networks learn.
Remember when I mentioned interviewing a group of people — asking if they like cheese and bread, and whether they prefer pizza or pasta? The idea is to let the neural network learn patterns from this data instead of us hardcoding them.
Step 1: Set Initial Weights
To start, we give each connection an initial weight value, usually something neutral like 0.5.
Here’s an example setup:
| Connection | Description | Weight |
| W1,A | Cheese → Pizza Lover | 0.5 |
| W2,A | Bread → Pizza Lover | 0.5 |
| W1,B | Cheese → Pasta Lover | 0.5 |
| W2,B | Bread → Pasta Lover | 0.5 |
| W1,C | Pizza Lover -> Output Layer | |
| W2,C | Pasta Lover -> Output Layer | -0.5 |
At this point, both “cheese” and “bread” are considered equally important to deciding if someone prefers pizza.
But don’t worry — the network will adjust these values as it learns.
Step 2: Pass Data Through the Network
Let’s take one example person:
They like cheese but don’t like bread, and we know from our survey that they always prefer pizza.
So our inputs are X1 = 1 (likes cheese) and X2 = 0 (doesn’t like bread).
Now we calculate what happens as the data flows through the network:
Pizza Lover = σ(X1 * W1,A + X2 * W2,A) = σ(1*0.5 + 0*-0.5) = σ(0.5) = 0.62
Pasta Lover = σ(X1 * W1,B + X2 * W2,B) = σ(1*0.5 + 0*-0.5) = σ(0.5) = 0.62
Output Layer = σ(Y1 * W1,C + Y2 * W2,C) = σ(0.62*0.5 + 0.62*-0.5) = σ(0) = 0.5
Our network predicts 50% — meaning this person is only slightly more likely to prefer pizza.
But we know they actually always choose pizza, so the output should’ve been closer to 0.7 or higher.
Step 3: Adjust the Weights (Learning)
Since our prediction was wrong, we have an error.
The network now adjusts its weights slightly — increasing some (like W1,A and W1,C) and decreasing others.
We don’t change them drastically — just a little bit at a time.
That’s because the goal isn’t to make the network perfect for this one person, but to make it generalize well for everyone in our training set.
This process repeats again and again, for many people in the dataset.
Each time, the network updates its weights a tiny bit, slowly learning what really matters — cheese, bread, or maybe both.
(I am not going to go deeper in details about the training process I prefer to left it to another Post)After many iterations, the network ends up with a set of weights that truly represent the relationships in the data.
At that point, our neural network is ready to predict new cases — just like a well-trained brain (or perhaps… our own little digital Frankenstein’s mind 🧟♂️).
From Simple Brain to Sharp Mind: The Magic of Transformer Attention
Now that we know how a basic neural network looks, it’s time to see how things get a bit more interesting in a Language Model.
Let’s start with a simple prompt:
“Paris is the capital of”
As we’ve mentioned before, the job of a Language Model is to predict the next token.
In this case, the most likely token is obviously “France.”
But how does the model actually predict that word?
Let’s simplify things and imagine that each token is just a word. So our input looks like this:
Paris
is
the
capital
of
The first step is to turn each token into numbers — or more precisely, into embeddings.
Embeddings are vectors of many dimensions, so after embedding, our prompt might look something like this:
| Token | Embedding (simplified example) |
| Paris | [0.8, 0.2, 0.9] |
| is | [0.7, 0.3, 0.8] |
| the | [0.1, 0.7, -0.8] |
| capital | [0.2, 0.5, -0.2] |
| of | [-0.7, 0.1, 0.08] |
Because each word is now a vector, we can represent them as arrows pointing in different directions in a coordinate space.

Each direction encodes meaning — or in other words, the semantics of the word.
For example, the vector for “Paris” might point in a similar direction to other vectors related to cities or France.
So we can think of vector directions as representing relationships between concepts.

But here’s the problem:
If every word always keeps the same vector, we lose part of the meaning that depends on context — the meaning that comes from the whole sentence.
Let’s see that with an example using ChatGPT-4.
Prompt 1:
“Tell me something about Harry in Windsor Castle.”

Here, the model clearly knows we’re talking about Prince Harry.
Now let’s slightly change it:
Prompt 2:
“Tell me something about Harry in Hogwarts Castle.”

This time, it’s a completely different answer — now the model knows we mean Harry Potter.
So what happened here?
The word “Harry” is the same in both sentences, but the context words — “Windsor” vs “Hogwarts” — completely change its meaning.
This is where attention comes in.
After converting each token into embeddings, the model calculates attention — figuring out how much each word influences the others.
It’s like every word is sending information to every other word in the prompt.
Thanks to this, the vector for “Harry” changes depending on the surrounding context.
Before attention, “Harry” might just mean “a male name.”
After attention, the model knows which Harry we’re talking about — the Prince or the Wizard.In other words, attention allows words to understand each other. so after applying our attention we finished with 2 different Vectors one for Harry Potter and the other one for the Prince harry.

It’s the step where our model’s brain really comes to life — when our Frankenstein’s creature starts thinking about meaning, not just memorizing patterns.Of course, in real life, a Language Model works with hundreds or thousands of dimensions, far beyond what we can visualize — but the idea is the same.
The Mind of the Monster: Where Knowledge Takes Shape
Let’s continue with another example.
This time, our prompt is:
“Can you compare Windsor Castle vs. Hogwarts?”
And ChatGPT gives us an answer full of details about both castles.

So, what’s really happening here? The model isn’t just predicting random words — it’s recalling facts about these two places. That means, in some way, it actually has knowledge.
But where does this knowledge live?
After the attention mechanism, the next step in the model is the Multilayer Perceptron (MLP). This layer works much like the hidden layers we talked about earlier in the Pizza vs. Pasta example. It’s made up of neurons, connections, and — most importantly — weights.
These weights are where the model’s knowledge truly resides. They store the relationships and meanings that the model has learned. You can imagine each weight as a tiny adjustment that helps the model move in the right direction toward the correct fact.
It’s as if every vector (representing a word or concept) carries attributes — like “real” or “fictional” — that connect it with other vectors. Together, all these relationships form the web of knowledge inside the model’s brain.
When Frankenstein Remembers: The Spark of Memory in LLMs
Let’s try another example. The next one is actually the same chat, even though I’m showing it in two separate pictures.


“What is Hogwarts?”
After ChatGPT answered, I asked a new question:
“Tell me something about Harry.”
And right away, it knew exactly which Harry I was talking about!
It seems like the model remembered our previous conversation — but that’s actually an illusion.
What really happens is that the LLM processes the entire chat every time you send a new message.
So when I asked “Tell me something about Harry,” the model’s input wasn’t just that one sentence — it included all the previous messages, including my first question about Hogwarts.
That’s why it “remembers” what we’re talking about — not because it truly recalls the past, but because it constantly re-reads and re-processes the whole conversation to stay on track.
Connecting the Brain to the World
So, we have a Language Model (LLM) trained with a massive amount of data — it seems to know a little about everything, right?
But what happens when we need it to handle specific, constantly changing information?
Let’s go back to our demo idea for this post:
“Imagine we’re working as assistants in a shipping company.
Our job is to answer questions from clients, such as:
- Can I send a package from Mexico to Canada?
- How much does it cost to ship from Italy to the United States?
- What services do you provide?
- What’s the status of my package?
- If I send a package tomorrow, when will it arrive?”
To answer these questions, our assistant needs live company data — shipping prices, services, routes, package tracking, and so on.
And here’s the problem: this information changes all the time.
New routes open, new services appear, and people send new packages every day.So we can’t possibly include all this information in the model’s original training.
Once the model is trained, that data is frozen in time — it doesn’t automatically know about new events or updates.
🧰 Enter the Tools
That’s where tools come in.
Tools are like small functions or APIs that we connect to the LLM so it can access real-time information.
Let’s see an example.
Someone asks our assistant:
“I sent a package a week ago with tracking number 1234. Can you tell me when it will be delivered?”
To answer that, the LLM needs data stored in the company’s internal system — information it could never have seen during training.
So we give it a tool that can fetch this data, like this function:
def get_package_tracking_info(tracking_number):
"""
Retrieves package tracking information from the company system.
"""
return {
"status": "On the way",
"estimated_delivery": "11/02/2025"
}The LLM itself doesn’t know how this function works internally — it just knows that a tool called get_package_tracking_info exists, and that if it passes a tracking number, it gets back the current package details.
So, when a user asks about their package, the model might reason like this:
“I need to use the get_package_tracking_info tool with the tracking number 1234 to find out where this package is.”
Then we (or the system) actually run that function and get the result:
{
“status”: “On the way”,
“estimated_delivery”: “11/02/2025”
}
Finally, we send this result back to the LLM, and it transforms it into a natural answer for the user:
“Your package is on its way and is expected to arrive on November 2, 2025.”
🧩 Putting It All Together
That’s how the LLM uses tools: it doesn’t “know” everything by itself — instead, it knows how to ask for the right tool to get the answer.
So, to make our shipping assistant fully functional, we’ll need to provide it with a set of tools for:
- Checking routes
- Calculating shipping costs
- Listing services
- Tracking packages
And now… it’s time to start coding our assistant! 🚀
It’s Alive! — Building Our Own AI Assistant
FINALLY, it’s time to get our hands dirty. We’re going to build our assistant.
But what exactly are we building?
We’re creating a chat interface where a user can ask questions about our company’s services, package status, shipping options, and more. The assistant will be written in Python and powered by an LLM, which will generate the answers. Of course, the LLM needs access to our shipping company’s data (packages, routes, services, tracking information), so we’ll expose that data through tools.
But how does the assistant actually answer each question?
How does a message travel through the system—from the user input to the final answer?
Let’s walk through an example conversation:

Now let’s break down how each message travels through the architecture:

Step-by-step explanation
1. User sends a message
The user types something in the UI. For example:
“Hi, I need some help…”
Our assistant sends this message to the LLM. But it doesn’t send only the message—we also attach important metadata, including:
- The list of tools the assistant can use
- A description of each tool
- The expected parameters
- A general system instruction (system prompt)
Here’s an example of how we describe a tool:
tracking_function = {
"name": "get_package_tracking_info",
"description": """
Get detailed tracking information for a package using its tracking number.
This function searches for a package in the tracking system and returns comprehensive information
including current status, origin, destination, weight, selected services, last update time, and estimated delivery date.
Available services include: "Pick from home", "Drop in home", and "Express".
If this function returns result.success = false, it means the package does not exist.
""",
"parameters": {
"type": "object",
"properties": {
"tracking_number": {
"type": "string",
"description": "Tracking number of the package (e.g., 'TRK123456789')",
}
},
"required": ["tracking_number"],
"additionalProperties": False
}
}
As you can see, we provide:
- What the tool does
- What parameters it expects
- A description of each parameter
We include similar descriptions for all the tools we want the LLM to use. The LLM uses this information while predicting the next tokens.
2. System Prompt
We also include a system prompt that defines the assistant’s role. For example:
“You are a helpful assistant for a package shipping company called ‘DemoLivery AI’. You help customers with their shipping needs and provide package tracking information.”
You can see the full system prompt here
3. LLM generates the next response
The LLM predicts the most likely next tokens, which means it generates the most appropriate response.
In our example, the LLM realizes it needs the tracking number and responds asking for it.
4. User provides the tracking number
The user replies:
“My tracking number is TRK963852741”
We send this message again to the LLM including the entire chat history, because that’s how the assistant “remembers” the conversation.
The LLM now says it wants to use the tool get_package_tracking_info with the parameter TRK963852741.
5. The assistant calls the tool
The assistant executes the function get_package_tracking_info(tracking_number=”TRK963852741“).
The tool returns something like:
{
"success": true,
"tracking_number": "TRK963852741",
"package_info": {
"origin": { "country": "Canada", "city": "Vancouver" },
"destination": { "country": "United Kingdom", "city": "Manchester" },
"weight": 9.1,
"services": ["Pick from home"],
"status": "In Transit",
"last_update": "2024-01-15 12:45:00",
"estimated_delivery": "2024-01-18"
}
}
6. Send tool result back to the LLM
We send this result (plus the whole history again) to the LLM.
Now the LLM has everything it needs to craft a final response.
It returns something like:
“Your package is currently in transit to Manchester, UK. The estimated delivery date is January 18th.”
Perfect.
Code Reference
You can find all the code here:
https://github.com/freddyDOTCMS/Package-Shipping-Assistant
And we also have a full tutorial explaining how the assistant works:
https://github.com/freddyDOTCMS/Package-Shipping-Assistant/blob/main/docs/TUTORIAL.md
Defining the Data Format
In a real company, all this shipping information would live inside the company’s internal systems. Every package sent would be stored in a database, and the company would expose an API you could call to retrieve details about any package.
Then we would build tools that call that API so the assistant can access the data when needed.
But for this demo, we’re keeping things simple.
All the data will live in a single JSON-like Python file, and we’ll define tools that read this file and return the information the assistant needs.
You can find the data file here:
https://github.com/freddyDOTCMS/Package-Shipping-Assistant/blob/main/data.py
In this file, we have several data structures. Let’s explain each one.
1. all_destinations
This is a nested dictionary containing all the shipping routes our company supports.
- The top-level keys are origin countries
- Each origin contains:
- a list of origin cities
- a list of possible destinations, each with:
- destination country and cities
- cost per pound
- departure day
- pricing for each service type
Here’s an example entry:
all_destinations = {
"United States": {
"cities": ["New York", "Los Angeles", ...],
"destinations": [
{
"country": "Canada",
"cities": ["Toronto", "Vancouver", ...],
"per_lb": 8.50,
"departure_day": "Monday",
"pricing": {
"Pick from home": {"base": 45.00},
"Drop in home": {"base": 35.00},
"Express": {"base": 75.00}
}
},
# ... more destinations
]
},
# ... more origin countries
}
This means, for example:
- You can ship a package from the United States to Canada
- You can see the price for each available service (Pick from home, Drop in home, Express)
You can see the day packages depart for that route
2. tracking_packages
This is a list of simulated packages currently being tracked.
Each item represents one package that is still in transit or waiting to be delivered.
Here is one example package:
tracking_packages = [
{
"tracking_number": "TRK123456789",
"origin": {"country": "United States", "city": "New York"},
"destination": {"country": "Canada", "city": "Toronto"},
"weight": 15.5,
"services": ["Pick from home", "Express"],
"status": "In Transit",
"last_update": "2024-01-15 14:30:00",
"estimated_delivery": "2024-01-18"
},
# ... more packages
]
Each entry includes:
- Tracking number
- Origin and destination
- Weight
- Selected services
- Current status
- Last update timestamp
- Estimated delivery date
You can explore the full data structure here:
https://github.com/freddyDOTCMS/Package-Shipping-Assistant/blob/main/data.py
Understanding the Tools
As we mentioned before, our assistant needs tools.
Tools are simply functions that provide the data the LLM needs.
In a real production system, these tools would call internal company APIs to retrieve package details, prices, routes, and more.
But for this demo, our tools simply read from the JSON-like data file we created and return the necessary information.
We’ve built a set of functions that cover everything our assistant needs.
1. get_package_tracking_info
This tool returns detailed tracking information for a package, based on its tracking number.
Function signature:
def get_package_tracking_info(tracking_number):
Example response:
{
"success": true,
"tracking_number": "TRK123456789",
"package_info": {
"origin": {"country": "United States", "city": "New York"},
"destination": {"country": "Canada", "city": "Toronto"},
"weight": 15.5,
"services": ["Pick from home", "Express"],
"status": "In Transit",
"status_description": "<calculated_text>",
"last_update": "2024-01-15 14:30:00",
"estimated_delivery": "2024-01-18"
}
}
Here’s an example (image) where the assistant uses this tool:

2. calculate_shipping_price
This tool calculates the final shipping price based on:
- Origin country
- Destination country
- Package weight
- Selected services
Function signature:
def calculate_shipping_price(origin_country, destination_country, package_weight_lb, selected_services):
Example response:
{
"success": true,
"origin_country": "France",
"destination_country": "Germany",
"package_weight_lb": 10,
"selected_services": ["Express"],
"price_breakdown": {
"base_service_costs": {"Express": 56.0},
"total_base_cost": 56.0,
"weight_cost": 50.0,
"per_lb_price": 5.0,
"total_cost": 106.0
},
"summary": {
"services": "Express",
"weight": "10 lbs",
"total_price": "$106.00"
}
}
Example usage inside the assistant:

3. get_route_information
This tool returns all the details for a specific shipping route, including:
- All destination cities reachable from an origin
- Pricing information
- Available services and descriptions
Example response (formatted):
{
"success": true,
"origin": {
"country": "France",
"city": "",
"available_cities": ["Paris", "Marseille", ...]
},
"destinations": [
{
"country": "United States",
"cities": ["New York", "Los Angeles", ...],
"per_lb": 12.0,
"departure_day": "Tuesday",
"pricing": {
"Pick from home": {"base": 83.0},
"Drop in home": {"base": 73.0},
"Express": {"base": 123.0}
}
},
{
"country": "Canada",
"cities": ["Toronto", "Vancouver", ...],
"per_lb": 11.5,
"departure_day": "Monday",
"pricing": {
"Pick from home": {"base": 81.0},
"Drop in home": {"base": 71.0},
"Express": {"base": 121.0}
}
}
// ... more routes
],
"total_destinations": 8
}
Example inside the assistant:

Tool Definitions in Code
You can see all the tool function definitions here:
https://github.com/freddyDOTCMS/Package-Shipping-Assistant/blob/main/functions.py
How We Send Tools to the LLM
For the LLM to use a tool, we must send a full description of:
- The tool name
- A detailed description
- The expected parameters
- The type of each parameter
- Required fields
- Whether additional fields are allowed
For example, the metadata for get_package_tracking_info looks like:
tracking_function = {
"name": "get_package_tracking_info",
"description": """
Get detailed tracking information for a package using its tracking number…
""",
"parameters": {
"type": "object",
"properties": {
"tracking_number": {
"type": "string",
"description": "Tracking number of the package (e.g., 'TRK123456789')"
}
},
"required": ["tracking_number"],
"additionalProperties": False
}
}
This entire structure becomes part of the context we send to the LLM.
The model uses it to understand:
- when to call the tool
- which parameters it needs
- how to format the tool call
You can view the full list of tool descriptions here.
And the exact format sent to the LLM is shown here.
As you’ll notice, not all functions are used—later you’ll explain why.
Introducing Ollama: Our Local LLM Engine
To power our assistant, we need an LLM—the “brain” of the system.
For this demo, we will use Ollama, a free and open-source LLM runner that works locally on your computer. This makes it perfect for development and experimentation.
You can download Ollama here:
👉 https://ollama.com/
Once installed, start the Ollama server by running:
ollama serve
You can find additional setup details in the README:
👉 https://github.com/freddyDOTCMS/Package-Shipping-Assistant/blob/main/README.md
How to Talk to Ollama: Connection & Messaging
Now that we have our local LLM engine running, we need to connect to it from Python, send user messages, receive responses, and maintain a full conversation.
Let’s go step by step.
1. Connecting to Ollama
We use the OpenAI-compatible client library, pointing it to Ollama’s local server:
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # Ollama doesn't require auth, but the field is mandatory
)
This tells the OpenAI client to communicate with Ollama instead of OpenAI’s cloud API.
2. Sending Messages to Ollama
LLMs don’t just receive the last user message—they need the entire conversation history every time.
We also include a system prompt, which tells the model how it should behave.
Here is the structure:
[
{
"role": "system",
"content": "You are a helpful assistant for a package shipping company …"
},
{
"role": "user",
"content": "Hi, I want to send a package from France to Germany"
}
]
- system → instructions for the assistant
- user → messages from the customer
- assistant → (LLM responses)
- tool → (messages returned by tools)
You can view the full system prompt here:
👉 https://github.com/freddyDOTCMS/Package-Shipping-Assistant/blob/main/main.py#L9
Example Conversation
User:
“I want to send a package from France to Germany.”
Assistant:
“You can send a package using Pick from home, Drop in home, or Express… Which one do you want?”
Now if the user replies:
“I would like Express”
You must send the full conversation, including the previous assistant message:
[
{ "role": "system", "content": system_prompt },
{ "role": "user", "content": "Hi, I want to send a package from France to Germany" },
{ "role": "assistant", "content": "You can send a package … Which service would you like to choose?" },
{ "role": "user", "content": "I would like \"Express\"" }
]
3. Sending Messages from Python
Here is the code that prepares and sends the chat:
messages = [
{"role": "system", "content": system_prompt}
] + history + [
{"role": "user", "content": message}
]
response = client.chat.completions.create(
model="llama3.2",
messages=messages,
tools=llm_tools
)
- We combine the system prompt + history + new user message
- We send all this to Ollama
- The model returns a response with either:
- a normal assistant message, or
- a tool call
4. Handling Tool Calls
Sometimes the LLM decides it needs to call a tool, such as calculate_shipping_price.
When that happens, the response looks like this:
{
"finish_reason": "tool_calls",
"message": {
"function_arguments": {
"origin_country": "France",
"destination_country": "Germany",
"package_weight_lb": 10,
"selected_services": ["Express"]
},
"name": "calculate_shipping_price"
}
}
When this happens, we must run the function in Python, then send the result back to the LLM.
Here’s the code:
if response.choices[0].finish_reason == "tool_calls":
function_name = response.choices[0].message.tool_calls[0].function.name
function_arguments = response.choices[0].message.tool_calls[0].function.arguments
# Load the function by name
func = getattr(functions, function_name)
# Parse JSON arguments
if isinstance(function_arguments, str):
function_arguments = json.loads(function_arguments)
# Execute the function
function_result = func(**function_arguments)
# Add tool call + tool result to conversation
messages.append({
"role": "assistant",
"content": None,
"tool_calls": response.choices[0].message.tool_calls
})
messages.append({
"role": "tool",
"content": str(function_result),
"tool_call_id": response.choices[0].message.tool_calls[0].id
})
# Call the LLM again, now with tool result added
response = client.chat.completions.create(
model="llama3.2",
messages=messages
)
Flow summary:
- LLM asks to run a function
- We extract the function name & arguments
- Run the function in Python
- Add the result back into the conversation
- Ask the LLM to generate the final user-friendly message
5. Adding a UI with Gradio
To make everything interactive, we use Gradio, a super simple UI library.
One line creates a full chat interface:
gr.ChatInterface(chat, type=”messages”).launch()
The chat function:
- receives the chat history and the last message
- handles all logic described above
- sends queries to the LLM
- returns the assistant’s answer
You can see the entire implementation here:
👉 https://github.com/freddyDOTCMS/Package-Shipping-Assistant/blob/main/main.py#L52
What Went Wrong: Our Assistant’s Delirious Moments
Everything looks cool so far, right? It feels like we already have an assistant ready for production — one that will save us tons of work.
Well… not really. Our assistant is still far from production-ready.Why?
Because sometimes it completely loses its mind. Let’s walk through a few chats where the assistant went full delirious mode.
I Just Said “Hi”…

All I did was say hi, man!
But if we check the logs, we can see the assistant tried to call the get_route_information tool with empty origin and destination.
That’s why we got that weird answer.

“How Much Does It Cost?”
Now let’s try asking for the cost.
But what happens if we ask without enough information?

Let’s check the logs:

As you can see, it first calls get_route_information with Mexico as the origin — which is fine.
But after we get the result and call the LLM again… everything gets messy.
To be honest, I’m still not sure what happened here — maybe a bug in the code, maybe the LLM hallucinated, or maybe I should try a different model. No idea yet.
WHAT?!

If you check the data, the route between Mexico and Canada is allowed.
So what happened here?
Let’s investigate:

Again, it tries to call get_route_information but without origin or destination, even though both were clearly mentioned in the message.
This happens a lot, unfortunately.
here another example:

Adding More Tools Makes It Worse
Something else I’ve noticed:
The more tools you add, the more delirious the assistant becomes.
That’s why I exclude a couple of tools for the demo.
Because once you add, say, 10 more tools — which is not even that many — the hallucinations get even worse.
What ‘s Next?
I know what you’re thinking:
“You didn’t really explain how an LLM works… at least not in detail. I actually have more questions now.”
And honestly… you’re right.
You might also be thinking:
“Wouldn’t all of this work better using MCP (Model Context Protocol)?”
Yep — probably right again.
Or maybe:
“What about using RAG (Retrieval-Augmented Generation) to reduce the assistant’s delirious moments?”
Maybe! That’s definitely something worth trying.
So the real question becomes:
How do we keep improving this assistant until it’s truly production-ready?
To be honest, this is just the beginning.
My plan is to continue iterating, improving, experimenting, and learning as we go.
After all, that’s the whole point of this project, right?
So stay tuned — things are just getting interesting.
Reference
https://dotcms.udemy.com/course/llm-engineering-master-ai-and-large-language-models
