A Journey into Advanced Data Structures
Posted on Thu 13 February 2025 in posts
Introduction
Practice makes perfect. So let's practice some data structures.
We will begin with a stack. Then move on to a queue. In the future, we will follow the book by Peter Brass titled "Advanced Data Structures". Ideally, I would like to implement these in
- Rust
- C
- Python
The languages are ordered in increasing order of familiarity and proficiency. This post will have implementation in C. Later on I hope to add Rust and Python code as well. All code will be available here and here.
Stack
Stack is simple: you have a collection of items, say, numbers, which you pile up on top of each other. To get a number back, you grab the last number you put in. This is LIFO: Last In, First Out.
Pointers
We will use pointers here quite extensively. Pointers are extremely powerful.
A brief note on pointers: these are simply memory addresses of you variables. Example:
int a = 10; // an integer
int *p = &a; // a pointer to an integer
Implementing a stack: nodes
Here we will treat a stack as a collection of nodes. What are nodes? Well, it's up to us to define, so let's do just that:
typedef struct node {
void *data; // pointer to any data of any type
int idx; // we'll use it to keep track of the size of the stack
struct node *next; // pointer to the next node in stack
} node_t;
From this we can create a stack:
typedef struct stack_t__ {
void *data;
struct node *next;
} __stack_t;
Note that the strange naming with multiple underscores is a way to avoid name conflict with DARWIN's (macOS) implementation of stack in the C standard library.
Stack methods
Each stack has fundamentally two methods:
- Push: add item onto the stack;
- Pop: remove the top item from the stack.
We will additionally add a couple of fun routines:
- Create a node: this will create a new node to be added to the stack;
- Create a stack: this will create a stack (empty by default);
- Display a stack: this will pretty-print all the items in the stack;
- Check if stack is empty.
Here are all of their signatures:
//fundamental routines
void push(__stack_t *st, void *data, size_t size);
void *pop(__stack_t *st);
//helper routines
node_t *new_node(void *data, size_t size);
__stack_t *create_stack(void);
void display_stack(__stack_t *st);
int is_empty_stack(__stack_t *st);
Note that pop
is a void *
-pointer function. It will return the data item or
the node with the data item (design choice).
We will implement a stack as a linked list. What's a linked list? A collection of nodes which point from one to the next:
// Linked list:
// [DATA | NEXT ] -> [DATA | NEXT ] -> [DATA | NULL ] // NULL here is a terminating pointer
Creating a stack
To create a stack in our implementation we will allocate required memory space
and set its next
attribute to null. This is our entry point. The remaining
nodes will carry the data values as a linked list.
__stack_t *create_stack(void) {
__stack_t *tmp = (__stack_t *)malloc(sizeof(__stack_t));
tmp->next = NULL;
return tmp;
}
Once we allocate the memory for our main stack pointer, we simply return it.
Creating a node
Creating a node is also pretty straightforward. We allocate the required amount
of memory and then copy the data to it using memcpy
.
node_t *new_node(void *data, size_t size) {
node_t *tmp = (node_t *)malloc(sizeof(node_t));
tmp->data = malloc(size);
memcpy(tmp->data, data, size);
tmp->idx = 0;
tmp->next = NULL;
return tmp;
}
Pushing and Popping
No onto the most important methods of stack: push
and pop
.
For the former, we simply "insert" the new node between the entry point st
and
its current next
node.
void push(__stack_t *st, void *data, size_t size) {
node_t *tmp = new_node(data, size);
tmp->next = st->next;
if (st->next != NULL) {
tmp->idx = st->next->idx + 1;
}
st->next = tmp;
return;
}
Note that we also keep track of the size of the stack. This is done to be able to check the size if we wanted.
Finally, for the pop
method, we would grab the entrypoint's next and return
it. At the same time we overwrite the old next
with a next
's next
.
void *pop(__stack_t *st) {
node_t *item = st->next;
st->next = st->next->next;
return item;
}
Helper routines
Below are the helper routines we can use in this implementation.
First function simply tells us if the stack is empty.
The second one is just a pretty display utility.
int is_empty_stack(__stack_t *st) {
if (st->next == NULL) {
return 1;
}
return 0;
}
void display_stack(__stack_t *st) {
node_t *runner = (node_t *)malloc(sizeof(node_t));
runner = st->next;
while (runner != NULL) {
int *data = (int *)runner->data;
printf("[%d|%d]", runner->idx, *data); // assume int for simplicity
runner = runner->next;
printf(" -> ");
}
printf("NULL\n");
free(runner);
}
Queue
The difference between a queue and a stack is in the way we insert and remove data. Queue is a First In First Out type of data structure.
For this we will use the doubly-linked list strategy.
A doubly-linked list is simply a linked list where each node points both "forward" and "backward".
Our "entrypoint" is an extra node which will point to the last inserted node and to the first inserted node.
Creating a node: queue edition
Let's define a data type node
for the queue structure.
typedef struct nodeq_t {
struct nodeq_t *next;
struct nodeq_t *prev;
void *data;
int idx;
} nodeq_t;
We will have two initialization routines: one for the queue (entrypoint node):
nodeq_t *create_queue(void) {
nodeq_t *tmp = (nodeq_t *)malloc(sizeof(nodeq_t));
tmp->next = tmp;
tmp->prev = tmp;
return tmp;
}
and another for the actual data node:
nodeq_t *new_node_q(void *data, size_t size) {
nodeq_t *tmp = (nodeq_t *)malloc(sizeof(nodeq_t));
tmp->data = malloc(size);
memcpy(tmp->data, data, size);
tmp->idx = 0;
tmp->next = NULL;
tmp->prev = NULL;
return tmp;
}
One could say that these two routines are redundant but I think this distinction between the queue-specific and node-specific routine is useful. It helps isolate logic between the whole queue and each individual node.
Enqueueing and Dequeuing
The two fundamental routines for queues are enqueue and deque.
The first one adds and the other one removes the data.
For adding the node, we simply insert the new node in between the entrypoint's
next
and that next
's next
.
For removing, we act on the entrypoint's prev
and overwrite it.
void enqueue(nodeq_t *qu, void *data, size_t size) {
nodeq_t *node = new_node_q(data, size);
node->next = qu->next;
qu->next = node;
node->next->prev = node;
node->prev = qu;
}
void *deque(nodeq_t *qu) {
nodeq_t *node = qu->prev;
qu->prev = qu->prev->prev;
qu->prev->next = qu;
return node;
}
Additional functions
For completeness, I'm adding two more listings for extra routines:
- a helper to display the queue
void display_queue(nodeq_t *qu) {
nodeq_t *runner = (nodeq_t *)malloc(sizeof(nodeq_t));
runner = qu;
while (runner->next != qu) {
runner = runner->next;
int *data = (int *)runner->data;
printf("[%d]", *data);
printf(" <-> ");
}
free(runner);
}
- a helper to determine if the queue is empty;
int is_empty_queue(nodeq_t *qu) {
if (qu->next == qu) {
return 1;
}
return 0;
}
Concluding Remarks
There are probably more ways to implement queues and stacks. A variety of ways are described in the "Advanced Data Structures" book, including linked list based implementation.
Note that to avoid memory leaks, we would need to carefully track our use of memory allocation and possibly create a way to destroy the data structures safely.
Since these implementations are for learning purposes, I will defer most of memory management to the OS. However, in a more serious production environment this could be a serious concern.
Next, I will work on some tree based structures.