A Journey into Advanced Data Structures

Posted on Thu 13 February 2025 in posts

Introduction

Practice makes perfect. So let's practice some data structures.

We will begin with a stack. Then move on to a queue. In the future, we will follow the book by Peter Brass titled "Advanced Data Structures". Ideally, I would like to implement these in

  • Rust
  • C
  • Python

The languages are ordered in increasing order of familiarity and proficiency. This post will have implementation in C. Later on I hope to add Rust and Python code as well. All code will be available here and here.

Stack

Stack is simple: you have a collection of items, say, numbers, which you pile up on top of each other. To get a number back, you grab the last number you put in. This is LIFO: Last In, First Out.

Pointers

We will use pointers here quite extensively. Pointers are extremely powerful.

A brief note on pointers: these are simply memory addresses of you variables. Example:

int a = 10; // an integer
int *p = &a; // a pointer to an integer

Implementing a stack: nodes

Here we will treat a stack as a collection of nodes. What are nodes? Well, it's up to us to define, so let's do just that:

typedef struct node {
  void *data; // pointer to any data of any type
  int idx; // we'll use it to keep track of the size of the stack
  struct node *next; // pointer to the next node in stack
} node_t;

From this we can create a stack:

typedef struct stack_t__ {
  void *data;
  struct node *next;
} __stack_t;

Note that the strange naming with multiple underscores is a way to avoid name conflict with DARWIN's (macOS) implementation of stack in the C standard library.

Stack methods

Each stack has fundamentally two methods:

  1. Push: add item onto the stack;
  2. Pop: remove the top item from the stack.

We will additionally add a couple of fun routines:

  1. Create a node: this will create a new node to be added to the stack;
  2. Create a stack: this will create a stack (empty by default);
  3. Display a stack: this will pretty-print all the items in the stack;
  4. Check if stack is empty.

Here are all of their signatures:

//fundamental routines
void push(__stack_t *st, void *data, size_t size);
void *pop(__stack_t *st);

//helper routines
node_t *new_node(void *data, size_t size);
__stack_t *create_stack(void);
void display_stack(__stack_t *st);
int is_empty_stack(__stack_t *st);

Note that pop is a void *-pointer function. It will return the data item or the node with the data item (design choice).

We will implement a stack as a linked list. What's a linked list? A collection of nodes which point from one to the next:

// Linked list:

// [DATA | NEXT ] -> [DATA | NEXT ] -> [DATA | NULL ] // NULL here is a terminating pointer

Creating a stack

To create a stack in our implementation we will allocate required memory space and set its next attribute to null. This is our entry point. The remaining nodes will carry the data values as a linked list.

__stack_t *create_stack(void) {
  __stack_t *tmp = (__stack_t *)malloc(sizeof(__stack_t));
  tmp->next = NULL;
  return tmp;
}

Once we allocate the memory for our main stack pointer, we simply return it.

Creating a node

Creating a node is also pretty straightforward. We allocate the required amount of memory and then copy the data to it using memcpy.

node_t *new_node(void *data, size_t size) {
  node_t *tmp = (node_t *)malloc(sizeof(node_t));
  tmp->data = malloc(size);
  memcpy(tmp->data, data, size);
  tmp->idx = 0;
  tmp->next = NULL;
  return tmp;
}

Pushing and Popping

No onto the most important methods of stack: push and pop.

For the former, we simply "insert" the new node between the entry point st and its current next node.

void push(__stack_t *st, void *data, size_t size) {
  node_t *tmp = new_node(data, size);
  tmp->next = st->next;
  if (st->next != NULL) {
    tmp->idx = st->next->idx + 1;
  }
  st->next = tmp;
  return;
}

Note that we also keep track of the size of the stack. This is done to be able to check the size if we wanted.

Finally, for the pop method, we would grab the entrypoint's next and return it. At the same time we overwrite the old next with a next's next.

void *pop(__stack_t *st) {
  node_t *item = st->next;
  st->next = st->next->next;
  return item;
}

Helper routines

Below are the helper routines we can use in this implementation.

First function simply tells us if the stack is empty.

The second one is just a pretty display utility.

int is_empty_stack(__stack_t *st) {
  if (st->next == NULL) {
    return 1;
  }
  return 0;
}

void display_stack(__stack_t *st) {
  node_t *runner = (node_t *)malloc(sizeof(node_t));
  runner = st->next;
  while (runner != NULL) {
    int *data = (int *)runner->data;
    printf("[%d|%d]", runner->idx, *data); // assume int for simplicity
    runner = runner->next;
    printf(" -> ");
  }
  printf("NULL\n");
  free(runner);
}

Queue

The difference between a queue and a stack is in the way we insert and remove data. Queue is a First In First Out type of data structure.

For this we will use the doubly-linked list strategy.

A doubly-linked list is simply a linked list where each node points both "forward" and "backward".

Our "entrypoint" is an extra node which will point to the last inserted node and to the first inserted node.

Creating a node: queue edition

Let's define a data type node for the queue structure.

typedef struct nodeq_t {
  struct nodeq_t *next;
  struct nodeq_t *prev;
  void *data;
  int idx;
} nodeq_t;

We will have two initialization routines: one for the queue (entrypoint node):

nodeq_t *create_queue(void) {
  nodeq_t *tmp = (nodeq_t *)malloc(sizeof(nodeq_t));
  tmp->next = tmp;
  tmp->prev = tmp;
  return tmp;
}

and another for the actual data node:

nodeq_t *new_node_q(void *data, size_t size) {
  nodeq_t *tmp = (nodeq_t *)malloc(sizeof(nodeq_t));
  tmp->data = malloc(size);
  memcpy(tmp->data, data, size);
  tmp->idx = 0;
  tmp->next = NULL;
  tmp->prev = NULL;

  return tmp;
}

One could say that these two routines are redundant but I think this distinction between the queue-specific and node-specific routine is useful. It helps isolate logic between the whole queue and each individual node.

Enqueueing and Dequeuing

The two fundamental routines for queues are enqueue and deque.

The first one adds and the other one removes the data.

For adding the node, we simply insert the new node in between the entrypoint's next and that next's next.

For removing, we act on the entrypoint's prev and overwrite it.

void enqueue(nodeq_t *qu, void *data, size_t size) {
  nodeq_t *node = new_node_q(data, size);
  node->next = qu->next;
  qu->next = node;
  node->next->prev = node;
  node->prev = qu;
}
void *deque(nodeq_t *qu) {
  nodeq_t *node = qu->prev;
  qu->prev = qu->prev->prev;
  qu->prev->next = qu;
  return node;
}

Additional functions

For completeness, I'm adding two more listings for extra routines:

  • a helper to display the queue
void display_queue(nodeq_t *qu) {
  nodeq_t *runner = (nodeq_t *)malloc(sizeof(nodeq_t));
  runner = qu;
  while (runner->next != qu) {
    runner = runner->next;
    int *data = (int *)runner->data;
    printf("[%d]", *data);

    printf(" <-> ");
  }
  free(runner);
}
  • a helper to determine if the queue is empty;
int is_empty_queue(nodeq_t *qu) {
  if (qu->next == qu) {
    return 1;
  }
  return 0;
}

Concluding Remarks

There are probably more ways to implement queues and stacks. A variety of ways are described in the "Advanced Data Structures" book, including linked list based implementation.

Note that to avoid memory leaks, we would need to carefully track our use of memory allocation and possibly create a way to destroy the data structures safely.

Since these implementations are for learning purposes, I will defer most of memory management to the OS. However, in a more serious production environment this could be a serious concern.

Next, I will work on some tree based structures.