Advanced Data Structures. File Indexing with RB Trees
Posted on Sat 22 March 2025 in posts
Introduction
This will be a short post. I my attempts learning more about data structures, I wanted to try and implement something that I could potentially use.
Recently, I discovered that my computer had a large file hidden somewhere. As I was looking for a quick and easy way to find the file, I thought "Why not write a small command tool that would give me information about a directory structure?"
So I decided to do several things at once:
- I wanted to write a program in C that would display the information about files inside a given directory;
- I wanted to use a fun data structure to store the information about the file system
As a result, a small tool I called file_indexer
.
File Indexer
The tool can be found here The tool is very small, it can interactively process commands such as
list <PATH>
- list files inside<PATH>
, e.g.list ./
info <PATH>
- show info about<PATH>
, e.g.info ./
exit
- exit the program.
The data structure to store the info
output is
Red-Black Tree. It's an
interesting data structure to implement, after many iterations over the Peter
Brass book, and chatting with ChatGPT, I was able to get the rotations right.
Rotations of the tree are probably the hardest part of this structure.
The main idea is that once an element is inserted, key properties of the tree may be violated and hence the node needs to be corrected. In hindsight, this is probably an overkill in terms of implementation complexity and a simple heap would have done just fine.
Learnings
Over the course of the work I learned a lot about C, data structures, and various ways we can extract information about the file system.
I think this project also taught me that it is more interesting to rely less on the various tools that help you code like AI based IDE helpers. That is, inside your IDE ideally should only be your own head and thoughts. However, AI proves an indispensable companion when prototyping, discussing various concepts, and ideating.