People often come up to me on the street and ask: ‘Aleks, could you tell us about Red-Black trees?’
I mean, no one really does that, but sure, sure I can.
Right, let’s start from the beginning. Red-black tree is a data structure. It’s a type of tree. Trees, if you remember, or totally-know-but-forgot-just-now, look like this:
They have a root, the root has children connected by edges. A unit of a tree is called a node, if a node doesn’t have children - it’s called a leaf. In a tree a node can have as many children as it god-damn pleases. And then there’s a binary tree.
Binary trees look like this:
As you have totally noticed, in a binary tree a node can’t have more than two children. Very responsible parenting, if you ask me.
Now, binary trees can be search trees. In a binary search tree, typically, every child to the left is smaller than its parent, every child to the right is greater than its parent. Like so:
Trees can also be balanced, a balanced tree is a type of tree where its subtrees don’t differ in height by more than one node. In the drawing below the left tree is a balanced, the right isn’t:
Trees can also be self-balancing, which means that they automatically balance themselves on each insertion and deletion. If you’re feeling slightly lost - totally fine, what happens is - after insertion the tree looks at its nodes and might decide to re-order them by swapping nodes around or moving them from one branch to another.
So, what is a red-black tree? It’s a kind of a self-balancing binary search tree. Typically a node in a tree might contain: a pointer to one or more children, some data. Red-black trees, however, also contain a bit that represents colour (red or black). The colour is the magic thingamawhatsit that helps with self-balancing.
Here’s a set of rules that makes red-black trees work:
- Nodes can be either black or red.
- The root of a red-black tree is always black.
- Null leaves are always black.
- If a node is red - both of it’s children are black.
- Any path from a node you pick should contain the same number of black nodes.
To get rid of potential confusion raised by #3 - leaves in red-black trees actually have two implied null children that are always colored black, but don’t worry about it too much.
Should uh… should we see how the this stuff would work in practice?
Insertion is straightforward, kind of. Let’s take a look at a tree we have here:
To quickly illustrate the bit about null leaves, here’s what the above tree would look like with null children (but, like I said, don’t worry about it).
Now let’s add ‘34’ to it. All the usual binary search tree rules apply and we end up with the node a-a-a-all the way there.
Now we need to sort out the colour. A newly inserted node is by default red. Then we implement the rules above. If the parent of the newly inserted node is black - all is cool, we can congratulate ourselves and have a sip of that whiskey we’ve been saving for a special occasion.
In our case, though, it isn’t, we have a red node with a red child, a total no-no. So what we do now is look at its uncle. This guy:
You might be wondering - why aren’t we checking the grandparent, maybe its color is wrong? Well, because the grandparent won’t ever have a color discrepancy, since its color has been sorted when our child’s parent was inserted. Anyway, back to the uncle. There are two possibilities - the uncle is red or black (or null, which also counts as black).
If the uncle is null or black, we do a restructure. We grab our three nodes - the child, the parent and the grandparent and order them:
We then grab the middle value node and make it the parent of the other two, recolor it to black and its children to red. Boom! You’re done, the rest of the tree should already adhere to the five rules above:
As you can imagine, there’s actually four variations of this scenario.
But conceptually the process stays the same.
And then there’s the case where the uncle is red. In which case we don’t restructure anything, but we recolor the hell out of this tree. We leave the child red, the child’s parent and uncle become black, the grandparent becomes red, unless it’s the root node, then it remains black. If the grandparent does change color to red - we recursively sort out potential color rule violations between the grandparent and its parent and its parent and… you get the point. All the way to the root. And then we’re done.
If you’re curious - insertion into red-black trees has a O(logN) complexity.
So first let’s think about what happens when a node in a binary search tree is deleted. There are a few scenarios: it might be a leaf, might have one or two children.
If it doesn’t have children - we delete it, put a null node in its place and go have a PJ party.
If it has a single child - we delete it and put that child into its place.
If it has two children - replace the node either with its in-order successor or in-order predecessor, then recursively do the same thing for the successor/predecessor until you end up with a single child or a null node. Then you’re done.
Now let’s add color to the mix.
If you’re deleting a node that’s red - the black node property (#5) is still satisfied, all is well with the world, we just delete it like we would with a plain binary search tree and replace it with its black child. If the deleted node was black - there’s some more work to be done. We empty the node of the value and associate it with the node that will replace it. Leaving it that way isn’t comme il faut, so here’s what we can do.
If the empty black node is replaced by a red node - we re-colour the node to black and forget anything ever happened.
If the new node is also black, congratulations, you now have a double black node. If it’s a double black root node - we discard the empty black node, it shouldn’t mess with the black node property. No one will ever know.
Otherwise we start looking at siblings. If the sibling of the double black node is also black, but has a red child - we color that child black and restructure. If the sibling is black and its children are black too - we recolor.
If the sibling is red - restructure and recolor.
If you’re still curious - deletion from red-black trees has a O(logN) complexity.
If all of this sounds somewhat confusing - here’s a pretty good place to mess around with red-black trees and see how they work.