Saturday, May 4, 2019

Lazy BST

Hello everyone,

I am bringing you another lab this week, this one working closely with binary search trees. This lab involves the implementation of two types of binary search trees; a general binary search trees and a lazy deletion binary search trees. As always feel free to leave a comment at the end and without further adieu, lets get into this.

What is a Binary Search Tree?

A binary search tree is an abstract data structure that stores items in a "sorted" format. Binary search trees are made up of nodes, with each node containing two pointers, a left child pointer and a right child pointer, and the data.

Source: GeeksforGeeks.com

Each node has an ordering condition imposed: the left sub tree of a parent node is less than the data in the parent node, while the right sub tree of a parent node is greater than the data in the parent node.
Based on the programming language you are implementing a binary search trees in, it is assumed that the data being inserted into the tree is comparable.

What makes binary search trees useful is that it allows for fast operations, such as searching, because searching through a binary search tree involves you going down either the left subtree or the right subtree over and over again, each time reducing the search space by at most half.

The Lab

The lab itself is essentially the implementation of a regular binary search tree (class name = BST) and a "lazy" binary search tree (class name = LazyBST). LazyBST is built off of BST, but with a few minor additions.

The LazyBST class utilizes a deletion technique known as "lazy deletion," which is essentially marking a node as deleted instead of going through the process of actually deleting the node and restructuring the tree. The advantage of lazy deletion is that it allows for fast deletion of nodes, but the downside is that it increases search and insert times (since the nodes marked as "deleted" are not actually gone) and wastes memory (again, because the marked nodes are not gone).

The Implementation

Starting off with BST, almost all of BST's methods will be utilizing recursion to traverse through the BST and do things like inserting nodes, deleting nodes, deleting the entire tree, etc. BST will include its own dedicated node class, which will store the pointers to the left and right children and the user data. BST's protected members include a size member, which represents the amount of nodes in the tree, and a pointer to the root node. BST also have methods such as insert(), remove(), find(), findMin(), getters, and much more; which uses recursion.

LazyBST is essentially BST, but has a few more things to take care off. LazyBST's nodes now contain a "is_deleted" flag in order to implement "lazy deletion." LazyBST itself also has a "real_size" member, which stores the real amount of nodes within LazyBST (valid and "deleted" nodes) while "size" stores the amount of valid nodes. You can essentially recycle the methods within BST, but you have to modify them to keep the lazy deletion in mind.

Lab Questions

Q: What changes will you have to make to turn LazyBST into a subclass of BST?
A: In LazyBST, we would have to have its node contain a flag of sorts to allow for lazy deletion and implement another "size" member that tells us the actually number of nodes within a LazyBST object. The methods inherited from BST will need to be overloaded in LazyBST such that it not only checks for null pointers, but also checks for marked nodes. To deallocate some memory after marking many nodes as "deleted," we would also require a garbage collection system of sorts.

Q: What is the tradeoff in using LazyBST? What are you trying to minimize?
A: With LazyBST, we are trying to minimize the time for deleting nodes within the tree, but we end up having to increasing the amount of time it takes to find nodes, insert nodes, and also waste memory.

Q: Does Laziness help change the Big-O of any of the other common operations?
A: I would say laziness does not really help change the Big-O of other common operations, since most operations involves "splitting" the tree in half in each iteration, resulting in a worst case O(log N).

Q: BST<T>::find() returns a reference to the payload. But remember you need to pass it the payload in order to find() it. Why do you need to invoke find() on the tree if you already have the payload in your hands?
A: Looking into this, I believe the reason why we need to invoke find on the tree, even if we already have what we want, is because we want to change something about the element that is stored within the tree itself. The payload we have in our hands may contain the bare minimum to find the data we want to change that is stored in the tree itself (I believe this occurs with pairs within C++ STL).

Conclusion

Overall, this lab is mainly about implementing the binary search tree abstract data type. Most of the methods within both BST and LazyBST involves the use of recursion. The methods themselves usually have a worst case runtime of O(log N) since some operations involves splitting the tree in half in each iteration. LazyBST is essentially BST, but requires a "deleted" flag in each of its nodes and the methods need to be modified to handle the situation where it lands on a "deleted" node.

That is pretty much all for this week. As always please leave a comment and thank you for reading!

1 comment: