Solving Graph and Tree Problems

The post covers some of the important aspects of solving Graph problems. The graph in this post refers not only to Graph but some of its specialized variants like Binary Tree, Trie, etc as well. Here, I will be covering some of the important aspects of handling these questions in interviews.

Graph and its specialized subtypes

Let’s go through these in detail-

Graph is a collection of nodes with edges between some of them. A graph can either be directed or undirected. Directed edges are like a one-way street (i.e source → destination), undirected edges are a two-way street. In the undirected graph, an edge like (source — destination) would be stored twice. A graph can also have cycles, and a graph without cycles is known as an acyclic graph.

The graph might consist of multiple isolated subgraphs. This means we can’t always traverse a graph with one node. Graph can be represented either as adjacency matrix or adjacency list (more in-depth details here).

Here, is a reference Implementation of Graph using adjacency list. Most of the interview problems can be solved using adjacency list implementation. But, be careful, and choose the data structure based on the problem you are trying to solve. This is very important for Graph problems.

Traversal / Search

  • DFS: Go deep first, before we go wide. Use recursion or stack to implement it.
  • BFS: Visits the nodes in increasing order of their distance from the starting node, i.e. we go wide before we go deep. Use the Queue to implement it.

Question: What happens if we don’t check of visited nodes?

If this is NOT done, we might get stuck in an infinite loop.

Prefer DFS if we want to visit every node in the Graph. Both DFS and BFS work but, DFS is definitely simpler to implement. However, if we want to find the shortest path (or just any path) between 2 nodes, BFS is generally better.

Tree is a directed Graph without cycles. Unlike Graph, tree has a start node known as root, which can be used to visit the entire tree. If we can perform an operation at the root node and if we can do the same at other children nodes then it exhibits Optimal Substructure property.

In a binary tree, each node has up to two children. In a ternary tree, each node can have up to three children. If you are using the tree to represent a bunch of phone numbers then it will be a 10-ary tree with each node having up to 10 children (one for each digit). Trees may or may NOT have links back to their parent nodes.

  • Binary Search Tree: all left descendants <= root < all right descendants. The definition of the BST can vary slightly with respect to equality. Under some definition, the tree can’t have duplicate values and in others, the duplicate values can be on either side. So, better clarify always.
  • Heap / Complete Binary Tree: Every level of the tree is fully filled, except perhaps the last level. And it gets filled from left to right.
  • Full Binary Tree: Every node has either zero or two children. That is, no node has only one child.
  • Balanced Tree: Balanced enough to ensure O(log n) times for insert and find, If the size of the left and right subtree is EXACTLY the same then we call it a PERFECT balanced tree. RED-BLACK trees and AVL trees are two balanced trees. Balanced trees are rare in the interviews.

Traversal:

Tree being a subtype of Graph also has two strategies (BFS and DFS) for traversal. There is only one major exception — Tree is connected graph and we are given root node, so there is no risk of traversing the same node multiple times and hence we don’t need to track the visited node (notice in the DFS/BFS implementation of the Graph).

In graph DFS traversal, we don’t care the order in which children node get picked but that’s NOT the case for trees. In tree, choosing one node over the other generates specialized versions of DFS traversal. Below list covers these specific DFS traversals.

  • Pre-order: root-left-right. If you need to explore the roots before inspecting any leaves, pick pre-order because you will encounter all the roots before all of the leaves. Pre-order traversal is used to create a copy of the tree. Root gets visited first.
  • Post-order: left-right-root. If you need to explore or perform some operation on all the leaves first before exploring nodes then this approach is ideal. It can be used to generate postfix notation of the tree. Root gets visited last.
  • In-order: left-root-right. If you know that the tree has an inherent sequence in the nodes, and you want to flatten the tree back into its original sequence, then an in-order traversal should be used. The tree would be flattened in the same way it was created. Sorts the elements if tree is BST.

Question: Reverse sort a BST?

Performing in-order will sort the elements in ascending order. Now this means, to reverse the order, we need to reverse the traversal order as well. Use reverse In-order, i.e reverse(left-root-right) = right-root-left.

Question: Remove all subtrees with the value of 0?

We need to do something at the leaf nodes first. Use post-order traversal.

Question: DFS vs BFS which is more efficient?

Both have the same time cost, i.e. visit each node once. DFS uses memory proportional to the depth or height of the tree, while BFS uses memory proportional to the breadth of the tree. Space travel doubles each time it gets one level deeper. If the tree is almost balanced, depth will be O(lg n) but breadth will be O(N). Keep these in mind and then decide which is a more efficient algorithm for a given problem. But, this holds only if you can solve a problem with either of the approaches.

Use recursion as the de facto approach to solve tree problems, and keep below points in mind:

  • If you want to maintain global state, use class level (global scoped) variable/objects.
  • If you want to pass certain details downward, use method arguments.
  • If you want to pass info upwards, use return value. This is key to solve problems which ask to find the sum, count, min/max for the given condition.

Binary Heap is a complete binary tree(i.e., totally filled other than the rightmost elements on the last level) where each node is either smaller or larger than its children. In min-heap, the root is the smallest and in max-heap root node is largest.

2 Key log n operations of min-heap:

  • Insert: Insert at the bottom. Insert at the rightmost spot to maintain the complete tree property. Then fix the tree property by swapping the new element with its parent, until we find an appropriate spot for the element. Bubble up the minimum.
  • Extract Min: For min-heap, minimum sits at the top. The problem is how to remove it. Remove the root and swap it to the last element in the heap. Then we bubble down this element, swapping it with one of its children until the min-heap property is restored.

So, in both operations, we first maintain the complete binary tree property and then Heapify (i.e. either bubble up or down).

Heap can be implemented by using either an Array or Binary Tree. Java PriorityQueue is an implementation of Heap.

One of the classic problems is finding the top k elements. Below is the implementation in Java. To find top k, we are using min-heap i.e kth element will be sitting at the root.

The MOST confusing aspect of Priority Queue is — which implementation (i.e. min or max) to be used for a given problem. The trick, I use to resolve this confusion is to remember that the poll()/remove() method removes the head or the first element sitting at the head of the priority queue. Just check, if you need to keep minimum or maximum element? If you want to retain the minimum value then use max-Heap.

Tries (Prefix Tree) is a variant of the n-ary tree in which characters are stored at each node. Each path down the tree may represent a word. Trie is an infix of the word reTRIEval and pronounced as TREE but to distinguish it from the tree some people also pronounce it as TRY. As the name suggests, it is an efficient retrieval or searching data structure.

Very commonly a trie is used to store the entire ENGLISH language for quick prefix lookup. Each node consists of at max 26 children and edges connect each parent node to its children. These 26 pointers are nothing but pointers for each of the 26 letters of the English alphabet. Strings are stored in a top to bottom manner on the basis of their prefix in a trie.

While implementing Trie be careful to not put any character at the root node. Implement add/insert and search method in such a way that it starts one level below the root level.

HashMap/HashTable can quickly look up whether a string is a valid word, but it can’t tell us if a string is the prefix of any valid word. A trie can do this very quickly in O(K) time, where K is the length of the string. This is actually the same RUNTIME as hashtable will take.

We often refer to hash table lookups being O(1) time, but if you look closely just like Trie it also need to go through each character.

My final thought about all these data structures is that, be a bit careful and seek clarification when necessary as Graph problems are confusing and the possibility of the missing corner or edge cases is high.

Happy interview!

Thank you for reading! If you enjoyed it, please clap 👏 for it.

Distributed System Engineer| ML/DL Enthusiast | Java, Python, and Go Programmer