Introduction to Algorithms I. All meterials credit to Princeton.
Course Introduction
Algorithms + Data Structures = Programs.
Union-Find
< Dynamic Connectivity
- Connections are reflexible(p->p), symmetric(p->q, q->p) and transitive.
- Connected components: Maximal sets of objects that are matually connected.
< Quick-Find
- Data structure: Integer array id[]. p and q are connected iff(if and only if) they have the same id.
- Find: check if id[p] == id[q].
- Union: merge components containing p and q, namely change all entries whose id equals id[p] to id[q].
- Implementation & Analysis: Quick-Find is a slow one for UF because the Union operation is O(N). Considering another O(N) for initialization, in total it takes \(N^2\)(quandratic) array accesses to process a sequence of N union commands on N objects.12345678910111213141516171819202122232425// Eager approach for Union-Findpublic class QuickFindUF {private int[] id;public QuickFindUf(int len) {id = new int[len];for (int i = 0; i < len; i++) {id[i] = i;}}public boolean isConnected(int p, int q) {return id[p] == id[q];}public void union(int p, int q) {int pid = id[p];int qid = id[q];int len = id.length;for (int i = 0; i < len; i++) {if (id[i] == pid) {id[i] = qid;}}}}
< Quick-Union
- Data structure: Integer array id[], but id[i] refers to the parent of i.
- Find: check if p and q have the same root.
- Union: merge components containing p and q, namely set the id of p’s root to the id of q’s root.
- Implementation & Analysis: Quick-Union is still a slow one for UF. Since the tree might get very tall, Find can be really expensive and the Union will reach O(N) in the worst case because it includes Find operation.123456789101112131415161718192021222324// lazy approachpublic class QuickUnionUF {private int[] id;public QuickUnionUF(int len) {id = new int[len];for (int i = 0; i < len; i++) {id[i] = i;}}private int getRoot(int i) {while (i != id[i]) {i = id[i];}return i;}public boolean isConnected(int p, int q) {return getRoot(p) == getRoot(q);}public void union(int p, int q) {int i = root(p);int j = root(q);id[i] = j;}}
< Improvements
Weighted Quick-Union: In Quick-Union, to avoid tall trees, we can track the size of each tree and always link the root of the smaller tree to root of the larger one.
-> data structure: Same as Quick-Union, an array storing the parent. Need additional int array sz[] to store the number of objects in each tree.
-> Find: Same as Quick-Union, checking if the two items have the same root.
-> Union: Need to compare the size of the two trees before merging to make sure the smaller one is linked to the larger one. Don’t forget to update the sz[] afterwards.
-> Implementation & Analysis: Tree is balanced by introducing sz[], and the Find takes time proportional to depth (at most \(lgN\) for N objects) of p and q, and Union takes constant time for given roots.1234567891011121314151617181920212223242526272829303132333435public class WeightedQuickUnionUF {private int[] id;private int[] sz;public WeightedQuickUnionUF(int len) {id = new int[len];sz = new int[len];for (int i = 0; i < len; i++) {id[i] = i;sz[i] = 1; // only itself in its tree}}private int getRoot(int i) {while (i != id[i]) {i = id[i];}return i;}public boolean isConnected(int p, int q) {return root(p) == root(q);}public void union(int p, int q) {int i = getRoot(p);int j = getRoot(q);if (i == j) {return;}if (sz[i] < sz[j]) {id[i] = j;sz[j] += sz[i];} else {id[j] = i;sz[i] += sz[j];}}}-> Provement for proposition “Depth of any node x is at most lgN“:
The depth of x increases by 1 when its tree T1 is merged to another larger tree T2. Since the size of T2 is larger than T1, the size of T1 at least doubles. The fact is that the tree containing x with size N can double at most \(lgN\) times. We can assume each time the tree containing x is merged, the newly-formed tree gets a doubled size N. So divide the size N by 2 again and again, we can trace back how many times was x added to this tree, which is exactly x’s depth, and obviously it can only be divided at most for \(lgN\) times.- Path Compression: Flat the tree simply by adding one line of code to Weighted Quick-Union! Always update the direct parent in the query process.123456789101112131415161718192021public class WQUPC {private int id[];private int sz[];public WQUPC(int len) {id = new int[];sz = new int[];for (int i = 0; i < len; i++) {id[i] = i;sz[i] = 1;}}public int getRoot(int i) {while (i != id[i]) {id[i] = id[id[i]]; // assign current node to its parent's parenti = id[i]; // If i is not updated, it is id[id[i]].}return i;}public boolean isConnected(int p, int q) {...}public int union(int p, int q) {...}}
Analysis of Algorithms
< Introduction
- Reason to analysis: predict performance, compare algorithms, provide guarantees and understand theoretical basis. More practically, avoid performance bugs.
- Intrisic estimate: lg(Thousand) is about 10. lg(Million) is about 20. lg(Billion) is about 30.
< Observations
- Timing a program:
-> Measuring the running time: manually or use code to automatically trace the elapsed executing time.
-> Empirical analysis: Use existing data of running time to estimate the unknown timing with different input scale. Usually we use Data analysis to find the regular between running time T(N) and input size N, which is assumed to be linear in the log-log plot. That is, \(lg(T(N)) = b*lgN + c\). Particularly, the ratio of \(lg(T(N))\) and \(lgN\)(slope in the log-log plot) can be estimated by take the lg value of the ratio of T(2t) and T(t).
< Mathematical models
We can simply list all the operations in the program and their corresponding time as a function of input N in a table. But we have more choices to simplify the calculations. For example, we can only care about the most expensive part.
- Cost model: Select the most time-consuming operation to estimate. Among the
N + 2
,N(N - 1)
,1/2(N + 1)(N + 2)
, the cost model is the largest oneN(N - 1)
. - Tilde notation: Ignore the lower order terms.
N + 2
\(\rightarrow\) ~N,N(N - 1)
\(\rightarrow\) ~\(N^2\),1/2(N + 1)(N + 2)
\(\rightarrow\) ~\(\frac{1}{2}N^2\).
< Order-of-growth classifications
The order-of-growth classifications can be represented by a small set of functions:
1, \(logN\), \(N\), \(NlgN\), \(N^2\), \(N^3\) and \(2^N\), whose log-log plot looks like this:
Take a look at binary search. The Princeton version uses at most \(1 + lgN\) compares to search in a sorted array of size N.
< Theory of algorithms
- Big Theta: Used to classify algorithms, take the exact order of growth. (term with maximum exponent)
- Big Oh: Used to develop the upper bounds, where the actual cost should be smaller or equal to the term.
- Big Omega: Used to develop the lower bounds, where the actual cost should be larger or equal to the term.
- Tilde: Provide approximate model. Many people mistakenly take Big O() as approximate model.
Bags, Stacks and Queues
< Separate interface and implementation
- Client: program using operations defined in interface;
- Interface: description of data type, basic operations;
- Implementation: actual code making operations happen;
- Deciplined design: create modular, reusable libraries.
- Performance: try to optimize the implementations.
- Prefer compile-time error and avoid run-time error, so that we can have some confidence that it will work for any client.
< Stack
- LIFO: Last in first out.
Implementation Detail:
-> Underflow: pop from an empty stack. Throw exception.
-> Overflow: resize array in array version stack.
-> Null items: Insertion of null items allowed?
-> Loitering: Holding a reference to an object when it is no longer needed?12345678public String pop() {return s[--N]; // the poped item still exists.}public String pop() {String item = s[--N];s[N] = null; // Avoid Loitering: garbage collector will do the restreturn itme;}Resizing array version stack: In array implementation, every grow/shrink need to copy some existing items to new places. Too expensive!
-> Grow: Double the size when full.
-> Shrink: halve the size when 1/4 full. (1/2 would be too frequent when pop-push flipping)
< Queue
- FIFO: First in first out.
Elementery Sorts
< Comparable
- total order
-> Antisymmetry: if \(v \le w\) and \(w \le v\), then \(v = w\).
-> Transitivity: if \(v \le w\) and \(w \le x\), then \(v \le x\).
-> Totality: either \(v \le w\) or \(w \le v\) or both. - Callback: reference to executable code.
-> Java: Interfaces;
-> C++: class-type functors(伪函数,即);
-> C: function pointers;
-> C#: delegates;
-> Python, JavaScript: first-class functions. - User-defined comparable types: Implement the Comparable interface for sort123456789101112131415161718public class Date implements Comparable<Date> {private final int month, day, year;public Date(int m, int d, int y) { // Constructormonth = m;day = d;year = y;}public int compareTo(Date that) { // return - / 0 / +if (this.year < that.year) return -1;if (this.year > that.year) return +1;if (this.month < that.month) return -1;if (this.month > that.month) return +1;if (this.day < that.day) return -1;if (this.day > that.day) return +1;return 0;}}
< Selection Sort
- Idea: Scan from starting index i to right and select the index j of minimum value. Then swap(a, i, j).1234567891011private void selectionSort(Comparable[] a) {for (int i = 0; i < a.length; i++) {int index = i;for (int j = i + 1; j < a.length; j++) {if (less(a[j], a[index])) {index = j;}}swap(a, i, index);}}
< Insertion Sort
- Idea: Swap each larger item to a[i]’s left where a[0 ~ a-1] is already sorted.1234567private void insertionSort(Comparable[] a) {for (int i = 0; i < a.length; i++) {for (j = i; j > 0 && less(a[j], a[j - 1]); j--) {swap(a, j, j - 1);}}}
< Shell Sort
- Idea: insertion sort with decreasing increment.1234567891011121314private void shellSort(Comparable[] a) {int h = 1;while (h < a.length / 3) {h = 3*h + 1;}while (h >= 1) {for (int i = h; i < a.length; i++) {for (int j = i; j >= h && less(a, j, j - h); j -= h) {swap(a, j, j - h);}}h = h / 3;}}
< Bubble Sort
- Idea: Swap smaller value from the tail to the front at finished index i.123456789private void bubbleSort(Comparable[] a) {for (int i = 0; i < a.length; i++) {for (j = a.length - 1; j > i; j--) {if (less(a[j], a[j - 1])) {swap(a, j, j - 1);}}}}
Integer Sort
These methods can only be applied to integer collections or objects that can be mapped to integers.
< Bucket Sort
- Idea: Mark the items in original collection in the auxillary bucket. Then output the indices that have marks in order.12345678910111213private void bucketSort(int[] a) {int bucket = new int[999999];for (int i = 0; i < a.length; i++) {bucket[a[i]] ++;}int j = 0, k = 0;while (j < a.length && k < bucket.length) {while ((bucket[k]--) > 0) {a[j++] = k;}k++;}}
< Radix Sort
- Idea: Sort with each digit, from unit digit to tens, to hundreds, etc.1234567891011121314151617181920212223242526private void radixSort(int[] A, int radix, int maxLength) {int[] temp = new int[A.length];int[] bucket = new int[radix];for (int i = 0, devide = 1; i < maxLength; i++) {Arrays.fill(bucket, 0);System.arraycopy(A, 0, temp, 0, A.length);for (int j = 0; j < A.length; j++) {int subKey = (temp[j] / devide) % radix;bucket[subKey]++;}for (int j = 1; j < radix; j++) {bucket[j] = bucket[j] + bucket[j - 1];}for (int k = A.length - 1; k >= 0; k--) {int subKey = (temp[k] / devide) % radix;bucket[subKey]--;A[bucket[subKey]] = temp[k];}devide *= radix;}}
Merge Sort
< Basic
- Idea: devide and conquer. Let the ordered subsequence merge to the whole ordered sequence123456789101112131415161718192021222324252627282930313233343536373839private void mergeSort(Comparable[] A, int start, int end) {if (start >= end)return;int mid = (start + end) / 2;mergeSort(A, start, mid);mergeSort(A, mid + 1, end);merge(A, start, mid, end);}private void merge(Comparable[] A, int start, int mid, int end) {int left = start;int right = mid + 1;int index = start;Comparable[] temp = new Comparable [A.length];// the first half of A is in order. So as the second half.while (left <= mid && right <= end) {if (less(A[left], A[right])) {temp[index++] = A[left++];} else {temp[index++] = A[right++];}}// Collect the rest part of A into temp. Only one of the whiles can execute.while (left <= mid) {temp[index++] = A[left++];}while (right <= end) {temp[index++] = A[right++];}// copy the merged sequence in temp back to Afor (index = start; index <= end; index++) {A[index] = temp[index];}}
< Improvement
- Insertion sort: When subarray is of small size, use insertion to get them sorted.
- Check before merge: When a[mid] < a[mid + 1], the two subarray is already in order, so no need to merge.12345678910111213141516private void mergeSort(Comparable[] A, int start, int end) {if (start >= end)return;if (end <= start + CUTOFF - 1) { // CUTOFF MIGHT BE 7insertSort(A, start, end);return;}int mid = (start + end) / 2;mergeSort(A, start, mid);mergeSort(A, mid + 1, end);if (!less(A[mid+1], A[mid])) // Subarray already in orderreturn;merge(A, start, mid, end);}
- Switch a[] and aux[]: Elliminate the time to copy from a[] to aux[] by switching the sequence of parameters passing into merge() and sort() each time.
< Comparator
- Comparable and Comparator interface: the former is about sorting using a type’s natural order, and the latter is about sorting using an alternative order. Comparator类似一个外部的比较器,如果不指定,则会调用该类内部的comparable来比较。12345678910111213141516171819202122public class Point2D {public final Comparator<Point2D> POLAR_ORDER = new PolarOrder();private final double x, y;...private static int ccw(Point2D a, Point2D b, Point2D c) {double area2 = (b.x-a.x)*(c.y-a.y) - (b.y-a.y)*(c.x-a.x);if (area2 < 0) return -1; // clockwiseelse if (area2 > 0) return +1; // counter-clockwiseelse return 0; // collinear}private class PolarOrder implements Comparator<Point2D> {public int compare(Point2D q1, Point2D q2) {double dy1 = q1.y - y;double dy2 = q2.y - y;if (dy1 == 0 && dy2 == 0) { ... }else if (dy1 >= 0 && dy2 < 0) return -1;else if (dy2 >= 0 && dy1 < 0) return +1;else return -ccw(Point2D.this, q1, q2);}}}
< Stablity
- Maintain the original order for equal value.
- Useless? No! Think about firstly sort by field A and then B, how to guarantee the correct order that is already set in field A when sorting two equal B value? Only stable algorithm can.
- Stable: Insertion, Merge
- Instable: Selection, Shell, Quick
Quick Sort
< Basic
- Idea: Move the radix to somewhere so that the left-hand sides are all smaller than it and the right-hand sides are greater.12345678910111213141516171819202122private void quickSort(Comparable[] A, int start, int end) {if (start >= end)return; // recursion ending condition.int left = start, right = end;Comparable pivot = A[(start + end)/2]; // take random item as pivot. Here it is the middle.while (left <= right) {while (left <= right && less(A[left], pivot))left++; // from left to right to find the first item not smaller than pivot.while (left <= right && greater(A[right], pivot))right--; // from right to left to find the first item not greater than pivot.if (left <= right) {swap(A, left, right); // let the greater-than-pivot item swap with the smaller oneleft++;right--; // continue to}}quickSort(A, start, right);quickSort(A, left, end);}
< Improvement
- Insertion sort: When subarray is of small size, use insertion to get them sorted.
- Median as pivot: Take sample 3 items to estimate the median of the list, and swap the median to the front as pivot.
< 3-way QuickSort
- Idea: Since quickSort goes quadratic unless partitioning stops on equal keys, in equal-keys-prevail array, put all items equal to the pivot in place by adding lt and gt.
< Usage
- Quick-Selection: Select the kth max/min item.
Priority Queues
< Binary heaps
- binary tree: empty or node with links to left and right binary trees.
- complete tree: prefectly balanced, except for the bottom level.
- height of complete tree: \(\lfloor lgN \rfloor\) with \(N\) nodes.(???So one-node tree means 0?)
- binary heap: each parent has a larger key than its children.12345678910111213141516171819private void createMaxHeap(Comparable[] A, int end) {for (int i = (end - 1) / 2; i >= 0; i--) { // Start from the father of the very last childint fatherIndex = i;while (2 * fatherIndex + 1 <= end) { // If the node has at least a left childint biggerIndex = 2 * fatherIndex + 1;if (biggerIndex < end) { // If it also has a right childif (less(A[biggerIndex], A[biggerIndex + 1]))biggerIndex ++; // Select the bigger one between left & right}if (less(A[fatherIndex], A[biggerIndex])) { // If the father is smaller than the bigger childswap(A, fatherIndex, biggerIndex);fatherIndex = biggerIndex; // Continue to compare to the childen after swap} else {break; // father is already the greatest}}}}
< HeapSort
- Idea: Take the root of binary heap and rebuild the heap.12345678private void maxHeapSort(Comparable[] A) {int size = A.length;for (int i = 0; i < size; i++) {createMaxHeap(A, size - 1 - i);swap(A, 0, size - 1 - i); // put the max root at the end}}private void createMaxHeap(Comparable[] A, int end) {...}
Binary Search Trees(BSTs)
< Basic
- Definition: A BST is a binary tree with an order where each node’s key is larger than all keys in its left subtrees and smaller than all keys in its right subtree.
- Node: 4 fields - key, value, reference to left and right subtree.
- Improvement: Add one more field “int count” to track the size of each subtree at the corresponding root node. Help with rank() and select();
< Operations
Floor: get the biggest key that is smaller than the given key.
123456789101112131415161718192021222324252627public Key floor(Key key) {Node x = floor(root, key);if (x == null) {return null;}return x.key;}private Node floor(Node x, Key key) {if (x == null) {return null;}int cmp = key.compareTo(x.key);if (cmp == 0) {return x;} else if (cmp < 0) {return floor(x.left, key);} else {Node t = floor(x.right, key);if (t != null) {return t;} else {return x;}}}Ceiling: get the smallest key that is bigger than the given key.
Rank: The number of keys that are smaller than the given one.
12345678910111213private int rank(Key key, Node x) {if (x == null) {return 0;}int cmp = key.compareTo(x.key);if (cmp < 0) {return rank(key, x.left);} else if (cmp > 0) {return 1 + size(x.left) + rank(key, x.right);} else {return size(x.left)}}In-order traversal: Traverse the left subtree before output the current root value; tranverse right subtree after root output.
< Deletion
- Lazy version: Simply set the value of that node to null. But unsatisfying for memory overloaded.
DeleteMin: Go left all the way until a node with null left link, and replace that node with its right link. Update the count.
12345678private Node deleteMin(Node x) {if (x.left == null) {return x.right;}x.left = deleteMin(x.left);x.count = 1 + size(x.left) + size(x.right);return x;}Hibbard deletion:
-> [0 children]: Set its parent’s corresponding link to null.
-> [1 child]: Set its parent’s corresponding link to its only child.
-> [2 children]: Find the key and put the minimum key in its right subtree on its spot.123456789101112131415161718192021222324252627public void delete(Key key) {root = delete(root, key);}private Node delete(Node x, Key key) { // Recursive code!!!if (x == null) {return null;}int cmp = key.compareTo(x.key);if (cmp < 0) {x.left = delete(x.left, key);} else if (cmp > 0) {x.right = delete(x.right, key);} else {if (x.right == null) {return x.left;} else if (x.left == null) {return x.right;} else {Node t = x;x = min(t.right);x.right = deleteMin(t.right);x.left = t.left;}}x.count = 1 + size(x.left) + size(x.right);return x;}
Balanced Search Trees
< 2-3 search trees
- Variable number of keys per node
-> 2-node: one key, two children. left < key < right.
-> 3-node: two keys, three children. left < key1 < mid < key2 < right. - Insertion at the bottom:
-> 2-node: Simply add the key into it and yields a 3-node.
-> 3-node: Add the key and made it temporary 4-node, and then split it by passing the middle key into the parent node, and continue to pass up if necessary. - Perfect balance: Every path from root to null link has the same length.
< red-black BSTs
Idea: represent 2-3 search trees as a BST with two types of links. Null links are black.
123456789private static final boolean RED = true;private static final boolean BLACK = false;private class Node {Key key;Value val;Node left, right;boolean color;}Left-leaning: Force red link leaning to the left as valid glue of a 3-node.
- Rule:
-> No node has two red links connected(which means it’s a 4-node).
-> Every path from root to null link has the same number of black links.
-> Red links leans left. Search: Implemenation is the same but runs faster than BSTs because of better balance.
1234567891011121314public Val get(Key key) {Node x = root;while (x != null) {int cmp = key.compareTo(x.key);if (cmp < 0) {x = x.left;} else if (cmp > 0) {x = x.right;} else {return x.val;}}return null;}Elementary Operations
-> Rotate Left: When right leaning red links occurs, rotate it left.123456789private Node rotateLeft(Node h) {assert isRed(h.right);Node x = h.right;h.right = x.left;x.left = h;x.color = h.color;h.color = RED;return x;}-> Rotate Right: When a node has consecutive red links, rotate the upper red link right.
123456789private Node rotateRight(Node h) {assert isRed(h.left);Node x = h.left;h.left = x.right;x.right = h;x.color = h.color;h.color = RED;return x;}-> Color flip: When two red links connected to same parent node, flip the color to split this 4-node.
12345678private void flipColors(Node h) {assert !isRed(h);assert isRed(h.left);assert isRed(h.right);h.color = RED;h.left.color = BLACK;h.right.color = BLACK;}Insertion at the bottom:
-> 2-node: Color new link red and rotate if right leaning.
-> 3-node: Color new link red and rotate and/or flip along the way to the root to keep balance. Pay attention to the order to cope with 3 kinds of situation that break the balance.1234567891011121314151617181920212223private Node put(Node h, Key key, Value val) {if (h == null) {return new Node(key, val, RED);}int cmp = key.compareTo(h.key);if (cmp < 0) {h.left = put(h.left, key, val);} else if (cmp > 0) {h.right = put(h.right, key, val);} else {h.val = val;if (isRed(h.right) && !isRed(h.left)) {h = rotateLeft(h);}// NOT elseif !!!if (isRed(h.left) && isRed(h.left.left)) {h = rotateRight(h);}if (isRed(h.left) && isRed(h.right)) {flipColors(h);}}
Geometric Search
< 1D range search
- Range search: Find all keys between k1 and k2.
- Range count: Number of keys between k1 and k2.
BST RC: Simply use rank to get the range count.
1234567public int size(Key lo, Key hi) {if (contains(hi)) {return rank(hi) - rank(lo) + 1;} else {return rank(hi)}}BST RS:
-> Recusively find all keys in left subtree;
-> Check key in current node;
-> Recusively find all keys in right subtree;< 2D orthogonal range search
- Range search: Find all keys lying in a 2D range.
- Range count: Number of keys lying in a 2D range.
- Data structure: BST using x and y coordinates alternatively.
- 2D tree RS:
-> Check if current node is in range(ractangle);
-> Recusively check all keys in left/bottom subtrees;
-> Recusively check all keys in right/top subtrees. - Nearest neighbour search:
-> Check distance from current point to the query point;
-> Recusively check left/bottom;
-> Recusively check right/top. When getting closer to the query point, some search is pruned.< Interval search trees
- BST with node storing interval. Use left endpoint as key and store max endpoint of each subtree at root.
- Insertion: Use left endpoint as key to insert and update max endpoint along the search path.
- Search:
-> If interval in node intersects query interval, return it.
-> Else if left subtree is null, go right.
-> Else if max endpoint in left subtree is less than lo, go right.
-> Else go left.123456789101112131415private Interval get(Key lo, Key hi) {Node x = root;while (x != null) {if (x.interval.intersects(lo, hi)) {return x.interval;} else if (x.left == null) {x = x.right;} else if (x.left.max < lo) {x = x.right;} else {x = x.left;}}return null;}
< Rectangle intersection
- Find all intersections among a set of N orthogonal rectangles.
- Sweep vertical line method: Triggered by endpoints’ x-coordinates, maintain the y-intervals in BST; Left endpoints trigger interval search for that y-interval and insert, while right endpoints trigger removal of that interval.
Hash Tables
< Hash function
- Idea: Save items in key-index table, where index is a hash function of key.
Hash in user-defined types: make use of prime numbers and built-in hashCode() for Java classes.
1234567891011121314public final class Transaction implements comparable<Transaction> {private final String who;private final Date when;private final double amount;...public int hashCode() {int hash = 17;hash = 31*hash + who.hashCode();hash = 31*hash + when.hashCode();hash = 31*hash + ((Double) amount).hashCode();return hash;}}Collision: Two distinct keys hashing to same index.
- Separate chaining: maintain a chain for the collided keys.
- Linear probing: find next empty slot.