Connections are reflexible(p->p), symmetric(p->q, q->p) and transitive.
Connected components: Maximal sets of objects that are matually connected.
< Quick-Find
Data structure: Integer array id[]. p and q are connected iff(if and only if) they have the same id.
Find: check if id[p] == id[q].
Union: merge components containing p and q, namely change all entries whose id equals id[p] to id[q].
Implementation & Analysis: Quick-Find is a slow one for UF because the Union operation is O(N). Considering another O(N) for initialization, in total it takes \(N^2\)(quandratic) array accesses to process a sequence of N union commands on N objects.
// Eager approach for Union-Find publicclassQuickFindUF { privateint[] id; publicQuickFindUf(int len) { id = newint[len]; for (inti=0; i < len; i++) { id[i] = i; } }
publicbooleanisConnected(int p, int q) { return id[p] == id[q]; }
publicvoidunion(int p, int q) { intpid= id[p]; intqid= id[q]; intlen= id.length; for (inti=0; i < len; i++) { if (id[i] == pid) { id[i] = qid; } } } }
< Quick-Union
Data structure: Integer array id[], but id[i] refers to the parent of i.
Find: check if p and q have the same root.
Union: merge components containing p and q, namely set the id of p’s root to the id of q’s root.
Implementation & Analysis: Quick-Union is still a slow one for UF. Since the tree might get very tall, Find can be really expensive and the Union will reach O(N) in the worst case because it includes Find operation.
// lazy approach publicclassQuickUnionUF { privateint[] id; publicQuickUnionUF(int len) { id = newint[len]; for (inti=0; i < len; i++) { id[i] = i; } } privateintgetRoot(int i) { while (i != id[i]) { i = id[i]; } return i; } publicbooleanisConnected(int p, int q) { return getRoot(p) == getRoot(q); } publicvoidunion(int p, int q) { inti= root(p); intj= root(q); id[i] = j; } }
< Improvements
Weighted Quick-Union: In Quick-Union, to avoid tall trees, we can track the size of each tree and always link the root of the smaller tree to root of the larger one. -> data structure: Same as Quick-Union, an array storing the parent. Need additional int array sz[] to store the number of objects in each tree. -> Find: Same as Quick-Union, checking if the two items have the same root. -> Union: Need to compare the size of the two trees before merging to make sure the smaller one is linked to the larger one. Don’t forget to update the sz[] afterwards. -> Implementation & Analysis: Tree is balanced by introducing sz[], and the Find takes time proportional to depth (at most \(lgN\) for N objects) of p and q, and Union takes constant time for given roots.
publicclassWeightedQuickUnionUF { privateint[] id; privateint[] sz; publicWeightedQuickUnionUF(int len) { id = newint[len]; sz = newint[len]; for (inti=0; i < len; i++) { id[i] = i; sz[i] = 1; // only itself in its tree } } privateintgetRoot(int i) { while (i != id[i]) { i = id[i]; } return i; } publicbooleanisConnected(int p, int q) { return root(p) == root(q); } publicvoidunion(int p, int q) { inti= getRoot(p); intj= getRoot(q); if (i == j) { return; } if (sz[i] < sz[j]) { id[i] = j; sz[j] += sz[i]; } else { id[j] = i; sz[i] += sz[j]; } } }
-> Provement for proposition “Depth of any node x is at most lgN“: The depth of x increases by 1 when its tree T1 is merged to another larger tree T2. Since the size of T2 is larger than T1, the size of T1 at least doubles. The fact is that the tree containing x with size N can double at most \(lgN\) times. We can assume each time the tree containing x is merged, the newly-formed tree gets a doubled size N. So divide the size N by 2 again and again, we can trace back how many times was x added to this tree, which is exactly x’s depth, and obviously it can only be divided at most for \(lgN\) times.
Path Compression: Flat the tree simply by adding one line of code to Weighted Quick-Union! Always update the direct parent in the query process.
publicclassWQUPC { privateint id[]; privateint sz[]; publicWQUPC(int len) { id = newint[]; sz = newint[]; for (inti=0; i < len; i++) { id[i] = i; sz[i] = 1; } } publicintgetRoot(int i) { while (i != id[i]) { id[i] = id[id[i]]; // assign current node to its parent's parent i = id[i]; // If i is not updated, it is id[id[i]]. } return i; } publicbooleanisConnected(int p, int q) {...} publicintunion(int p, int q) {...} }
Analysis of Algorithms
< Introduction
Reason to analysis: predict performance, compare algorithms, provide guarantees and understand theoretical basis. More practically, avoid performance bugs.
Intrisic estimate: lg(Thousand) is about 10. lg(Million) is about 20. lg(Billion) is about 30.
< Observations
Timing a program: -> Measuring the running time: manually or use code to automatically trace the elapsed executing time. -> Empirical analysis: Use existing data of running time to estimate the unknown timing with different input scale. Usually we use Data analysis to find the regular between running time T(N) and input size N, which is assumed to be linear in the log-log plot. That is, \(lg(T(N)) = b*lgN + c\). Particularly, the ratio of \(lg(T(N))\) and \(lgN\)(slope in the log-log plot) can be estimated by take the lg value of the ratio of T(2t) and T(t).
< Mathematical models
We can simply list all the operations in the program and their corresponding time as a function of input N in a table. But we have more choices to simplify the calculations. For example, we can only care about the most expensive part.
Cost model: Select the most time-consuming operation to estimate. Among the N + 2, N(N - 1), 1/2(N + 1)(N + 2), the cost model is the largest one N(N - 1).
Tilde notation: Ignore the lower order terms. N + 2\(\rightarrow\) ~N, N(N - 1)\(\rightarrow\) ~\(N^2\), 1/2(N + 1)(N + 2)\(\rightarrow\) ~\(\frac{1}{2}N^2\).
< Order-of-growth classifications
The order-of-growth classifications can be represented by a small set of functions: 1, \(logN\), \(N\), \(NlgN\), \(N^2\), \(N^3\) and \(2^N\), whose log-log plot looks like this: Take a look at binary search. The Princeton version uses at most \(1 + lgN\) compares to search in a sorted array of size N.
// Princeton version publicstaticintbinarySearch(int[] a, int key) { intlo=0, hi = a.length - 1; while (lo <= hi) { intmid= lo + (hi - lo) / 2; if (key < a[mid]) { hi = mid - 1; } elseif (key > a[mid]) { lo = mid + 1; } else { return mid; } } return -1; }
// My version pulbic staticintbinarySearch2(int[] a, int key) { intleft=0, right = a.length - 1; while (left < right) { intmid= (left + right) / 2; if (key > a[mid]) { left = mid + 1; } else { right = mid; } }
if (a[left] == key) { return left } else { return -1; } }
< Theory of algorithms
Big Theta: Used to classify algorithms, take the exact order of growth. (term with maximum exponent)
Big Oh: Used to develop the upper bounds, where the actual cost should be smaller or equal to the term.
Big Omega: Used to develop the lower bounds, where the actual cost should be larger or equal to the term.
Tilde: Provide approximate model. Many people mistakenly take Big O() as approximate model.
Bags, Stacks and Queues
< Separate interface and implementation
Client: program using operations defined in interface;
Interface: description of data type, basic operations;
Implementation: actual code making operations happen;
Prefer compile-time error and avoid run-time error, so that we can have some confidence that it will work for any client.
< Stack
LIFO: Last in first out.
Implementation Detail: -> Underflow: pop from an empty stack. Throw exception. -> Overflow: resize array in array version stack. -> Null items: Insertion of null items allowed? -> Loitering: Holding a reference to an object when it is no longer needed?
1 2 3 4 5 6 7 8
public String pop() { return s[--N]; // the poped item still exists. } public String pop() { Stringitem= s[--N]; s[N] = null; // Avoid Loitering: garbage collector will do the rest return itme; }
Resizing array version stack: In array implementation, every grow/shrink need to copy some existing items to new places. Too expensive! -> Grow: Double the size when full. -> Shrink: halve the size when 1/4 full. (1/2 would be too frequent when pop-push flipping)
< Queue
FIFO: First in first out.
Elementery Sorts
< Comparable
total order -> Antisymmetry: if \(v \le w\) and \(w \le v\), then \(v = w\). -> Transitivity: if \(v \le w\) and \(w \le x\), then \(v \le x\). -> Totality: either \(v \le w\) or \(w \le v\) or both.
User-defined comparable types: Implement the Comparable interface for sort
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
publicclassDateimplementsComparable<Date> { privatefinalint month, day, year; publicDate(int m, int d, int y) { // Constructor month = m; day = d; year = y; }
publicintcompareTo(Date that) { // return - / 0 / + if (this.year < that.year) return -1; if (this.year > that.year) return +1; if (this.month < that.month) return -1; if (this.month > that.month) return +1; if (this.day < that.day) return -1; if (this.day > that.day) return +1; return0; } }
< Selection Sort
Idea: Scan from starting index i to right and select the index j of minimum value. Then swap(a, i, j).
1 2 3 4 5 6 7 8 9 10 11
privatevoidselectionSort(Comparable[] a) { for (inti=0; i < a.length; i++) { intindex= i; for (intj= i + 1; j < a.length; j++) { if (less(a[j], a[index])) { index = j; } } swap(a, i, index); } }
< Insertion Sort
Idea: Swap each larger item to a[i]’s left where a[0 ~ a-1] is already sorted.
1 2 3 4 5 6 7
privatevoidinsertionSort(Comparable[] a) { for (inti=0; i < a.length; i++) { for (j = i; j > 0 && less(a[j], a[j - 1]); j--) { swap(a, j, j - 1); } } }
< Shell Sort
Idea: insertion sort with decreasing increment.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
privatevoidshellSort(Comparable[] a) { inth=1; while (h < a.length / 3) { h = 3*h + 1; } while (h >= 1) { for (inti= h; i < a.length; i++) { for (intj= i; j >= h && less(a, j, j - h); j -= h) { swap(a, j, j - h); } } h = h / 3; } }
< Bubble Sort
Idea: Swap smaller value from the tail to the front at finished index i.
1 2 3 4 5 6 7 8 9
privatevoidbubbleSort(Comparable[] a) { for (inti=0; i < a.length; i++) { for (j = a.length - 1; j > i; j--) { if (less(a[j], a[j - 1])) { swap(a, j, j - 1); } } } }
Integer Sort
These methods can only be applied to integer collections or objects that can be mapped to integers.
< Bucket Sort
Idea: Mark the items in original collection in the auxillary bucket. Then output the indices that have marks in order.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
privatevoidbucketSort(int[] a) { intbucket=newint[999999]; for (inti=0; i < a.length; i++) { bucket[a[i]] ++; } intj=0, k = 0; while (j < a.length && k < bucket.length) { while ((bucket[k]--) > 0) { a[j++] = k; } k++; } }
< Radix Sort
Idea: Sort with each digit, from unit digit to tens, to hundreds, etc.
privatevoidmergeSort(Comparable[] A, int start, int end) { if (start >= end) return; intmid= (start + end) / 2; mergeSort(A, start, mid); mergeSort(A, mid + 1, end); merge(A, start, mid, end); }
privatevoidmerge(Comparable[] A, int start, int mid, int end) { intleft= start; intright= mid + 1; intindex= start; Comparable[] temp = newComparable [A.length]; // the first half of A is in order. So as the second half. while (left <= mid && right <= end) { if (less(A[left], A[right])) { temp[index++] = A[left++]; } else { temp[index++] = A[right++]; } } // Collect the rest part of A into temp. Only one of the whiles can execute. while (left <= mid) { temp[index++] = A[left++]; } while (right <= end) { temp[index++] = A[right++]; } // copy the merged sequence in temp back to A for (index = start; index <= end; index++) { A[index] = temp[index]; } }
< Improvement
Insertion sort: When subarray is of small size, use insertion to get them sorted.
Check before merge: When a[mid] < a[mid + 1], the two subarray is already in order, so no need to merge.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
privatevoidmergeSort(Comparable[] A, int start, int end) { if (start >= end) return; if (end <= start + CUTOFF - 1) { // CUTOFF MIGHT BE 7 insertSort(A, start, end); return; }
intmid= (start + end) / 2; mergeSort(A, start, mid); mergeSort(A, mid + 1, end); if (!less(A[mid+1], A[mid])) // Subarray already in order return; merge(A, start, mid, end); }
Switch a[] and aux[]: Elliminate the time to copy from a[] to aux[] by switching the sequence of parameters passing into merge() and sort() each time.
< Comparator
Comparable and Comparator interface: the former is about sorting using a type’s natural order, and the latter is about sorting using an alternative order. Comparator类似一个外部的比较器,如果不指定,则会调用该类内部的comparable来比较。
Useless? No! Think about firstly sort by field A and then B, how to guarantee the correct order that is already set in field A when sorting two equal B value? Only stable algorithm can.
Stable: Insertion, Merge
Instable: Selection, Shell, Quick
Quick Sort
< Basic
Idea: Move the radix to somewhere so that the left-hand sides are all smaller than it and the right-hand sides are greater.
privatevoidquickSort(Comparable[] A, int start, int end) { if (start >= end) return; // recursion ending condition. intleft= start, right = end; Comparablepivot= A[(start + end)/2]; // take random item as pivot. Here it is the middle. while (left <= right) { while (left <= right && less(A[left], pivot)) left++; // from left to right to find the first item not smaller than pivot. while (left <= right && greater(A[right], pivot)) right--; // from right to left to find the first item not greater than pivot. if (left <= right) { swap(A, left, right); // let the greater-than-pivot item swap with the smaller one left++; right--; // continue to } } quickSort(A, start, right); quickSort(A, left, end); }
< Improvement
Insertion sort: When subarray is of small size, use insertion to get them sorted.
Median as pivot: Take sample 3 items to estimate the median of the list, and swap the median to the front as pivot.
< 3-way QuickSort
Idea: Since quickSort goes quadratic unless partitioning stops on equal keys, in equal-keys-prevail array, put all items equal to the pivot in place by adding lt and gt.
< Usage
Quick-Selection: Select the kth max/min item.
Priority Queues
< Binary heaps
binary tree: empty or node with links to left and right binary trees.
complete tree: prefectly balanced, except for the bottom level.
height of complete tree: \(\lfloor lgN \rfloor\) with \(N\) nodes.(???So one-node tree means 0?)
binary heap: each parent has a larger key than its children.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
privatevoidcreateMaxHeap(Comparable[] A, int end) { for (inti= (end - 1) / 2; i >= 0; i--) { // Start from the father of the very last child intfatherIndex= i; while (2 * fatherIndex + 1 <= end) { // If the node has at least a left child intbiggerIndex=2 * fatherIndex + 1; if (biggerIndex < end) { // If it also has a right child if (less(A[biggerIndex], A[biggerIndex + 1])) biggerIndex ++; // Select the bigger one between left & right } if (less(A[fatherIndex], A[biggerIndex])) { // If the father is smaller than the bigger child swap(A, fatherIndex, biggerIndex); fatherIndex = biggerIndex; // Continue to compare to the childen after swap } else { break; // father is already the greatest } } } }
< HeapSort
Idea: Take the root of binary heap and rebuild the heap.
1 2 3 4 5 6 7 8
privatevoidmaxHeapSort(Comparable[] A) { intsize= A.length; for (inti=0; i < size; i++) { createMaxHeap(A, size - 1 - i); swap(A, 0, size - 1 - i); // put the max root at the end } } privatevoidcreateMaxHeap(Comparable[] A, int end) {...}
Binary Search Trees(BSTs)
< Basic
Definition: A BST is a binary tree with an order where each node’s key is larger than all keys in its left subtrees and smaller than all keys in its right subtree.
Node: 4 fields - key, value, reference to left and right subtree.
Improvement: Add one more field “int count” to track the size of each subtree at the corresponding root node. Help with rank() and select();
< Operations
Floor: get the biggest key that is smaller than the given key.
Hibbard deletion: -> [0 children]: Set its parent’s corresponding link to null. -> [1 child]: Set its parent’s corresponding link to its only child. -> [2 children]: Find the key and put the minimum key in its right subtree on its spot.
Variable number of keys per node -> 2-node: one key, two children. left < key < right. -> 3-node: two keys, three children. left < key1 < mid < key2 < right.
Insertion at the bottom: -> 2-node: Simply add the key into it and yields a 3-node. -> 3-node: Add the key and made it temporary 4-node, and then split it by passing the middle key into the parent node, and continue to pass up if necessary.
Perfect balance: Every path from root to null link has the same length.
< red-black BSTs
Idea: represent 2-3 search trees as a BST with two types of links. Null links are black.
Left-leaning: Force red link leaning to the left as valid glue of a 3-node.
Rule: -> No node has two red links connected(which means it’s a 4-node). -> Every path from root to null link has the same number of black links. -> Red links leans left.
Search: Implemenation is the same but runs faster than BSTs because of better balance.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
public Val get(Key key) { Nodex= root; while (x != null) { intcmp= key.compareTo(x.key); if (cmp < 0) { x = x.left; } elseif (cmp > 0) { x = x.right; } else { return x.val; } } returnnull; }
Elementary Operations -> Rotate Left: When right leaning red links occurs, rotate it left.
Insertion at the bottom: -> 2-node: Color new link red and rotate if right leaning. -> 3-node: Color new link red and rotate and/or flip along the way to the root to keep balance. Pay attention to the order to cope with 3 kinds of situation that break the balance.
BST RS: -> Recusively find all keys in left subtree; -> Check key in current node; -> Recusively find all keys in right subtree;
< 2D orthogonal range search
Range search: Find all keys lying in a 2D range.
Range count: Number of keys lying in a 2D range.
Data structure: BST using x and y coordinates alternatively.
2D tree RS: -> Check if current node is in range(ractangle); -> Recusively check all keys in left/bottom subtrees; -> Recusively check all keys in right/top subtrees.
Nearest neighbour search: -> Check distance from current point to the query point; -> Recusively check left/bottom; -> Recusively check right/top. When getting closer to the query point, some search is pruned.
< Interval search trees
BST with node storing interval. Use left endpoint as key and store max endpoint of each subtree at root.
Insertion: Use left endpoint as key to insert and update max endpoint along the search path.
Search: -> If interval in node intersects query interval, return it. -> Else if left subtree is null, go right. -> Else if max endpoint in left subtree is less than lo, go right. -> Else go left.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
private Interval get(Key lo, Key hi) { Nodex= root; while (x != null) { if (x.interval.intersects(lo, hi)) { return x.interval; } elseif (x.left == null) { x = x.right; } elseif (x.left.max < lo) { x = x.right; } else { x = x.left; } } returnnull; }
< Rectangle intersection
Find all intersections among a set of N orthogonal rectangles.
Sweep vertical line method: Triggered by endpoints’ x-coordinates, maintain the y-intervals in BST; Left endpoints trigger interval search for that y-interval and insert, while right endpoints trigger removal of that interval.
Hash Tables
< Hash function
Idea: Save items in key-index table, where index is a hash function of key.
Hash in user-defined types: make use of prime numbers and built-in hashCode() for Java classes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
publicfinalclassTransactionimplementscomparable<Transaction> { privatefinal String who; privatefinal Date when; privatefinaldouble amount;