1. The i th order statistic of a set of n elements is the i th smallest element. Regardless of the parity of n, medians occur at i = ⌊(n + 1)/2⌋ (the lower median) and i = ⌈(n+1)/2⌉ (the upper median). Here, we consistently use the phrase “the median” to refer to the lower median.
2. We formally specify the selection problem as follows:
Input: A set A of n (distinct) numbers and an integer i , with 1 ≤ i ≤ n.
Output: The element x ∈ A that is larger than exactly i-1 other elements of A.
3. we can find both the minimum and the maximum using at most 3⌊n/2⌋
comparisons. We do so by maintaining both the minimum and maximum elements
seen thus far. We compare pairs of elements from the input first with each other, and then we compare the smaller with the current minimum and the larger to the current maximum, at a cost of 3 comparisons for every 2 elements. How we set up initial values for the current minimum and maximum depends on whether n is odd or even. If n is odd, we set both the minimum and maximum to the value of the first element, and then we process the rest of the elements in pairs. Then we perform 3⌊n/2⌋ comparisons. If n is even, we perform 1 comparison on the first 2 elements to determine the initial values of the minimum and maximum, and then process the rest of the elements in pairs as in the case for odd n. We perform 1 initial comparison followed by 3(n - 2)/2 comparisons, for a total of 3n/2-2. Thus, in either case, the total number of comparisons is at most 3⌊n/2⌋.
4. As in quicksort, randomSelect partition the input array recursively. But unlike quicksort, which recursively processes both sides of the partition, it works on only one side of the partition. Quicksort has an expected running time of θ(n lg n), the expected running time of randomSelect is θ(n), assuming that the elements are distinct:
int randomSelect(int[] arr, int start, int end, int i ) {
// if the array only has one element
if (start==end) return arr[start];
int q = randomPartition(arr, start, end);
int k = q - start + 1;
// if pivot is the answer
if ( k == i ) return arr[q];
// if i < k , then the answer is the ith smallest one in left subarray
if ( i < k ) return randomSelect(arr, start, q - 1, i);
// if i > k , then the answer is the i-k th smallest one in right subarray
else return randomSelect(arr, q + 1, end , i - k );
}
The worst-case running time for randomSelect is θ(n^2), even to find the minimum, because we could be extremely unlucky and always partition around
the largest remaining element, and partitioning takes θ(n) time. However, randomSelect can find any order statistic, and in particular the median, in
expected linear time, assuming that the elements are distinct.
5. The algorithm select finds the desired element in O(n) time in worst case by recursively partitioning the input array and guaranteeing a good split upon partitioning the array. The select algorithm determines the i th smallest of an input array of n > 1 distinct elements by executing the following steps. (If n = 1, then select merely returns its only input value as the i th smallest.)
1) Divide the n elements of the input array into ⌊n/5⌋ groups of 5 elements each and at most one group made up of the remaining n mod 5 elements.
2) Find the median of each of the ⌊n/5⌋ groups by first insertion-sorting the elements of each group (of which there are at most 5) and then picking the median from the sorted list of group elements.
3) Use select recursively to find the median x of the ⌊n/5⌋ medians found in
step 2). (If there are an even number of medians, then x is the lower median.)
4) Partition the input array around the median-of-medians x. Let k be one more than the number of elements on the low side of the partition, so that x is the kth smallest element and there are n-k elements on the high side of the partition.
5) If i = k, then return x. Otherwise, use select recursively to find the i th
smallest element on the low side if i < k, or the (i-k)th smallest element on
the high side if i > k.
At least half of the medians found in step 2) are greater than or equal to the median-of-medians x.Thus, at least half of the ⌊n/5⌋ groups contribute at least 3 elements that are greater than x, except for the one group that has fewer than 5 elements if 5 does not divide n exactly, and the one group containing x itself. Discounting these two groups, it follows that the number of elements greater than x is at least 3n/10 - 6. Similarly, at least 3n/10 - 6 elements are less than x. Thus, in the worst case, step 5) calls select recursively on at most 7n/10 + 6 elements.
6. Implementation:
public class MedianOfMediansSelector { private int[] arr; public MedianOfMediansSelector( int[] arr) { this.arr = arr; } public int select(int index) { if (index < 0 && index >= arr.length) throw new IllegalArgumentException("Index out of boundary"); int end = arr.length; int start = 0; int pivotIndex = choosePivot(start, end); pivotIndex = partition(start, end, pivotIndex); while (pivotIndex != index) { if( pivotIndex > index ) { end = pivotIndex; pivotIndex = choosePivot(start, end); pivotIndex = partition(start, end, pivotIndex); } else { start = pivotIndex + 1; pivotIndex = chooser.choosePivot(start, end); pivotIndex = partition(start, end, pivotIndex); } } return arr[index]; } private int partition(int start, int end , int pivotIndex){ //swap with the partition element with the first one int temp = arr[start]; arr[start] = arr[pivotIndex]; arr[pivotIndex] = temp; int pivot = arr[start]; int i=start+1,j=end-1; while( i <= j ) { while ( i <= j && arr[i] <= pivot ) i ++; if (i > j) break; while (arr[j] > pivot ) j--; if ( i > j) break; temp = arr[i]; arr[i] = arr[j]; arr[j] = temp; } i--; arr[start] = arr[i]; arr[i] = pivot; return i; } private int choosePivot(int start, int end) { if ( start >= end ) throw new IllegalArgumentException("end should bigger than start!"); if ( end > arr.length ) throw new IllegalArgumentException("end should not bigger than arry length!"); if ( end - start <= 2) { return start; } // sort elements in each block int BLOCK_SIZE = 5; int i = BLOCK_SIZE; int num = 1; for ( ; i < end ; i += BLOCK_SIZE , num ++) { insertionSort(i - BLOCK_SIZE, i); } i -= BLOCK_SIZE; insertionSort(i, end); // choose medium of each group int medium = BLOCK_SIZE/2; int[] mediums = new int[num]; int r = ( i + end ) / 2; // medium item in last block i = 0 ; for ( int j = medium; i < num - 1 ; j += BLOCK_SIZE, i ++ ) { mediums[i] = arr[j]; } mediums[i] = arr[r]; MedianOfMediansSelector selector = new MedianOfMediansSelector (mediums); int pivot = selector.select(num/2); for ( i = start; i < end; i ++) { if ( arr[i] == pivot) return i; } throw new IllegalStateException("Medium of Medium didn't find a correct pivot!"); } private void insertionSort(int start, int end) { int temp; for ( int i = start + 1; i < end ; i ++ ) { for ( int j = i ; j > 0 ; j --) { if ( arr[j] < arr[j-1] ) { temp = arr[j]; arr[j] = arr[j-1]; arr[j-1] = temp; } else { break; } } } } }
相关推荐
9. **中位数和有序统计(Medians and Order Statistics)**:ch9 Medians and Order Statistics.pdf 解释了如何在未排序的数据中找到中位数和其他特定位置的元素,这对于处理大数据集特别有用。 这些主题覆盖了算法...
9 Medians and Order Statistics 183 9.1 Minimum and maximum 184 9.2 Selection in expected linear time 185 9.3 Selection in worst-case linear time 189 III Data Structures Introduction 197 10 Elementary ...
9 Medians and Order Statistics 213 9.1 Minimum and maximum 214 9.2 Selection in expected linear time 215 9.3 Selection in worst-case linear time 220 III Data Structures Introduction 229 10 Elementary ...
Chapter 9 - Medians and Order Statistics Part III - Data Structures Chapter 10 - Elementary Data Structures Chapter 11 - Hash Tables Chapter 12 - Binary Search Trees Chapter 13 - Red-Black ...
Chapter 9 - Medians and Order Statistics Part III - Data Structures Chapter 10 - Elementary Data Structures Chapter 11 - Hash Tables Chapter 12 - Binary Search Trees Chapter 13 - Red-Black ...
Chapter 9 - Medians and Order Statistics Part III - Data Structures Chapter 10 - Elementary Data Structures Chapter 11 - Hash Tables Chapter 12 - Binary Search Trees Chapter 13 - Red-Black ...
Chapter 9 - Medians and Order Statistics Part III - Data Structures Chapter 10 - Elementary Data Structures Chapter 11 - Hash Tables Chapter 12 - Binary Search Trees Chapter 13 - Red-Black ...
Chapter 9 - Medians and Order Statistics Part III - Data Structures Chapter 10 - Elementary Data Structures Chapter 11 - Hash Tables Chapter 12 - Binary Search Trees Chapter 13 - Red-Black ...
Chapter 9: Medians and Order Statistics Lecture Notes 9-1 Solutions 9-10 Chapter 11: Hash Tables Lecture Notes 11-1 Solutions 11-16 Chapter 12: Binary Search Trees Lecture Notes 12-1 Solutions 12-15 ...
第九章 中值与顺序统计(Medians and Order Statistics) 第三部分(Part III) 数据结构(Data Structures) 第十章 基本的数据结构(Elementary Data Structures) 第十一章 散列表(Hash Tables) 第十二章 ...
第九章 中值与顺序统计(Medians and Order Statistics) 第三部分(Part III) 数据结构(Data Structures) 第十章 基本的数据结构(Elementary Data Structures) 第十一章 散列表(Hash Tables) 第十二章 二叉...
第九章 中值与顺序统计(Medians and Order Statistics) 第三部分(Part III) 数据结构(Data Structures) 第十章 基本的数据结构(Elementary Data Structures) 第十一章 散列表(Hash Tables) 第十二章 二叉...
第九章 中值与顺序统计(Medians and Order Statistics) 第三部分(Part III) 数据结构(Data Structures) 第十章 基本的数据结构(Elementary Data Structures) 第十一章 散列表(Hash Tables) 第十...
第九章 中值与顺序统计(Medians and Order Statistics) 第三部分(Part III) 数据结构(Data Structures) 第十章 基本的数据结构(Elementary Data Structures) 第十一章 散列表(Hash Tables) 第十...
6. 中位数和顺序统计(Medians and Order Statistics):讨论了线性时间选择算法和最坏情况下的线性选择算法,这部分内容对于理解如何在数据集中找到关键的中位数等统计值至关重要。 7. 基本数据结构(Elementary ...
第九章 中值与顺序统计(Medians and Order Statistics) 第三部分(Part III) 数据结构(Data Structures) 第十章 基本的数据结构(Elementary Data Structures) 第十一章 散列表(Hash Tables) 第十...