Quickselect vs Countingselect

General discussion about C/C++

Moderators: Darobat, RecursiveS, Dante Shamest, Bugdude, Wizard

Quickselect vs Countingselect

Postby Sasstraliss » Sat Jun 02, 2012 3:44 am

Quickselect, based on quicksort, and counting select, based on counting sort.

Each is capable of finding the kth smallest element in an unsorted/sorted list/array.


This is some example code for a QuickSelect algorithm. This doesn't include the partition function.

Code: Select all
// return the kth smallest item
int quickSelect(int items[], int first, int last, int k) {
    int pivot = partition(items, first, last);
    if (k < pivot-first) {
        return quickSelect(items, first, pivot, k);
    } else if (k > pivot) {
        return quickSelect(items, pivot+1, last, k-pivot);
    } else {
        return items[k];
    }
}




Counting select, based on the counting sort algorithm, looks something like this.

Code: Select all
// return the kth smallest item
int countingSelect(int items[], int first, int last, int k) {
    int counts[cap];
    for (int c = 0; c < cap; c++) {
        counts[c] = 0;
    }
    for (int i = first; i < last; i++) {
        counts[items[i]] += 1;
    }
    int c = 0;
    while (k >= 0) {
        k -= counts[c++];
    }
    return c-1;
}




I wish to write a set of guidelines to decide which of the two algorithms is most appropriate for specific situations or data sets. I need to think of situations, and than implement guidelines as to which algorithm is the better choice.

For this, I need a little help distinguishing which algorithm has specific strengths in certain areas/data sets and varieties, etc.

Thanks in advance.
Sasstraliss
 
Posts: 1
Joined: Sat Jun 02, 2012 3:42 am

Re: Quickselect vs Countingselect

Postby exomo » Sat Jun 02, 2012 7:48 am

The first one seems to modify the input data. But this can easily be fixed by working on a copy of the items. Time should be similar to QuickSort in O(n*log n).

I think the second one looks "faster" (something like O(n+cap) ), but at the price of unpredictable and possibly very high memory allocation. I don't know how you get the value of cap, but I guess it's something like the maximum value of the array. And you have to keep in mind that numbers in the array might be negative, when using them as index for another array.
You just should run some tests on different sets of data. I would guess the second algorithm is faster as long as the values inside the array are "small enough" and the size of the array is "big enough", on small arrays with big values you waste too much time going through the count array. So basically you have to test on different array sizes and different value ranges to find out what "big enough" and "small enough" are.
Who needs a signature anyway.
User avatar
exomo
 
Posts: 881
Joined: Fri Sep 26, 2003 12:30 pm
Location: germany->baden


Return to General

Who is online

Users browsing this forum: No registered users and 2 guests