Mergesort question

Discuss all kind of algorithms and data structures from their mathematical and programming sides.

Moderators: Darobat, RecursiveS, Dante Shamest, Bugdude, Wizard

Mergesort question

Postby Darryl » Tue Jul 08, 2008 1:03 pm

Lately I've gotten a new quad core system and so have decided to explore optimizing various algorithms for it.

In looking at sorting algorithms I noticed something odd about merge sorts

From the Wikipedia on mergesort, it list the algorithm as such:
Conceptually, a merge sort works as follows:

1. If the list is of length 0 or 1, then it is sorted. Otherwise:
2. Divide the unsorted list into two sublists of about half the size.
3. Sort each sublist recursively by re-applying merge sort.
4. Merge the two sublists back into one sorted list


Now my question is, why go through steps 1 to 3 recursively until you're down to lists of length one? Why not just start by treating each element as a sorted list of one and start with item 4 and begin merging? Am I missing somethings for surely something so simple couldn't have been overlooked?
User avatar
Darryl
 
Posts: 1342
Joined: Wed Sep 01, 2004 10:50 am
Location: Cayman Islands

Postby Alvaro » Tue Jul 08, 2008 1:33 pm

Can you write pseudo-code for your proposed algorithm? I think you can actually get a working algorithm that is basically a non-recursive version of MergeSort. It would be interesting to compare their empirical performance.
User avatar
Alvaro
Moderator
 
Posts: 5185
Joined: Mon Sep 22, 2003 4:57 pm
Location: NY, USA

Postby Alvaro » Tue Jul 08, 2008 1:48 pm

This works with random-access iterators:
[syntax="cpp"]#include <algorithm>

template <typename It>
void DarrylSort(It begin, It end) {
unsigned n=end-begin;
for(unsigned block_size=1; block_size<n; block_size*=2) {
for(unsigned block_begin=0; block_begin+block_size<n; block_begin += 2*block_size) {
std::inplace_merge(begin+block_begin,
begin+block_begin+block_size,
begin+std::min(block_begin+2*block_size, n));
}
}
}

[/syntax]

This is less cache friendly than the recursive version, though, since it does multiple passes over the entire data set.
User avatar
Alvaro
Moderator
 
Posts: 5185
Joined: Mon Sep 22, 2003 4:57 pm
Location: NY, USA

Postby Darryl » Tue Jul 08, 2008 1:59 pm

I just found what I was looking for. It's called a bottom up mergesort.

Once I had a name, it was easy to find the pros and cons.

"The advantage of top down is that it is absurdly simple."

"The advantage of bottom up is that it doesn't require auxilliary storage for the algorithm (ignoring space needed for merging). "

Here is a pseudo code implementation.

Code: Select all
Input: array a[] indexed from 0 to n-1.

m = 1
while m <= n do
    i = 0
    while i < n-m do
        merge subarrays a[i..i+m-1] and a[i+m .. min(i+2*m-1,n-1)] in-place.
        i = i + 2 * m
    m = m * 2


*** EDIT ***

I should had refresh the page before posting...the two are almost identical.

I didn't know there was a std::inplace_merge function... learn something new everyday. (I think that's one of the hard things about programming is not know what exist)
User avatar
Darryl
 
Posts: 1342
Joined: Wed Sep 01, 2004 10:50 am
Location: Cayman Islands

Postby devil_slayer » Tue Jul 08, 2008 2:38 pm

Now, instead of just using your quad-core, you might wanna look at GPU programming. Then again, why not use both? :)
User avatar
devil_slayer
 
Posts: 489
Joined: Wed Oct 01, 2003 3:44 am
Location: Warsaw, POLAND

Postby devil_slayer » Tue Jul 08, 2008 2:48 pm

Ooh, check out http://www.cs.mu.oz.au/498/notes/node38.html! Note the speedup on N processors (bottom of the page).

Also see this for some examples of algorithms on GPU...
User avatar
devil_slayer
 
Posts: 489
Joined: Wed Oct 01, 2003 3:44 am
Location: Warsaw, POLAND


Return to Algorithms & Data Structures

Who is online

Users browsing this forum: No registered users and 1 guest