Recording distributions: appropriate container?

Discuss all kind of algorithms and data structures from their mathematical and programming sides.

Moderators: Darobat, RecursiveS, Dante Shamest, Bugdude, Wizard

Recording distributions: appropriate container?

Postby DannyBoy » Thu Nov 17, 2005 9:32 am

I am recording integer values that fall in the interval [-lim,+lim]. One possible way of doing this is to have use an array of size (2*lim+1). Each time integer x arises I increment the appropriate element in the array. However, it is likely that the actual interval is much smaller than [-lim,+lim], but the actual size of the interval isn't known a priori. I would like to use a container that is only as large as it needed to be, as the potential interval is large and I need to use many of these containers. Is the best way to have a vector that I resize when necessary? (I thought that constantly resizing vectors would be pretty expensive, though.) Is there an efficient alternative?

Thanks. :D
User avatar
DannyBoy
 
Posts: 1160
Joined: Fri Feb 13, 2004 12:56 pm
Location: In the Billiard Room with the Lead Pipe

Postby Alvaro » Thu Nov 17, 2005 10:17 am

The other option is using something like std::map<int, unsigned>, or hash_map<int. unsigned> (if your library provides hash_map). These would tend to work better if your data is sparse. Probably hash_map is the best solution all around.
User avatar
Alvaro
Moderator
 
Posts: 5185
Joined: Mon Sep 22, 2003 4:57 pm
Location: NY, USA

Postby Fuso » Fri Dec 16, 2005 7:12 pm

Doubling the capacity of the vector whenever it fills up is done with O(1) amortized time complexity. To add a capacity c/2 amount of elements to an array already containing c/2 elements includes creating a new array of twice the size and copying all elements from the old array to the new array. This is done using c operations, hence, the mean time complexity for this part is O(1) per addition, it has a (small) constant factor effect on the add operation, amortized.

What is most effective quite depends on how dense the keys are. Creating a hash map that maps the keys without a compression function, you end up with O(N) space complexity, where N is the maximum key to use. If you knew about how many elements there would be, you should create a hash map of a few times that size (pref. 10 times at least) and compress, but ensure that you have bucket functions.
User avatar
Fuso
 
Posts: 19
Joined: Wed Jul 06, 2005 3:44 am
Location: Sweden


Return to Algorithms & Data Structures

Who is online

Users browsing this forum: No registered users and 0 guests

cron