## Recording distributions: appropriate container?

Discuss all kind of algorithms and data structures from their mathematical and programming sides.

Moderators: Darobat, RecursiveS, Dante Shamest, Bugdude, Wizard

### Recording distributions: appropriate container?

I am recording integer values that fall in the interval [-lim,+lim]. One possible way of doing this is to have use an array of size (2*lim+1). Each time integer x arises I increment the appropriate element in the array. However, it is likely that the actual interval is much smaller than [-lim,+lim], but the actual size of the interval isn't known a priori. I would like to use a container that is only as large as it needed to be, as the potential interval is large and I need to use many of these containers. Is the best way to have a vector that I resize when necessary? (I thought that constantly resizing vectors would be pretty expensive, though.) Is there an efficient alternative?

Thanks.

DannyBoy

Posts: 1160
Joined: Fri Feb 13, 2004 12:56 pm
Location: In the Billiard Room with the Lead Pipe

The other option is using something like std::map<int, unsigned>, or hash_map<int. unsigned> (if your library provides hash_map). These would tend to work better if your data is sparse. Probably hash_map is the best solution all around.

Alvaro
Moderator

Posts: 5185
Joined: Mon Sep 22, 2003 4:57 pm
Location: NY, USA

Doubling the capacity of the vector whenever it fills up is done with O(1) amortized time complexity. To add a capacity c/2 amount of elements to an array already containing c/2 elements includes creating a new array of twice the size and copying all elements from the old array to the new array. This is done using c operations, hence, the mean time complexity for this part is O(1) per addition, it has a (small) constant factor effect on the add operation, amortized.

What is most effective quite depends on how dense the keys are. Creating a hash map that maps the keys without a compression function, you end up with O(N) space complexity, where N is the maximum key to use. If you knew about how many elements there would be, you should create a hash map of a few times that size (pref. 10 times at least) and compress, but ensure that you have bucket functions.

Fuso

Posts: 19
Joined: Wed Jul 06, 2005 3:44 am
Location: Sweden

Return to Algorithms & Data Structures

### Who is online

Users browsing this forum: No registered users and 2 guests