7

How to initialize HashMap in Java

 3 years ago
source link: https://marco.dev/java-hashmap
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

No argument constructor …

In Java is a good practice to initialize the initial capacity of collections and maps.

Many developers (me included) have the habit to declare a new Collection or Map using the no-argument constructor, e.g.:

Map exampleMap = new HashMap();

With this instruction, Java initializes a new HashMap object with the attribute loadFactor at 0.75 and the DEFAULT_INITIAL_CAPACITY at 16.

The HashMap stores internally his values in an array of HashMap$Node objects (at least until when the size doesn’t become too big).

The initialization doesn’t create yet the array, it will be instantiated only with the first insert in the map (e.g. using put), Java will create the internal array with something like: Node<K,V>[] tab = (Node<K,V>[])new Node[16].

… it will grow

Every time an entry is added into the Map, the HashMap instance checks that the number of values contained in the bucket array is not more than his capacity multiplied the load factor (default at 0.75). In our case : 16 (capacity) * 0.75 (load factor) = 12 (threshold).

What happens when the 13th value is inserted in the array? The number of entries in the array is more than the threshold and the HashMap instance calls the method: final Node<K,V>[] resize().

This method creates a new array of Node with a capacity of the current store (16) * 2: (Node<K,V>[])new Node[32]

The values of the current bucket array are ‘transferred’ in the new array, the new threshold is also multiplied * 2.

The table shows how the size of the bucket array grows adding new entries.

The rehashing is done in resize() requires computational power and should be avoided if possible.

# of inserts .put(K,V) resize() calls bucket array size Threshold 0 0 null 0 1 1 16 12 13 2 32 24 25 3 64 48 49 4 128 96 97 5 256 192 193 6 512 384 385 7 1 024 768 769 8 2 048 1 536 1 537 9 4 096 3 072 … … … … 98 305 15 262 144 196 608 … … … …

defining the initial size in the constructor

You have a defined number of entries or you know what should be the number of values that the map will contain, then it’s recommended to set the ‘initial capacity’ accordingly.

Example you will have 100 entries and not one more? Is new HashMap(100) the optimum size for your map?

Unfortunately, no.

If the initial threshold is calculated using the following algorithm:

/**
* Returns a power of two size for the given target capacity.
*/
static final int tableSizeFor(int cap) {
    int n = -1 >>> Integer.numberOfLeadingZeros(cap - 1);
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

… the result is 128.

When you start to insert elements in your Map, HashMap resizes the Map and recalculates the threshold: threshold (128) * DEFAULT_LOAD_FACTOR (0.75) = new threshold (96) . With a threshold of 96 and the Map will be re-hashed when you insert the 97th element.

If you want to optimize the size of the HashMap you can specify the load factor in the initialization: new HashMap(100, 1f) This will create a new HashMap with a threshold of 128, after the initialization, the threshold will be still at 128 (128 * 1 = 128).

To have a threshold of 100 you need a factor of 0.78125 (new HashMap(100, 0.78125f). A less suited alternative to avoid the re-hashing is to initialize the Map with a size of 129: new HashMap(129). This would generate a table with a threshold of 192 (256*0.75);

General tip from the code source

The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations.

If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

JDK version used for the tests

OpenJDK version 14

Acknowledgments

Thanks to Bernhard (https://github.com/oxc) for pointing out some mistakes in the original publication.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK