How does a HashMap works internally in Java ??

Points To Remember

HashMap implements the Map Interface.
It works on the principles of hashing.
It stores and retrieves data in a key value pair.
HashMap can have only one key as a null element.
HashMap stores both key and value in form of Entry object Map.Entry.
Initial capacity of HashMap is 16. (Since jdk 1.7 we can have HashMap with initial capacity as 0).
Default load Factor of a HashMap is 0.75, after this a HashMap gets rehashed.

Q1 : How does HashMap work internally ?

We add data to the HashMap in key-value pair, by method put(key, value). When we do this, the hashCode() method is called upon the key to return a hashcode, this hashcode is an integer value 16 digit long by default. This hashcode is then used by the HashMap's internal hashing method to find a bucket location to store the Entry object. At this bucket location both key and value is saved in the bucket. Since we know that different objects can result into the same hashcode, thus all the keys that result into the same hashcode are stored in the same bucket in form of a linklist. This linklist however is not the same as java.util.LinkedList, it is a much simpler implementation of list as compared to java.util.LinkedList.

This second hash function is provided by the implementation of HashMap and cannot be overridden by the developer. This is a static method and looks like

static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

Each hashmap has an Array and in that Array places the entries in a position according to the keys hashcode (e.g. position = hashcode % arraysize).
If more than one entry ends up in the same bucket those entries are stored in a linked list. Thus the bucket-metaphor: each array-entry is a "bucket" where you just dump in all the matching keys.We use array to get the desired "constant time" behaviour since we need to Access random positions in this list.
Whereas within a bucket you have to traverse all elements until finding the desired key anyways, so you can use a linked list as it is easier to append to (no resize needed).
If key is null, then Null keys always map to hash 0, thus index 0.
If both the hashCode() and equals() return the same value for key then, its value is overridden if it exists.

This also Shows the Need for a good hashfunction, because if all keys hash to only a few values you will get Long linked lists to search and a lot of (fast to access) empty buckets.

Q2 : How do we retrieve values from a HashMap ?

We retrieve from HashMap using the get(key) method. As stated above, first the hashcode is found by the hashCode() method and then this hashcode is applied to the static hash() method of the hashmap to find the bucket location. Once the bucket location is found, that is actually a link list. We traverse from first element of the list to the last element applying equals() method to find the value corresponding to the key.

Q3 : What will happen if two keys have same hashcode ?

Yes, this is also a main concern that two different objects can produce same hashcode, thus this is the main reason why this hashode is again rehashed using internal hashing algorithm of the HashMap. When we get two objects with same hashcode, their bucket location will be same. Therefore both the key value pairs will be stored in the same bucket in the form of a linklist.

For example, in the above shown image the keys "John Smith" and "Sandra Dee" produces the same hashcode(). So same hashcode will point to the same bucket location, e.g in our case, lets say that bucket location is 152. Now in case we search for "Sandra Dee", we will apply equals() method to all the keys in the link list of this bucket until one of them return true. This returned value will be the value corresponding to that key.Image 2 above, shows how key-value elements are stored in an Entry object, how keys with same hashcode are stored in a singly linkedlist and how an array of these bucket locations are stored in an array.

Q4 : Why Strings, Integers and other Wrapper classes considered as good keys ?

Any class that implements the hashCode() and equals() methods of the java.lang.Object are considered as good keys. Using immutable, final object with proper equals() and hashcode() implementation would act as perfect Java HashMap keys and improve performance of Java HashMap by reducing collision. Immutability also allows caching there hashcode of different keys which makes overall retrieval process very fast, if unequal object return different hashcode than chances of collision will be less which subsequently improve performance of HashMa. Thus String and various wrapper classes e.g. Integer very good keys in Java HashMap.

Ekiras

Search This Blog

How does a HashMap works internally in Java ??

Labels

Comments

Post a Comment