Tuesday, September 9, 2014

Do you think you know Maps in Java?

Hi to All,
One thing that i’ve encountered in the past time is the actual usage of the Map interface in Java, and it’s many implementations, or maybe the correct way to put it: “Unknown basics of Map usage”.

Most of us probably know that we use Map for the common Key-Value Pair,
and we use it because it has the efficient “get” method of O(1) right?
But what are we really talking about?
You might notice that “Map” is actually an Interface and not an implementation, So saying these things about Efficiency and Complexity is really not possible about the Interface itself.

What are we going to talk about?
  1. Map and some HashMap implementation basics
  2. Collisions
  3. Hashing and Rehashing
  4. TreeMap
  5. LinkedHashMap
  6. ConcurrentHashMap
  7. Lets meet Multimap
  8. Conclusion (For the lazy ones that don’t want to read all of my hard effort :) )

So, lets elaborate more about Map and actual usage cases of Implementations you might have heard of:

Map and Some HashMap implementation basics

The Implementation of HashMap exist since Java 1.2, one of our basic building blocks of programing in Java, but how much do we really know about it’s way of implementation,
so let’s talk about some basic elements of it and about the fundamental API of Map Interface:

  • Map.Entry Interface
First of all there is a Very important inner interface that we should know that takes case of the Key-Value structure in the Map: Map.Entry.
It is the entity that stores our value in each Node (And yes, the actual class is Node, it implements the interface) and implements the way that it stores the same Equal Keyed Objects in a Linked List manner.

  • Map Main Methods (These are not the only ones, but the ones i want to mention)
The elaboration is mainly by the Map implementation of HashMap.
Inserts the wanted value in the correct place in the Map. In case the key exists by it’s hashCode, then the equals method is checked in the linked lists values, and replaced in case it’s true, and added if false.
returns the value associated to the hashCode of the given Key, if absent then null is returned.
returns if there is a mapping between the Key and some value in the Map.
Basically it uses “get” and checks if the result is null or not (So it’s the same efficiency wise).
removes the entire mapping of the Key in the Map.
removes the the element key only if it’s value returns true for the equals() method of the mapped value in the Map.
returns the distinct KeySet of the Map, actually it returns an inner type that implements Set that is called KeySet.
getting a Collection that represents all the values of the Map sequentially.

(Like most of the Map implementations, HashMap extends the Abstract class AbstractMap)

How is the data stored you ask?
The data in a HashMap is store as simple Array of Node values, that each one of them might hold a LinkedList like construct of values that have the same hashCode value.
And in case we get to a state the we exceed the size of the predefined array, the HashMap reallocated the space and creates a new array (Everything is made by the default values of the HashMap or the defined ones by the programmer in the creation of the map).

Collisions

One of the first things i remember that they told us at class is that the efficiency of “get” in Hash Maps is defined by the amount of collisions with Keys you have, that at the worst case are O(n).
That is pretty bad is you think of it, that every usage of “get” of “containsKey” will take you O(n) right?
But this is not exactly the case in the actual implementation of HashMap,
it all depends on the relation between the implementation of hashCode() method of the Key object and equals(), after trying to “put” a key-value to the map, first the hashCode is checked, and if it is present, then the equals method of the Key is checked, if true the value is replaced, in case of false, a new node is created and appended to the “LinkedList” of nodes in the current bucket.
This is the case that we will get the “Worst case” of O(n) during all of the operations of Map.

Hashing And Rehashing
The hash process and the actual resolve of the bucket (index in the array) that the key should be under is resolved in an inner hash() method that was implemented by the JDK writers, and takes the hashCode of the Object, and calculates the exact space in the array that the key is located at.
A Rehashing process occurs when the size of the Map is changed, and we need to resize the inner array.
The most important thing about hashing in all sorts of Maps you’ll use is actually the implementation of the Key object, it very important for the correct functionality of any sort of Map for the Keys to be Immutable!
If not, you can run into some unpleasant situations, that you will lose values in the Map because the keys hashCode() method is the one responsible for finding the value in the Map.
Using primitive types like Integer, Float, Double and etc are preferable, but sometimes you’ll need something more complex.

  • I've encountered some other useful Map Implementations during my time of coding:

What happens when we need to keep all of our elements sorted and to get the best efficiency possible with all of the actions (Because we know that sorting is a pain while trying to get good performance),
We can use the TreeMap data structure that does all of the operations in (log n) time complexity.
I've used is when I had to keep a data structure that my main importance was to retrieve all of the elements in sorted order each time i need the values.

Another situation I needed a different kind of data structure was when i needed to keep the insertion order of my elements, so the right tool for the Job is LinkedHashMap.
Every time i got the collection of values, they were in the insertion order, but still kept the efficiency of O(1)in the main operations.

Visual Comparison Table

Implementation
Elements Order
No guaranteed order, but will be the same over time.
Sorting according the natural ordering of a Comparator of the elements
Insertion order of the elements to the Map
put, get, remove, containsKey
O(1)
O(log n)
O(1)
Base Interfaces
Map
Map, NavigableMap, SortedMap
Map
Null Values/ Keys
allowed
only values allowed to be null
allowed
Implementation structure
Buckets (Array of key-value nodes)
Red-Black Tree
Double-Linked Buckets
Is Synchronized
No
No
No


What about multithreaded work you say? None of the above is Thread-Safe, so there is a great solution for that as well, the ConcurrentHashMap implementation.
I won’t elaborate much about it (This post is getting too long), but simply explaining, it works much like the HashMap, but with another layer of abstraction from above, you can think of it as many HashMaps in one big HashMap, and a Locking ,mechanism for each little HashMap, so we don’t have to lock the whole map in every operation we make on the Map.
(It is much more complex than I explained here, but if needed i’ll write another post about that alone).
There are more possibilities of synchronization of Maps in java via the Collections class,
Using:  Collections.synchronizedMap() and etc. (Again, i won’t elaborate here).

Lets Meet MultiMap
Wait a second, thats not what they have promised us! Shouldn’t it just collide and save the other values under the same Hash Code in the Map?
What happens if we want to actually save the other values?

There is a solution, by Google, i really hope that most of you know the libraryGuava”.
(You can find the reference by Maven dependency - current stable version of 18).

We will need this when we don’t want to rely on the equals method of the key and we know that for the same key we have many valid values, and we want to work with collections of these values.
Yes, as a stupid programer I found myself implementing it in the past without looking for ready solutions.
Using the MultiMap implementation of Guava makes life easy.
There are many implementation given by factories, just try the class Multimaps for assistance.

Conclusion

Now we understand more about the way that HashMap works, and about the time Complexity of all of the actions.
The main this you should take in consideration is that probably for every problem that you have there is a more appropriate Map implementation than the Naive HashMap, and you should get to know them.
You can read about some of them in the Java Tutorial, or to search for your own special need at our good friend “Google”.

Thank you very much for reading,
hope this post was helpful for you,
If any more elaboration is needed please write a comment :)

No comments:

Post a Comment