Tuesday, September 9, 2014

Do you think you know Maps in Java?

Hi to All,
One thing that i’ve encountered in the past time is the actual usage of the Map interface in Java, and it’s many implementations, or maybe the correct way to put it: “Unknown basics of Map usage”.

Most of us probably know that we use Map for the common Key-Value Pair,
and we use it because it has the efficient “get” method of O(1) right?
But what are we really talking about?
You might notice that “Map” is actually an Interface and not an implementation, So saying these things about Efficiency and Complexity is really not possible about the Interface itself.

What are we going to talk about?
  1. Map and some HashMap implementation basics
  2. Collisions
  3. Hashing and Rehashing
  4. TreeMap
  5. LinkedHashMap
  6. ConcurrentHashMap
  7. Lets meet Multimap
  8. Conclusion (For the lazy ones that don’t want to read all of my hard effort :) )

So, lets elaborate more about Map and actual usage cases of Implementations you might have heard of:

Map and Some HashMap implementation basics

The Implementation of HashMap exist since Java 1.2, one of our basic building blocks of programing in Java, but how much do we really know about it’s way of implementation,
so let’s talk about some basic elements of it and about the fundamental API of Map Interface:

  • Map.Entry Interface
First of all there is a Very important inner interface that we should know that takes case of the Key-Value structure in the Map: Map.Entry.
It is the entity that stores our value in each Node (And yes, the actual class is Node, it implements the interface) and implements the way that it stores the same Equal Keyed Objects in a Linked List manner.

  • Map Main Methods (These are not the only ones, but the ones i want to mention)
The elaboration is mainly by the Map implementation of HashMap.
Inserts the wanted value in the correct place in the Map. In case the key exists by it’s hashCode, then the equals method is checked in the linked lists values, and replaced in case it’s true, and added if false.
returns the value associated to the hashCode of the given Key, if absent then null is returned.
returns if there is a mapping between the Key and some value in the Map.
Basically it uses “get” and checks if the result is null or not (So it’s the same efficiency wise).
removes the entire mapping of the Key in the Map.
removes the the element key only if it’s value returns true for the equals() method of the mapped value in the Map.
returns the distinct KeySet of the Map, actually it returns an inner type that implements Set that is called KeySet.
getting a Collection that represents all the values of the Map sequentially.

(Like most of the Map implementations, HashMap extends the Abstract class AbstractMap)

How is the data stored you ask?
The data in a HashMap is store as simple Array of Node values, that each one of them might hold a LinkedList like construct of values that have the same hashCode value.
And in case we get to a state the we exceed the size of the predefined array, the HashMap reallocated the space and creates a new array (Everything is made by the default values of the HashMap or the defined ones by the programmer in the creation of the map).

Collisions

One of the first things i remember that they told us at class is that the efficiency of “get” in Hash Maps is defined by the amount of collisions with Keys you have, that at the worst case are O(n).
That is pretty bad is you think of it, that every usage of “get” of “containsKey” will take you O(n) right?
But this is not exactly the case in the actual implementation of HashMap,
it all depends on the relation between the implementation of hashCode() method of the Key object and equals(), after trying to “put” a key-value to the map, first the hashCode is checked, and if it is present, then the equals method of the Key is checked, if true the value is replaced, in case of false, a new node is created and appended to the “LinkedList” of nodes in the current bucket.
This is the case that we will get the “Worst case” of O(n) during all of the operations of Map.

Hashing And Rehashing
The hash process and the actual resolve of the bucket (index in the array) that the key should be under is resolved in an inner hash() method that was implemented by the JDK writers, and takes the hashCode of the Object, and calculates the exact space in the array that the key is located at.
A Rehashing process occurs when the size of the Map is changed, and we need to resize the inner array.
The most important thing about hashing in all sorts of Maps you’ll use is actually the implementation of the Key object, it very important for the correct functionality of any sort of Map for the Keys to be Immutable!
If not, you can run into some unpleasant situations, that you will lose values in the Map because the keys hashCode() method is the one responsible for finding the value in the Map.
Using primitive types like Integer, Float, Double and etc are preferable, but sometimes you’ll need something more complex.

  • I've encountered some other useful Map Implementations during my time of coding:

What happens when we need to keep all of our elements sorted and to get the best efficiency possible with all of the actions (Because we know that sorting is a pain while trying to get good performance),
We can use the TreeMap data structure that does all of the operations in (log n) time complexity.
I've used is when I had to keep a data structure that my main importance was to retrieve all of the elements in sorted order each time i need the values.

Another situation I needed a different kind of data structure was when i needed to keep the insertion order of my elements, so the right tool for the Job is LinkedHashMap.
Every time i got the collection of values, they were in the insertion order, but still kept the efficiency of O(1)in the main operations.

Visual Comparison Table

Implementation
Elements Order
No guaranteed order, but will be the same over time.
Sorting according the natural ordering of a Comparator of the elements
Insertion order of the elements to the Map
put, get, remove, containsKey
O(1)
O(log n)
O(1)
Base Interfaces
Map
Map, NavigableMap, SortedMap
Map
Null Values/ Keys
allowed
only values allowed to be null
allowed
Implementation structure
Buckets (Array of key-value nodes)
Red-Black Tree
Double-Linked Buckets
Is Synchronized
No
No
No


What about multithreaded work you say? None of the above is Thread-Safe, so there is a great solution for that as well, the ConcurrentHashMap implementation.
I won’t elaborate much about it (This post is getting too long), but simply explaining, it works much like the HashMap, but with another layer of abstraction from above, you can think of it as many HashMaps in one big HashMap, and a Locking ,mechanism for each little HashMap, so we don’t have to lock the whole map in every operation we make on the Map.
(It is much more complex than I explained here, but if needed i’ll write another post about that alone).
There are more possibilities of synchronization of Maps in java via the Collections class,
Using:  Collections.synchronizedMap() and etc. (Again, i won’t elaborate here).

Lets Meet MultiMap
Wait a second, thats not what they have promised us! Shouldn’t it just collide and save the other values under the same Hash Code in the Map?
What happens if we want to actually save the other values?

There is a solution, by Google, i really hope that most of you know the libraryGuava”.
(You can find the reference by Maven dependency - current stable version of 18).

We will need this when we don’t want to rely on the equals method of the key and we know that for the same key we have many valid values, and we want to work with collections of these values.
Yes, as a stupid programer I found myself implementing it in the past without looking for ready solutions.
Using the MultiMap implementation of Guava makes life easy.
There are many implementation given by factories, just try the class Multimaps for assistance.

Conclusion

Now we understand more about the way that HashMap works, and about the time Complexity of all of the actions.
The main this you should take in consideration is that probably for every problem that you have there is a more appropriate Map implementation than the Naive HashMap, and you should get to know them.
You can read about some of them in the Java Tutorial, or to search for your own special need at our good friend “Google”.

Thank you very much for reading,
hope this post was helpful for you,
If any more elaboration is needed please write a comment :)

Thursday, August 28, 2014

Defining Maven Compiler Plugin

Hi again,
After a long time away (in South America), when i decided to try to open another Simple Maven Project in Java,
I remembered one of the most annoying things in Maven when opening a new project,
Each time i find myself forgetting the way to define the relevant compiler version in the POM.xml.

"Why should i do that" you say? Because the default value of the compiler is Java 1.5, and now days, when the newest Java version is 8, 
I think we would like to get some new features to the language :)

So, straight forward, this is the way to do that: 
(That's for Lazy me to be able to copy/paste)



A little elaboration:
This all should be under the "<build>" tag, as a plugin, and the relevant attributes are: source and target.
They are regularly set to the same version, in our case we'll set them to the newest JDK version "1.8".

And for "dessert", even if it's a little application that does the stupidest thing, you might want to close is as a Runnable Jar File,
Every common IDE (Like Eclipse or IDEA) gives you the option to export your product, but, aren't we using Maven? so let's do that correctly...

We would do that via the "Maven Assembly Plugin" (The second plugin under the build tag).
You should only define the way you want the jar to be packed (With or without the dependencies), and the Main Class you'd like to run at the start up of the application.

If you want more information please read at the official Maven Site (You'll see much more useful plugins there).
Maven Compiler Plugin
Maven Assembly Plugin

Thank you all for reading,
Demi Ben-Ari

Friday, January 10, 2014

How to download the content of a whole File Server in Windows

Hi to all,
I have run into to a problem lately while trying to download the content of an external Yum repository of Linux RPMs for Offline use (The Cloudera Hadoop repository),
it's nice to just right click and download the file when we're talking about a couple of files,
but what will you do when we're talking about hundreds of files that take up to 19 GB??

Yes, my home PC is running windows, so the most convenient way I've found to download all of the repository is via "Wget", you've probably know the command in Linux OS, but there is an implementation in Windows as well.
It could be found at Wget 32 bit.
Just download it and lets start working:

First, you'll need to find the source directory you want to download,
and in the script, you should change the **Wanted Source URL** with the actual source you want to download.

Second, put the "wget.exe" file in the wanted directory, and put the Batch File shown below in the same place.

Here's the script:


Then just run the batch file with double click or via the windows command line.
The process might take a while, but eventually all the files will be downloaded.

Take to notice that you won't need all the files you've downloaded, because it actually takes everything,
so i've added some flags to Wget that will filter some of the unneeded files, but you can't adjust the command with more properties that you can read in the Wget Usage.


But i'll explain the flags I've used briefly: (for example the URL: http://hostname/aaa/bbb/ccc/ddd/)
wget – recursively download all files from certain given directory
--execute="robots = off": Specify whether the norobots convention is respected by Wget
--mirror: Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps ftp directory listings
--convert-links: After the download is complete, convert the links in the document to make them suitable for local viewing
--no-parent: Do not ever ascend to the parent directory when retrieving recursively
--wait=5: Wait the specified number of seconds between the retrievals.
-R: Specify comma-separated lists of file name suffixes or patterns to accept or reject

So for conclusion, you'll end up with all of the files in your local machine,
ready for transfer for offline use.

Hope this helps you to avoid a lot of hard labor of downloading all of your files manually. (And to tell you the truth it might be more efficient than my prior post of Downloading an eclipse update site),

Demi Ben-Ari

Tuesday, December 17, 2013

What I Discovered while trying to Pass Default JVM Parameters

Hello everyone,
The next post is in fact because of a problem my friend encountered while trying to run the JVisualvm and JConsole on a Windows Platform, That are both *.exe files that come with the JDK, and eventually run a JVM,
Just for some background, he was trying to run them on a Virtual VMWare station,
that for some odd reason was launching the wanted application but showing only a window with a Black background with no actual action that you can do with it, but to close it.

I thought that it might be a problem with the Graphics Card, because the implementation of the JVM tries to draw the GUI via the graphics card by default unless you tell it to do something else.
You can do it by passing the JVM parameter: "-Dsun.java2d.noddraw=true".

(You can read about more Java2D parameters at the next link)
But this is all fine and well but the problem is that there was no way to pass JVM parameters via the command line to *.exe file.

Solution:
The work around to this problem is the default JVM Parameters that you can pass to all the JVM's that will be launched in your machine
(No matter if an executable file like "exe" runs them):
1)      "_JAVA_OPTIONS"
or
2)      "JAVA_TOOL_OPTIONS"
(I saw another option you can read about it in this link - that will work on an IBM JVM, "IBM_JAVA_OPTIONS", but I've tried it only on an Oracle Hotspot JVM – 1.7u45)
And if you really want to go deep, the actual call that parses the JVM parameters in the JVM implementation is in the next link, it's part of the openJDK, in the method "parse_vm_init_args" at line 1734, where you can see that the "JAVA_TOOL_OPTIONS" is being parsed first, and the   "_JAVA_OPTIONS" second.

The usage is pretty simple, you put an environment variable in your OS (No matter if it's windows or Linux) with the wanted value,
 (For our example:  "_JAVA_OPTIONS =-Dsun.java2d.noddraw=true")
and when the JVM launches it will pick up the wanted value, you can see it in the next example when I ran it in the cmd:



What you can see is an environment variable being set to the wanted value, and a java program being run with no JVM parameters, and then the JVM while lunching picking up the wanted value.
While running the same application you can see the passed parameters via the JVisualVM application:



If you'd like to read some more about other kind of JVM parameters that you can pass at startup, you can read the next post in a blog I saw.

I wrote a little java program that let's you list out all the parameters passed to the JVM:


So, for conclusion,
When you'd like to pass default parameters to the JVM without mentioning them literally,
or to pass them to an executable that runs a JVM, you can do it in the methods mentioned above,
by the way, the workaround I mentioned help my friend out J
Hope this help some else,
Demi Ben-Ari.