Programing Excavation: September 2014

Friday, September 26, 2014

What are you Logging about?

Every application today needs some kind of way to report out it’s actions or current state. The main reasons that we would need there reportings are:

Debugging
Monitoring the Applications (components) Actions
Offline and Backwards Analytics

In this case we are talking about all of these actions in a JVM based environment.

Mostly when we use the phrase “To Log” in programming it means that we want to output information from our application out a user / administrator / logs monitor.

Two things that actually define a “Log File” that is a lot of Rows of:

Time Stamp - some kind representation of it.
Message - some kind of information about the event that occurred.

What are we going to do?

We are going to go over some of the most famous and commonly used Logging Frameworks in Java, get to know the Pros and Cons of some of them, Show some code samples and examples (you can find the full example project on GitHub), finally get to a Conclusion which is the best one to use?.

What will we be covering:

Logging Common Phrases
Console logging
Log4J
Logback
Log4J 2
Log4J2 vs Logback - Performance
SLF4J
Conclusion

(We are not going to cover the “java.util.logging”)

Logging Common Phrases

Logger -

Probably this will be the Object (I do hope it’s an interface) that you’ll be using to do all of your “Logging” through.

Logging level -

The logging frameworks have almost the same names for the logging levels, in some you’ll find them as method names and in some you’ll pass them as a parameter to the “log()” method with the message to be printed, but eventually the list is pretty the same (the order is from the most weak to the strongest): Trace, Debug, Info, Warn, Error, Fatal.

Root Logger -

This is a definition of the top most logger, all of the setting applied to it, most of the time, will propagate to it’s child loggers.

Appender -

Appenders are mostly definitions of pipes to different kinds of output methods to the logging framework, some of the most common are: stdout, strerr, FileAppender, RollingFileAppender, SocketAppender and more, you can understand their uses by their names.

Log File -

In the most stupid way, the actual file that you are writing all of your data to. It has the characteristics we’ve mentioned above in each row, “time stamp + message”, and sometime people name the log file as something associated with the Data or Timestamp (Like: “2014-09-26-06:45-Application.log”) to easily parse and search for log file later.

Console logging

The “Hardcore” or “Old School” Way! or, as i like to call it: “The Worst Way”.

This method is used mostly for small programs or at development time, you shouldn’t use it extensively because it makes your code a nightmare to maintain, and in a large scale application, when you are watching the console where you ran the application and you suddenly see the next line that someone wrote popping every second:

“$$$ ### I’m Here, This is not suppose to Happen ### $$$”.

or the uninformative one:

“%%%% ------ Yeah -------- %%%%”

You probably want to kill the person who did it. (If say you haven’t seen something like this in the code, you are Lying!!!)

And eventually if you want to change something after, you find yourself doing search and replace over “System.out”....and changing a lot of code.

A bit about Threading, there is a difference between System.out and System.err that you would probably not notice in the documentation, the System.err ensures that the Order of the messages that you print out via multiple thread with be like a FIFO queue because all of the things you write to the Standard Err is immediately shown to the user, when the System.out does not ensure the order. (Both of them access the “write()” method, that inside has a Synchronized block, but it’s about the implementation of the JVM and the Operation system behind that matters).

In short, (it’s a joke, i didn’t make it short at all)

try to avoid this method.

Log4J (Versions 1.2.X)

The veteran, it is here for long, since 2001, and he is very widely used in the world.

He is very flexible and convenient because you use one interface in you code to log all of you messages, and you can control the logging level and the output methods via a configuration file.

Lets see the example of a small Java application that i wrote to see some of the usages and the advantages and problems.

Worker.java

AbstractWorker.java

Log4JWorker.java

Log4Main.java

The explanation is in the code so i’ll write some points:

Double checking because of String concatenation.
Logger is being taken via a Factory by the class name.
There is a log level control by the methods being printed.
Log4J is a Thread safe framework.

To not lengthen this post more, there are some more important configuration settings explained in the Next Post.

So, how do we watch all of these Logs?

There is an open source project called “ChainSaw”, that is written in Java, and lets you see the logs in a Graphical user interface.

The thing is that the last commit to this project was around 6 years ago.

Logback

As they say in their website, “Logback is intended as a successor to the popular log4j project, picking up where log4j leaves off”. (You can read their Article about the reasons to prefer logback over log4j)

I agree with some parts, and i’ll point out some nice features that i liked about “Logback” that were missing in Log4J:

Automatic reloading of configuration files.
Automatic compression of old archived log files.
Conditional processing of configuration files in the XML - used for the same configuration file for different environments.
Implementation of Logback for the Android environment.
logback-classic speaks SLF4J natively (We will talk about that a bit later).
Log4J Translator - for configurations in case of migration

A small snippet of the logback.xml configuration file:

For more elaboration please read the Logback Manual.

Viewing logs:

Just like “Chainsaw” for log4j, we have “Lilith”, one of the differences is that this project is active today too.

(They have mentioned that Mac users might have problems with installations but they have an explanation for fixing it on their website)

Configuration of the path to the logback file:

Log4J 2

It is the contender of logback, and the newer version of “Log4j 1.2.X”

In their website they have a Benchmark of Performance saying that they are much better (faster) then logback.

I won’t talk much about the features of Log4J 2 because it’s mostly eye to eye with logback.

For more elaboration about Usage and configuration. Here you can find the “Log4J 2” user manual, and the Wiki pages as well.

Logback vs Log4J 2

And now for the interesting part that will convince us (or maybe not) about “Who is the winner?”

I didn’t do the comparison myself, but i’ll let you read a Great Post by Tal Weiss from “Takipi” (I really recommend reading their Blog), he did a benchmark of performance between the two and a couple of more frameworks, and the conclusion was that Log4J 2 is the winner regarding the performance.

SLF4J

After talking about implementations and benchmarks, eventually we get to the point (almost).

SLF4J, Simple Logging Facade for Java, is a logging implementation that allows the User (the programmer) to plug-in the desired logging system at deployment time (and development of course).

The bindings are made through the the connectors loaded to the Java Classpath.

SLF4J has a connector Jar for every major Logging framework.

The benchmark made by Takipi was implemented via SLF4J as well with all of the different implementations.

In the “logging-example” project i’ve implemented the same code as the Log4JWoker, you can view it Here, and download the project and run it yourself.

One of the coolest things you can notice is that you can avoid the Double check in case if the string concatenation.

Notice that the “{}” inside the message is a placeholder for the parameters you pass to the logging method, and is evaluated only if the proper logging level is enabled.

You’ll see if you run it that there is a clash of Multiple Bindings and the Log output is:

And you can see that the after the clash SLF4J runs with it’s default implementation of “Logback”.

Lidalia:

In logback and other frameworks the possibility for logging via the “log()” method, and providing the “Level” as a parameter was deprecated. Lidalia supplies a decorator for SLF4J, that allows you to access exactly that.

The list of the supported implementations of connectors is:

(and you can implement one of you own if you would like)

Conclusion:

We’ve stated and compared the different logging frameworks available in Java, and my recommendation is to use SLF4J as the facade to the logging issue, and leaving the decision to which underlying framework use to your personal taste or performance constraints.

This will leave your code clean and maintainable, and you’ll be able to use a general interface of logging over any implementation you’ll choose.

I would recommend using one the “Log4J 2” or the default “Logback” implementations, it really depends on your specific needs and tastes.

Some useful SLF4J and Log4J2 connector links:

SLF4J - Log4J 2 - Configurations

Maven repository - Log4J2 connector

Thank you very much for reading,

If any more elaboration is needed please write a comment :)

Log4J Configurations

Log4J configurations

Log4J Logger is defined via one of two kinds of files: Properties file, XML file.

(Commonly called “log4j.properties” or “log4j.xml”).

Lets see examples of the two:

log4j.properties

log4j.xml

Some explanation about the “Conversation Pattern”.

File Location and it’s Hierarchy

The framework has a hierarchy of search for the configuration file, at first it searches an environment variable, if it’s passed to it, if not, it searches in the Java Classpath (So most of the times you’ll find it in some “resources” directory inside the Jar or the WEB-INF in web applications),

And if after that it’s not found, A Configurator creates the configuration as default and runs the application with and error to the console saying that it didn’t find a configuration file.

How Run Log4j with external configuration

You’ll need to pass a variable to you JVM once you run it like this:

Configuration is made the same way for “log4j.xml”.

Tuesday, September 9, 2014

Do you think you know Maps in Java?

Hi to All,

One thing that i’ve encountered in the past time is the actual usage of the Map interface in Java, and it’s many implementations, or maybe the correct way to put it: “Unknown basics of Map usage”.

Most of us probably know that we use Map for the common Key-Value Pair,
and we use it because it has the efficient “get” method of O(1) right?

But what are we really talking about?
You might notice that “Map” is actually an Interface and not an implementation, So saying these things about Efficiency and Complexity is really not possible about the Interface itself.

What are we going to talk about?

Map and some HashMap implementation basics
Collisions
Hashing and Rehashing
TreeMap
LinkedHashMap
ConcurrentHashMap
Lets meet Multimap
Conclusion (For the lazy ones that don’t want to read all of my hard effort :) )

So, lets elaborate more about Map and actual usage cases of Implementations you might have heard of:

Map and Some HashMap implementation basics

The Implementation of HashMap exist since Java 1.2, one of our basic building blocks of programing in Java, but how much do we really know about it’s way of implementation,
so let’s talk about some basic elements of it and about the fundamental API of Map Interface:

Map.Entry Interface

First of all there is a Very important inner interface that we should know that takes case of the Key-Value structure in the Map: Map.Entry.

It is the entity that stores our value in each Node (And yes, the actual class is Node, it implements the interface) and implements the way that it stores the same Equal Keyed Objects in a Linked List manner.

Map Main Methods (These are not the only ones, but the ones i want to mention)

The elaboration is mainly by the Map implementation of HashMap.

V put(K key, V value) -

Inserts the wanted value in the correct place in the Map. In case the key exists by it’s hashCode, then the equals method is checked in the linked lists values, and replaced in case it’s true, and added if false.

V get(Object key)-

returns the value associated to the hashCode of the given Key, if absent then null is returned.

boolean containsKey(Object key) -

returns if there is a mapping between the Key and some value in the Map.

Basically it uses “get” and checks if the result is null or not (So it’s the same efficiency wise).

V remove(Object key) -

removes the entire mapping of the Key in the Map.

boolean remove(Object key, Object value)-

removes the the element key only if it’s value returns true for the equals() method of the mapped value in the Map.

Set keySet() -

returns the distinct KeySet of the Map, actually it returns an inner type that implements Set that is called KeySet.

Collection values() -

getting a Collection that represents all the values of the Map sequentially.

(Like most of the Map implementations, HashMap extends the Abstract class AbstractMap)

How is the data stored you ask?

The data in a HashMap is store as simple Array of Node values, that each one of them might hold a LinkedList like construct of values that have the same hashCode value.

And in case we get to a state the we exceed the size of the predefined array, the HashMap reallocated the space and creates a new array (Everything is made by the default values of the HashMap or the defined ones by the programmer in the creation of the map).

Collisions

One of the first things i remember that they told us at class is that the efficiency of “get” in Hash Maps is defined by the amount of collisions with Keys you have, that at the worst case are O(n).

That is pretty bad is you think of it, that every usage of “get” of “containsKey” will take you O(n) right?

But this is not exactly the case in the actual implementation of HashMap,

it all depends on the relation between the implementation of hashCode() method of the Key object and equals(), after trying to “put” a key-value to the map, first the hashCode is checked, and if it is present, then the equals method of the Key is checked, if true the value is replaced, in case of false, a new node is created and appended to the “LinkedList” of nodes in the current bucket.

This is the case that we will get the “Worst case” of O(n) during all of the operations of Map.

Hashing And Rehashing

The hash process and the actual resolve of the bucket (index in the array) that the key should be under is resolved in an inner hash() method that was implemented by the JDK writers, and takes the hashCode of the Object, and calculates the exact space in the array that the key is located at.

A Rehashing process occurs when the size of the Map is changed, and we need to resize the inner array.

The most important thing about hashing in all sorts of Maps you’ll use is actually the implementation of the Key object, it very important for the correct functionality of any sort of Map for the Keys to be Immutable!

If not, you can run into some unpleasant situations, that you will lose values in the Map because the keys hashCode() method is the one responsible for finding the value in the Map.

Using primitive types like Integer, Float, Double and etc are preferable, but sometimes you’ll need something more complex.

I've encountered some other useful Map Implementations during my time of coding:

TreeMap

What happens when we need to keep all of our elements sorted and to get the best efficiency possible with all of the actions (Because we know that sorting is a pain while trying to get good performance),
We can use the TreeMap data structure that does all of the operations in (log n) time complexity.

I've used is when I had to keep a data structure that my main importance was to retrieve all of the elements in sorted order each time i need the values.

LinkedHashMap

Another situation I needed a different kind of data structure was when i needed to keep the insertion order of my elements, so the right tool for the Job is LinkedHashMap.

Every time i got the collection of values, they were in the insertion order, but still kept the efficiency of O(1)in the main operations.

Visual Comparison Table

Implementation	HashMap	TreeMap	LinkedHashMap
Elements Order	No guaranteed order, but will be the same over time.	Sorting according the natural ordering of a Comparator of the elements	Insertion order of the elements to the Map
put, get, remove, containsKey	O(1)	O(log n)	O(1)
Base Interfaces	Map	Map, NavigableMap, SortedMap	Map
Null Values/ Keys	allowed	only values allowed to be null	allowed
Implementation structure	Buckets (Array of key-value nodes)	Red-Black Tree	Double-Linked Buckets
Is Synchronized	No	No	No

ConcurrentHashMap

What about multithreaded work you say? None of the above is Thread-Safe, so there is a great solution for that as well, the ConcurrentHashMap implementation.
I won’t elaborate much about it (This post is getting too long), but simply explaining, it works much like the HashMap, but with another layer of abstraction from above, you can think of it as many HashMaps in one big HashMap, and a Locking ,mechanism for each little HashMap, so we don’t have to lock the whole map in every operation we make on the Map.

(It is much more complex than I explained here, but if needed i’ll write another post about that alone).

There are more possibilities of synchronization of Maps in java via the Collections class,
Using: Collections.synchronizedMap() and etc. (Again, i won’t elaborate here).

Lets Meet MultiMap

Wait a second, thats not what they have promised us! Shouldn’t it just collide and save the other values under the same Hash Code in the Map?

What happens if we want to actually save the other values?

There is a solution, by Google, i really hope that most of you know the library “Guava”.

(You can find the reference by Maven dependency - current stable version of 18).

We will need this when we don’t want to rely on the equals method of the key and we know that for the same key we have many valid values, and we want to work with collections of these values.

Yes, as a stupid programer I found myself implementing it in the past without looking for ready solutions.

Using the MultiMap implementation of Guava makes life easy.

There are many implementation given by factories, just try the class Multimaps for assistance.

Conclusion

Now we understand more about the way that HashMap works, and about the time Complexity of all of the actions.

The main this you should take in consideration is that probably for every problem that you have there is a more appropriate Map implementation than the Naive HashMap, and you should get to know them.

You can read about some of them in the Java Tutorial, or to search for your own special need at our good friend “Google”.

Thank you very much for reading,

hope this post was helpful for you,

If any more elaboration is needed please write a comment :)