Programing Excavation: DataStax

Hi All,
After getting close to the limit of space in our Cassandra cluster, we've decided to add more nodes to it.
What might sound like a trivial task, "Because Cassandra is a scalable and robust Database!", is quite not trivial, at least without knowing some things.
So I'll tell you about the process and the things that I've learned during.

Let's lay out the main subjects that we're going to cover:
1) Launching the DataStax AMI Cassandra EC2 Servers.
2) Updating packages and configuring the new node.
3) Maintenance After adding a node/s
4) The insights the I've got from the last actions are -
This is the most important one, if you want jump right to it, because that is what I want you to comment on :)

So, Let's Start...

Launching the DataStax AMI Cassandra EC2 Servers:

1) Enter the AWS console on the ec2 section, and press the "Launch Instance".

2) Go to "Community AMIs" on the left and search for "DataStax" in the search box, and choose the "DataStax Auto-Clustering AMI 2.5.1-hvm". (Note: be sure that you are choosing the "2.5.1-hvm".
There is a newer AMI, but it's the one we are using in our cluster, so you better launch instances with the same AMI.
3) Choose the instance type you would like, we are using, i2.2xlarge, launch the same instance type like the other ones in your cluster.
4) Choose the amount of Instances you'd like to launch ("Number of instances").
5) Choose the the availability zone under the "Subnet" to "a" (the one that your Cassandra cluster is on).
6) In the "Advanced Details" under "User Data" choose "As text" and insert the next line:

--clustername **Cassandra-Cluster-Name** --totalnodes **Amount of nodes you'd like** --version community --opscenter no

The last flag, "--opscenter no" means that we are launching nodes that will be installed without the OpsCenter application, in case you set it to Yes or without anything, it will install the OpsCenter on one of the nodes (It's not needed to install in case you are adding nodes an existing cluster)

7) Press Add Storage (Nothing to change there) => 

    and then "Tag Instance", you can add a name e.g: "cassandra_" =>

    Press "Configure Security Group", the select "Select an existing security group" and choose: "vpc-secured" =>
    Press "Review and Launch" =>
    Press "Launch"

As a matter of fact, this whole part, is the same for launching a new cluster, just leave out the "--opscenter no" from the command in #6.

Updating packages and configuring the new node:

1) run "sudo apt-get update" - updating repository listing of the versions.
2) run "sudo apt-cache policy $SERVICE" - To see all of the possibilities and the currently installed version. ($SERVICE => cassandra / datastax-agent)
3) run "sudo apt-get upgrade $SERVICE" - To upgrade the specific service, and all of it's dependencies. ($SERVICE => cassandra / datastax-agent)
This will upgrade all the underlying packages like java and other things.

4) run "sudo apt-get install datastax-agent=5.1.3"
5) run "sudo apt-get install cassandra=2.1.7" (our DataStax agent version - check if changed on the other nodes before)

Our cassandra version is: 2.1.7, but any new Image you will launch will come with the newest cassandra: 2.2.0 let's say, so you'll need to downgrade to the relevant cassandra and datastax-agent versions.

6) Clearing the data from the new added node (Just in case there is something in the data directories - DataStax Docs).
"sudo rm -rf /var/lib/cassandra/commitlog/"
"sudo rm -rf /var/lib/cassandra/data/"
"sudo rm -rf /var/lib/cassandra/saved_caches/"

Cassandra.yaml changes: (It's located at: /etc/cassandra/cassandra.yaml)
- For our example, let's say the IP of the wanted server is: 123.0.0.10.

7) Change the value of "auto_bootstrap" to "true".

If this option has been set to false, you must set it to true.
This option is not listed in the default cassandra.yaml configuration file and defaults to true.

Details:
auto_bootstrap (Default: true) This setting has been removed from default configuration.
It makes new (non-seed) nodes automatically migrate the right data to themselves.
When initializing a fresh cluster without data, add auto_bootstrap: false

8) Change the seeds to being the running seed servers (I've chosen random 3 nodes from the cluster: 123.0.0.1,123.0.0.2,123.0.0.3)

seed_provider:

- class_name: org.apache.cassandra.locator.SimpleSeedProvider

parameters:

- seeds: "123.0.0.1,123.0.0.2,123.0.0.3"

9) Change the "listen_address" to the server's Private IP.

eg: "listen_address: 123.0.0.10"

10) Change the "broadcast_rpc_address" to the server's Private IP.

eg: "broadcast_rpc_address: 123.0.0.10"

11) Add the value of the opscenterd to the /var/lib/datastax-agent/conf/address.yaml

"stomp_interface: 123.0.0.100"
("123.0.0.100" is the private IP address of the OpsCenter running on our cluster)

12) We are ready to run the new added node, run:
"sudo service cassandra start"

To check the status of the added node run: "nodetool status"
When the new node changes from UJ => UN, the node is active and running and will accept read requests.
You will see it too via, "netstat -an | grep 9042" command, the server will be listening on port 9042, that is the client port on Cassandra.

13) After the bootstrap process is over, Change the value of "auto_bootstrap" to "false".

Notes:
1) Don't add the new node/s to the "seeds" section of the "cassandra.yaml", the new node need to receive data from the old seed nodes and not vice versa).
2) Some information about the Bootstrap process:

During the new node bootstrap process, when the the Cassandra node is receiving the new keys, it will receive Write requests, and is a part of the ring,
but will not accept any Read requests until the bootstrap process is not over.

DataStax Documentation: Adding a node to the cluster: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html

Maintenance After adding a node/s

Run a cleanup process of the old (irrelevant keys in the node - ones that were transferred to the new node / nodes).
Cassandra doesn't handle keys that were removed automatically.
Run the command: "nohup nodetool cleanup > cleanup.log &"
We are running it in the background and with nohup because it's a long task and you don't want to be dependant on the ssh connection to the server. (DataStax Docs)

Checking if the cleanup process is running:
"ps -ef | grep cleanup"
You'll see a process if the cleanup is still running.

Useful tools / methods to check about the processes that are running:

1) "nodetool netstats" - See all of the files transferred from the other servers of the cassandra ring.
2) "nodetool compactionstats" - See all of the running compaction tasks, and Cleanup are shown too.
3) "nodetool tpstats" - See all of the statuses and pending tasks in the node. (The relevant property is: "CompactionExecutor")
4) "netstat -an | grep 7000" - Show the connections to the other nodes in the cluster during the "Streaming" process during the sync of the new node - Cassandra nodes communicate with each other via port 7000)

The sync process for our type of nodes (i2.2xlarge) - took around 1 day, ~1TB of moved data,

And the cleanup around the same time.

The insights the I've got from the last actions are:

1) You need to consider the instance types on AWS that you are running your Cassandra Cluster on,
We found that Cost / Effective, the i2.2xlarge are the most profitable one, because it has 2 SSD disks, each 800GB, which is quite a lot, but:
"With great storage comes great..." - network transfer when needing to rebalance the cluster :)

You actually don't add nodes every day, so you might say it's a process the you rarely do,

But think about the issue of failure,
Cassandra handles well because of its Robustness, It needs to transfer all of the tokens of the failed node in case of failure, so rebalancing the cluster, to another node will have a great load on performance of the seed nodes and on the new node as well.

My thought is that you are better of having more smaller nodes, than having a few very large nodes.

2) Out of the box, Cassandra seems like a simple database to administrate, and probably the default configurations might be good enough for most users,
But once you need to do maintenance to your cluster, you need to have deeper knowledge about what you are doing.

If you have some more things that you know, and want to share from your experience, I would love to hear about it,
More important, if you think I'm doing something wrong :)

Any questions of things that are not clear enough are more than welcome!

Hope you'll learn from this post as much as I've learned

P.S,

Adding some useful links that I've used during the process:

1) Adding nodes to an existing cluster - DataStax
2) Planning an Amazon EC2 cluster - DataStax

The Use Case:

The Data is currently on Amazon’s S3 storage system and most it is time series data.

We compute and analyze our data using an “Apache Spark” cluster.
While trying to migrate the data to a more efficient storage model (both reads and writes), we tried out Apache Cassandra Database, and during running the first benchmark, we encountered a lot of failures during the “write” part of the Spark nodes.

I'm using the DataStax Java API spark-cassandra-connector.

(You can find some old but useful examples in the DataStax Blog and in the DataStax JavaAPI Documentation)

I started off with using a 2 node Cassandra Cluster, running on 2 c3.2xlarge machines on AWS EC2. I used the DataStax Community AMI to create the nodes and connected two of them into a cluster.

I’ve written a Spark Java application that reads the data from S3 and writes out the RDDs to the Cassandra cluster.

During the “runJob at RDDFunctions.scala:24” stage there were a lot of failures like:

“java.io.IOException: Failed to write 273 batches to test.some_cassandra_table.

……”

Finally the Spark application would fail because of the failures, so i tried to find out the cause of the problem, i went to one of the Cassandra nodes and checked the nodes log located at:

“/var/log/cassandra/system.log”, and many warning messages of the next type coming in all the time:

“

WARN [SharedPool-Worker-132] 2015-03-19 07:44:43,229 BatchStatement.java:243 - Batch of prepared statements for [test.some_cassandra_table] is of size 5264, exceeding specified threshold of 5120 by 144.

“

After looking for the solution to my problem on Google and coming up with nothing except a variation of the next StackOverFlow answer in many sites, that did not help much, i started looking for the meaning of the Log messages.

I found the definition of the “Batch size threshold” in the Cassandra nodes at “/etc/cassandra/cassandra.yaml”:

batch_size_warn_threshold_in_kb: 5

Ending up to 5120 bytes, and that’s the value in the logs.

The next step was trying to figure out how to change the write batch size of the DataStax cassandra-spark-driver, and i found the next documentation reference: Link

In the “Tuning” paragraph in mentioned the “spark.cassandra.output.batch.size.rows” parameter you can set to the SparkConf while creating the JavaSparkContext, and i changed it to 5120, but it didn’t give the wanted effect, it multiplied the batch size to being much higher.

It’s default is “auto”, and the “auto”s outcome was much better.

So, after being frustrated with the outcome, i went on reading the source code of the cassandra driver, in the scala class: “WriteConf.scala” and found the usage of another parameter that really made a real change, when using the “auto” default the WriteConf goes straight to the second parameters, “spark.cassandra.output.batch.size.bytes”, value.

I changed the value to being 8192 (instead of the deafult: 1024 * 16 = 16384), making the batch size smaller, and things started working fine.

Although the log message is a warning, it was working some of the time, and sometimes failing, but with changing the parameter, it did not fail anymore.

And if you are asking why the threshold of 5K, i found the next explanation: Link.

Quote: Key reasoning for the desire comes from Patrick McFadden

"Yes that was in bytes. Just in my own experience, I don't recommend more

than ~100 mutations per batch. Doing some quick math I came up with 5k as

100 x 50 byte mutations.

Totally up for debate."

So as we see we can tune these things a bit more, depends on the queries we want to run and depends on the actual environment.

In the future i’ll play a bit more with these configuration, but for now it’s enough for me to continue the benchmark with no errors :)

I felt that someone else must of had the same problem but i didn’t find anything about it written in the internet,

If anyone has a better solution or some other insights i would love to hear, please comment to this post.

I hope this might help someone else.

Programing Excavation

Thursday, August 20, 2015

Ramping Up a Cassandra Node on AWS EC2

Thursday, March 19, 2015

Write Batch Size Error - spark-cassandra-connector