Tuesday, July 5, 2011

Tomcat clustering on Amazon

This blog post is actually from a coworker of mine, Jason Bennett. For our work, we had to get Tomcat clustering running on Amazon and had a real time of it. So, for the benefit of others, he posted his experience (Thank Jason!)

Tomcat session replication on Amazon Web Services (AWS) is not much different from replication in any machine cluster, with the major exception that Amazon does not allow multicast traffic, the usual basis of Tomcat replication. Tomcat does, however, support static replication, which works just fine in Amazon’s cloud.

Initially, read the Tomcat session replication documentation and follow the basic configuration (under For The Impatient). Note that the FarmWarDeployer is not necessary if you don’t care about deploying once across the cluster. Make sure to set the NioReceiver port if needed, and ensure that this port is allowed under your security configuration. This last point is very important! Most of the problems you will encounter in setting up replication are Amazon configuration problems, not Tomcat problems. If you are running two Tomcats on the same box (for some reason), these ports must be different on the two instances.

Next, review the Cluster Basics in the documentation and ensure that you have followed ALL of them. Double check your time settings, make sure your two tomcats are accessed via a single URL, and that your web.xml file is marked properly.

Finally, we need to configure the static cluster. Refer to the documentation and paste the example within the tag. Make the port attribute match the value in your NioReceiver, the host attribute match another machine in your cluster, and the uniqueId field be different from any other static entry in this file. Add an additional entry for each member of your cluster. More than likely, you will paste the template and change only the host and uniqueId values.

At this point, start up your servers. You should see an entry like this:

org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://10.10.40.101:4000,10.10.40.101,4000, alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 }, payload={}, command={}, domain={115 98 45 98 117 105 108 100 101 ...(15)}, ]

Your first server should have an entry that says “no other members found” while your remaining servers will connect with a string like the above.

Possible reasons for failure:
1. Make sure that you’ve allowed the required ports in the Amazon config. This is your most likely source of problems, especially if you got it to work outside of the Amazon.
2. Double-check your configuration. Make sure that the ports are not being used elsewhere, and that your static configuration port matches your NioReceiver port on the OTHER machine.
3. Make sure you’re accessing both machines via the same URL. If you use different URLs, the sessions will be different.

If you still cannot get the replication to work, try setting up two Tomcat instances on a single machine. Getting that setup to work should prove the concept, meaning any issues across machines are network problems or configuration typos.

Good luck, and remember that it can work!