Amazon Web Services (EC2 & S3) – The Future of Data Centre Computing? Part 2

Since my last blog entry on AWS, there has been a number of interesting developments. Firstly, I said before that there were no documented success stories on [tag]EC2[/tag]; well now there are! This is a clear indication that the EC2 community is alive and kicking and that, finally, people are starting to find a purpose for EC2. Maybe the most interesting of these, from a business potential point of view, is gumiyo.com; an ‘end to end mobile commerce platform’, connecting buyers with sellers, either on the mobile, hand-held devices or on the web. Up to this point, most ventures into EC2 consisted of social networking or media sharing applications, so it is nice to see something that has a real dollar value being rolled out on EC2.

We, DSI, are not in the business of social networking or media sharing. They have their place in the world and connect and enrich many millions of peoples’ lives, but we deal in data, generally PHI (Private Health Information) or other privately identifiable data. This type of data generally falls under [tag]HIPAA[/tag] or [tag]SAS 70[/tag], or some such, regulation and thus have incredibly stringent security and accessibility requirements. With this in mind, and following on from the work we have done so far with EC2, we have four topics for investigation:

  • Scalability
  • Reliability
  • Security
  • Integration

Scalability and Reliability

For me, scalability and reliability go hand-in-hand. Any user of a system must be confident in the fact that the system will respond in a reliably consistent manner; how can we guarantee this level of service? Well, generally we are contractually obliged to meet predefined [tag]SLAs[/tag] (Service Level Agreements). Often these SLAs can be very difficult and expensive to meet; and what about when the unexpected happens? Server failure? Periods of increased traffic? How can we confront these challenges and still come out on top? Do we purchase a few hundred thousand dollars worth of hardware and leave it sitting there, just in the case the unexpected happens? Unused computing power is a very expensive method of heating a room. ;-)

Ok, back to EC2! We mentioned before that EC2 does not have multicast support. This meant that we could not use the out-of-the-box tomcat clustering features on on Spring JPetstore application.This lead me to look at the offering of [tag]Terracotta Sessions[/tag] from the good people of Terracotta, who have open sourced their JVM clustering solution. Their solutions include:

  • [tag]Terracotta DSO[/tag] (Distributed Shared Objects)
  • Terracotta Sessions
  • and [tag]Terracotta Spring[/tag]

The solution of interest to us was Terracotta Sessions, using a Terracotta server as the backbone to our Tomcat clustering solution. We wanted our deployment topology to look something like

EC2 Topology
We wanted to use an [tag]Apache HTTP Server[/tag] configured for load balancing across our clustered Tomcat instances running on EC2. This would allow us to expose a familiar URL to users of the system and not one that was dependant on the IP or resolved name of a running EC2 instance. This also satisfies one of the prerequisites to calling a system scalable and reliable; load balancing.

Terracotta Sessions: Installation and Configuration

Download the Linux distribution of Terracotta Sessions and ftp to a running EC2 instance. Extract the TAR to your folder of choice. Now you’re ready to start configuring and running Terrcotta Sessions.

Terracotta comes bundled with its own JRE, and relies upon its classes being loaded at the bootclasspath level for any JVM wishing to join the cluster. If you wish Terracotta to use a preinstalled JRE, there are a couple of things you will need to do:

  1. Create a soft, or symbolic, link from the TC jre directory to your preinstalled JRE/JDK directory. This can be achieved by running ln -s source_dir target_dir e.g. ln -s jre /use/local/apps/jdk1.5.0_10
  2. You will need to run a make-boot-jar.sh script which will allow Terracotta to create a boot JAR specific to your JDK. This jar can then be prepended on the bootclasspath in the java options of the Tomcat instance to be clustered. The make-boot-jar.sh file can be found in $TERRACOTTA_HOME/sessions/bin and the resulting JAR file can be found in $TERRACOTTA_HOME/common/lib/dso-boot

We are nearly ready to start the Terracotta server. Now we need to provide it with a configuration file to tell it what to cluster. An example of a tomcat config file can be found at $TERRACOTTA_HOME/sessions/config-samples. Take a copy of this, and in the section ‘web-applications‘, add your application name deployed under tomcat. In our case, we added jpetstore. Start your Terracotta server by calling $TERRACOTTA_HOME/sessions/bin/start-tc-server.sh -f path/to/config/file/tc-config.xml. This will start the server on port 9520 and DSO on port 9510. JVMs wishing to join the cluster will do so by communicating with the DSO on port 9510.

The Terracotta documentation details how to integrate a Tomcat server into the Terracotta cluster. Basically, it means the addition of java options to the catalina.sh file e.g.


# Terracotta Sessions setup
TC_INSTALL_DIR="/home/jason/terracotta-2.2.1"
DSO_BOOT_JAR=
"${TC_INSTALL_DIR}/common/lib/dso-boot/dso-boot-hotspot_linux_150_10.jar"

JAVA_OPTS="${JAVA_OPTS} -Xbootclasspath/p:${DSO_BOOT_JAR}"
JAVA_OPTS="${JAVA_OPTS} -Dtc.install-root=${TC_INSTALL_DIR}"
JAVA_OPTS="${JAVA_OPTS} -Dtc.config=192.168.2.1:9510"
JAVA_OPTS="${JAVA_OPTS} -Dwebserver.log.name=tomcat_1"
JAVA_OPTS="${JAVA_OPTS} -Dcom.sun.management.jmxremote"
export JAVA_OPTS
echo "Using JAVA_OPTS: " ${JAVA_OPTS}

Two options are important here; the DSO_BOOT_JAR and -Dtc.config options.
The option DSO_BOOT_JAR refers to the boot JAR that was created by running make-boot-jar.sh script earlier in process and -Dtc.config tells the Terracotta client to contact the running DSO process that was started earlier and download its configuration settings.

On starting the Tomcat server you should see a message similar to


Terracotta, version 2.2 as of 20061203-151234

If you see this, then your Tomcat server is now part of the Terracotta cluster.

I hadn’t intended on this blog becoming a technical how-to in setting up clustered Tomcat servers on EC2 with Apache HTTP Server load balancing, but I felt that the steps to getting Terracotta working warranted a mention as some of the documentation is a little disjointed. But if anyone has any further questions on getting Terracotta up and running or how to configure to Apache load balancing, then post a comment and I would be more then happy to try and help.

Ok! You got that working; what next?

I tested the Tomcat cluster by starting a user session on JPetstore, adding some items to my shopping cart and then shutting down each Tomcat instance in sequence. You can tell which instance is currently serving your requests by looking at the logs. The sequence of events goes something like:

  1. Add item to shopping cart
  2. Shut down server_1 processing request. Two tomcats in cluster.
  3. Add item to shopping cart.
  4. Shut down server_2 processing request. One tomcat in cluster.
  5. Add item to shopping cart.
  6. Start server_1 and shutdown server_3.
  7. All items are still listed in shopping cart being processed by server_1

This proves, for us, that a scalable, reliable platform can be built on EC2 at a very low cost. In fact, the infrastructure that we just put together could be rolled into production with no associated licensing costs. Terracotta sessions is distributed with a license that allows you to cluster session state in up to four Tomcat instances for free (This is actually incorrect; as mentioned by Ari in one of the comments. Terracotta is now OSS, and as such is free for any number of nodes, but users can pay Terracotta for 7×24 support).


I think we can cross the topics Scalability and Reliability off of our todo list. Next up are Security and Integration. Topics like tunnelling, VPNs, VLANs, DMZs, a all pop up their heads, and right now, we don’t have any firm answers on these. The data to be processed needs to reside somewhere and we have discussed the possibility of creating a localized database instance on EC2 and then synchronizing the data with the primary data in the blue network. We have a lot of work to do before we get to this stage, and that will be the topic of the next blog entry.

Jay

You may also be interested in:

Introduction and Performance Comparison of an EC2 instance and a 2.0 Ghz Dual Core Centrino Laptop

and

Setting up a Gnome Desktop Environment on EC2 and Access Remotely Using FreeNx

  1. #1 by Orion Letizi on February 15, 2007 - 4:15 pm

    I just published a blog entry about using Terracotta on EC2. It’s interesting that you had the same idea and published it in the same day.

  2. #2 by Orion Letizi on February 15, 2007 - 4:16 pm

    er… week.

  3. #3 by jay on February 15, 2007 - 4:27 pm

    Ah, great minds think alike ;-)
    Like you, I am really, really impressed by EC2 and when we got a fully clustered and load balanced app up and running, we were like, wow, this thing has real potential!
    Kudos to you guys for making Terracotta available. It is a seriously cool product, and without it I think I would have given up long ago on the clustered/load balanced idea.
    Your DSO Shared Work Queue has given me some fresh ideas on how EC2 might be used. I thought of something similar but using a different job distibution mechanism, but looks like DSO is worth further consideration.
    Keep up the good work.

  4. #4 by ARI ZILKA on February 15, 2007 - 4:38 pm

    FYI,

    Terracotta _used_ to be free for 4 nodes. It is now OSS, which means it is free for any number of nodes. People who want 7×24 pay us a yearly fee but we take our .org support offering very seriously.

    Enjoy,

    –Ari

  5. #5 by jay on February 15, 2007 - 4:47 pm

    That’s great news! Thanks for clearing that up. I’ll amend the blog to reflect this.

    Cheers,
    Jay

  6. #6 by Gokul.S.Kartha on February 21, 2007 - 4:13 am

    Hi
    Looks good,i was hunting for a tool like this to get my tomcat cluster up on EC2…

    Regards
    Gokul.S.Kartha
    Software Engineer
    Device Driven(India)

  7. #7 by Anoop on February 23, 2007 - 9:15 am

    Hi,
    Thats a very neat tool. But could explain how we could use one of the Terracota products to cluster the 2nd level cache of Hibernate using OSCache in EC2? Was able to do it without any trouble in company network. Unfortunately EC2 doesnt support multicast (using jgroups) which is used by OSCache to cluster the cache. Have an alternative to use JMS instead of jgroups, but was thinking is terracotta would solve this problem.

    Regards,
    Anoop

  8. #8 by Jay on February 23, 2007 - 10:04 am

    Hi Anoop,

    I think what you are looking for here is Terracotta DSO. You will need to get your hands on the OSCache source code to make this work properly and to be able to set root objects, distributed methods, locks and transient fields and their initialization properties. A good starting point might be to read Kirk Pepperdine’s (of Java Performance Tuning fame) white paper on DSO. He actually does a brief comparison with JMS.

    This is something that is on my radar to play around with in the coming weeks. We also use Hibernate extensively, and for now, are using OSCache as the 2nd level cache. I have a good enough feeling that we will be able to get it to work, and when I do, I will blog about the experience.

    Keep me posted on your progress in getting this working.

    Jay

  9. #9 by Anoop on February 27, 2007 - 11:43 am

    Sure Jay, but I hope you would be putting it onto your blog soon for all the wonderful Java community out there.

    Anoop,
    Technical Project Manager,
    Device Driven (India),
    http://www.devicedriven.com

  10. #10 by Jay on February 28, 2007 - 4:03 pm

    I had started to look into this when two things became evident:
    1. Have a look at Terracottas’ Common Use Cases. At the very bottom of the page, as TC say themselves ‘Terracotta is currently in the lab with Hibernate and will pick up our heads and show the community what we are thinking once we have an alpha.’…. So we may have to wait a while longer for an update on this. Maybe Ari could give some more information as to when Hibernate might be supported.
    2. When looking through the OSCache source code, the AbstractConcurrentReadCache (foundation of their caching mechanism) extends java.util.AbstractMap. This class is not supported by Terrcotta. Go to the TC unsupported classes web page for more information.
    I guess this leaves the topic of Hibernate caching in the capable hands of the Terracotta guys.
    Maybe something like Tangosol Coherence might be worth a look. Has anyone had any experience with this?

    Jay

  11. #11 by kenwimer on March 13, 2007 - 2:42 pm

    Good article. I’ll have to give terracotta a go if I ec2 opens up their beta again.

  12. #12 by Steve on April 5, 2007 - 11:22 pm

    Good stuff!

    A version of EhCache clustered with a config module is in trunk. I’ve also seen a replacement for JBoss Cache clustered with Terracotta that performs really well. I suspect it would be quite easy to cluster OSCache with Terracotta as well. If one needs help with this kind of stuff just hit the forums, mailing lists, and/or irc channels! Clustering of Hibernate proxies is in dev right and should be in trunk in the next few weeks.

    Best part about being open source is how fast products move along and get integrated.

    Steve

  13. #13 by Jay on April 10, 2007 - 3:33 pm

    Hi Steve,

    When you say ‘A version of EhCache clustered with a config module is in trunk’, what exactly do you mean? I haven’t heard the term ‘in trunk’ before, so I’m not quite sure what you mean by it. Is there a link you could provide us with which might provide some more information?

    As for getting OSCache working, I looked at this before, but it looks like OSCache is backed by one of the Terracotta unsupported classes – ‘……the AbstractConcurrentReadCache (foundation of their caching mechanism) extends java.util.AbstractMap. This class is not supported by Terrcotta.’

    Do you think this no longer a limiting factor in getting OSCache up and running with Terracotta.

    Jay

  14. #14 by Jay on April 10, 2007 - 3:51 pm

    Just did a little checking myself, and looking at the latest nightly build of Terracotta 2.3, in the release notes you can see the following:

    Additional java.util Classes Supported

    * CDV-51
    o java.util.AbstractList
    o java.util.AbstractMap

    So with AbstractMap now being supported, I think I’ll pick up where I left off on the getting OSCache working with Terracotta. :-)

    Also, the release note indicates support for IBatis integration:

    iBATIS

    * CDV-44 Cluster iBATIS Generated POJOs from O/R mapping engine

    Great work guys! Keep it up.

    I’ll post here as soon as I get something working on the OSCache and Terracotta side of things.

    Jay

  15. #15 by Taylor on June 1, 2007 - 1:20 am

    Just wanted to leave a quick note to clarify – “trunk” is our development line, so you can download from our download page – http://www.terracotta.org/confluence/display/orgsite/Download by selecting the “trunk” tab.

    More details on EHCache support are here – http://www.terracotta.org/confluence/display/integrations/EHCache

    Also, OSCache has been clustered by us for a long time, but we have never had any effort to wrap it up into a Config Module ala EHCache. A forum user had success with OSCache (and describes how to get it working) here – http://forums.terracotta.org/forums/posts/list/197.page

  16. #16 by Jay on June 5, 2007 - 9:40 am

    Thanks for clarifying this Taylor and for providing more information on EHCache and OSCache integration with Terracotta. Once I get a bit of free time, I will certainly be looking at how we can leverage these cool technologies.

    Keep up the good work!

    Jay

  17. #17 by Paul M on November 6, 2007 - 10:49 am

    Just a followup to the comment about JGroups not working on EC2. JGroups has got TCP as well, which after some false starts I was able to get working on EC2. I am yet to see how different the performance is on the new larger instances.
    The network between nodes is the bottleneck on EC2.

    http://blog.vmdatamine.com/2007/11/pentaho-cluster-installing-jgroups-on.html

    Have Fun
    Paul

  18. #18 by Flora Bringard on August 25, 2011 - 1:14 pm

    Really appreciate you sharing this blog.Really looking forward to read more. Awesome.

  1. TheServerSide Java Symposium, Barcelona: Day 2 at DeCare Systems Ireland Blog
  2. Amazon Web Services (EC2 & S3) - The Future of Data Centre Computing? Part 1 at DeCare Systems Ireland Blog
  3. Amazon Web Services (EC2 & S3) - The Future of Data Centre Computing? Part 3 at DeCare Systems Ireland Blog
  4. Amazon Web Services (EC2 & S3) - The Future of Data Centre Computing? Part 4 at DeCare Systems Ireland Blog

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: