Following on from our last blog entry on Amazon’s [tag]Elastic Computing Cloud[/tag] ([tag]EC2[/tag]) platform, in this blog we are going to explore two of the remaining topics on our original five-point roadmap. So far we have covered:
Next up we want to tackle, probably the two most important topics when it comes to making EC2 a viable platform for commercial business:
Much of the data that we deal with on a day-to-day basis is of a sensitive nature. Depending on your business you may define what you mean to be sensitive data, but I think there can be arguing that protected health information and financial data is of an extreme sensitive nature.
Persisting data of this nature either to file, database or some other persistent medium that existed inside the Amazon network was never really an option for us. Storing this this type data outside of our corporate LAN or that of one of partners was never going to make the legal folks all that happy and caused some uncomfortable shuffling from those who were aware of the potential risks.
Amazon have pretty much completely indemnified themselves from any responsibility for any breach of the security provisions that they have in place for EC2/S3. With that in mind, I just don’t know if we can ‘trust’ Amazon just yet. I have trust in quotes because I don’t want to to be taken in the literal sense of the word. We trust Amazon as much as we trust any third party that we deal with. In fact, who doesn’t trust Amazon? The amount of parcels arriving here daily is a clear indicator of the level of trust that the general public have for Amazon. The difference in this scenario is that Amazon do not seem to have a clear duty of care to our sensitive data. There in lies the problem. I’m not a legal expert; I read license agreements when necessary and happily bow to the expertise of those of a legal frame of mind. Indeed if someone would like to comment on the position of Amazon Web Services and our sensitive data, I would be very much obliged.
But I digress. Let’s get back to the topic at hand; security and integration of EC2 with our internal data and services. Once we focused on the problem at hand, the solution became obvious. The solution was a technology that we use daily in our lives to provide secure methods of communication with remote processes; [tag]SSH Tunneling[/tag] or Port Forwarding as it is sometimes know. If we could tunnel into our corporate LAN, we would be able to transfer what would normally be unsecured TCP traffic over a secured channel.
The below diagram will give a better of idea of what we were proposing:
For obvious reasons, we could not allow direct access from the EC2 network into our LAN. We can see from the diagram that we employ an [tag]OpenBSD[/tag] proxy which simply brokers all incoming [tag]SSH[/tag] tunnels, through our internal firewall to an OpenBSD gateway. From this gateway, we can port forward to pretty much any service we choose. OpenBSD was chosen for its secure-by-default nature i.e. an administrator does not need to lock down services, and it’s excellent reputation when it comes to security.
Of course, when we first came up with this idea, we wanted to ensure that SSH best practises were followed at all times. Let’s walk through what was put in place:
- Each EC2 instance runs within a security group. Each group has a dedicated firewall. A white-list was created that allowed only specific access to the instances over the standard SSH port 22.
- A rule was added to the external DSI firewall to allow access to our proxy from the external EC2 IP address.
- A rule was added to the internal DSI firewall to allow access to our gateway server from the proxy.
- Non-root user accounts were created on both the proxy and gateway servers
- A public/private key pair were created for each user
- Key phrases were created for each key pair for increased security
- Implementing a keychain for management of the ssh-agent process
- Root logins was disabled
- SSH2 is being employed over standard SSH
If you do not implement a public-key cryptography, then on creation of an SSH connection, it will default to password authentication. This would make scripting of the tunnel creation difficult.
Noel Keating, a Senior Developer here in DSI, did some great work in implementing these best practises.
Putting It All Together
With all this in place, we were then able to script the creation of necessary tunnels from EC2 into our LAN. An example of a command to create a tunnel for the purposes of JDBC access to one of our database servers would be:
ssh -T -l jbloggs -g -L 1111:localhost:1111 22.214.171.124 ssh -N -g -l sbloggs -L 1111:anotherdb.decaresystems.ie:1521 126.96.36.199 &
The net effect of this would be that the EC2 instance on which this command was executed would have JDBC access to a database server using a JDBC url like jdbc:oracle:thin://localhost:1111:SOMEDB
What Does This Mean For Us?
The fact that we can access data over JDBC isn’t really all that impressive. But let’s put this in context; we are accessing data on our LAN, over a secure channel, from a platform that offers as much processing power as we could conceivably need.
What does this mean in the long run?
Many, what are now standard, processes have been put in place to enhance our development process; from the implementation of continuous builds, performance management to the hosting of test and user acceptance environments. Two major issues that consistently pop up when introducing a new project or as an existing project goes moves through its various phases are a) provisioning of hardware and b) managing multiple processes running on the hardware that we have.
We now have an environment where we do not have to consider rack space, power outlets or air-conditioning. We can provision dedicated hardware when we need to. Gone are the headaches of ‘we need to bring down process A so that we can run process B’, and all this can be done is very cost effective manner.
Today, we are running one of our build cycles on EC2, accessing our CVS repository over SSH. These builds are being deployed to application servers also running on the EC2 platform in a separate security group. We hope to build this out so that one of our application suites is running across multiple EC2 instances with the necessary routing of requests being taken care of by Apache HTTP Server redirects. One of the downsides of using EC2 is that you are limited to 1.7Gb of physical RAM. A full application suite probably requires more in the region of 16Gb, So, in this case, it is simply not possible to run the full suite on one instance.
In the near future, we hope to move more builds to EC2, implement a continuous performance management process and begin hosting more and more test environments. With the availability of this on-demand processing power, we aim to further increase our code quality, reduce server downtime and further enable our teams to do what they do best; write quality software.