Recursive Digressions: cloud

Showing posts with label cloud. Show all posts

Wednesday, November 19, 2014

Building a Bitnami Tomcat Image using Docker

I am a long-time fan of Bitnami's prepackaged stacks. If you want to, for example, quickly stand up a new Drupal instance, Bitnami allows you to do this - using either a machine image with the stack pre-installed or a binary installer that you can run on the appropriate type of OS.

When I first learned about Docker, I thought of Bitnami and how it seemed a natural fit for them to offer Docker image versions of their stacks. It turns out that they are in the process of doing exactly that. However, at the time of this writing, they don't have these available, so I decided to build my own. What follows is a step by step recipe for taking the Bitnami Tomcat 7 installer and building a Docker image that captures the result of a successful install.

Step 0 - Create a VM and install Docker

I did this in a single step using Digital Ocean's ability to select OS / application combos - in this case Docker 1.3.1 on Ubuntu 14.04 (64 bit). To keep Bitnami's installer from complaining about memory (in Step 4) you are going to need at least a 2 GB VM. If you want to run multiple stacks side-by-side on the same VM, you are going to need at least 4GB.

Step 1 - Download the Bitnami Tomcat Installer onto Your VM

The easiest way to do this is use 'wget' on the VM:

root@vm:~# mkdir bitnami; cd bitnami

root@vm:~# wget https://bitnami.com/redirect/to/45854/bitnami-tomcatstack-7.0.57-0-linux-x64-installer.run

root@vm:~# chmod +x *.run

Note that Bitnami is always updating their downloads so, by the time you read this, the installer above may not be available. Just use the appropriate installer for your OS. Obviously you can also choose to use an earlier or later version of Tomcat.

Also note the I've saved the installer under a new directory (which we will reference in Step 3) and made it executable.

Step 2 - Download/Pull the Base Docker Image

Working with Docker is like baking sourdough bread; you need a little something to start with. I chose to use Docker's base Ubuntu image because (a) I really don't care which OS I'm running Tomcat on, and (b) I've used Bitnami's Tomcat stack on Ubuntu before and never had any problems.

root@vm:~# docker pull ubuntu

You should see a brief flurry of activity ending with:

Status: Downloaded newer image for ubuntu:latest

Step 3 - Start a Container

First I'll show you the command, then I'll explain the options:

root@vm:~# docker run --cap-add=ALL -i -p 80:80 -t -v /root/bitnami:/bitnami ubuntu /bin/bash

--cap-add=ALL: When it starts, Tomcat tries to set some capabilities (i.e. establish the privilege to do one or more "superuser like" things). By default Docker does not allow processes within a container to do this. This option allows processes within the container to set any capability they want. This is a sloppy and dangerous thing to do. I should dig into the Tomcat code and figure out exactly which capabilities it is requesting and grant only those capabilities (see the "principle of least privilege").

-v /root/bitnami:/bitnami: This option bind mounts "/root/bitnami" on the VM to "/bitnami" in the container. This will allow us to access the installer file from inside the container.

-p 80:80: By default the Apache web server listens on port 80. This option maps port 80 of the container to port 80 on our VM. Obviously you can map the container port to any free port on your VM (e.g 8080 using "-p 8080:80").

-i, -t: These two options connect you to the shell running inside the container.

ubuntu: This option specifies the image to run in the container. In this case it is the default Ubuntu image that we pulled in Step 2.

/bin/bash: This option tells Docker to run a bash shell inside the container.

At this point you should find yourself at a container-level prompt like:

root@d10f70897ce3:/#

Step 4 - Run the Bitnami Installer

Next we want to run the Tomcat installer to install Apache, Tomcat, and MySQL into our container:

/bitnami/bitnami-tomcatstack-7.0.57-0-linux-x64-installer.run --mode unattended

This command will take a couple of minutes to complete, so be patient. If all goes well you should return to the container-level prompt where you can poke around a bit to check things out. A "ps -ef" should show you the Apache, MySQL, and Tomcat processes, there should be an "/opt/tomcatstack-7.0.57-0 directory", etc. You can test whether Apache is up and accessible by browsing to "http://<your VM address>/". You should see the welcome page for the Bitnami Tomcat stack.

Note that the way in which we installed Apache, MySQL, and Tomcat is extremely unsafe. For example, there is no password for the Tomcat manager application. Under this configuration it should only be a matter of minutes before someone installs something unpleasant onto Tomcat. The Bitnami installer supports a number of command-line options for setting the MySQL password, the Tomcat manager password, etc. You can play around with these to get the configuration you want. This is where Docker shines; you can quickly re-run Steps 3 and 4 to experiment with different configurations. One thing to be aware of is that Docker saves containers after you exit them so, to avoid confusion, you should probably "docker rm <container-id>" on any containers you are no longer interested in.

Step 5 - Snapshot the Container

Now that you have a container running a configuration of the Tomcat stack that you are happy with, it is time to snapshot that container and create a Docker image. Since we started the Apache, MySQL, and Tomcat processes from the bash shell that we launched on container startup, exiting the shell will cause these processes to terminate. I confess to being somewhat superstitious, however, so I prefer to shut down these processes in the "proper" manner:

root@d10f70897ce3:/# /opt/tomcatstack-7.0.57-0/ctlscript.sh stop

After this completes you can simply exit the bash shell to exit the container and return to your VM-level shell. At this point we can snapshot the container and create a new image using the "docker commit" command like so:

root@vm:~# docker commit -m="Some pithy comment." d10f70897ce3 mybitnami/tomcat:v1

The resulting image should be viewable through the "docker images" command.

Step 6 - Launching the Image

Launching our newly created image is simply a matter of starting a container using that image:

root@vm:~# docker run --cap-add=ALL -d -p 80:80 mybitnami/tomcat:v1 /bin/sh -c "/opt/tomcatstack-7.0.57-0/ctlscript.sh start; tail -F /opt/tomcatstack-7.0.57-0/apache-tomcat/logs/catalina-daemon.out"

This looks a little intimidating, so let's break it down. The "--cap-add=ALL" option was covered in Step 3. We still need this because Tomcat still sets the same capabilities. The "-d" option simply tells Docker to run the container in the background. We've eliminated the "-i" and "-t" options because we don't need to interact directly with the container. The "-p 80:80" options specifies the same port mapping and we've eliminated the "-v" option because we no longer need to access any host files from the container. What makes this step look complicated is the in-line shell script at the end. What we are telling Docker to do is run the following commands in a shell:

/opt/tomcatstack-7.0.57-0/ctlscript.sh start

tail -F /opt/tomcatstack-7.0.57-0/apache-tomcat/logs/catalina-daemon.out

Docker will run a shell that executes "ctlscript.sh start" thus starting Apache, MySQL, and Tomcat. It will then run the "tail" command on the main Tomcat log file, blocking on additional writes to this file. What this means is that the shell process that is the parent or grandparent of all the Apache, MySQL, and Tomcat processes will continue to run, thus keeping the whole tree of processes alive.

There are a number of ways we can monitor our container at this point. We can view a top-like display of the processes in the container via:

root@vm:~# docker top <container ID>

We can look at the container's STDOUT and STDERR using:

root@vm:~# docker logs <container ID>

Step 7 - Stopping the Container

To stop the container running our tomcat stack we can send the SIGTERM signal to the root process of the container (our shell running "tail") via:

root@vm:~# docker stop <container ID>

This should cause all of the server processes to shut down cleanly. As I mentioned, I'm a bit superstitious about these things so I would prefer a mechanism that invoked "ctlscript.sh stop" before exiting the container. I've spent enough time investigating to determine that this is a subject for another post.

Some Questions

Why Not Use an Existing Tomcat Image?

If you are familiar with Docker you are probably aware that there are plenty of existing images that run Tomcat. Why not simply use one of these? Firstly, none of these images (that I am aware of) include an integrated Apache or, more importantly, MySQL. Secondly, I am working with an application that I built using the Bitnami stack and I'm comfortable dinking with this stack. It is less work for me to build an image of my existing system than it is to switch to a new system.

Why Not Use "docker build"?
Steps 3-5 could have been replaced using the "docker build" command and a Docker file. However, at the time of this writing, the containers used during the "docker build" command do not allow their processes to request capabilities. A

RUN bitnami-tomcatstack-7.0.56-0-linux-x64-installer.run

command will fail with the following error:

set_caps: failed to set capabilities
check that your kernel supports capabilities
set_caps(CAPS) failed for user 'tomcat'

Service exit with a return value of 4

when Tomcat tries to run for the first time. This issue is being tracked by Docker here: https://github.com/docker/docker/issues/1916.

Why Use Docker At All?
At the beginning of this post I pointed out that Bitnami stacks exist in machine image form for most popular systems. I can go to AWS and, in less time and less effort, create a new VM that is functionally equivalent to the docker container that I have created here. Some points:

My Bitnami Tomcat stack Docker image is a just a building block. Next I'm going to install a webapp on Tomcat, a database on MySQL, etc. Then I'm going to snapshot that. Again, I could do the same with AWS, but I can't run an AMI anywhere besides AWS. I can take my Docker images and run them on anything with a compatible kernel.
When saved as a TAR file my docker image is approximately 800 Mb. Most VM images are far larger than this. Lighter is faster.
Bitnami does a great job with integration but nothing is ever quite exactly the way you want it. The dink-->test-->dink-some-more cycle in Steps 3 and 4 is much faster using containers on an individual VM than using multiple VMs.
If, for whatever reason, I wanted to run multiple instances/versions of my stack it would probably be much cheaper to run them side-by-side in separate containers on the same (larger) VM than it would be to run them each in their own (smaller) VMs. This cost difference is even greater if I decide that I need to make my stacks available at a static IP address and/or given DNS name.

Friday, March 23, 2012

Cloud Broker Overload

'That's a great deal to make one word mean,' Alice said in a thoughtful tone.
'When I make a word do a lot of work like that,' said Humpty Dumpty, 'I always pay it extra.'

- Through the Looking Glass

“Cloud brokers” are a hot topic, thanks in part to their inclusion in the NIST Cloud Computing Reference Architecture [1]. NIST’s definition derives, in part, from a 2009 Gartner report [2]. As Ben Kepes points out [3], these definitions of cloud broker are at odds with the accepted meanings of the word “broker”. Ben also makes the point that the issue is more fundamental than what names we use to call the various actors in a multi-provider scenario. The article suggests the term “service intermediary” as more descriptive of the kinds of things that companies like enStratus and RightScale actually do – where “service intermediary” is defined as an actor that does service intermediation and/or service aggregation but doesn’t do service arbitration. Although I agree with much of Ben’s article, I think it misses the main problem with the NIST definition.

The Boat Analogy

Suppose I wanted to buy a boat. For various reasons, I decide to use a boat broker. I expect the broker to (among other things) introduce me to the parties selling boats and help me work through the process of buying the boat. The interaction pattern is three-way. The seller, the broker, and I are all aware of each others existence and expect different things from one another. For example, if the engine seized the day after I bought the boat, it is doubtful that I would hold the broker responsible.

Suppose that, instead of buying a boat, I simply wanted to rent one. Now, instead of seeking out a broker, I would look for a boat charterer. In contrast to my dealings with the broker and the seller, my interactions with the chartering company are two-way. The chartering company may or may not own the boat. I don’t know and, ultimately, I don’t care. All I care about is that the boat is made available for my use over a specific period of time. Any problems with the boat are the responsibility of the chartering company – regardless of who owns the boat.

The main problem with the NIST definition is that it lumps “brokers” and “charterers” together and, in so doing, masks the significant differences in the interactions and expectations of the parties involved.

It’s the Relationships

The first step to unraveling this hairball is to stop focusing on the functional aspects of what (for argument’s sake) I will simply call “the intermediary”. Whether the intermediary simply arbitrates requests amongst (nearly) identical back-end providers or synthesizes an aggregation of different providers to create a new service is not as important as whether or not the consumer does or doesn’t have a contractual relationship with these back-end providers.

Regardless of how many back-end services an intermediary uses and regardless of how imaginatively it might use them, if the consumer doesn’t have a contractual relationship with those back-end providers, their interactions with that intermediary are no different than those of any other cloud provider. While the intermediary may have more fodder for excuses (“our storage provider failed in exactly such a way as to expose a heretofore unknown bug in our billing provider”), an SLA is an SLA and, if the intermediary fails to meet their SLA, the consumer is entitled to whatever compensation is specified in the service contract.

If you squint at the NIST definition you can infer that the distinction it draws between “given services” and services that “are not fixed” are a reference to the visibility (or lack thereof) between the consumer the back-end services. If this is the case, this distinction needs to be made explicit and unbundled from the definitions of intermediation, aggregation, and arbitrage.

Functional and Business Relationships

Most of the discussion around cloud brokers tends to focus on the functional relationships (i.e. who sends requests to whom and how are the results processed). Above, I point out the importance of the business relationships (i.e. who has contracts with whom). Obviously both sets of relationships are important. What makes multi-party cloud scenarios interesting is that the two sets of relationships are independent of one another. This can lead to a fair number of different scenarios.

Take, for example, the “punch out” scenario found in many enterprise purchase portals. The consumer (an employee) has both business and functional relationships with the intermediary (their employer). At some point there is an SSO exchange and the consumer is redirected from the intermediary to the provider (the supplier’s website). Although the consumer now has a functional relationship with the provider (in that they are sending requests and receiving responses from the supplier’s site) they do not have a business relationship with the provider (i.e. they aren’t asked for their credit card). Behind the scenes, there are both functional and business relationships between the employer and the supplier (the order information is sent back to the portal and the supplier expects to be paid by the employer).

If we confine our considerations to a cloud consumer, a single intermediary, and a single cloud provider then further restrict ourselves to consider only those cases in which the consumer has, at a minimum, a functional relationship with the intermediary and a business relationship with at least one other party – I figure there are 26 possible scenarios (you may want to check me on this). Granted, many of these combinations may not have a workable business case, but here are some discrete examples:

Jamcracker

consumer has business and functional relationships with intermediary (Jamcracker)
consumer has business and functional relationships with the cloud provider (e.g. WebEx)
intermediary and cloud provider have business and functional relationships

SpotCloud

consumer has business and functional relationships with intermediary (SpotCloud)
consumer has no business or functional relationship with cloud provider
intermediary and cloud provider have business and functional relationships

Akamai

consumer has functional but no business relationship with intermediary (Akamai)
consumer has functional and business relationships with the cloud provider
intermediary and cloud provider have business and functional relationships

Again, the danger with calling all these scenarios “cloud broker scenarios" is that you will mask important differences in their characteristics and behavior.This creates both confusion and misunderstanding.

The Taxonomy Challenge

Obviously we can’t simply give each of the possible multi-party scenarios a unique name; there are too many to remember. What we have is the classic problem of taxonomy. The scenarios are distinguished along a number of different axes and it is difficult to tell which axis is “the most important”.

While I don’t have a complete answer to this problem, it seems to me that it makes the most sense to do the “top level split” around the existence or non-existence of any business relationship between the consumer and the back-end provider(s). Although it pains me to admit it, the industry is coalescing around the term “cloud broker” to refer to scenarios in which there is no business relationship between the consumer and the provider (exactly the opposite of how the term is used in the real world). This leaves the term “service intermediary” to refer to those scenarios in which there is a business relationship between the consumer and the cloud provider.

When describing new things it is easy to fall into the trap of wasting time arguing about their names. Regardless of what terms people use, it would be helpful if we consistently used the same, separate names to refer to the top-level cases I outlined above. “Broker” and “intermediary” are as good as any others.

Final Digression

I suspect that the term “cloud broker”, as it is currently used, derives from an older term – “message broker”. This makes sense because “message broker” is misapplied in exactly the same way as “cloud broker”. “Message broker” is commonly used to refer to an architectural pattern in which you use an intermediary to minimize or eliminate the producer’s and consumer’s awareness of each another.

References

[1] NIST SP 500-292, “NIST Cloud Computing Reference Architecture”, http://collaborate.nist.gov/twiki-cloud-computing/pub/CloudComputing/ReferenceArchitectureTaxonomy/NIST_SP_500-292_-_090611.pdf

[2] Gartner, “Gartner Says Cloud Consumers Need Brokerages to Unlock the Potential of Cloud Services”, http://www.gartner.com/it/page.jsp?id=1064712

[3] Diversity, “NIST Decides to Redefine the English Language, Broker != Service Intermediary”, http://www.diversity.net.nz/nist-decides-to-redefine-the-english-language-broker-service-intermediary/2011/09/12/