Running a CouchDB 2 Cluster in Production on AWS with Docker

Things are heating up in the CouchDB universe now that CouchDB 2 is an out-of-the-box multi-master database that can scale to store a lot of data!

Unfortunately, there is still a bit of a shortage in documentation when it comes to how to use CouchDB 2 in production. The point of this tutorial is to take you step by step through the process of setting up a CouchDB cluster in production using AWS and Docker. We’ve used a similar setup for Quizster, a digital dropbox and grading system, and it is working great!

The setup below uses open source software and therefore, it can easily be adapted to work for the Google Cloud Platform, Azure or any other hosting providers, i.e. no vendor lock-in. Moreover, because we are using open source software, you can also set up a local environment to develop against! (VirtualBox and Vagrant are great for this)

Why are we going to use Docker?

Keeping up to date with the latest version of a database can be a real drag. One of the latest trends is to just stand up a new server and migrate your data over each time you need to upgrade. In some cases, this is the best option, but by using Docker, we also have the option of just issuing a docker update when a new CouchDB docker image is released. This way, we don’t need to worry about whether our distro has the latest CouchDB binary and don’t have to fight our way out of dependency hell. Moreover, we can easily stand up a new server, install docker on this server and then bam, run a docker image for CouchDB! Docker also has some nice built in functionality for handling restarts for when your servers are rebooted or CouchDB just crashes.

Our initial design was pretty ambitious and used Docker Swarm with AWS’s Network File System, called EFS. The advantage of this design was that you could stand up a cluster of docker swarm nodes and then just use docker service scale to add more CouchDB nodes. The deal breaker however, was that we found that running CouchDB on top of EFS made the database over 10 times slower! In addition, Docker Swarm doesn’t appear to allow routing to a swarm node based on task slot. So, we decided to drop Docker Swarm in favor of a design where our CouchDB images are statically bound to specific servers. (Managing persistent storage with Docker Swarm is a known issue and nothing yet has really emerged to solve this problem).

Here is what we are going to do:

Create two EC2 instances on AWS, both running Docker. Each node will be located in a different availability zone (physical location).
Run an instance of the CouchDB image on each EC2 instance
Run a simple script to connect the CouchDB nodes
Use a load balancer to distribute traffic to each node according to load and availability. The load balancer will also be used to serve database traffic over SSL.

Note: AWS has a free tier, but it isn’t going to cover all the costs incurred by following the steps in this tutorial. Fortunately, AWS charges by the hour so you can easily follow this tutorial and then destroy all the pieces without incurring much of a cost. If you were to continue to use this setup in one of the cheaper regions, e.g. in the US West region, you’d be looking at a monthly bill of about $26 ($16 for the load balancer + $10 for the EC2 servers). This is pretty darn good for a production ready 2-node CouchDB cluster!

I’ll assume you have little to no AWS experience. If this assumption is wrong, then please feel free to skip around.

Step 1 — Create an AWS account

Create a free AWS account

Step 2 — Import Your Public SSH Key

Overview: like most modern hosting providers, AWS encourages users to connect to their servers via SSH keys instead of using passwords as passwords are a lot easier to crack.

Search for the EC2 service

Select Key Pairs

Click Import Key Pair. You’ll then need to paste in your public SSH key and click Import. On Mac/Linux based systems, this text is found in ~/.ssh/id_rsa.pub

Step 3 — Create Security Groups

Overview: security groups allow your servers to communicate with each other in a private cloud while exposing specific ports to the world. We are going to create 2 security groups as this configuration will give us a lot of flexibility to make changes in the future.

From the EC2 dashboard, click Security Groups

Click Create Security Group

Enter a name and description of ssh and specify an inbound rule on port 22 from anywhere.

Adding this rule simplifies our setup, but exposes a security hole where any box can SSH into our servers (assuming they have our SSH key). Therefore, after you have completed this tutorial, you should remove the port 22 rule and set up a VPN instead.

Repeat the steps above to create a new security group, except call this new group couchdb-load-balancer and create a rule to allow inbound connections on port 443 from anywhere.

When you are done, you should have 3 security groups:

Step 4 — Create The 1st EC2 Instance

Return to the EC2 Dashboard and then click Launch Instance

Select Ubuntu (you can of course select almost any other OS that runs docker, but this tutorial is tailored for Ubuntu)

Select t2.nano and click Review and Launch

On the next screen, click Edit security groups

Select the ssh and default security groups and click Review and Launch

Then click Launch

Choose the key pair that you imported above and click Launch Instances

Click View Instances. Select the instance and make a note of the Public DNS and Private IP. We’ll refer to this Public DNS as DB1-PUBLIC-DNS and this Private IP as DB1-PRIVATE-IP.

Note: if you ever stop and then start this instance, the Public DNS will change.

Step 5—Install Docker and Run the CouchDB Container

SSH into the EC2 instance

$ ssh ubuntu@DB1-PUBLIC-DNS

Download and run scripts to configure Ubuntu and Docker

$ git clone https://github.com/redgeoff/docker-ce-vagrant$ cd docker-ce-vagrant$ sudo ./ubuntu.sh # Select "keep the local version ... "$ sudo ./docker.sh

Create a directory for hosting your DB files

$ mkdir /home/ubuntu/common

Run a CouchDB Docker Container and make sure to replace DB1-PRIVATE-IP accordingly.

Notes:

Docker only has to download the image once and then will just run the container on all subsequent starts/restarts.
The --restart always parameter ensures that your CouchDB node will automatically restart if it crashes or when the server is rebooted
All the nodes in your server must use the same values. The value above will result in the password admin. You can use the [couch-hash-pwd](https://github.com/redgeoff/couch-hash-pwd) utility to generate this hash. For example, if your password is mypassword you can use couchdb-hash-pwd -p mypassword

Enable CORS so that your application can communicate with the database from another domain/subdomain.

$ curl -sL https://deb.nodesource.com/setup_8.x | sudo -E bash -$ sudo apt-get install -y nodejs build-essential$ sudo npm install npm -g$ sudo npm install -g add-cors-to-couchdb$ add-cors-to-couchdb http://localhost:5984 -u admin -p admin

Step 6—Create Another EC2 Instance

Overview: we are now going to create another EC2 instance and then run another CouchDB docker container. Most of the steps are the same as before. (An alternative route, that isn’t covered by this tutorial, is to create an Amazon Machine Image (AMI) of the 1st EC2 instance and then use this AMI to create other instances — this is a good option if you are going to be spinning up many nodes).

Return to the EC2 Dashboard and select Instances

Select the 1st instance and then select Launch More Like This

Click the Configure Instance tab at the top of the page and be sure to select a different subnet/zone. Why? Well, we want our two CouchDB nodes to be located in different physical locations, also known as Availability Zones in the AWS world. This way, if there is something like a natural disaster in one zone, we won’t lose any data as our other node will remain intact. (Note: AWS works its magic to make sure that it is super fast to transfer data between different availability zones, but the data transfer between regions is a lot slower. Therefore, you should not attempt to run a cluster of nodes across different AWS regions).

Click Review and Launch, Launch, select your SSH key and click Launch Instance.

Make a note of the Public DNS and Private IP of this new instance and repeat Step 5 to update Ubuntu, install docker and run the CouchDB container. In the docker run command, be sure to use the Private IP of your 2nd EC2 instance.

Step 7— Create the Cluster

SSH into either EC2 instance and run the following commands. Be sure to replace DB1-PRIVATE-IP and DB2-PRIVATE-IP accordingly. This script connects the 2 nodes and creates system databases.

$ git clone https://gist.github.com/redgeoff/5099f46ae63acbd8da1137e2ed436a7c create-cluster$ cd create-cluster$ chmod +x ./create-cluster.sh$ ./create-cluster.sh admin admin 5984 5986 "DB1-PRIVATE-IP DB2-PRIVATE-IP"

You can then use curl [http://admin:admin@localhost:5984/_membership](http://admin:admin@localhost:5984/_membershipto) to ensure that your cluster has been configured correctly. In the all_nodes entry, you should see both your values for DB1-PRIVATE-IP and DB2-PRIVATE-IP. If you don’t, double check the parameters in you docker run command. Note: COUCHDB_USER, COUCHDB_PASSWORD, COUCHDB_SECRET and the value used after setcookie must be the same. See Node Management for more info on how to troubleshoot the cluster.

Step 8— Import an SSL Certificate

I highly recommend that you buy an SSL certificate if you do not already have one as transferring database data over an insecure connection just isn’t going to cut it in production. If you don’t have an SSL certificate and wish to purchase one, there is a great deal for $42/yr for the AlphaSSL Wildcard Certificate. If you wish to proceed without SSL, skip this step.

Click on the cube in the top-left corner of the page and search for the Certificate Manager

Click Getting Started

Click Import Certificate

Enter the certificate details, click Review and Import and then click Import.

Step 9—Set Up a Load Balancer

On the EC2 Dashboard, select Load Balancers.

Click Create Load Balancer

Select Application Load Balancer

Specify HTTPS and port 443. If you wish to proceed without SSL (not recommended) then you can use HTTP and port 80.

Select all the availability zones and click Next: Configure Security Settings.

Choose an existing certificate and then click Next: Configure Security Groups.

Select the couchdb-load-balancer and default security groups and then click Next: Configure Routing.

Configure the routing and click Next: Register Targets

Select both your EC2 instances and click Add to registered.

Then click Create.

Step 10— Configure the DNS

Overview: we are going to set up DNS routing via AWS’s awesome Route 53 service as it can dynamically map to our load balancer.

Click on the cube in the top-left corner and search for Route 53

Click Get started now

Click Create Hosted Zone

Enter the hosted zone details

Check the Alias box, click on the Alias Target and select your load balancer.

Make a note of the name servers in your hosted zone, e.g.

Visit the domain registrar with which you have registered your domain name, e.g. GoDaddy, Google Domains, AWS, etc… and point your domain to these name servers. You’ll probably have to wait a few minutes until the DNS switches over.

Step 11— Relax

Spin up Fauxton by visiting https://db.mydomain.com/_utils and log in with admin/admin. It’s time to relax!

(Note: if the DNS is slow to propagate, you can access your database via the Public DNS for your load balancer, e.g. https://LOAD-BALANCER-PUBLIC-DNS/_utils. Just click through the SSL warning displayed by your browser)

Step 12 — Update When A New Version of CouchDB Is Available

One of the coolest things about this setup is that you can update to the latest version of CouchDB just by running the following on all your boxes:

$ sudo docker pull couchdb$ sudo docker rm couchdb --force$ sudo docker run -d --name couchdb ... # See docker run above

And, this can be done one node at a time, because the CouchDB API maintains backwards compatibility. Of course, having a backup is always a best practice in case something unexpected happens.

If you enjoyed this tutorial, please like it and share it. And, if you have any feedback, please leave it below.

About the Author

Geoff Cox is the creator of MSON, a new declarative programming language that can be used to generate an app from JSON. He’s been self-employed for the greater part of the last 15 years and loves taking on ambitious, yet wife-maddening, projects like creating a database and distributed data syncing system. You can reach him @redgeoff7 or at github.