SSH Tunneling to AWS EC2 and Connecting to DocumentDB with Python

Written by avr27 | Published 2024/03/14
Tech Story Tags: ssh-tunneling | aws-ec2 | documentdb | amazon-vpc | mongodb | access-control | ssh-tunnel-into-aws-ec2 | network-security

TLDRSSH tunneling provides a secure pathway for accessing AWS DocumentDB from outside the VPC. By establishing a secure connection through an EC2 instance, it prevents direct exposure to the internet, enhancing data security. This article explains the necessity of SSH tunnels, provides code examples for implementation, and emphasizes the importance of securing remote access to AWS DocumentDB.via the TL;DR App

Why is an SSH tunnel into AWS EC2 needed?

Before I tell you why it's needed, I'd like to share why I had to do it. The answer is simple: to locally test things in our ML Codebase.

Now, coming to why it's needed:

Amazon DocumentDB is a managed database service that is designed to be secure. This simply means that the database is hosted privately onto something called Amazon Virtual Private Cloud (Amazon VPC). In simple terms, I like to think of it as Amazon's own private internet. So DocumentDB can be directly accessed by any AWS service within the same VPC or any other with the required permissions.

SSH tunneling is needed when we want to access DocumentDB resources from outside the cluster's VPC, here on our local machine. To access DocumentDB from your local machine, you typically need to go through a bastion host (EC2 instance) using SSH. This extra layer of security ensures that your database connection is not directly exposed to the internet, reducing the risk of unauthorized access.


What is SSH Tunneling?

SSH tunneling, also known as "port forwarding," is a technique used to secure and encrypt communication between two computer systems over an unsecured network, such as the Internet. It involves creating a secure channel (tunnel) through which data can be transferred between a local and a remote machine.

In simple terms, it's establishing a VPN. In our context:

  • The local machine is the one running your Python script.
  • The remote machine is an EC2 instance in your AWS environment.

The SSH tunnel allows secure communication between your local machine and the EC2 instance, providing a secure pathway for data to travel. Once the tunnel is established, you can use it to connect to DocumentDB securely, as if it were running on your local machine.


Code

Before diving into the code, you will need a few important constants. I suggest storing them as environment variables for security purposes.

# SSH tunnel configuration
SSH_HOST=ec2-x-x-x-x.region.compute.amazonaws.com
SSH_USER=ec2-user
SSH_KEY_PATH=path to ec2-host-key-pair.pem file
LOCAL_BIND_PORT=3000 # any port of your choice

# MongoDB server configuration
MONGO_HOST=replica_db_name.*.*.docdb.amazonaws.com
MONGO_PORT=27017
MONGO_USERNAME=your_monogdb_username
MONGO_PASSWORD=your_monogdb_password

MONGO_DB_NAME=YOUR_DB_NAME
MONGO_COLLECTION_NAME=YOUR_DEFAULT_COLLECTION_NAME

# db parameters dict
DB_PARAMS = {
    "host": '127.0.0.1',
    "port": LOCAL_BIND_PORT,
    "username": MONGO_USERNAME,
    "password": MONGO_PASSWORD,
}

  • SSH_HOST: is the public IP for your EC2 instance running in the same VPC as your DocumentDB.
  • SSH_KEY_PATH: path to your key-pair.pem file. This is used to authenticate your SSH connection to the EC2 instances.

NOTE: Whitelist your IP Address in your EC2 Security Groups before running the code.

from pymongo import MongoClient
from sshtunnel import SSHTunnelForwarder


tunnel = SSHTunnelForwarder(
            (SSH_HOST, 22),
            ssh_username=SSH_USER,
            ssh_pkey=SSH_KEY_PATH,
            remote_bind_address=(MONGO_HOST, MONGO_PORT),
            local_bind_address=('127.0.0.1', LOCAL_BIND_PORT)
        )

# start the tunnel
tunnel.start()

# get mongo client
client = MongoClient(
                directConnection=True,
                **DB_PARAMS
        )

# do something
db = client[MONGO_DB_NAME]
collection = db[MONGO_COLLECTION_NAME]

documents = list(collection.find(some_query))
print(documents)

# stop the tunnel and close the client
client.close()
tunnel.stop()

client=None

How does this work?

Here is a simple picture to describe it:

The figure presents a simplified overview of SSH tunneling. The secure connection over the untrusted network is established between an SSH client and an SSH server. This SSH connection is encrypted, protects confidentiality and integrity, and authenticates communicating parties.

The SSH connection is used by the application (our Python code) to connect to the application server (Mongo/DocDB Server). With tunneling enabled, the application contacts a port (= 3000) on the local host ('127.0.0.1') that the SSH client listens on. The SSH client then forwards the application over its encrypted tunnel to the server (EC2 Instance). The server then connects to the actual application server (DocumentDB) - usually on the same machine or in the same data center as the SSH server. The application communication is thus secured without having to modify the application or end-user workflows.



References


Also published here.


Written by avr27 | Learning NLP & ML Engineering.
Published by HackerNoon on 2024/03/14