SSH Tunneling — The black magic for data science

Being a data scientist, many times in your life you would love to expose a local service to the world. That could be a Jupyter or RStudio service where you run your experiments or a Tensorboard service where you check the training process or a cool chatbot service that you have just built. If you have ever struggled to configure all the NAT, VPN, Firewall to get the remote access to your service, then you are not alone. It took me more than 2 years to get to know the SSH black magic. The magic that allows you to connect to any service at your local server from anywhere, regardless of your network settings.

Unlike the common belief that SSH is just a secured shell (as its name wrongly suggested), SSH is much more than that. In addition to shell, SSH is an important transport protocol based on which Git, SVN, SFTP, SCP… are built. SSH is also a connection forwarder, which is the source of all the magic in this post.

In this post, I assume that you have a Server, where run multiple services such as Jupyter on port 8888 and RStudio on port 8787. Now you want to access these services from a Client, which may be a laptop that connects to the internet from an unknown network. Depending on your control over the Server’s local network, there are multiple solutions for this problem. If you don’t care about SSH and just want a dead simple solution that works in any case, I suggest you go straight to Scenario 3.

Scenario 1: You have control over the Gateway Router.

In this scenario, your Server is behind a Gateway Router, which has a global (public) IP and you have full control. This setting is very common if you live in a Scandinavian country, where every household router has a global IP. This scenario also applies if your server is an EC2 virtual machine since every EC2 instance has public IP and you can control its firewall (i.e., its security group). The following figure demonstrates the scenario.

What people usually do: assume that your gateway has the public IP 193.71.x.x, and your server has the local IP 192.168.1.2. What people usually do is to set up port forwarding using the gateway’s NAT table like in the below figure. As you can see, traffic to port 22, 8888, 8787 of the router is forwarded to port 22, 8888 and 8787 of the server (192.168.1.2) respectively. Almost all (wireless) router has a NAT table. If your server is an EC2 instance, you can add rules for inbound traffic in its security group.

Scenario 1 NAT solution: on the router, set up the NAT table to forward external traffic to the server. On the client, you can access Jupyter through the link ‘193.71.x.x:8888'

What you could do: the NAT solution works fine, until you get tired of modifying the NAT table (or the security group) every time you spin up a new service. With SSH local port forwarding, the only port you need to forward is the SSH port. Once you can ssh to your server, any other traffic can go through this “SSH tunnel” like demonstrated in the following figure.

Scenario 1 SSH tunneling solution. From the client, set up SSH Local Port Forwarding. On the client you can access Jupyter through ‘localhost:8888’

Assume that you have forwarded the SSH port to your server, and from the client, you can ssh to the server by running $ ssh 193.71.x.x, then you can run the following magic commands on your client to get the access your Jupyter.

ssh -L 8888:localhost:8888 193.71.x.x

The command basically say that: “when I access localhost:8888, please forward to port 8888 of the SSH server”. If you want to know what is going on, add the option -v. If you do not need the usual SSH shell, add the option -N. You now can type localhost:8888 on your client browser and get access to the Jupyter service on the server. The same approach can be used for RStudio or any other service. If you have Windows on your client, you can set up Putty SSH tunnel like explained here.

Scenario 2: You do not have control over the Gateway Router.

In this scenario, your server is inside a local network that you have no control.

Scenario 2: Server is behind an uncontrollable Gateway

What people usually do: If your server locates in an intranet of a big organization, you can ask your Network Administrator to provide you the VPN access. However, of course, you usually do not want to share your VPN access to your customer to let him try your cool chatbot. Furthermore, what if your server is at home, and you are not serious enough to set up your own VPN. Some of you maybe thinking about TeamViewer, which is a decent option. However, coding over TeamViewer for an extended period is painful, and sometimes unstable.
What you could do: If you have a RemoteServer that has a public IP (like an EC2 instance), you can use SSH Remote Port Forwarding (Reverse Tunnelling). The following figure demonstrates what happens:

Scenario 2 SSH Remote Port Forwarding : on the target Server, open a SSH Reverse Tunnel to RemoteServer. Traffic to the RemoteServer will be forwarded accordingly to the target Server.

In this approach, we open a Reverse Tunnel so that traffic to the RemoteServer can be forwarded to the target server. To set up the Reverse Tunnel, run the following command on your target server:

ssh -R 8888:localhost:8888 193.72.x.x

The command basically says “when anyone access port 8888 of the remote server (193.72.x.x in this case), please forward to the URL localhost:8888 accessed from my machine (the target server)”. Now on any client, you can open 193.72.x.x:8888 on your browser and get access to the Jupyter service running on the target server. Of course, you must be able to connect to port 8888 of the RemoteServer from your client, which can be done by methods mentioned in scenario 1 above. Note that you must add the option GatewayPorts yes to the /etc/ssh/sshd_config file of at the Remote Server. You can check the detail here.

If you feel like this approach is just too much to be feasible, you are correct. Luckily, there is 3rd party service that does all the hard work for you, so you don’t have to know about the existent of SSH.

Scenario 3: You want to expose your local service to the world, NOW!

Ngrok is the service you are looking for. Behind the scene, ngrok uses SSH remote port forwarding to forward request to its remote server to your target server. It then gives you a link which you can access services on its remote server from anywhere.

Scenario 3: just use ngrok (or other alternatives)

Using ngrok is dead simple. You can download ngrok here, and then run the following command on your server:

./ngrok http 8888

You then will see something like this:

A typical ngrok output

You see that the link https://1faa4894.ngrok.io is now forwarded to your localhost:8888. Now you can open that link from anywhere and get access to the service at port 8888 of your server (Jupyter in this case). Note that the default server is in US, so if you are in other region, like EU, you should add the option -region eu.

Long story short:

SSH has magic that let you expose services on your local machine to the world, regardless of your network settings. However, if you don’t care about magic, then go to https://ngrok.com/ and get your problem solved.

Note: all visualizations in this post are hosted on github.