Using Jupyter/TensorBoard In Any Cloud With One Command

Written by casperdcl | Published 2022/04/26
Tech Story Tags: jupyter-notebook | terraform | machine-learning | data-science | cloud-computing | aws | azure | gcp

TLDROne command to deploy an ML-ready Jupyter server and sync results with your preferred cloud compute provider. You can self-provision hardware & software environments in the cloud with Terraform Provider Iterative - it's easier than you think!via the TL;DR App

Using Terraform Provider Iterative for bespoke Machine Learning & Data Science on AWS, Azure, GCP and K8s

Jupyter, TensorFlow/Keras, and TensorBoard are widely used in many Data Science & Machine Learning projects. Unfortunately, there are two main pain points often faced by scientists and engineers alike:

  • Installing and maintaining your entire software stack (including GPU drivers and dependencies) is difficult and time-consuming, and
  • You are limited by the hardware you own (laptop or desktop specs).

In some cases, work-arounds for these problems include the free Binder service as well as the notebook-based Google Colab, but both have their own strict limitations on resources (CPU, GPU, RAM, disk storage, and even uptime).

Instead, this short guide covers how to deploy an ML-ready Jupyter server and sync results with your preferred cloud compute provider. There are only 3 requirements:

  1. Download the terraform CLI tool and say goodbye to ClickOps (free),
  2. Get an ngrok account for port forwarding convenience (free), and
  3. Have cloud credentials, of course! (AWS, Azure, GCP, or Kubernetes.)

⚙ No ClickOps? Show me some code then…

To get a snazzy Jupyter workspace just like in the image above, download the accompanying code from this GitHub repository:

git clone https://github.com/iterative/blog-tpi-jupyter
cd blog-tpi-jupyter
terraform init               # Setup local dependencies

If you like, have a look at the main.tf file — it contains all the config options you’d possibly want, such as custom hardware specs and spot price bidding. The default is an AWS EC2 G4 instance (with an NVIDIA Tesla T4 16GB GPU for a total cost of around $0.15/hour as of writing). Next, we need some environment variables:

export NGROK_TOKEN="..."     # Sign up for free at https://ngrok.com
export TF_LOG_PROVIDER=INFO  # (optional) Increase verbosity
export AWS_ACCESS_KEY_ID="..."         # assuming AWS cloud provider
export AWS_SECRET_ACCESS_KEY="..."

See the authentication docs for other cloud providers.

Now time for magic! 🎩

terraform apply

In just a few minutes ⏱ this simple command:

  • creates a cloud storage bucket,
  • uploads your local shared working directory to the bucket,
  • makes an auto-scaling group which in turn provisions the desired accelerated compute instance,
  • installs an ML software stack (CUDA drivers, Python & TensorFlow/Keras) on the instance, and
  • serves Jupyter Lab, Notebook, and TensorBoard over an ngrok tunnel.

To see the logs (including the server URLs) at any point, simply run:

terraform refresh

⏳ If the URLs at the bottom of the output are blank ( urls = []), the instance isn’t ready yet. Wait a few minutes before running terraform refresh again. Eventually you’ll see:

Outputs:

urls = [
  "Jupyter Lab: https://8c62-54-173-120-3.ngrok.io/lab?token=...",
  "Jupyter Notebook: https://8c62-54-173-120-3.ngrok.io/tree?token=...",
  "TensorBoard: https://6d52-54-173-120-3.ngrok.io",
]

Finally, when done experimenting, download the shared working directory, delete the cloud storage, and terminate the cloud instance with one simple command:

terraform destroy

But Why?

It uses Terraform Provider Iterative (TPI) under-the-hood. There are a few distinct advantages to TPI:

💰 Lower cost: use your preferred cloud provider's existing pricing, including on-demand per-second billing and bulk discounts. 🔄 Auto-recovery: spot/preemptible instances are cheap but unreliable. TPI reliably and automatically respawns such interrupted instances, caching & restoring the working directory in the cloud even when you are offline. 👓 Custom spec: full control over hardware & software requirements via a single main.tf config file — including machine types (CPU, GPU, RAM, storage) & images.

TL;DR summary

You can self-provision DS & ML hardware & software environments in the cloud with TPI.

⚡️ It's easier than you think:

git clone https://github.com/iterative/blog-tpi-jupyter
cd blog-tpi-jupyter
export NGROK_TOKEN="..."     # Sign up for free at https://ngrok.com
export TF_LOG_PROVIDER=INFO  # (optional) Increase verbosity
terraform init    # Setup local dependencies
terraform apply   # Create cloud resources & upload "shared" workdir
terraform refresh # Get Jupyter & TensorBoard URLs (rerun if blank)
# *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
# Click on the printed URLs and have fun!
# *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
# When done, download "shared" workdir & terminate cloud resources:
terraform destroy


Written by casperdcl | Computational Physicist | Python, C++, CUDA | Git, CI/CD, Cloud Infra for Data Science & Machine Learning
Published by HackerNoon on 2022/04/26