10

Running Jupyter Notebooks in Grid Engine with Ngrok

 1 year ago
source link: https://chanind.github.io/python/2022/10/31/jupyter-sge-ngrok.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Running Jupyter Notebooks in Grid Engine with Ngrok

Oct 31, 2022

A lot of universities use Oracle Grid Engine (aka Sun Grid Engine, or SGE) for high-performance computing. This system lets you submit jobs requesting varying amounts of CPUs, GPUs, and memory to run machine learning (ML) and other compute-intensive tasks. This is great for when you’ve built a pipeline to train a ML model and just need a lot of power to run the training, but is awkward for experimentation and development since you’re just given a command prompt.

On the other end of the development cycle, there’s Jupyter, which lets you write Python code in an interactive notebook, mixing text and images in with executable code. Development and experimentation in Jupyter is a joy since you can easily print interactive tables with data to the screen, or draw images, output interactive tensorboards - basically anything that can be displayed in a web browser can be turned into a Jupyter widget. If we can combine Jupyter with Grid Engine we can get the power of Grid Engine with the development ease of Jupyter.

The issue is that usually Grid Engine jobs don’t have ports open to the outside or directly allow ssh access to the running job, so running Jupyter inside of a Grid Engine session is difficult. Fortunately that’s where Ngrok comes in. Ngrok is a tool which can forward a service running on a local machine and give you a web URL where you can access that service from the internet. This is perfect since it solves the problem of letting you easily access a Jupyter notebook that’s running inside of a Grid Engine session.

Initial setup

First, sign up for a free Ngrok account at ngrok.com. After you sign in, find the link to download the ngrok client for Linux and copy the URL of this link.

Next, ssh into Grid Engine and install ngrok in your home directory. This should look something like below:

# paste the linux download URL for ngrok here, it may be different than what's below
wget https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz
tar -xvzpf ngrok-v3-stable-linux-amd64.tgz

At this point, you should have an executable called ngrok in your home directory.

Next, copy your auth token from the ngrok website (it should be a long pseudo-random string like c8132179a3cE725B4e267_51F32179C3eE725B4E267) and run the following command:

./ngrok config add-authtoken <your token here>

At this point, ngrok should be good to go! Next just make sure you have jupyter installed with:

pip install notebook

It’s a good idea to set a password for jupyter since we’re going to make it accessible on the internet in the next section, and you don’t want random strangers on the internet to be able to run code in your notebook.

jupyter notebook password

Running Jupyter + Ngrok in an interactive session

Next, start an interactive session in Grid Engine, something like the following:

qrsh -l tmem=10G,h_rt=2:00:00,gpu=true -now no -verbose

Once your session has started, you need to run both ngrok and Jupyter on the same port. The specific port number doesn’t matter much - you just don’t want to pick a number that someone else on the same machine might also be using. Below I’m using port 7923, but change this to whatever number you prefer (numbers in the 7000-9999 range tend to be good choices).

(trap 'kill 0' SIGINT; jupyter notebook --no-browser --port 7923 & ~/ngrok http 7923)

The command above just runs jupyter and ngrok in parallel, and kills them both when you exit the shell.

Now, ngrok should display a URL on the screen (something like https://aba4-128-90-27-382.eu.ngrok.io) which you can open up in your browser, and, voila, you should see your jupyter notebook running inside of your Grid Engine interactive shell! And that’s it, you’ve got Jupyter running inside of Grid Engine.

If you have any improvements to this technique, let me know!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK