Data / Development

Integrate Jupyter Notebooks with GitHub

7 May 2021 10:20am, by

Jupyter Notebook is a web-based development tool that makes it easier for developers to manage projects. With a user-friendly interface, Jupyter includes interactive elements to create and share live documents that contain code, visuals, equations, and even narrative texts.

I’ve already written about how to install Jupyter Notebook in my piece “Jupyter Notebooks: The Web-Based Dev Tool You’ve Been Seeking,” so you should read through that tutorial to get Jupyter up and running.

Thing is, with a default Jupyter installation, you miss out on GitHub integration. And given how so many developers depend on the likes of GitHub, this is a feature that is sorely missed.

Fortunately, a developer has created an extension that makes it possible for you to use Jupyter with GitHub. Unfortunately, since the developer created the extension for Jupyter/GitHub, things have changed on the side of GitHub, so there’s one caveat to using this tool (I’ll explain later). But even with that caveat, this extension is a good way to keep your Jupyter Notebooks in sync with a GitHub repository (otherwise, all of those notebooks will remain on your local machine).

Let’s get these two pieces of technology connected.

Before you start this process, make sure you’ve taken care of getting Jupyter installed. Make sure you don’t launch a notebook yet. We’ll do that in a bit.

Installing the Extension

You’ve already installed the necessary dependencies for Jupyter (Python and pip). You now need to install the Jupyter GitHub extension. Log into your machine that contains Jupyter and open a terminal window. From the CLI, issue the following commands:

alias pip=pip3

pip install git+https://github.com/sat28/githubcommit.git

jupyter serverextension enable --py githubcommit

jupyter nbextension install --py githubcommit --user

jupyter nbextension enable githubcommit --user --py

The above commands will install the extension and make sure it is available for all notebooks.

Install the remaining dependencies

You probably already have git installed, but on the off-chance you don’t, issue the command (I’m demonstrating on Ubuntu Desktop 21.04):

sudo apt-get install git -y

If you’re using a Red Hat-based distribution, that command would be:

sudo dnf install git -y

Generate SSH keys

You’ll also need SSH keys (so you can clone the necessary repository). For this, run the command:

ssh-keygen

Make sure to accept the defaults and give the key a unique and strong password.

Once you’ve generated the key, view the public key with the command:

less ~/.ssh/id_rsa.pub

Copy the contents of that key and head over to your GitHub account. Go to Settings > SSH and GPG keys and click New SSH Key. In the resulting window, paste the SSH key you just generated, give it a name, and click Add SSH Key (Figure 1).

Figure 1: Adding an SSH key to GitHub.

Clone the Repository

We need to clone the extension repository, with the command:

git clone git@github.com:sat28/githubcommit.git

You will be asked for the password for your SSH key you just created. When this finishes, a new directory will be created, named githubcommit.

With the repository cloned, let’s make sure Git knows who we are. Issue the following two command:

git --global user.email EMAIL

git --global user.name NAME

Where EMAIL is your email address and NAME is your name.

Create a GitHub Access Token

Next, you need to create a GitHub access token. Go to your GitHub account and then to Settings > Developer Settings > Personal Access Tokens. Click Generate New Token and then, in the resulting window, give it a name and check the boxes for repo and write:packages. Scroll to the bottom and click Generate token. You’ll then need to copy that access token to your clipboard.

Configure the Extension

Change into the githubcommit folder with the command:

cd githubcommit

Open the env.sh configuration file with the command:

nano env.sh

In that file, you must configure the following section:

Where:

  • REPONAME is the name of a GitHub repository you’ll use for this.
  • BRANCH is the repository branch (probably “main”)
  • USERNAME is your GitHub username
  • EMAIL is the email address associated with your GitHub account
  • ATOKEN is the access token you just created.

Save and close the file.

Source the env.sh file and launch a notebook

The next step is to source the env.sh file with the command:

source ~/githubcommit/env.sh

You will be prompted for your SSH key password. Once you’ve done that, the repository you configured in the env.sh file will clone to your local drive (in your home directory). When that completes, change into the new directory that cloned from your GitHub repository (in my case it was named newstack).

From within that directory, launch the notebook with the command:

jupyter notebook --ip 0.0.0.0

Your notebook should open to reveal all of the files from your GitHub repository (Figure 2).

Figure 2: Jupyter Notebook with files from the newstack repository I created.

If you create a new file or open one, you’ll now see a GitHub logo in your notebook (Figure 3).

Figure 3: The GitHub logo now shows up in the Jupyter Notebook.

Here’s where the caveat comes into play. You should be able to click that button and then commit any new code to the connected GitHub repository. Unfortunately, it’s not working. I have a feeling this is because of the changes GitHub has made recently to authentication. Because of that, it’s on the developer to fix this problem.

Fortunately, I have a workaround.

After you’ve created all of your new files and are finished working in Jupyter Notebook, they’ll all be found in the directory pulled down from your configured GitHub repository. In my case, I used a test repository I created on GitHub, named newstack.

Change into that folder and then go through the usual process with git:

git add .

git commit -m "Added new files"

git push

Once you’ve taken care of the above steps, all of your new files will be uploaded to the GitHub repository.

The second caveat is that Jupyter doesn’t automatically pull down any new files created by other teammates or from within GitHub itself. For that, go back to the terminal window and issue the command:

git pull

In other words, to push and pull new content to and from GitHub, you must use the command line (at least until the developer fixes the issue). But even with having to do this, being able to integrate Jupyter Notebooks with GitHub makes the tool even more useful.

Hopefully, sometime soon, the developer will resolve the issue with the push and pull of content. Until then, you can get around the issues with a little manual git pull and git push.

The New Stack is a wholly owned subsidiary of Insight Partners. TNS owner Insight Partners is an investor in the following companies: MADE, Bit.

A newsletter digest of the week’s most important stories & analyses.