Cloning a GitHub repo into Drive from Google Colab
I recently discovered Google Colab, an amazing tool for Data Scientist to experiment and learn. I’m no Data Scientist, but I’ve been using Google Colab to learn Machine Learning, and since I have a few repos on my GitHub account that contain some Python Notebooks, I figured I’d clone them and use Google Colab’s GPU to train some models, especially since my MacBook Pro was in repair at an Apple Store and all I had to work with was my iPad Pro.
Unfortunately, this wasn’t as easy a feat as I thought, hence this post.
The Problem
The reason why cloning a GitHub repository with Google Colab is not as straight forward as one may think is twofold:
By default, you don’t have an easily accessible File System
The GitHub integration is not that good
This means that if you were to directly clone a GitHub repo, it will end up in some directory that you won’t easily find from outside Colab (if at all). Also, when it comes time to push everything back to the remote, it simply will fail (no way to enter the credentials).
So I was facing these two problems. I had managed to clone my repo, as you would, to the content folder that is enabled for your Google Colab notebook, I could see my ipynb files listed right there, and yet I was unable to open them. And also, if I tried to push back to the remote, that failed because Colab never prompted me to enter my GitHub credentials. And look, I’m sure that you could solve the credentials problem in some other way, but I was in a hurry, and so the solution may not have been the best or more secure. But let me explain how I solved both issues and am now able to clone any repo to my Google Drive, edit my ipynb files from Colab and push everything back to GitHub even from my iPad Pro.
Accessing Google Drive from Google Colab
Because by default the directories that you can access from Colab are not the ones on your Drive, it would make it very hard (if at all possible) to access those files later. However, if you clone a GitHub repo to your Drive folder, you can access it anytime. Of course, you could choose to clone to that folder form an actual computer, with access to the terminal, and later on, go on to open the files on Colab (or anywhere else), but as I mentioned, I found myself without a computer (although, what’s a computer right?) and I had to figure out how to clone my repos using my iPad.
For this, you will have to start by mounting Google Drive into Google Colab, which already has git installed so at least that is covered. The process is simple, just execute this next command on any Colab notebook and follow the link that it will display in the output. You then log in with your Google account, copy the provided key, and paste it back into colab:
from google.colab import drive
drive.mount(‘/content/drive’)
Of course, you can choose your own path where to mount Drive, but once you have, you can access all your files, as shown here.
So this is great, that means I can change directory into my repos folder that I have over on drive, create a new folder, and clone my repo there.
However, as I said, it is not as straight forward as you may imagine at the beginning. At least it wasn’t for me.
I’ve used commands on a Python notebook, I’ve also used magic commands on a Python notebook, I would not have imagined that a simple cd command would require magic to function, but apparently, on Python notebooks —or at least on Google Colab— it does.
And remember when I mentioned that Colab never asked for my GitHub credentials? This means that we have two things to solve when cloning our repository, even after we have Drive mounted inside of the “file system” inside of Google Colab.
Cloning the repository
Cloning the repo is a simple `git clone` away, right? Wrong, sort of.
Sure you can cd into any folder now, right? Wrong, sort of.
Changing Directory
Both commands require a bit more than that to which you may be used to, even if like in my case, you had lightly used Python notebooks before. It turns out that a simple `!cd /drive/My\ Drive/repos/SomeFolder` is not enough, because, for some reason that I didn’t take the time to google, the command has to be executed with magic, meaning that the command needs to use a percentage sign instead of an exclamation mark. This is the command that you need to execute:
%cd drive/My\ Drive/repos/[Your_Folder]
Cloning the repo
Et voilà! Now execute !pwd (without magic) and you are on your folder. Now you can simply clone as you have ever tone. Except that if you do, eventually when you push you will need to change the remote’s URL so it includes your credentials. So you may as well already clone using the URL that includes the credentials. So, say that your repo exists in https://github.com/you/your_repo.git, your URL will have to look like this: https://your_username:your_password@github.com/you/your_repo.git. So yeah, try to not share your Python Notebook anywhere public, keep that Drive folder private to you. Also, your username and password have to be URL encoded, meaning that if they include special characters, you need to write them differently —you know, how spaces are %20 and # (which my password by no means includes) are %23. Cloning one of my repos, then, looks like this:
Opening a Python Notebook
Now you can see all those ipynb files listed right there, so close... and yet so far. Because there is no double-clicking or right-clicking to open.
But they are now over on your Drive folder, so all you have to do is open drive.google.com, navigate to that folder, select the file, and click to open with>Colab.
If you are on an iPad, like it was my case, doing so from the Drive app won’t work, but you can open drive.google.com from the browser, make sure that you request the desktop site (you can do so on Safari by long pressing on the reload button or on Chrome by clicking the ellipsis at the top right corner) but only once you are on the folder where your ipynb file is because otherwise, you won’t be able to navigate into folders.
And done, now you can work on that file, edit it, and execute git commands as you have always done, including `!git push origin [your_branch]` to send everything over to the remote, from Colab, from your iPad, while your MacBook Pro sits in a dark room on some random Apple Store, alone, missing you.