In this post, we are going to talk about the basics of Colab. Code for this post is available here.

These basics include:

Using Colab environment as a command line

By using an exclamation mark in front of any command, Colab will treat it as a terminal command. For example, you can use !ls to list the files in the current directory. Here are a few examples:

  1. !wget "http://datasets.d2.mpi-inf.mpg.de/MPIIGaze/MPIIGaze.tar.gz" for downloading the file.
  2. !tar -xzvf MPIIGaze.tar.gz for unzipping the downloaded file.
  3. !pip install pydrive2 for installing Python libraries.
  4. !apt-get install tar for installing command line dependencies.
  5. !rm MPIIGaze.tar.gz for removing the file MPIIGaze.tar.gz from the Google Colab server.

Reading a file from Google Drive into the Colab environment

To read files from Google Drive in Colab, you first need to grant Colab access to your Drive. Use the following code snippet to authenticate and access your Drive. Note that you need to have a Google account to do this. Once you run the following code snippet, a screen will pop up.

import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

Once you give access to Colab, now it’s time to specify the name of the folder that you want to read its contents. You can access any folder and files in your Google Drive by its ID. You can find the id of any folder or file in your Google Drive by right clicking on it and choosing Get shareable link. The link will be something like this https://drive.google.com/open?id=1Z2X3Y4Z5X6Y7Z8X9Y0Z1X2Y3Z4X5Y6Z7. The id of the folder is the part after id= which is 1Z2X3Y4Z5X6Y7Z8X9Y0Z1X2Y3Z4X5Y6Z7 in this example. You can use the following code snippet to read the contents of a specific folder in your Google Drive. Alternatively, you can use the name of the folder instead of its id. In this case, you need to use the following code snippet. In the case below we are searching for a folder with the name test_folder in the root of the Google Drive. For your convenience, I have already created a test_folder that is located [here] (https://github.com/ROZBEH/rozbeh.github.io/tree/main/posts_script/colab_post/test_folder). Please upload this folder with it’s content to your drive, if you’d like to follow the example below.

# Query to search for a specific folder by its name
folder_query = "title = 'test_folder' and mimeType = 'application/vnd.google-apps.folder' and trashed=false"
folder_list = drive.ListFile({'q': folder_query}).GetList()

# Iterate through the found folders and print the name of the files in that folder
for folder in folder_list:
    file_query = f"'{folder['id']}' in parents and trashed=false"
    file_list = drive.ListFile({'q': file_query}).GetList()

    for file in file_list:
        print('File Name: %s, File ID: %s' % (file['title'], file['id']))

Once you read this specific folder, all the files will be stored inside folder_list. In order to read some specific files you can use commands like the following one. This command will download the files from your Google Drive to the local folder on server. The will be stored in a folder like /content/data/ on the server. You can change the path of the local folder by changing the value of local_download_path variable. The following code snippet will download all the files in the folder test_folder of the Google drive to the local folder on server.

local_download_path = os.path.expanduser('~/data')
if not os.path.exists(local_download_path):
  os.makedirs(local_download_path)
for f in file_list:
  print('title: %s, id: %s' % (f['title'], f['id']))
  fname = os.path.join(local_download_path, f['title'])
  print('downloading to {}'.format(fname))
  f_ = drive.CreateFile({'id': f['id']})
  # Check the MIME type and set a proper extension if needed
  mimetype = f_['mimeType']
  if mimetype == 'application/vnd.google-apps.document':
      fname += '.docx'
      f_.GetContentFile(fname, mimetype='application/vnd.openxmlformats-officedocument.wordprocessingml.document')
  elif mimetype == 'application/vnd.google-apps.spreadsheet':
      fname += '.xlsx'
      f_.GetContentFile(fname, mimetype='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
  elif mimetype == 'application/vnd.google-apps.presentation':
      fname += '.pptx'
      f_.GetContentFile(fname, mimetype='application/vnd.openxmlformats-officedocument.presentationml.presentation')
  else:
      # For other file types, try downloading directly
      try:
          f_.GetContentFile(fname)
      except Exception as e:
          print(f"Error downloading file {fname}: {e}")

  print(f"Downloaded '{fname}'")

Once you download the files, you can read them using the following code snippet. Let’s say we are reading the file dev_brown.txt from the folder test_folder in the Google Drive.

text_file = open ("/root/data/dev_brown.txt",'r')
lines = text_file.readlines()

Reading file from your local machine to Colab environment

If you have a file in your local machine and you want to upload it to the Colab drive, use the following code snippet.

from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Downloading file to your local machine

If you want to save some files from Colab drive into your local machine, use the following commands.

from google.colab import files
with open('example.txt', 'w') as f:
  f.write('some content')
files.download('example.txt')

Using GPU

If you would like to use GPU in your Python code. Go to Runtime on the menu and choose Change runtime type and then choose GPU in the Hardware accelerator.

Notes

  • If you want to connect the Colab to your local machine, try to use this link. https://research.google.com/colaboratory/local-runtimes.html

  • It is not to connect to a workspace on google Colab while another code is still running on the same work space! You should wait for the previous one to finish and then run your code!
  • You have 12 hours to finish your task on Colab otherwise all of your files and results will be deleted.