Jupyter and Google Sheets for Analytics

I hate spreadsheets

Since I’ve been leading our team’s OKRs, I’ve been collecting check-in surveys from my colleagues. I currently use Google Forms for the surveys - I’ve tried others as well, but Google Forms seem to work best1.

Since the data of the Google Form is automatically transferred to Google Sheets, I could’ve directly used Google Sheets’ Excel-like features to get some insights.

Initially I tried to do that. I could do trivial things like =SUM(), getting the mean values, and transforming them into graphs. But as soon as the queries became complicated, the functions became longer and fiddlier, and it proved to be a challenging task. There was a lot of googling, transposing, reformatting to get one thing done.

Even when I could display some graphs with basic functions, there was overhead to restrict everyone from accessing the graphs. I would have to publish individual sheets. I would need to hide the orginal surveys because emails would be on them.

So there were two things I needed to solve:

  1. Use programming to get insight
  2. Make it accessible to others without exposing the identity of the survey responder

Jupyter Notebook with Google Sheets

Python is a great language for any data wrangling and analysis. It has many libraries, such as pandas, SciPy and numpy. It also has a library for accessing Google products and converting them to df - a special data format in pandas. Python also works well with Jupyter Notebook, which is an open source web application that documents and displays code. Notebook files generated by Jupyter can also be displayed in Github without any configuration.

The generated IPython file becomes static once pushed into the repo. The import statements and output of print(x) would be displayed. The x = y will remain as strings on a screen. Even if a Github user forked the repo, they wouldn’t be able to replicate the output, unless they had all the files necessary.

With that in mind, I’ve developed a small Jupyter Notebook that imports Google Sheets contents. This was originally created as a private repository for work, so results have been removed entirely from the public version of the repo.

The code in the Jupyter Notebook:

  1. Authenticates a GCP Service User to access Google Sheets
  2. Gets the data from Google Sheets, which is imported as df - the data format from the pandas library
  3. Hashes and removes the email from the df
  4. Creates graphics from the data
  5. Displays and writes out the images as SVG, using the Orca (image generation) server

You can read more about how to use it in the repo.

Requirements

You need to have Python3, a GCP account and some data in your spreadsheet.


  1. Survey Monkey is hard to use for recurring surveys in the free-tier and SmartSheets, provided by the company, is a nightmare to use. Forms are incredibly hard to make, especially without the feature of cloning columns and it’s form settings. Google Forms on the other hand gives me a nice UI to manage the survey contents. It also shows analytics (which only worked the first time round - because I wasn’t going to create a new survey for every OKR check-in), with the capabilities of spreadsheets, which is automatically updated when new responses come through.