Since I’ve been leading our team’s OKRs, I’ve been collecting check-in surveys from my colleagues. I currently use Google Forms for the surveys - I’ve tried others as well, but Google Forms seem to work best1.
Since the data of the Google Form is automatically transferred to Google Sheets, I could’ve directly used Google Sheets’ Excel-like features to get some insights.
Initially I tried to do that.
I could do trivial things like =SUM()
, getting the mean
values, and transforming them into graphs. But as soon as the queries became complicated, the functions became longer and fiddlier, and it proved to be a challenging task. There was a lot of googling, transposing, reformatting to get one thing done.
Even when I could display some graphs with basic functions, there was overhead to restrict everyone from accessing the graphs. I would have to publish individual sheets. I would need to hide the orginal surveys because emails would be on them.
So there were two things I needed to solve:
Python is a great language for any data wrangling and analysis. It has many libraries, such as pandas
, SciPy
and numpy
. It also has a library for accessing Google products and converting them to df
- a special data format in pandas
. Python also works well with Jupyter Notebook, which is an open source web application that documents and displays code. Notebook files generated by Jupyter can also be displayed in Github without any configuration.
The generated IPython file becomes static once pushed into the repo. The import
statements and output of print(x)
would be displayed. The x = y
will remain as strings on a screen. Even if a Github user forked the repo, they wouldn’t be able to replicate the output, unless they had all the files necessary.
With that in mind, I’ve developed a small Jupyter Notebook that imports Google Sheets contents. This was originally created as a private repository for work, so results have been removed entirely from the public version of the repo.
The code in the Jupyter Notebook:
df
- the data format from the pandas
librarydf
You can read more about how to use it in the repo.
You need to have Python3, a GCP account and some data in your spreadsheet.