What is a Gitpod?

Webinars about software can be made compelling coding live. Engagement can be further increased if the audience can code along with the presenter. Getting this all in place though, can be tricky and can be self-defeating since the audience can be lost: if you impose pre-requisites like software setup on them or waste several early minutes on that setup in the webinar.

In looking for a solution to this I was reminded of Cloud9 which I’d dabbled with a couple of years ago. Then they were a startup who provided an excellent online IDE. I recalled they had a cloning feature. Perhaps I could get everything working in Cloud9 and distribute a clone. It turned out that Cloud9 had been bought by Amazon and made into an AWS Service: “Fine!”, I thought - however it seems there have been some integration problems - it was not straightforward. I would probably eventually have been able to make it work, possibly distributing an AWS AMI instance with the algo implemented using Cloud9. I’d then place it free in the AMI Marketplace.

However I then saw a Register article about VisualStudio Codespaces which provides an on-line IDE on top of Azure. I experimented here too - but there were some difficulties, such as with cloning/sharing.

So Amazon AWS was complicated, Microsoft Azure didn’t quite cut it - what about Google Cloud? The same article also mentioned, in passing, Theia - a framework for online IDEs. I checked this out: it combines with Gitpod to do what I needed. As their site says they “…eliminate(s)… friction by providing prebuilt, ready-to-code dev environments with a single click”. These instances are indeed on Google Cloud.

What’s A Gitpod?

Gitpod provides a Theia IDE using VS Code as editor who’s contents it keeps track of. It provides a full operating system environment to run code developed there.

Gitpod has many features and these are evolving over time - see their page. Fundamentally though a Gitpod is a Docker instance of Ubuntu 20.04 Focal Fossa. The user is gitpod who is not in sudoers, which complicates installation. Since it is a Docker instance, parts of installations that need root privileges can modify the associated Dockerfile. A lot can be done without much Dockerfile configuration - it depends on the circumstances what should be done there and what on the command line and in code.

While the Gitpod has user account gitpod with home directory /home/gitpod however the data that is part of the saved and shareable workspace is only the initial Docker instance plus the contents of the workspace. When the Gitpod is cloned or shut down it only retains changes in its workspace. The workspace is everything in the hierarchy starting at /workspace/your-workspace-name.

Gitpod and Github

Gitpod requires and integrates with Github. You must first create a (free) Github account. Any Github repo can be used to produce a Gitpod by forming a simple URL: for a GitHub repo like github.com/mygroup/myproject, the URL https://gitpod.io/#https://github.com/mygroup/myproject will create a GitPod containing a hierachy /workspace/myproject under which the GitHub project will be checked out.

Gitpod supplies a Google Chrome extension. This causes a green “Gitpod” button to appear in GitHub to the right of the “Clone or download” button in the upper-right of a repo. This is convenient, but in fact all it does is direct the user to the appropriate URL as above.

How It Works

When you hit the Gitpod link it creates a Docker container around your GitHub code and activates it in a Google Cloud Ubuntu instance. This is presented to you in your browser within the Theia IDE and VS Code editor - this takes a minute or so (or perhaps longer if you have an extensive Dockerfile). The IDE provides a Terminal interface. At this point you have all the facilities of any Ubuntu account owner would (but no sudo) plus whatever you installed with your Dockerfile.

The Dockerfile

There is something of a chicken-and-egg situation with the Dockerfile. I’ll take a moment to describe this since it shows the typical way you can work with Gitpods.

You create a Gitpod from some GitHub repo. You can just as well create a new one and use that. However, if you need specific base software for your project, such as a database or web-server from practical purposes you need root (the superuser) to install them (it is possible to install them as a normal user, but this is cumbersome and time-consuming and not normal practice). Normally in Ubuntu this would be done using sudo - however the gitpod user is not in sudoers so sudo can’t be used.

The solution is the Dockerfile. In GitHub it is placed in your workspace - i.e. if your workspace (and the name of your GitHub project) is my_workspace, the Dockerfile will be in /workspace/my_workspace. It is named .gitpod.Dockerfile and must be referred to in the .gitpod.yml (which we’ll come to). Within the Dockerfile we can set up any software installation requrired, such as sudo apt install nginx or similar since within the Dockerfile the user gitpod is in sudoers. Any user can be specified to execute the commands here. See the Dockerfile docs.

Typically it is during development in your Gitpod that you realize conceive of additional software you need and that the effort to install it locally is too great. However at this point you are in a running Docker instance - the Dockerfile has been executed - what to do? The easiest is to commit and push your new .gitpod.yml and ‘.gitpod.Dockerfile’ back to your repo and just close and abandon your current Gitpod - and just start another one (e.g. by clicking on the “Gitpod” button that the Gitpod Chrome extension created)! This is so easy and cheap and quick that it becomes part of the process of using Gitpod: just throw it away and do it again. Note however that there’s a limitation of 4 simultaneous running Gitpods. You may need to go to Gitpod (perhaps log in) and thence to https://gitpod.io/workspaces/ and click “Stop” on one of the other Workspaces.

Stop Workspace

Gitpod Stop and Start

Between start-ups of Gitpods, the state, other than in the workspace itself is lost. I need to be very clear on what happens here and why it’s useful: what you can do is a whole lot of development in your Gitpod and various runs of systems that gather data - for example, you can extract and store trading data into a PostgreSQL database. This data is retained between starts. So you can execute a set of time-consuming operations with the state of these stored by the gitpod user by the software being used (like a database) - and it will be available to every copy of that Gitpod that is distributed. This is a very powerful and flexible solution.

There are some processes whose data is not retained however - those that store it outside the Workspace. An example is the setup of Python pre-requisites based on locally created libraries. In Getting a Cryptalgo Running I give an example for the Python module preparation of talib. This is done using the .gitpod.yml file. In this file you can specify operations to be performed on every startup, e.g.

tasks:
  - init: echo "Not doing anything with 'init' right now!"
    command: | 
      cd /workspace/gitpod/ta-lib-master && python3 setup.py install && cd /workspace/gitpod/cryptalgo
      export LD_LIBRARY_PATH=/workspace/gitpod/.local/lib
image:
  file: .gitpod.Dockerfile

Distribution

As I mentioned, a great feature of Gitpods - perhaps the killer feature - is the ease by which they can be distributed. You simply go to the top-right user icon and “Share Workspace Snapshot”

Workspace Snapshot

The URL created can now be shared around - whoever clicks on it gets a new Ubuntu instance, created by the Docker that will allow them to come to the same state as the developer who created the Gitpod. It’s the immediate sharing of software in a defined runtime state.