At Bugout, we are heavy users of the GitHub API. We primarily use it for data analysis. GitHub’s personal access tokens have worked really well for this purpose.
Last week, that changed. We released Thumbsup, a web service which summarizes GitHub issues and Stack Overflow questions. We use Thumbsup to enrich search results on our search engine, Bugout. You can also use it directly to summarize long GitHub issues like this one.
We built Thumbsup as a GitHub App. In the process, we perceived a lack of good guides on setting up GitHub Apps. We hope this blog post can help anyone else taking the same journey.
Thumbsup is written in Python as a Flask service. We have released the project under the Apache 2 license and you can view the code on GitHub at https://github.com/simiotics/thumbsup. Even if you are using a different tech stack, the code may help you as a guide to the semantics of creating a GitHub App.
Why not personal access tokens?
Every time you request at Thumbsup summary, Thumbsup makes two GitHub API requests – one to get the issue itself, and a second to get the comments. Thumbsup can be used freely by anyone via Bugout or Thumbsup itself, which means that there is a certain degree of unpredictability when it comes to available rate limit. If we used the same personal access tokens for Thumbsup that we did for our dataset builders, we would give up control on how quickly we blew through our rate limit for the purposes of data gathering.
Additionally, exposing personal access tokens through a public application like Thumbsup is dangerous because you could potentially attach too many permissions to the access tokens. This could, for example, allow Thumbsup users to view private issues in our GitHub organization. This is a security hole we would rather avoid.
For these reasons, it was clear that using personal access tokens to authenticate Thumbsup’s GitHub API requests was a bad idea. We had to change, but what to change to?
There are four different ways to identify yourself to the GitHub API:
- If you want to access public resources, you can make unauthenticated API calls but are bound by a rate limit of 60 requests per hour.
- You can use a personal access token, which authenticates you as your GitHub user. This expands your rate limit to 5000 requests per hour.
- You can use a GitHub App, which starts with its own rate quota of 5000 requests per hour. Once a GitHub App has 20 users, it receives an additional 50 requests per hour per user up to a maximum of 12,500 requests per hour.
- You can use a GitHub OAuth app, which acts on behalf of users who authorize it and assumes their rate limit (5000 requests per hour).
Personal access tokens have already been ruled out, and we needed a higher rate limit than unauthenticated requests would give us. That left us with a dilemma.
GitHub App or GitHub OAuth App?
GitHub has a long discussion of the difference between a GitHub App and an OAuth App. The TL;DR version:
- Create an OAuth app if you want to authenticate or act on behalf of individual GitHub users. Examples: Any app that allows you to “Login with GitHub”, GitHub Desktop.
- Create a GitHub App if you want to operate at the organization or repository level. Examples: Dependabot, TravisCI, codecov.
In our case, Thumbsup does not need any organization or repository permissions (as it currently works with public resources), so neither of these conditions apply. However, authentication is a little simpler with GitHub Apps. Also, with GitHub Apps, you can also set up personal installations which are not available to other users. These personal installations are very useful to set up services that work with public GitHub resources. This is why we elected to build Thumbsup as a GitHub App.
If Thumbsup grows to the point that it is making thousands of requests per hour against the GitHub API, we may have to add an OAuth flow. We could also potentially solve the problem by caching GitHub responses on our server, or by creating a few different GitHub Apps and cycling through their access tokens on the server. Our policy on this (in terms of timing) is that problems of scale should be dealt with at one-tenth scale.
Authentication with GitHub Apps
Authenticating against the GitHub REST API with a personal access token is easy. All you have to do is add the following header to your API requests:
Authorization: token ${PERSONAL_ACCESS_TOKEN}
Authenticating with a GitHub App adds an extra layer of obfuscation. The App itself only has permission to get information about its installations. You have to authenticate as an installation to access other resources.
You first use a JWT for the GitHub App to generate an access token for the installation.
This means you have to generate the JWT to make the access token request. The GitHub API docs show how to do this using a simple Ruby script. If you are using Python, you can see how Thumbsup generates the JWT, as well. The key is that you will need the app id and the private key for the app, both available from the application page on GitHub.
You will also need the id of the installation that you want to generate an access token for. You can view all the installations of your app using:
curl -X POST \
-H "Accept: application/vnd.github.machine-man-preview+json" \
-H "Authorization: Bearer ${GITHUB_APP_JWT}" \
https://api.github.com/app/installations
Finally, assuming you have stored the JWT token as GITHUB_APP_JWT
and the installation id as INSTALLATION_ID
, you can make the POST request for an access token:
curl -X POST \
-H "Accept: application/vnd.github.machine-man-preview+json" \
-H "Authorization: Bearer ${GITHUB_APP_JWT}" \
https://api.github.com/app/installations/${INSTALLATION_ID}/access_tokens
The response body is a JSON object whose “token” key is the access token you need.
Now that you have the access token, you can use it to make requests against the GitHub API the same way you would a personal access token, by adding the following header to your REST API calls:
Authentication: token $ACCESS_TOKEN
There is one major difference between this installation access token and a personal access token. The installation access token expires after an hour. If you are building an application, like Thumbsup, which is always operational, you have to figure out how to refresh this access token.
Refreshing the access token
Thumbsup runs behind gunicorn in production. This means that every request spawns a separate worker process. It is easy enough to regenerate the access token before it expires – in Thumbsup, we do this using a systemd timer (here’s the associated service). The only fiddly bit is making sure the gunicorn worker processes have access to the latest access token. We do this by having the access token generator write the token atomically to a specific file.
Each worker can now read the token from the file that the token generator stores to before it calls the GitHub API.
If you are building your own GitHub Apps, we hope this guide has been of some help to you.
We invite you to try out the alpha version of Bugout: https://alpha.bugout.dev
Thumbsup is live at: https://thumbsup.bugout.dev
The Thumbsup source code is available on GitHub: https://github.com/simiotics/thumbsup