My Take on Managing Certbot with Ansible

Intro

Given there is a server that hosts few personal project, what is the most efficient way to let the projects use https? Let’s Encrypt (further on referred to as LE) is of course the answer; however, deploying certbot that automates obtaining and renewing TLS certificates appeared not so easy as it might seem at a glance, so I’d like to share my experience.

The Setup

  • A virtual machine with Debian 10 (“Buster”) on board;
  • HTTP server: nginx;
  • Ansible is used to maintain the machine and test the changes on a local one before applying them to production.

The First Challenge: Picking Challenge

There are multiple challenge types that LE supports:

  • HTTP
  • DNS
  • TLS-ALPN

Just few words about them.

HTTP

Requires an HTTP server running and responding to the requests designated for the host being verified. Certbot places a file with a random name in a .well-known/acme-challenge directory, and LE server queries the file. If it finds it and the content of the file is correct, the challenge is passed.

DNS

Requires creating a TXT record for _acme-challenge subdomain of the domain being verified with a random value. Assuming that the domain being verified is example.com, ACME server will query TXT record for _acme-challenge.example.com, and if its content equals to the expected random string, the challenge is passed.

TLS-ALPN

Requires a server that implements ACME TLS ALPN, which likely means “a custom piece of software running on port 443 of the host that the domain being verified points to”.

So, TLS-ALPN is not an option: we want nginx to listen to 443, and it likely won’t implement the mentioned TLS ALPN in the nearest future. That leaves us to options.

DNS challenge is really convenient in some cases because it doesn’t require any interference with HTTP server configuration. It’s also the only option for wildcard certificates and the only decent option if you have more than one server terminating SSL/TLS for your application. However, it requires access to changing DNS records, which is not something you usually want to put to the server where the application is hosted. Also, DNS providers don’t always allow to restrict API access to just a single subdomain (of few of them) and a specific record type, which means it would require to keep API token for changing the whole DNS zone on the server where the application runs, which apparently imposes some security risks, so let’s explore the only remaining option. If you know any DNS providers that allow restricting API access with enough granularity to make described challenge secure other than Amazon Route 53, please drop a comment. If you’re looking for one, check out Route 53.

HTTP challenge doesn’t need anything special: it just requires an HTTP server on the host the domain being verified points to, which we’ll have there anyway. Sounds like a win, doesn’t it? At a glance, it does. Let’s declare this kind of challenge the best fit for our use case and try to use it.

The Second Challenge: Learning Concepts

When switching from DNS challenge that I ran on a separate host and then deployed a certificate to the target host, to HTTP challenge that wouldn’t require any “separate host,” I started from my own assumptions, and it took me almost a full day to understand that I got things somewhat wrong and had to start over. Here are main conclusions:

  1. If you don’t want to maintain few configurations for your web server, you’ll have to get creative.
  2. If you want to sign your own CSR, you better drop this idea.
  3. If you want your certificates where you want them to be, you better drop this idea, embrace certbot’s locations, and just symlink to them.

More details on these points below.

The Flow

Certbot issues a certificate, saves the configuration it issues it with, and then, when invoked with crontab or another scheduler, traverses these “saved configurations,” for each of them checks certificate’s expiration date, and it’s less than 30 days ahead, renews the certificate for another 90 days. So, we only have to run it once to issue the certificate properly, set the crontab (which debian package maintainers already did for you), sit back and relax.

Now, to the issues.

Race Condition

This is the only one I expected.

So, you probably want to configure nginx to serve your website over HTTPS and redirect requests that come over HTTP to HTTPS. So, if you use ansible or any other configuration management tool, you likely have a template where all these things are put together, e.g. in a single file. So, when you try to run a playbook (recipe, whatever it’s called), it will just leave you with a broken nginx because it won’t start until the certificates are there. However, in order to get certificates, you need a web server that serves files required for the challenge over HTTP. Do you see a circular reference here?

There are 2 ways I see that can help break this circle:

  1. Until a certificate exists, deploy only HTTP part of the configuration, run nginx, pass the challenge, get the certificate, put the rest of the configuration (for https), reload nginx. That would be the way to go if I were to do it all manually. I didn’t want to break the configuration into pieces without a reason induced by the configuration itself, though.
  2. If there’s no certbot’s certificate, put a self-signed “stub” certificate just so that nginx could start, issue a real certificate, replace the stub with a real certificate, reload nginx. This is what I ended up doing. Having a little trick to work around someone’s (cough certbot’s cough) tricky behavior is (arguably) better than adjusting my processes to play along with that tricky behavior.

Don’t Use Your Own CSR

I totally didn’t expect this one.

As I needed a self-signed certificate at some point, I thought that it would be nice to use the same CSR for both the self-signed certificate and the real one with the only difference that in the former case I sign it myself and in the latter case LE signs it for me when I pass the challenge. While it works in general, there’s a huge pitfall nobody warns about. Flow with custom CSRs is considered to be “fire and forget.” Certbot doesn’t want to manage such certificates, so it won’t be able to renew it automatically; it only can issue them once; it doesn’t create “saved configuration” for them, certbot certificates doesn’t list them as there’s no configuration, so it won’t be checked when the scheduler asks certbot to do its job, and they won’t be renewed. I neither know why, nor why it’s not mentioned anywhere is the docs.

I only saw this mentioned explicitly once in a thread on LE comminity forum.

Don’t Cross The Path of Certbot

I thought it would be convenient to pass a filename to certbot and get a certificate in a location I wanted things to be; however, it just doesn’t work. Well, it kind of works, but not really. Certbot has convenient options that allow setting file names to write certificates into explicitly; however, it doesn’t always respect them and doesn’t put them into a renewal configuration. I found a similar complaint on LE community forums which makes me think it doesn’t work for other people too, not just for me. The only advice out there was to also override a config directory which I don’t want to do because it would break “out of the box” renewal: I install certbot from a Debian package, and the package maintainer provides also a cron job and a logrotate config that is used instead of a built-in log rotating capability. So, if I overrode config dir, I would have to set my own cron job which uses the same non-default config path and keep rotating logs in mind. Nope, going to avoid that; hence, will have to use the default paths and place symlinks to them in “my place.” The only thing to remember here is that for predictability, I advise specifying certificate name: it will somewhat guarantee the location within certbot’s config dir: real certificates will reside in /etc/letsencrypt/live/CERTNAME.

Will There Be Code?

This journey was less trivial than I expected it to be, so I decided to publish the result as an Ansible role. It likely won’t be useful to you as is (I don’t have plans to grow, thoroughly test, and maintain it) but there’s a chance you’ll pick up few ideas from there. Also, when it’s published, it’s easier for me to reuse it across my own projects, so here it is:

Alexander Kurilo
Systems Architect
comments powered by Disqus