Prevent Let's Encrypt failed authorizations with Ansible
It happens once every few years. Because of whatever reason I request another VPS at a service provider, provision the machine with Ansible and deploy a few services, usually as docker containers. And what? They don’t work, unfortunately. In 99% of the cases, I forgot to update the DNS records and with Traefik and Let’s Encrypt, I hit the rate limit while investigating and I can’t obtain any new certificate the upcoming hour. With a few lines in Ansible, this won’t happen again. Hopefully.
The goal is quite simpel: with Ansible I can deploy any service like a static site or web application in “one click” (or with a single command in my terminal). I just need to be sure the cert renewal process will succeed.
The condition to assert is simple: is the host’s IP address equal to the resolved address of the FQDN for the service I want to deploy? Please note this is a simple and straightforward approach and will be different in cases you work with clusters, CDNs etc. But then I’m sure you have your DNS management at a more professional level than me, as a guy who just wants to run his own things for certain stuff.
My choice here is to deploy the container, configure
it completely but just make sure the container is in a stopped
state. You
could manage it differently, by failing the deployment and letting you correct
the DNS settings first, but then you might be blocked by DNS TTL before you can
configure your container.
Another way might be to make Ansible configure the DNS settings for you, but this feels too dangerous to fail, I haven’t seen any Ansible module to interact with Digital Ocean’s networking API and you still have to deal with DNS TTLs.
DNS queries with Ansible
You can query domains with Ansible using the community plugin dig
. Note it
does require dnspython
first at the local controller node so I had to
install the package python3-dnspython
on my Ubuntu laptop first.
Then the Ansible logic is quite simple:
-
Query DNS A record for the service you’re deploying
-
Query DNS A record for the host machine you’re deploying to
-
Assert if both records are equal. If true, set a variable for the container state to
started
otherwise make itstopped
-
Configure container with given state from above
In Ansible yaml code the tasks would look like:
- name: Query DNS lookup for domain {{ myservice_fqdn }}
set_fact: service_ip="{{ lookup('dig', myservice_fqdn)}}"
- name: Query DNS lookup for host
set_fact: host_ip="{{ lookup('dig', ansible_host)}}"
- name: Set container service state based on DNS configuration
set_fact: >
container_state="{% if(service_ip == host_ip) %}
started{% else %}
stopped{% endif %}"
- name: Check DNS configuration for {{ myservice_fqdn }} is correct
fail:
msg: >
"Warning: Service is configured at '{{ myservice_fqdn }}'
but it resolves to '{{ service_ip }}'.
This does not match the host '{{ ansible_host }}'
which is located at '{{ host_ip }}'.
The container state is set to '{{ container_state }}'
to prevent it from starting
and flooding Let's Encrypt renewal requests."
when: service_ip != host_ip
ignore_errors: True
- name: Create the container
docker_container:
name: "{{ docker_container }}"
image: "{{ docker_image }}"
pull: yes
state: "{{ container_state }}"
restart_policy: unless-stopped
networks_cli_compatible: yes
networks:
- name: "{{ traefik_docker_network }}"
labels:
traefik.enable: "true"
traefik.http.routers.my.entrypoints: "websecure"
traefik.http.routers.my.rule: "Host(`{{ myservice_fqdn }}`)"
traefik.http.routers.my.tls: "true"
traefik.http.routers.my.tls.certresolver: "le"
The tests all went successfully. Now I’ve to wait a few years to acertain this was the right & only thing to prevent me making these mistakes again.