Docker network and DNS issues with Traefik HTTP routing (Python, Django, FastAPI)

I work with a Web development agency, and we run a Docker Compose setup for developing typical Django and FastAPI applications. This setup consists of a frontend, backend, database, an authentication and proxy container. Unfortunately, the network part of Docker appears to block traffic – only on NixOS. :worried:

The Docker Compose Setup

A simplified version of the setup would look like this:

# docker-compose.yml
services:
  auth:
    image: ghcr.io/zitadel/zitadel:stable
  backend:
    build:
      context: ./backend
  db:
    image: postgres:17
  frontend:
    build:
      context: ./frontend
  proxy:
    image: traefik:v2.10
    command:
    - --configFile=/traefik/config/static.yaml
    ports:
    - "443:8443"

The proxy setup is inspired by the Traefik proxy setup for local development from the Docker documentation. The traefik configuration routes requests at *.localhost:443 (or *.localhost:80) from your local Webbrowser to the services running with Docker Compose.

# Traefik providers configuration
http:
  routers:
    auth:
      rule: |
        Host(`{{env "AUTH_DOMAIN"}}`) ||
        HostRegexp(`{subdomain:[a-z]+}.{{env "AUTH_DOMAIN"}}`)
      service: auth
    backend:
      rule: Host(`{{env "BACKEND_DOMAIN"}}`)
      service: backend
    frontend:
      rule: Host(`{{env "FRONTEND_DOMAIN"}}`)
      service: frontend
  services:
    auth:
      loadBalancer:
        servers:
        # h2c is the scheme for unencrypted HTTP/2
        - url: "h2c://auth:8080"
        passHostHeader: true
    backend:
      loadBalancer:
        servers:
        - url: "http://backend:8000/"
    frontend:
      loadBalancer:
        servers:
        - url: "http://frontend:3000/"
tls:
  certificates:
  - certFile: /traefik/tls/cert.pem
    keyFile: /traefik/tls/key.pem

As a developer, you bring up the setup by running docker compose up and can then simply navigate between https://www.localhost, https://backend.localhost and https://auth.localhost, as if they were deployed out there in the Internet, which is veeery handy!

Works on other systems but not on NixOS

This has been proven to be a rocket-solid setup on macOS, Ubuntu Linux developer laptops and even runs on Windows in WSL. It all works: DNS resolution, network traffic, SSL certificates. With NixOS, unfortunately, we’re only somewhat 98% there.

$ docker compose up
...

The backend container, which runs a Python web application (either Django or FastAPI) and is designed to make an authentication request against the auth container, fails to connect through the proxy. With FastAPI (utilizzando fastapi-zitadel-auth) you get a

httpx.ConnectTimeout

… with Django (utilizzando django-allauth) you get a

requests.exceptions.ConnectTimeout

Inside the backend container the various tools interestingly behave in a very different manner (e.g. cURL ignores /etc/hosts and only uses the resolver defined in /etc/resolv.conf, while ping and nc (netcat) honor the former), but the traffic seems to reach the proxy service, which translates the request trying to forward it to the target service, which then times out.

Straightforward Docker setup

Docker is set up on my laptop running NixOS unstable the obvious way:

{ pkgs, ... }: {
  environment.systemPackages = [ pkgs.gnome-boxes pkgs.vagrant ];
  virtualisation = {
    docker.enable = true;
    docker.extraPackages = [ pkgs.docker-buildx ];
    libvirtd.enable = true;
  };
  users.users.peter.extraGroups = [ "docker" "networkmanager" "wheel" ];
}

I originally tried to use rootless Docker, but that proved to cause additional networking issues, and I should use Docker instead of Podman to ensure everyone on the team uses the same setup. (Note: I kept libvirt, vagrant and boxes in the code snippet above just in case someone recognizes any known side-effects.)

Any ideas why the network traffic might be blocked?

I found a temporary workaround to make things work for development:

$ sudo iptables -F

This seems to be equivalent to dropping the shield entirely. :shield: :open_mouth: The man page says:

-F, --flush [chain]
Flush the selected chain (all the chains in the table if none is given). This is equivalent to deleting all the rules one by one.

From a user perspective, this explains that we are too restrictive with the default Docker networking setup. It would be good to get the Docker experience en par with mainstream Linux distros and macOS. :penguin: :green_apple:

System safety is great, but the “Docker promise” to have a uniform behavior across distros and platforms is another. Currently, NixOS breaks this promise. A Compose setup suggested by the official Docker documentation doesn’t work, because traffic on the Docker gateway is partly blocked. :construction: