Something is footgunning around dns-lookups

Hi there,

this is my first install of nixos (gnome edition) and instantly it delivers something special → a problem around dns-lookups … but maybe its something else.

(may be related to DNS Lookup Problems)

Since my first start i can visit websites only sometimes. Lets say ddg.gg (duckduckgo) or youtube
At first the browser says “not available”
5 Minutes later it works and i can watch videos etc
another 5 mins later youtube stops to work, but ddg.gg still works - or the other way around. URLs work or cease to work randomly over time … Sometimes a website loads, sometimes not. This is browser behaviour since my first install 2 days ago.

First i made sure my router works properly and my other machines can visit the same websites on the same network at the same time without any outages. Nixos is connected via cable (Ethernet). No VPN here.

i placed

   networking.nameservers = [ "1.1.1.1" "9.9.9.9" ];

into /etc/nixos/configuration

my /etc/resolve.conf now is:

# Generated by resolvconf
search Speedport_W_723V_1_41_000
nameserver 192.168.2.1
nameserver fe80::1%enp2s0
nameserver 1.1.1.1
nameserver 9.9.9.9
options edns0

No change in behaviour.

Then i started ping’ing and dig’ging. After nixos start, i open terminal and ping ddg.gg successfully. Then i open firefox and it can’t connect to ddg.gg. Then i go back to console and ping ddg.gg fails. 10 minutes later it works again …

fascinating thing is that dig resolves the correct ip address while ping fails :

[water@nixos:~]$ ping blog.fefe.de
ping: blog.fefe.de: Name or service not known

[water@nixos:~]$ dig blog.fefe.de
;; Warning: Message parser reports malformed message packet.

; <<>> DiG 9.18.24 <<>> blog.fefe.de
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60199
;; flags: qr aa rd ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;blog.fefe.de.			IN	A

;; ANSWER SECTION:
.			0	CLASS1232 OPT	10 8 tFncTcYkF8Q=

;; ADDITIONAL SECTION:
blog.fefe.de.		78467	IN	A	31.15.64.162

;; Query time: 0 msec
;; SERVER: 192.168.2.1#53(192.168.2.1) (UDP)
;; WHEN: Tue Mar 12 16:31:33 CET 2024
;; MSG SIZE  rcvd: 69

[water@nixos:~]$ ping blog.fefe.de
ping: blog.fefe.de: Name or service not known

[water@nixos:~]$ ping 31.15.64.162
PING 31.15.64.162 (31.15.64.162) 56(84) bytes of data.
64 bytes from 31.15.64.162: icmp_seq=1 ttl=53 time=20.7 ms
64 bytes from 31.15.64.162: icmp_seq=2 ttl=53 time=20.6 ms
64 bytes from 31.15.64.162: icmp_seq=3 ttl=53 time=20.8 ms
64 bytes from 31.15.64.162: icmp_seq=4 ttl=53 time=21.2 ms
64 bytes from 31.15.64.162: icmp_seq=5 ttl=53 time=20.4 ms
64 bytes from 31.15.64.162: icmp_seq=6 ttl=53 time=20.5 ms
64 bytes from 31.15.64.162: icmp_seq=7 ttl=53 time=20.6 ms
64 bytes from 31.15.64.162: icmp_seq=8 ttl=53 time=20.7 ms
64 bytes from 31.15.64.162: icmp_seq=9 ttl=53 time=20.7 ms
64 bytes from 31.15.64.162: icmp_seq=10 ttl=53 time=20.9 ms
64 bytes from 31.15.64.162: icmp_seq=11 ttl=53 time=20.6 ms
64 bytes from 31.15.64.162: icmp_seq=12 ttl=53 time=20.6 ms
64 bytes from 31.15.64.162: icmp_seq=13 ttl=53 time=20.5 ms
^C
--- 31.15.64.162 ping statistics ---
13 packets transmitted, 13 received, 0% packet loss, time 12022ms
rtt min/avg/max/mdev = 20.376/20.676/21.150/0.190 ms

Seems like something is footgunning around here, but what exactly ??
I’m not really into networking, so i have no clue where to look at now.

any help appreciated

:slight_smile:

The dig Errormessage ;; Warning: Message parser reports malformed message packet. does not appear as long as ping works.

After reboot it looks ok.

[water@nixos:~]$ dig blog.fefe.de

; <<>> DiG 9.18.24 <<>> blog.fefe.de
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28360
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;blog.fefe.de.			IN	A

;; ANSWER SECTION:
blog.fefe.de.		83318	IN	A	31.15.64.162

;; Query time: 11 msec
;; SERVER: 192.168.2.1#53(192.168.2.1) (UDP)
;; WHEN: Wed Mar 13 14:36:58 CET 2024
;; MSG SIZE  rcvd: 57

But then blog.fefe FAILs in a browser and dig goes

[water@nixos:~]$ dig blog.fefe.de
;; Warning: Message parser reports malformed message packet.

; <<>> DiG 9.18.24 <<>> blog.fefe.de
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51817
;; flags: qr aa rd ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;blog.fefe.de.			IN	A

;; ANSWER SECTION:
.			0	CLASS1232 OPT	10 8 jYkbCemBDVA=

;; ADDITIONAL SECTION:
blog.fefe.de.		83186	IN	A	31.15.64.162

;; Query time: 0 msec
;; SERVER: 192.168.2.1#53(192.168.2.1) (UDP)
;; WHEN: Wed Mar 13 14:39:18 CET 2024
;; MSG SIZE  rcvd: 69

could be something on the router but why do all the other machines have no dns problems?

Any advise or hints?

switched my config so resolv.conf now looks like this

nameserver 1.1.1.1
nameserver 8.8.8.8

skipping dns-server from the router that seemingly produces the malformed message packet’s ← sometimes! And on the nixos machine only.

This way firefox now has stable internet access, but the problem persists in the background. It may reside on this old router (from german Telekomik) but i really would like to know. I checked firmware and settings on it and everything seems perfectly ok…

Hey silent folks,

been a while. I really like nixos and its way of setting up environments. Especially the nix-shell way of running separate and isolated shell environments. But the use of nixos feels quite unstable for me, there are a lot of things i can’t get a grasp on. The problem mentioned above is still not solved (due to no help at all)

For big-skilled system engineers this is probably the perfect OS for administering working environments. But i am just a tiny rust developer coding all day long and i just can not afford that much time in hacking everything together to make it work properly. I’m just not enough into that, sry folks. But i would love to see this evolve and become handier in the future.

As a rust dev i also do frontend stuff using trunk. But trunk on nixos is still unstable. Actually it needs to execute a binary file (dart) to build up sass/scss and binary files can not be executed on nixos … so trunk FAILs :frowning: and got me to the point switching back to an arch distro now. This eats too much time.

Maybe i’ll throw nixos into a VM next time and see how it goes.

Some other issues i encountered so far were:
· the nixos plasma edition crashes while booting from ISO. Some systemd failure (No Screenshot cos i don’t use VM right now) , tried on a ThinkPad E15
· nix-env --list-generations returns nothing after i generated a ton of them. They all appear on boot screen only.
· icons from installed software sometimes are not shown in gnome properly.
· after logging in from suspend, the KEYUP from pressing the Enter-Key after entering the password is somehow discarded, resulting in linefeeds instantly filling my IDE like Enter is still pressed.

all these little little bug pulver everywhere - yeah yeah yeah

but rustup worked fine so far - a big :smiley: for that

i would say cya but not sure if anyone reads this …

so what

1 Like

Hey, I have the exact the same issue as you. It only happens with my home router and with my NixOS devices. Unfortunatly, I don’t know enough of networking to know the issue or to know how to debug this.

I have seen some people reporting similar stuff:

But no solution so far :upside_down_face:

I spent my morning chasing down this exact issue, and I wanted to document the journey and steps to resolve for any future Google/DDG visitors who run into this very strange situation. It’s so undocumented online that I must assume that this is a pretty extreme edge case that is either hardware/driver or local modem related.

The issue: I completed the installation and configuration of a shiny new NixOS server in my lab and went to work on some day job application development for the very first time. In the course of my work I ran into an odd issue where sometimes my curl requests were failing due to a curl (6) Could not resolve host error. Upon a cursory examination I quickly determined that this error would occur even if run in a loop on the same domain, and whether or not the error would trigger was a complete mystery.

I decided to take my application and all of its requisite framework out of the picture, so I started running the curls directly from my shell. I was happy (?) to learn that this wasn’t related to the application itself, but flummoxed that it was happening at all.

This led to me using dig to try and see what the issue was. I discovered that if I ran dig once on a valid domain it always worked on the first run without any issues. If I immediately re-ran on the same domain it would sort of work, but would contain the ;; Warning: Message parser reports malformed message packet. entry line.

Unsatisfied with this level of detail, I ran tcpdump on port 53 to take a look at the actual packets at work, and I discovered that on the first request for a domain my home modem would return a full DNS record with no reported errors, and it resolved just fine.

So first response:
12:36:57.987152 IP 192.168.9.1.53 > 192.168.9.4.39419: 62343 4/0/1 A x.x.x.x, A x.x.x.x, A x.x.x.x, A x.x.x.x (111)

Broken respoonse:
12:37:00.385447 IP 192.168.9.1.53 > 192.168.9.4.54451: 28718- 1/0/1 OPT UDPsize=1232 [COOKIE 907885592f2c819d] (75)

Since I’ve been working on this network and with this hardware for months without issue, this is almost certain to be an issue with this build of systemd-resolved coupled with how my modem’s DNS resolver cache is delivered. Rather than break out the DNS spec and try to compile my own tests for this and waste even more of my workday on this problem, I updated my configuration.nix to use a local resolver cache instead of my modem’s.

The solution:

I updated my configuration.nix from the standard/minimal configuration:

networking.networkmanager = {       
  enable = true;                    
};                                  

It now uses the following:

networking.networkmanager = {                 
  enable = true;
  useDnsmasq = true;
};                                            
networking.resolvconf.useLocalResolver = true;

This completely cuts the modem and its dubious resolver cache out of the picture (assuming the dnsmasq cache has a longer ttl than the modem’s, which is true by default in my case but you might need to extend your dnsmasq cache lifetime if the problem persists after a rebuild+switch) and uses a local cache. I have had no further troubles with dig, curl, or other DNS resolves since I made this update.

1 Like