Occasional DNS problems

Hi ! I love NixOS with all my heart but ever since installing it I experience occasional internet connection problems which I assume are related to DNS. several times a day I lose access to some domains, not all of them, sometimes I can ping google.com but not www.google.com sometimes it is the other way around

❯ ping www.google.com
ping: connect: Network is unreachable

I do not know much about networking but here is what I can tell about my system during these connection loses:

my default gateway is my router

❯ ip r
default via 192.168.1.1 dev wlp3s0 proto dhcp src 192.168.1.16 metric 600 

I can ping my router

❯ ping -c 1 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=5.05 ms

when I try to dig for a domain name, I normally get an ip address from the answers section, but during connection loses this section is empty and I get an answer in an additional section

❯ dig www.google.com
;; Warning: Message parser reports malformed message packet.

; <<>> DiG 9.18.19 <<>> www.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39867
;; flags: qr; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; QUESTION SECTION:
;www.google.com.			IN	A

;; ANSWER SECTION:
.			0	CLASS1232 OPT	10 8 kDobbyoGKuE=

;; ADDITIONAL SECTION:
www.google.com.		218	IN	A	74.125.131.147

;; Query time: 48 msec
;; SERVER: 192.168.1.1#53(192.168.1.1) (UDP)
;; WHEN: Tue Nov 21 18:38:03 MSK 2023
;; MSG SIZE  rcvd: 71

and then I can ping provided ip with no problem

❯ ping -c 1 74.125.131.147
PING 74.125.131.147 (74.125.131.147) 56(84) bytes of data.
64 bytes from 74.125.131.147: icmp_seq=1 ttl=112 time=15.8 ms

here is my resolve.conf

❯ cat /etc/resolv.conf
# Generated by resolvconf
search Home
nameserver 192.168.1.1
nameserver 1.1.1.1
options edns0

any help will be appreciated

Another thing to keep an eye on is the current DNS server reported by resolvectl status and whether you can observe any difference in reliability between the two nameservers:

dig @1.1.1.1 example.com
dig @192.168.1.1 example.com
3 Likes

is looks like a flakey dns server, your local router 192.168.1.1 might be failing in some way.

things to try

log into the local router, and see what it’s dns setting are. Make a note of them, or change them to 1.1.1.1 , or another reliable dns server.

you are on wifi, packet loss caused by low signal or interference can cause temporary name resolution problems. this espically problematic with UDP packets, and only the wifi mac layer will re transmit them, where tcp will keep trying.

you can

try connecting directly to the router via ethernet, and disable wifi… do you still get the same problem?

try

nix-shell -p wavemon

this will give you the information you need to see signal strength and packet loss, etc etc etc.

i’ve seen poor wireless drivers, or access points that don’t adhere to 802.11 standards cause havock with linux wireless drivers… there is a lot to go wrong. If you think you wireless card is faulty or has poor driver support, go and get yourself a linux compatible usb wifi dongle, they are around $5 , and are great to rule out driver problems, or faulty wifi hardware.

good luck!

trying to run resolvectl I get this

❯ resolvectl status
Failed to get global data: Unit dbus-org.freedesktop.resolve1.service not found.

and trying to start the service I get this

❯ sudo systemctl enable systemd-resolved.service
Failed to enable unit: Unit file systemd-resolved.service does not exist.

could this be the source of the problem ?

there definitely is a difference between the two. I could not tell if it is something important

❯ dig @192.168.1.1 github.com
;; Warning: Message parser reports malformed message packet.

; <<>> DiG 9.18.19 <<>> @192.168.1.1 github.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17934
;; flags: qr; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; QUESTION SECTION:
;github.com.			IN	A

;; ANSWER SECTION:
.			0	CLASS1232 OPT	10 8 p0U52hzRGcE=

;; ADDITIONAL SECTION:
github.com.		25	IN	A	140.82.121.4

;; Query time: 91 msec
;; SERVER: 192.168.1.1#53(192.168.1.1) (UDP)
;; WHEN: Wed Nov 22 02:34:47 MSK 2023
;; MSG SIZE  rcvd: 67
❯ dig @1.1.1.1 github.com

; <<>> DiG 9.18.19 <<>> @1.1.1.1 github.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34139
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;github.com.			IN	A

;; ANSWER SECTION:
github.com.		11	IN	A	140.82.121.4

;; Query time: 65 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Wed Nov 22 02:34:53 MSK 2023
;; MSG SIZE  rcvd: 55

after digging into the GUI of my router, I found this

Primary DNS Server: 185.245.187.157
Secondary DNS Server: 185.245.187.190,185.245.187.195

is there a way to try to resolve a domain name without asking my router to do so before changing its settings? is this what dig @<dns_server_ip> <domain_name> supposed to do ?

looking at the charts, I couldn’t see any difference between the time of connection loss and the time when everything is working fine. To be honest, I don’t think it’s a packet loss issue as it would affect all domains, not just some. it is also worth noting that I do not observe similar connection problems on any of my devices connected to the same network

they are assigned by your ISP, and thats what your router is using to resolve things.

Depending the garbage level of your router, you maybe able to override, so all clients getting DHCP from the router, get your DNS settings.

If not you’ll have to override each client.

Try a different router if you want to keep using your ISP’s dns servers…might clear the problem up…
I recommend openwrt or pfsense… I think someone did a nix built version of it.

lots of things to try, i very doubt it the resolver on the nixos clients, but sometimes it can be … especially if the resolver is configured esoterically or a security feature are enabled that the upstream server has trouble with.

DNS is the largest distributed database on the planet… and sometime it’s amazes me it works at all at the scale it is at…

Your ISP’s dns could be overloaded or unhealthy is someway, ISP’s are know to disable dns to combat resource exhaustion or malware attacks on their networks, or block certain dns traffic depending on where you are … i call this DNS shenanigans…

Get yourself a VPN, either make one , or get one… then you can skip your ISP’s infrastructure (apart from shunting packets), most ISP can actually get that one almost right :-).

If you have the privilege to swap providers, and your not subject to a monopoly where you live then see what other options are available. You may find that the problem ‘goes away’.

99% of the clients of the ISP will be using a windows DNS resolver, i am very surprised that MS have not do the triple E treatment on the DNS protocol… maybe they have and i didn’t get the memo! :slight_smile: LOL!

Troubleshooting without access is difficult… , but maybe i can get a tmate session and take look :-).

Intermittent networking problems are difficult to reproduce and troubleshoot … , they can be cause by dynamic changes on the network, load, time of day, the cycles of the sun and the moon, (no kidding).

networking is very difficult, things get real when you step out the safety of your machine… , all developers could do with know a bit more about the network, over just opening and closing a socket.