To provoke this bug, at least the following must be true.
A reply must be a CNAME.
The target of the CNAME must not exist for the queried RR (a No Data reply).
The No Data reply must include an SOA record in the NS section.
The query must take place over TCP.
The result is the caching of a CNAME whose target is the
name of the SOA record, instead of the No data record.
If there is a RR at this target, then a subsequent query for that
RRtype will get a qrong reply.
Thanks to the testers extraordinaire at Pi-Hole for spotting this
and providing enough information to chase it down.
If DNS is happening over TCP, the query is handled by a forked
process. Of ipset ot nftset is configured, this might include
inserting addresses in the *sets. Before this update, that
was done by the forked process using handles inherited from the
parent "master" process.
This is inherently racy. If the master process or another
child process tries to do updates at the same time, the
updates can clash and fail.
To see this, you need a busy server doing lots of DNS
queries over TCP, and ipset or nftset configured.
Going forward, we use the already established pipe to send the
updates from the child back to the master process, which
serialises them.
Signed-off-by: Rob Gill <rrobgill@protonmail.com>
At the moment if a misformatted query is reported by the upstream server
it is not clear from the log.
Other error codes from RFC1035 (server failure, not implemented,
refused) are logged with text, but format error is logged merely as "1".
Such that an upstream reporting a format error is presently logged as eg:
Apr 20 12:01:55 dnsmasq[3023]: reply error is 1
After this patch they are logged informatively, eg:
Apr 20 12:48:40 dnsmasq[3023]: reply error is FORMERR
This is a two line fix, FORMERR is already defined in dns-protocol.h.
This should just work in all cases now. If the normal chain-of-trust exists into
the delegated domain then whether the domain is signed or not, DNSSEC
validation will function normally. In the case the delgated domain
is an "overlay" on top of the global DNS and no NS and/or DS records
exist connecting it to the global dns, then if the domain is
unsigned the situation will be handled by synthesising a
proof-of-non-existance-of-DS for the domain and queries will be
answered unvalidated; this action will be logged. A signed domain
without chain-of-trust can be validated if a suitable trust-anchor
is provided using --trust-anchor.
Thanks to Uwe Kleine-König for prompting this change, and contributing
valuable insights into what could be improved.
We can't answer and shouldn't forward non-QUERY DNS requests.
This patch fixes handling such requests from TCP connections; before
the connection would be closed without reply.
It also changes the RCODE in the answer from REFUSED to NOTIMP and
provides clearer logging.
Timeouts for TCP connections to non-responive servers are very long.
This in not appropriate for DNS connections.
Set timeouts for connection setup, sending data and recieving data.
The timeouts for connection setup and sending data are set at 5 seconds.
For recieving the reply this is doubled, to take into account the
time for usptream to actually get the answer.
Thanks to Petr Menšík for pointing out this problem, and finding a better
and more portable solution than the one in place heretofore.
A relatively common situation is that the reply to a downstream query
will fit in a UDP packet when no DNSSEC RRs are present, but overflows
when the RRSIGS, NSEC ect are added. This extends the automatic
move from UDP to TCP to downstream queries which get truncated replies,
in the hope that once stripped of the DNSSEC RRs, the reply can be returned
via UDP, nwithout making the downstream retry with TCP.
If the downstream sets the DO bit, (ie it wants the DNSSEC RRs, then
this path is not taken, since the downstream will have to get a truncated
repsonse and retry to get a correct answer.
Heretofore, when a validating the result of an external query triggers
a DNSKEY or DS query and the result of that query is truncated, dnsmasq
has forced the whole validation process to move to TCP by returning a
truncated reply to the original requestor. This forces the original
requestor to retry the query in TCP mode, and the DNSSEC subqueries
also get made via TCP and everything works.
Note that in general the actual answer being validated is not large
enough to trigger truncation, and there's no reason not to return that
answer via UDP if we can validate it successfully. It follows that
a substandard client which can't do TCP queries will still work if the
answer could be returned via UDP, but fails if it gets an artifically
truncated answer and cannot move to TCP.
This patch teaches dnsmasq to move to TCP for DNSSEC queries when
validating UDP answers. That makes the substandard clients mentioned
above work, and saves a round trip even for clients that can do TCP.
Add the relevant information to the metrics and to the output of
dump_cache() (which is called when dnsmasq receives SIGUSR1).
Hence, users not collecting metrics will still be able to
troubleshoot with SIGUSR1. In addition to the current usage,
dump_cache() contains the information on the highest usage
since it was last called.
A NXDOMAIN answer recieved over TCP by a child process would
be correctly sent back to the master process which would then
fail to insert it into the cache.
the compiler padding it with an extra 8 bytes.
Use the F_KEYTAG flag in a a cache record to discriminate between
an arbitrary RR stored entirely in the addr union and one
which has a point to block storage.
If there are multiple cache records with the same name but different
F_REVERSE and/or F_IMMORTAL flags, the code added in fe9a134b could
concievable break the REVERSE-FORWARD-IMMORTAL order invariant.
Reproducing this is damn near impossible, but it is responsible
for rare and otherwise inexplicable reversion between 2.87 and 2.88
which manifests itself as a cache internal error. All observed
cases have depended on DNSSEC being enabled, but the bug could in
theory manifest itself without DNSSEC
Thanks to Timo van Roermund for reporting the bug and huge
efforts to isolate it.