Some of our Openstack users run quite large number of dnsmasq instances
on single host. They started hitting default limit of inotify socket
number on single system after upgrade to more recent version. System
defaults of sysctl fs.inotify.max_user_instances is 128. They reached
limit of 116 dnsmasq instances, then more instances failed to start.
I was surprised they have any use case for such high number of
instances. They use one dnsmasq for one virtual network.
I found simple way to avoid hitting low system limit. They do not use
resolv.conf for name server configuration or any dhcp hosts or options
directory. Created inotify socket is never used in that case. Simple
patch attached.
I know we can raise inotify system limit. I think better is to not waste
resources that are left unused.
The current logic is naive in the case that there is more than
one RRset in an answer (Typically, when a non-CNAME query is answered
by one or more CNAME RRs, and then then an answer RRset.)
If all the RRsets validate, then they are cached and marked as validated,
but if any RRset doesn't validate, then the AD flag is not set (good) and
ALL the RRsets are cached marked as not validated.
This breaks when, eg, the answer contains a validated CNAME, pointing
to a non-validated answer. A subsequent query for the CNAME without do
will get an answer with the AD flag wrongly reset, and worse, the same
query with do will get a cached answer without RRSIGS, rather than
being forwarded.
The code now records the validation of individual RRsets and that
is used to correctly set the "validated" bits in the cache entries.
Adds option to delay replying to DHCP packets by one or more seconds.
This provides a workaround for a PXE boot firmware implementation
that has a bug causing it to fail if it receives a (proxy) DHCP
reply instantly.
On Linux it looks up the exact receive time of the UDP packet with
the SIOCGSTAMP ioctl to prevent multiple delays if multiple packets
come in around the same time.
The nasty code with static variable in retry_send() which
avoids looping forever needs to be called on success of the syscall,
to reset the static variable.