it's DNS, stupid!
you’ve done it. the project is complete. you’re ready to update those DNS records to show the world your work.
aside from the recommendations at the end of this article, there is nothing you can do to make the DNS changes happen faster. you’re at the mercy of DNS TTL and DNS caching.
this delay is almost always unacceptable to someone connected to the project. so much so that they embark on a crusade to fix DNS.
the ONLY solution is to wait for records to expire in caches around the globe.
fight!
before you attempt anything, realise what you’re designing against.
- resolvers lie about honouring TTLs.
- public resolvers generally respect them, but enforce minimums and sometimes maximums.
- ISP resolvers are notorious for inflating short TTLs upward to cut their own traffic, so your 60s TTL becomes their 1800s.
- browsers keep their own internal DNS cache.
- OS stub resolvers and connection-pooling apps (the JVM famously caches resolved addresses) all add layers that TTL doesn’t reach.
- TTL is a maximum-staleness promise, not a schedule. propagation delay after a change is somewhere between zero and the TTL, per resolver, which you can’t observe or control.
- negative caching, querying a name before it exists gets the absence cached too.
friction
the friction is almost always about one thing: the duration between when you change a record and when the world actually sees the change.
TTL exists to reduce query load and latency, but it does that by handing out a copy that resolvers are entitled to keep until it expires and that’s the part people have a problem with.
negative, ghost rider. the pattern is full.
negative caching catches most people.
an NXDOMAIN gets cached per the SOA minimum/negative-TTL, so if you query a name before you create it, the “doesn’t exist” answer sticks around.
classic “i added the record but still not found”.
a record created shortly after the JVM (or OS) cached the negative result, your app might ignore the new DNS records until the negative cache expires.
recommendations
so what can you do to make the changes propagate faster on the day? plan.
preemptively drop the TTL on the records you plan to change to 60 seconds a day or two before the migration, do the cutover, then raise it back. avoid using sub-60s TTLs, it multiplies your query volume and latency for little gain.
these could be separate change windows.
and if the real goal is quick failover rather than a one-off migration, don’t use DNS for it at all. that’s a job for a load balancer/anycast/health-check job.