Most routing incidents aren’t caused by bad math or broken protocols.
They’re caused by unclear intent — nobody can confidently answer why traffic takes a particular path right now.
In Azure, routing is no longer a background configuration you “set once and forget”. It’s an evolving control surface, shaped by software, scale events, and failure domains. If you’re still treating routing like static infrastructure, you’re designing blind.
The Mental Model
The common assumption
Routing is:
- A deterministic outcome of prefixes
- Mostly static
- Something you “finish” early in a design
Once the tables look right, you move on.
Why this breaks
Azure networks are living systems:
- Routes appear and disappear dynamically
- Control planes make decisions faster than humans can react
- Multiple teams influence traffic paths, often unintentionally
The result is fragile certainty — everything looks fine until a route is withdrawn, a UDR is reused, or a firewall scales differently than expected.
How It Really Works
Think of Azure routing as two distinct software responsibilities:
1. Reachability (dynamic)
This is decided by:
- System routes
- BGP‑learned routes from ExpressRoute, VPN, or Azure Route Server
Azure Route Server doesn’t “optimise” traffic. It injects and withdraws reachability information into the VNet routing plane, based on what its BGP peers advertise. When a peer stops advertising a prefix, Azure stops believing that path exists.
That behaviour matters during failure — not during steady state.
2. Intent (static, but explicit)
This is enforced using:
- User Defined Routes (UDRs)
- Subnet‑level scoping
- Next‑hop selection
UDRs don’t care why a destination exists. They simply say: if traffic is headed here, it must go that way.
The key insight:
BGP answers “where can I go?”
UDRs answer “where am I allowed to go?”
Architecture Overview
This isn’t about elegance. It’s about predictability under change.
Real‑World Impact
Designing routing this way changes how you operate:
Failure is signalled, not hidden
When an NVA or SD‑WAN appliance stops advertising routes, Azure adapts immediately. There’s no stale static route pretending everything is fine.Blast radius becomes intentional
Small, scoped route tables mean one bad decision doesn’t rewrite half the VNet.Routing changes become reviewable
When UDRs and BGP configuration are defined as artefacts, you can reason about diffs, intent, and rollback — not just outcomes.Incident response gets faster
Engineers debug effective routes, not portal click history.
Implementation Examples
Azure Portal – Where This Actually Bites
Operationally important checks (often missed):
- Is route propagation enabled where you expect dynamic routes?
- Are UDRs unintentionally overriding critical BGP paths?
- Do effective routes differ between subnets that “should” be identical?
Most production routing issues show up here — not during deployment.
Bicep – Making Routing Intent Explicit
| |
The value here isn’t the route itself — it’s that intent is now inspectable:
- Why all egress?
- Why this next hop?
- What breaks if it’s removed?
That conversation is the win.
Gotchas & Edge Cases (Where Designs Usually Fail)
UDRs override BGP — always
If a UDR points to a dead next hop, Azure will happily black‑hole traffic. Dynamic routing will not save you.Asymmetric routing isn’t theoretical
It happens the moment inbound and outbound paths are controlled by different teams or constructs.Shared route tables amplify mistakes
Reuse feels efficient until one “small change” alters traffic for dozens of subnets.Route Server adds complexity you must own
If you don’t actively test route withdrawal scenarios, you’re just hoping it works.
Best Practices
- Use Azure Route Server only when dynamic reachability actually matters
- Keep UDRs small, scoped, and boring
- Test failures by withdrawing BGP routes — deliberately
- Treat effective routes as a first‑class debugging tool
- Document why a routing decision exists, not just the prefix