Routing as Code with Azure Route Server and Policy‑Based Routing

Treating routing decisions as software artefacts — and owning the operational consequences

Most routing incidents aren’t caused by bad math or broken protocols.
They’re caused by unclear intent — nobody can confidently answer why traffic takes a particular path right now.

In Azure, routing is no longer a background configuration you “set once and forget”. It’s an evolving control surface, shaped by software, scale events, and failure domains. If you’re still treating routing like static infrastructure, you’re designing blind.

The Mental Model

The common assumption

Routing is:

  • A deterministic outcome of prefixes
  • Mostly static
  • Something you “finish” early in a design

Once the tables look right, you move on.

Why this breaks

Azure networks are living systems:

  • Routes appear and disappear dynamically
  • Control planes make decisions faster than humans can react
  • Multiple teams influence traffic paths, often unintentionally

The result is fragile certainty — everything looks fine until a route is withdrawn, a UDR is reused, or a firewall scales differently than expected.

How It Really Works

Think of Azure routing as two distinct software responsibilities:

1. Reachability (dynamic)

This is decided by:

  • System routes
  • BGP‑learned routes from ExpressRoute, VPN, or Azure Route Server

Azure Route Server doesn’t “optimise” traffic. It injects and withdraws reachability information into the VNet routing plane, based on what its BGP peers advertise. When a peer stops advertising a prefix, Azure stops believing that path exists.

That behaviour matters during failure — not during steady state.

2. Intent (static, but explicit)

This is enforced using:

  • User Defined Routes (UDRs)
  • Subnet‑level scoping
  • Next‑hop selection

UDRs don’t care why a destination exists. They simply say: if traffic is headed here, it must go that way.

The key insight:
BGP answers “where can I go?”
UDRs answer “where am I allowed to go?”

Architecture Overview

flowchart LR AppSubnet[App Subnet] Firewall[NVA / Firewall] RouteServer[Azure Route Server] ER[ExpressRoute / VPN] Internet[Internet] AppSubnet -->|UDR: default route| Firewall Firewall --> RouteServer RouteServer -->|BGP learned prefixes| Firewall Firewall --> ER Firewall --> Internet

This isn’t about elegance. It’s about predictability under change.

Real‑World Impact

Designing routing this way changes how you operate:

  • Failure is signalled, not hidden
    When an NVA or SD‑WAN appliance stops advertising routes, Azure adapts immediately. There’s no stale static route pretending everything is fine.

  • Blast radius becomes intentional
    Small, scoped route tables mean one bad decision doesn’t rewrite half the VNet.

  • Routing changes become reviewable
    When UDRs and BGP configuration are defined as artefacts, you can reason about diffs, intent, and rollback — not just outcomes.

  • Incident response gets faster
    Engineers debug effective routes, not portal click history.

Implementation Examples

Azure Portal – Where This Actually Bites

Operationally important checks (often missed):

  • Is route propagation enabled where you expect dynamic routes?
  • Are UDRs unintentionally overriding critical BGP paths?
  • Do effective routes differ between subnets that “should” be identical?

Most production routing issues show up here — not during deployment.

Bicep – Making Routing Intent Explicit

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
resource routeTable 'Microsoft.Network/routeTables@2022-09-01' = {
  name: 'rt-app-egress'
  location: resourceGroup().location
  properties: {
    disableBgpRoutePropagation: false
    routes: [
      {
        name: 'force-egress-firewall'
        properties: {
          addressPrefix: '0.0.0.0/0'
          nextHopType: 'VirtualAppliance'
          nextHopIpAddress: '10.10.0.4'
        }
      }
    ]
  }
}

The value here isn’t the route itself — it’s that intent is now inspectable:

  • Why all egress?
  • Why this next hop?
  • What breaks if it’s removed?

That conversation is the win.

Gotchas & Edge Cases (Where Designs Usually Fail)

  • UDRs override BGP — always
    If a UDR points to a dead next hop, Azure will happily black‑hole traffic. Dynamic routing will not save you.

  • Asymmetric routing isn’t theoretical
    It happens the moment inbound and outbound paths are controlled by different teams or constructs.

  • Shared route tables amplify mistakes
    Reuse feels efficient until one “small change” alters traffic for dozens of subnets.

  • Route Server adds complexity you must own
    If you don’t actively test route withdrawal scenarios, you’re just hoping it works.

Best Practices

  • Use Azure Route Server only when dynamic reachability actually matters
  • Keep UDRs small, scoped, and boring
  • Test failures by withdrawing BGP routes — deliberately
  • Treat effective routes as a first‑class debugging tool
  • Document why a routing decision exists, not just the prefix
🍺
Brewed Insight: If your routing design only works when nothing changes, it’s not a design — it’s a coincidence.

Learn More