Segmentation vs Reality: Why East

Segmentation looks solid right up until something breaks.

Not a pentest.
Not a red‑team exercise.
A real outage, at 2am, with customers waiting.

That’s when the boundaries you designed start bending and some of them never snap back. East–west risk persists not because teams ignore segmentation, but because operations reward connectivity long after intent is forgotten.

The Mental Model

Common assumption:
“If environments and tiers are segmented, lateral movement is constrained.”

Why it breaks:
Segmentation is usually designed as a static layout, while real systems are dynamic under pressure. Once compromise is assumed, the question isn’t whether boundaries exist, it’s whether they survive shared services and incident response.

Most don’t.

How It Really Works

In real Azure estates, segmentation fails less often because of missing controls and more often because of where trust accumulates.

Two forces dominate:

Shared services that span environments by design
Operational exceptions made under urgency and never undone

Neither feels risky in isolation. Together, they flatten the estate.

Where teams think enforcement lives

Boundaries are often assumed to be enforced by:

VNets and peering layouts
Subscription separation
“This is prod, that isn’t”

But enforcement actually emerges from who can reach what during normal operations especially automation and responders. That reach expands steadily unless actively constrained.

Real‑World Impact

Example 1: The Shared Build Agent That Becomes the Bridge

A central CI/CD build agent subnet deploys to multiple application VNets across prod and non‑prod. It needs access to:

Source artefacts
Deployment endpoints
Management APIs

From an ops perspective, this is sensible. From a compromise perspective, it’s decisive.

If a single build agent VM or pipeline identity is compromised, segmentation between environments becomes largely theoretical. East–west reach isn’t gained through clever movement it’s inherited.

This is segmentation collapsing by design convenience, not misconfiguration.

Example 2: Incident Response Exceptions That Never Roll Back

During a production incident, teams temporarily allow additional east–west access to diagnose or stabilise the system. The change is:

Time‑boxed in intent
Permanent in practice

No one updates the diagram. No one re‑validates the boundary. The estate now contains a path that only exists because things once went wrong.

Multiply this across incidents, migrations, and staff turnover, and segmentation slowly becomes an archival concept.

Implementation Examples

This isn’t about adding more rules. It’s about making segmentation intent explicit and reviewable, especially around shared services.

Encoding segmentation intent for shared services

Below is a deliberately lightweight Bicep example that tags a shared service VNet with declared blast radius. This does not enforce isolation it makes erosion visible.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
resource sharedServicesVnet 'Microsoft.Network/virtualNetworks@2023-11-01' = {
  name: 'vnet-shared-services'
  location: resourceGroup().location
  tags: {
    role: 'shared-services'
    permittedEnvironments: 'prod,nonprod'
    eastWestRisk: 'high'
    owner: 'platform-team'
  }
  properties: {
    addressSpace: {
      addressPrefixes: [
        '10.10.0.0/16'
      ]
    }
  }
}

Why this matters operationally:

Shared services are explicitly acknowledged as risk concentrators
Platform teams can be challenged on blast radius during reviews
Drift between “should reach” and “does reach” becomes auditable

Segmentation that can’t be reasoned about programmatically won’t survive growth.

What east–west reality often looks like

flowchart LR BuildAgent[Shared Build Agent] ProdApp[Prod App Tier] NonProdApp[Non‑Prod App Tier] IncidentAccess[Temporary Incident Access] BuildAgent --> ProdApp BuildAgent --> NonProdApp IncidentAccess --> ProdApp

None of these paths are accidental. That’s the point.

Gotchas & Edge Cases

Shared services are rarely treated as first‑class risk domains
They’re optimised for reliability and speed, not containment.
Incident changes bypass architecture review by necessity
If rollback isn’t operationally enforced, it won’t happen.
Segmentation reviews usually happen pre‑deployment
East–west risk emerges post‑incident and post‑migration.

Best Practices

Treat shared services as deliberate blast‑radius expanders, not neutral infrastructure.
Assume any access granted during incidents will persist unless actively removed.
Regularly review segmentation starting from shared services outward, not from app tiers inward.
Prefer fewer boundaries that survive incidents over many that don’t.

This is not about perfection, it’s about survivability under stress.

🍺

Brewed Insight: Segmentation rarely fails at design time.
It fails when convenience and urgency quietly outvote intent and no one is accountable for restoring the boundary.