Why Alerts Matter
Setting up dashboards and logging is great—but it won’t help much at 2am if no one’s watching.
Azure Monitor alerts are your line of defence between quiet systems and production chaos. Combined with Action Groups, they make sure the right people (or scripts) get notified and take action ASAP.
With this post, you’ll learn:
- How to create metric and log-based alerts
- How Action Groups work—and what’s “in an alert”
- Bonus tips for noise reduction and targeting alerts by environment or team
What Are Azure Alerts & Action Groups?
Azure Monitor Alerts let you trigger notifications or automated actions when a signal crosses a threshold or matches a query.
There are three common types:
Type | Triggered By | Example |
---|---|---|
Metric alert | Real-time metrics | CPU > 80% for 5 mins |
Log alert | KQL query results from Log Analytics | HTTP 500 errors > 10 in 5 mins |
Activity log alert | Azure resource event | VM deleted or NSG updated |
Action Groups define what happens—like sending email/SMS, calling a webhook, or triggering automation.
Azure Portal Walkthrough
Step 1: Create an Action Group (Notification Recipient)
- Go to Azure Monitor → Alerts → Action Groups
- Click + Create
- Choose:
- Name & short name (used in alert UI)
- Resource Group & Region
- Under Notifications, pick:
- Email, SMS, Push notification, or ITSM
- Under Actions, choose:
- Webhook, Azure Function, Logic App, Automation Runbook
- Give it a tag like
env = production
orteam = ops
- Click Review + Create
Step 2: Create a Metric Alert Rule
Let’s alert on something common, like high CPU on a VM:
- Azure Monitor → Alerts → + Create → Alert rule
- Under Scope, choose a VM or App Service
- Under Condition, choose a signal (e.g.
Percentage CPU
) - Hit More options:
- Set threshold (e.g. Above 80%)
- Choose aggregation (e.g. average over last 5 mins)
- Under Actions, select the Action Group you created
- Name it clearly (e.g.
vm-cpu-prod-high
) - Review + Create
🔎 Step 3: Create a Log-Based Alert (e.g. HTTP 5xx from App Service)
-
Same flow → but choose Log Analytics Workspace for Scope
-
Condition → Custom log search
-
Example KQL query:
1 2 3 4
AppRequests | where ResultCode startswith '5' | summarize count() by bin(TimeGenerated, 5m) | where count_ > 10
-
Set frequency (5m) and lookback (5m)
-
Attach your Action Group
-
Finish and test
Bicep: Create Alert Rule and Action Group via IaC
Here’s an example setup for a dynamic metric-based alert + action group:
|
|
Architecture Overview
You can also have an action for WebHook which can send alert details (like severity, resource ID, timestamp, description) in a structured JSON payload to a monitoring/incident management system (e.g., PagerDuty, Splunk, ServiceNow, Slack via middleware)
The receiving endpoint can then trigger custom logic such as automatically creating tickets or incidents, kick-off a remediation script, logging or auditing alert events
Gotchas & Guidance
- Alerts on ephemeral resources: Avoid alerting on short-lived services unless scoped by tag or app layer.
- Avoid duplicates: Azure doesn’t deduplicate alert rules automatically. Be strict with naming conventions like
app-env-metric-type
. - Log alerts = delayed reaction (~5m+ latency). Use metric alerts for near real-time scenarios.
- If you’re alerting on App Gateway or Firewall logs? Watch out for high volume, and sample/summarise in KQL.
Best Practices
- Always name alerts like you name functions: clear, consistent, lowercase (
apigw-5xx-prod
,vm-cpu-east1
) - Assign owners via tags (e.g.
team = sre
) - Use severity levels with meaning (e.g. Sev 0 = page ops, Sev 3 = Slack-only)
- Collect and track alerts into a central Log Analytics workspace if doing alert fatigue analysis
- Enable alert suppression and smart groups to reduce noise