• /
  • EnglishEspañolFrançais日本語한국어Português
  • EntrarComeçar agora

Level 2 - Alerts, mean time to close scorecard rule

Alerts mean time to close measures how efficiently your team resolves incidents from the time they're opened until they're closed. This metric indicates your team's incident response effectiveness and helps identify areas for improvement in your resolution processes.

About this scorecard rule

This alerts mean time to close rule is part of Level 2 (Proactive) in the business uptime maturity model. It evaluates how quickly your team can diagnose and resolve incidents, reflecting the maturity of your incident management processes.

Why this matters: Faster incident resolution reduces customer impact, minimizes business disruption, and indicates effective monitoring and response procedures. Teams that consistently resolve incidents quickly demonstrate operational excellence.

How this rule works

This rule analyzes the time between when an incident is opened and when it's closed, calculating the mean time to close across all incidents in your account. It measures the efficiency of your incident response and resolution processes.

Understanding your score

  • Pass (Green): Average incident resolution time is 30 minutes or less
  • Fail (Red): Average incident resolution time exceeds 30 minutes
  • Target: Consistent incident resolution within 30 minutes for most alerts

What this means:

  • Passing score: Your team has efficient incident response processes and can quickly diagnose and resolve issues
  • Failing score: Incidents take too long to resolve, potentially indicating process inefficiencies, complex diagnostics, or inadequate tooling

How to improve incident resolution times

If your score shows slow incident resolution, follow these steps to optimize your incident management process:

1. Analyze current incident patterns

  1. Identify slow-resolving incidents: Review which types of incidents consistently take longer than 30 minutes
  2. Examine common causes: Look for patterns in incident types, affected systems, or time of occurrence
  3. Review resolution steps: Document what actions teams typically take to resolve different incident types

2. Optimize alert quality and context

Improve alert information:

  • Add context to alerts: Include relevant metadata, dashboards, and runbook links in alert notifications
  • Use descriptive alert names: Make alert titles clearly indicate the problem and affected system
  • Include baseline comparisons: Show normal vs. current values to help with quick assessment

Enhance alert routing:

  • Send alerts to right teams: Ensure alerts reach the people who can actually resolve the issue
  • Use intelligent routing: Route different alert types to appropriate specialists (database, frontend, infrastructure)
  • Provide escalation paths: Clear procedures for when initial responders can't resolve issues

3. Streamline diagnostic processes

Create effective runbooks:

  • Document common issues: Step-by-step resolution procedures for frequent problems
  • Include troubleshooting steps: Logical diagnostic flows that reduce investigation time
  • Link to relevant tools: Direct access to dashboards, logs, and diagnostic utilities

Improve tooling access:

  • Centralize monitoring data: Ensure responders can quickly access all relevant information
  • Use unified dashboards: Create incident-specific views that show all relevant metrics
  • Automate common checks: Reduce manual diagnostic steps with automated health checks

4. Enhance team response capabilities

Improve team readiness:

  • Cross-train team members: Ensure multiple people can handle different types of incidents
  • Document escalation procedures: Clear paths for when issues require additional expertise
  • Conduct incident response training: Regular practice sessions for common scenarios

Optimize response workflows:

  • Standardize communication: Use consistent channels and formats for incident updates
  • Automate routine responses: Use automation for common resolution steps
  • Track resolution progress: Clear visibility into who's working on what and current status

Measuring improvement

Track these metrics to verify your incident resolution improvements:

  • Mean time to close (MTTC): Target consistent resolution times under 30 minutes
  • Resolution time distribution: Monitor the spread of resolution times to identify outliers
  • First-time resolution rate: Percentage of incidents resolved without reopening
  • Escalation frequency: How often incidents require additional expertise or resources

Common scenarios and solutions

Complex incidents requiring deep investigation:

  • Problem: Some issues inherently require longer diagnostic time
  • Solution: Separate complex incidents into their own category and set different SLA expectations, or implement partial resolution acknowledgments

Incidents during off-hours:

  • Problem: Resolution times are slower when fewer experts are available
  • Solution: Improve on-call procedures, create better escalation paths, or enhance automated diagnostic tools

Repeated similar incidents:

  • Problem: Teams spend time re-solving the same types of problems
  • Solution: Invest in permanent fixes for recurring issues, create automated resolution scripts, or improve monitoring to catch root causes

Poor alert context:

  • Problem: Teams spend too much time understanding what's actually wrong
  • Solution: Enhance alert descriptions, include relevant dashboards, and provide direct links to affected systems

Understanding the 30-minute target

The 30-minute target represents a balance between thorough investigation and rapid response:

Why 30 minutes:

  • Customer impact: Most customers notice service degradation within this timeframe
  • Business impact: Longer incidents typically have exponentially higher business costs
  • Team efficiency: Indicates well-tuned processes and adequate preparation

When to adjust the target:

  • Lower target (15-20 minutes): High-availability services with strict SLAs
  • Higher target (45-60 minutes): Complex systems requiring deep investigation
  • Different targets by severity: Critical incidents need faster resolution than warnings

Advanced optimization strategies

Incident categorization

Categorize by resolution complexity:

  • Quick fixes: Simple restart or configuration changes (target: under 10 minutes)
  • Standard diagnostics: Typical troubleshooting procedures (target: 15-30 minutes)
  • Complex investigations: Deep technical analysis required (target: 45-60 minutes)

Automation opportunities

Automate routine responses:

  • Self-healing systems: Automatic restart or failover for common issues
  • Diagnostic automation: Automatic collection of relevant logs and metrics
  • Communication automation: Automatic status updates for stakeholders

Process optimization

Implement incident commanders:

  • Dedicated coordinators: Assign specific people to manage incident workflow
  • Clear communication: Single point of contact for updates and decisions
  • Resource allocation: Ensure right people are working on right problems

Important considerations

  • Balance speed with accuracy: Don't sacrifice proper investigation for faster closure times
  • Consider incident severity: Different types of incidents may require different resolution time targets
  • Account for business context: Weekend incidents may have different urgency than weekday issues
  • Measure meaningful closure: Ensure incidents are actually resolved, not just closed

Next steps

  1. Immediate action: Analyze your current slowest-resolving incident types and implement quick wins
  2. Process improvement: Develop standardized incident response procedures and runbooks
  3. Tool enhancement: Improve alert context and diagnostic tool access
  4. Team development: Invest in training and cross-functional incident response capabilities
  5. Advance to Level 3: Once incident response is optimized, focus on service level attainment

For comprehensive guidance on incident management optimization, see our Alert Quality Management implementation guide.

Copyright © 2025 New Relic Inc.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.