The role of an incident commander in IT
Incident commander - why you need one, their responsibilities, and skill sets
Mar 25, 202413 MINS READ
What is an incident commander?
The incident commander (IC) is the person responsible for every phase of an emergency response; including rapidly creating incident objectives, managing every part of incident operations, application of resources as well as responsibility for all parties involved.
For most incidents, a single incident commander performs the command activity. The incident commander is chosen based on their qualifications and experience. The Incident Commander may have a deputy, who can be selected from the same agency or from a supporting organisation.
The incident commander manages a critical incident by
Identifying the objectives of the incident response.
Devising a command plan to address the problem.
Monitoring the situation and adjusting as needed.
An incident commander can take many forms consisting of
De facto IC - Your org may already have a De facto incident commander in the ranks. This person often lives in support, operations, or an engineering role and is equipped with greater-than-average customer empathy.
Volunteer – An important factor in a volunteer incident commander model is ensuring volunteers are appropriately recognized, valued, and potentially even incentivized.
Dedicated – An existing engineering manager/director/support Manager who is dedicated to effectively managing incidents as they occur.
Incident commanders in IT service management are responsible for directing the response to incidents. They lead the Incident Response Team, supervise incident teams, and coordinate with other affected organizations. This includes interacting with stakeholders, collecting system information and data related to the incident, attributing tasks and roles to team members, managing communication between teams and other affected organizations (e.g., service providers), monitoring the headway of the response, and ensuring that the resolution process is conducted efficiently.
What type of teams need an incident commander?
First, let us examine applications in the public sector. The most visible Incident Commanders are the point person in cases that require local, regional, or national incident response, including police, fire, or the Federal Emergency Management Agency (FEMA) in national disaster scenarios. The objective of the Incident Commander is to ensure that every incident is resolved as quickly as possible and with minimal impact on operations.
In these situations, Incident Management Teams (IMTs) are rostered groups of ICS-qualified personnel consisting of an Incident Commander, additional incident leadership required, and personnel qualified for vital ICS positions.
The 5 levels of IMTs
The five levels of IMTs are:
Type 5: Local Village and Township Level – This is a solo-discipline team for initial action and small incidents.
Type 4: City, County, or Fire District Level - Single and/or multi-agency team for expanded incident.
Type 3: State, Territory, Tribal, or Metropolitan Area Level - Multiagency/multijurisdictional team for extended incidents and multiple operations.
Type 2: National and State Level - National or State team for incidents of regional impact.
Type 1: National and State Level - National or State team for incidents of national importance.
In each case, team members are certified as possessing the required training and experience to fill IMT positions.
As defined by FEMA, an incident is “an occurrence or event, natural or human-caused, that requires an emergency response to protect life or property.”
The U.S. Fire Administration categorizes five types of incidents requiring an Incident Commander:
Type 5 - The incident can be managed with one or two single resources with less than six personnel. The incident is contained within the first operational period and often within an hour to a few hours after resources arrive on the scene. This type includes a vehicle fire, an injured person, or a police traffic stop.
Type 4 - Several resources are required to contain the incident, including a Strike Team or Task Force.
Type 3 - A Type 3 Incident Management Team (IMT) or incident command organization manages initial action incidents with a sizable number of resources. An extended attack incident is required until containment/control is achieved or until an expanding incident transitions to a Type 1 or 2 IMT. The incident may stretch out into multiple operational periods.
Type 2 - This type of incident extends beyond the capabilities of local control and is expected to go into multiple operational periods. The incident may require the response of resources from the area, including regional and/or national resources, to effectively manage the operations, command, and general staffing. Operations personnel normally do not exceed two hundred per operational period, and the total number of incident personnel does not exceed five hundred.
Type 1 - This type of incident is the most complicated, requiring national resources to cope and operate safely and effectively. The agency administrator will hold briefings as the complexity analysis and delegation of authority are updated. Use of resource advisors at the incident base is recommended. There is a profound impact on the local jurisdiction, necessitating additional staff for office administrative and support functions.
Examples of incidents
Business incidents like cyber-attacks, data breaches, or labor strikes usually do not concern physical safety. Still, they also require immediate action because they can impact your company’s bottom line. In these incidents, an IT or DevOps team member is usually named Incident Commander and defines and forms the incident action plan and leads the decision-making process when business crises arise.
A major incident is a high-impact, pressing issue that usually affects the entire organization or a significant part. A major incident almost always results in a businesses’ services becoming unavailable, which results in the organization's business to take a blow and ultimately impacts its financial results.
The stakes of a major incident are greater than ever before. As reported in a study conducted by Information Technology Intelligence Consulting, 98 percent of organizations lose more than $100,000 from an hour of downtime caused by a major incident.
The incident command system (ICS) is a standard organizational configuration many companies follow when they respond to a business crisis event quickly to ensure they have the support needed to manage every aspect of an incident. Like any company project, an incident entails planning, logistics, and a flawless operational process to control. The disparity with an incident is that the stakes are higher, and the timeline is shorter.
The stages of the incident management process
The major incident management process predominantly consists of the following stages:
Stage 1: identification - Major incidents can be flagged by IT staff when they come across unusual tickets, or they can be detected by solutions like system monitoring tools that can automatically indicate a network issue and create a ticket to alert the service desk.
Stage 2: containment – Assembling the incident team, scheduling, and managing meetings to communicate, and identifying underlying issues.
Stage 3: resolution – Planning and executing activities to fix the problem.
Stage 4: maintenance – Performing a post-incident review, producing clear documentation, and applying measuring metrics to assess downtime and effective resolution over time.
A good example of a major IT incident is the 2019 Cloudflare outage.
In this incident, during a standard operating procedure of updating a managed rule for the web application firewall (WAF), a spike occurred in the usage of CPUs dedicated to serving HTTP/HTTPS traffic to nearly 100 percent across the servers in Cloudflare's network. The outage that ensued resulted in a reduction of 80 percent of Cloudflare's traffic and affected millions of internet users around the world.
The outage caused Cloudflare customers (and their customers) to encounter a 502 error page when visiting any Cloudflare domain. It was likely that up to half of the internet was inaccessible for the twenty-seven minutes of interruption. All Cloudflare websites were not accessible, triggering service disruptions for thousands of businesses and millions of users.
The WAF-managed rule was implemented; three minutes after Cloudflare's network operation tools started flagging the drop in traffic. The site reliability engineering team, London engineering team, and other relevant teams were brought together to troubleshoot and produce a fix and, within twenty-eight minutes identified WAF as the cause of the incident and 7 minutes later, a global WAF kill was implemented to return traffic levels back to normal. Within a total of 80 minutes, Cloudflare was 100-percent assured that it understood the cause of the outage and had a fix in place, so the WAF was re-enabled globally.
Why do teams need an incident commander?
Your chosen incident commander is the chief point of contact and source of truth about your incident. The IC sees the big picture, manages all the moving pieces, knows what actions have been taken before in analogous situations, and leads the planning and management of the next steps.
An incident commander facilitates communication and teamwork during and after a business emergency. It is easy for teams to do duplicate work without realizing it, overlook big-picture concerns, and fail to communicate swiftly and efficiently with system users, internal stakeholders, leadership, and each team member. The larger and more complex a business’s technology or team structures are, the more critical this position is to a healthy incident management practice.
Responsibilities of an incident commander
Incident Commanders, especially ones that are dedicated to resolving an incident, can also take on a variety of other supporting tasks, including but not limited to:
Stay updated on the industry - Incident commanders should regularly update themselves on the activities in the industry, such as the major incidents and responses. This will ensure they can guide the incident command team members and coordinate effectively by following the updated best practices.
Know the organization well
The IT incident commander is expected to know the incident command team's responsibilities, functions, and dependencies and understand how they work in their entirety. It would be ideal to keep all the documentation easily accessible. They should also have a thorough comprehension of the organization's incident life cycle.
Plan before incidents occur
It is crucial to be prepared with a strategic plan for potential incidents even before they happen. The better organized and documented your process is, the easier it will be for the incident commander and the incident command team members involved to follow it as the most intense, high-stress incidents occur.
Focus on the task at hand
Incident commanders are experts at focusing on the right information. They delegate the troubleshooting to the incident command team members and instead concentrate on managing the process. They know the concentration should be on asking the right questions rather than just offering answers. Do you need to loop in another team? Does the most recent hire understand the incident enough to help? Or do we already have an industry expert to advise and investigate this? When the right questions are posed, the solutions will start flowing.
Conduct post-mortems
Post-incident analysis is a great way to identify the holes in the incident command response team's expertise and work. After resolution, the incident manager must perform a postmortem analysis to identify how the team can tweak their incident management process in the long run. The postmortem encompasses:
Documenting and recording of incidents.
Facilitating post-incident review and incident analysis.
Collecting and aggregating data and providing other relevant incident metrics.
Sharing knowledge gained from the incident.
Directing incident process improvement steps across all responder organizations
What skills are needed for an incident commander?
The skills required to be an effective Incident Commander include
Problem-solving: As an Incident Commander, you must be an analytical person. They also need to be able to recognize and solve complex problems in high-pressure situations. The IC must also think critically and imaginatively outside the box to discover practical solutions. Can handle the entire incident or delegate tasks in a real-time response plan.
Communication: Effective communication is critical when it comes to responding to an incident. You should be an exceptional communicator to deliver your ideas plainly and succinctly to your team.
Decision making: An Incident Commander makes serious decisions that could impact the safety and well-being of others. They consider the pros and cons of different alternatives and make poised decisions.
Proactive listening: It Is not just about speaking; ICs are also fantastic listeners. They solicit and understand different perspectives to apply collected information and make better decisions.
Adaptability: No two incidents are alike. As an Incident Commander, you must adjust to changing conditions quickly and effectively. Like the post in an emergency operations center, you’ll be the “router” of all problems to general staff or IT.
Leadership: As an Incident Commander, you must lead a team of responders. You must inspire and stimulate the team while providing clear advice and direction to demonstrate your leadership.
Time management: Time is at the core when managing an incident. That is why ICs prioritize tasks and maximize every minute of the response.
Previous experience with similar incidents: While it Is not strictly necessary, having experience managing similar incidents is a substantial advantage. It will help the IC anticipate potential problems and arrive at effective solutions more quickly.
Tips and best practices for an incident commander
A few tips and best practices advice for an effective Incident Commander comprises:
Stay updated on the industry - Incident commanders should and continuously update themselves on the occurrences in the industry, such as the major incidents and responses. This will ensure they are able to guide the incident command team members and coordinate effectively by following the latest best practices.
Know the organization well - The IT incident commander is expected to know the incident command team's responsibilities, and functions, and needs to understand how they work as a whole. It would be ideal to keep all the documentation easily accessible. They should also thoroughly comprehend the organization's incident life cycle.
Plan before incidents occur - It is crucial to have a strategic plan for potential incidents even before they happen. The better organized and documented your process is, the easier it will be for the incident commander and the incident command team members involved to follow it when the most intense, high-stress incidents occur.
Focus on the task at hand - Incident commanders are experts at focusing on the right information. They leave the troubleshooting to the incident command team members and instead concentrate on managing the process. They know the focus should be on asking the right questions rather than just offering answers. When do you need to loop in with another team?
Conduct post-mortems - Post-incident analysis is an impressive way to identify the gaps in the incident command response team's knowledge and work. After resolution, the incident manager must perform a postmortem to identify how the team can improve their incident management resolution process in the long term.
Twelve best practices for the role are:
Establish command and control
Organize and prioritize resources
Make decisions based on goals
Keep stakeholders informed
Acquire insight into the impact of an incident
Lead with confidence and clarity
Develop incident plans in advance
Maintain communication channels
Utilize automation tools to streamline processes
Monitor resolution progress accurately
Follow up after incident resolution
Document what you have learned and review recommendations
The name “incident commander” can come from first responders who are acting as liaison officer. The role evolves and changes when thinking about IT. The incident commander is crucial for managing IT incidents effectively. They lead the charge in quickly resolving issues, keeping the impact on business operations minimal. With a mix of leadership, quick thinking, and clear communication, they guide their teams from the moment a problem is identified until everything is back, to normal.
Throughout this article, we've seen what it takes to be an incident commander, their challenges, and the essential skills needed. In today's fast-paced digital world, where downtime can significantly affect a company's bottom line, having a skilled incident commander and a solid incident management plan is essential for any business. It's all about being ready to act fast and efficiently to keep services running smoothly for customers.
The need for incident management tools
An incident management tool for incident commanders (or incident manager) in IT is essential for several reasons:
Centralized incident handling: It provides a centralized platform for managing and coordinating responses to IT incidents. This ensures that all relevant information, communication, and actions are consolidated in one place, streamlining the response process.
Real-time communication: It facilitates real-time communication among incident response teams, allowing incident commanders to quickly disseminate information, assign tasks, and coordinate efforts to resolve the incident efficiently.
Resource allocation: The tool enables incident commanders to allocate resources effectively by providing visibility into team availability, skill sets, and workload. This ensures that the right personnel are assigned to tasks based on their expertise and availability.
Incident tracking and documentation: It allows incident commanders to track the status of incidents, monitor progress, and document all actions taken during the response process. This documentation is crucial for post-incident analysis, compliance purposes, and improving future incident response procedures.
Incident prioritization: The tool helps incident commanders prioritize incidents based on their severity, and impact on business operations, and other factors. This ensures that critical incidents receive immediate attention and that resources are allocated accordingly.
Automation and orchestration: Some incident management tools offer automation and orchestration capabilities, allowing incident commanders to automate routine tasks, trigger predefined response actions, and orchestrate complex workflows. This accelerates the incident resolution process and reduces manual effort.
Overall, an incident management tool empowers incident commanders to effectively lead and orchestrate the response to IT incidents, ensuring timely resolution, minimal disruption to business operations, and continuous improvement of incident response capabilities.