Guide to IT Service Management Software

Understanding the Core Purpose of ITSM Processes

IT Service Management Software
ITSM Governance
ITSM Change Management Best Practices
ITSM Processes
ITSM Knowledge Management
ITSM Incident Management
Problem Management
Service Catalog

The modern definition of IT service management (ITSM) is fundamentally treating every function and asset of IT as a unified service delivered to the customer. ITSM processes are the operational building blocks of this service-first approach, working together to enable IT teams to manage services. Each ITSM process serves a distinct purpose, but they work best when connected.

While this interconnected structure is the ideal, achieving it requires strict governance and a deep understanding of how information must flow among core components. Many organizations try to solve this with technology, but a tool can only follow rules and can’t apply judgment.

This article explains what the core ITSM processes do, where organizations commonly struggle, and best practices recommended by the ITIL framework.

Summary of key ITSM processes and their purpose

Process	Purpose
Incident management	Restore normal service operation as quickly as possible and minimize adverse impacts on business operations.
Problem management	Identify and resolve the root causes of incidents and prevent their recurrence.
Change management	Ensure that IT changes are controlled and implemented with minimal disruption to services.
Service request management	Efficient and standardized fulfillment of routine service requests from users.
Knowledge management	Effective capture, storage, and sharing of information to improve efficiency and support informed decision-making.

Incident management

Incident management is the most visible ITSM process because it directly affects how users experience IT. ITIL 4 defines an incident as an unplanned interruption to a service or a reduction in the quality of a service. It is important to note that incident management is only meant to restore normal service operation as quickly as possible, not investigate why it happened.

A typical incident management lifecycle

Many mature organizations utilize the incident management practice as a governance checkpoint. In other words, every incident is considered a data point. Patterns in incident volume, category, resolution time, and affected services reveal systemic issues. Analyzed collectively, these trends feed insights across ITSM practices.

Incidents are typically categorized into three types.

Normal incidents have defined impact and urgency levels, typically ranging from P1 (most critical) to P4 (least critical). Each organization defines its own priority matrix based on business needs. These form the bulk of daily service desk work.
Major incidents are reserved for business-critical disruptions, such as system-wide outages, service failures affecting large user populations, or events that threaten core operations. They follow a separate procedure, with senior-level involvement, dedicated communication channels, and accelerated escalation paths.
Security incidents involve events that compromise or threaten the confidentiality, integrity, or availability of information or systems. These incidents often require coordination with specialized teams and follow regulatory or compliance-driven response procedures.

Notably, the categorization and prioritization of incidents serve different purposes but both affect resolution speed. While categorization routes tickets to the right team, prioritization orders them by urgency. Both have an impact on the user perception of the quality of the service, so pay close attention when defining them. Your target here should be to have a first-time-right, every time model. In an ideal world, you would like to see zero recategorizations and changes in priority.

What can go wrong?

A lot of organizations struggle with incidents that get logged incorrectly, miscategorized, or confused with service requests. As a result, you will see reports that don’t make sense, misrouted tickets, and SLA breaches that erode trust. Automated categorization and triage can address this at the point of logging.

Viewing and applying ticket predictions with Freddy AI

The next chokepoint in the incident workflow is usually related to the quality of the documentation available to service desk agents. Without quality work procedures, these agents deflect tickets to L2 rather than resolving them directly, effectively turning the service desk into a “catch and dispatch” team.

Bear in mind that documentation can’t just be a set of knowledge base articles. Agents also need access to requester information, mapped assets, ticket history, and data from other systems.

Recommendations

Start by training your service desk using real-life examples. Build a clear distinction between incidents and service requests and enforce its adherence.
Focus on the quality of ticket creation, making sure you have as much information as possible right from the start, and ensure that the service desk has access to the relevant work procedures required to fix incidents ASAP. Don’t forget to establish clear guidelines for ticket closure and closely monitor this process.
If the organization's maturity level allows, focus on a shift left approach by automating as many simple interactions as possible. Utilize AI solutions as valuable tools for the service desk in speeding up resolution.
Adopt the recommended reporting practices of the Freshservice Benchmarking Report. To start with, focus on the most used metrics in the industry:
- SLA adherence: The percentage of incidents resolved within the SLA timeframe
- Mean time to restore (MTTR): The average lifespan of incidents
- First contact resolution (FCR): The percentage of incidents resolved at first contact
- Resolved at Level 1 (RAL1): The percentage of incidents resolved at Level 1
- Employee satisfaction (ESAT): The percentage of users who are satisfied with the services
- Average first response time (AFRT): The average time needed to pick up tickets

Problem management

While incident management restores service, problem management asks why it broke in the first place. The problem management practice focuses on identifying the root causes behind incidents, fixing what’s broken under the surface, and making sure issues don’t repeat.

Typical phases of the problem management lifecycle

The problem identification process can differ based on the scenario:

Proactive problem management: You don’t see any incidents, and there’s no business impact. Instead, the trigger to investigate might be an event from monitoring tools, a vendor advisory about a potential flaw, or trend analysis revealing a component nearing failure. The goal is to identify and eliminate the root cause before users notice anything.
Reactive problem management: You respond once incidents begin to manifest. That’s a typical scenario after incident resolution, where you perform root cause analysis (RCA). In this case, your enterprise would have noticed a service interruption or at least a degradation of service. RCAs are resource-intensive, so you need AI-powered analytics tools to reduce agent workload and streamline the investigation process.

Utilizing AI-powered capabilities through Freddy AI Insights for RCA

Resolving a problem can also follow different approaches:

A workaround reduces the impact of a problem without eliminating the root cause. It's a temporary measure that helps service desk agents restore service faster while the investigation continues.
A permanent fix addresses the root cause directly, preventing the issue from recurring.

Not every problem gets a permanent fix immediately; some require changes that need planning, resources, or scheduled downtime. When a root cause is understood but unresolved, ITIL calls it a known error, meaning that the problem has been analyzed, the cause documented, but the fix hasn't been implemented yet. Known errors stay in that status until resolution, typically with a workaround attached so analysts can respond to related incidents without starting from scratch.

Problem management requires the same triage discipline as incident management; the only difference is how the work gets staffed. Incident management typically has dedicated staff resolving tickets continuously, while problem management mostly relies on subject matter experts who have other responsibilities. In such cases, prioritization determines which problems should be given attention first.

What can go wrong?

In practice, this is one of the most underused ITSM processes, often because it usually lacks clear ownership and resources. Resource constraints also affect RCA quality. Teams often settle for the first plausible cause rather than the actual root cause, which leads to recurring incidents because the real problem was never addressed.

Poor categorization and prioritization are other issues with this practice. Without clear criteria, teams investigate problems in the order they arrive rather than based on business impact.

Recommendations

Ensure that problem management activities have clear ownership. Ideally, assign one or more problem managers who can oversee all the activities and ensure that they are executed properly.
Proper investigations require time and knowledge, so make sure you assign sufficient resources and tools to support the practice.
There are several RCA methods available: 5 Whys, 8Ds, Fishbone, etc. Choose what works best for your organization, but make sure there is a proper method for identifying the actual root causes of incidents.

Sample Fishbone Diagram (Ishikawa) of a service outage incident’s RCA

Similar to incident management, integration with other processes/tools and the availability of information are critical. Proper tools and access to information help with detecting patterns, analyzing vast amounts of data, and linking various elements together.
Monitor key metrics that can tell you how well your problem management practice is performing. Here are some of the most common metrics to consider:
- Percentage of problems resolved within SLA targets: How many problems are closed within the agreed timeframe
- Percentage of problems with a workaround: How many problems have a valid workaround
- First time right RCA: Measure the quality of root cause investigation
- Problem backlog: How the rates of closed and created problems compare

People-first AI for exceptional employee experiences

Change management

A “change” is defined by ITIL 4 as an addition, modification, or removal of anything that can have an impact on IT services. The change management practice ensures that changes to IT services are introduced with minimal disruption and maximum transparency. Despite every change carrying risk, the goal of the practice is not to prevent a change but to manage it in a way that balances speed with stability.

Typical flow of a change ticket

Every change should be assessed for impact and authorized before deployment. Who does the authorizing depend on the risk, whether it's a single approver for low-risk changes or a broader review for high-impact ones.

Most organizations classify changes into three types, each with its own set of characteristics:

Standard changes: Low-cost, low-risk, frequently occurring changes that are preauthorized.
Normal changes: Changes that require assessment and authorization before implementation. These follow a defined workflow: submission, review, approval, scheduling, implementation, and closure. The level of scrutiny depends on risk and impact.
Emergency changes: Changes that bypass the standard process because waiting isn't an option. They are still documented and reviewed, but this is done after implementation rather than before.

What can go wrong?

Anything! Assessments, approvals, deployments…

It starts with incorrect assessments, usually due to poorly documented requirements, a lack of information about the impact of changes, and/or confusion about which stakeholders should be consulted. Such assessments lead to approvals where reviewers don't fully understand or they block everything because they don't trust what's in front of them.

For the same reason, change management is seen as a slow, bureaucratic process. In certain cases, you cannot wait for a change to go through all the steps and log it as a standard or emergency change. As a result, the controls meant to catch problems get bypassed entirely. And when those changes fail, they fail in production.

Recommendations

Ensure that you have clear templates for change requests, so stakeholders know exactly what’s happening and why.
Automate standard changes to remove unnecessary overhead, and clearly define the procedures for classifying and declassifying standard changes.
Risk assessment shouldn't stop at technical impact. Change reviewers need to consider business timing, user impact, dependencies on other changes, and rollback complexity.
Like other practices, measure the success of your changes using the following metrics:
- Percentage of successful changes: Compare changes that were implemented successfully to the total number of implemented changes.
- Emergency change percentage: High rates suggest poor planning, unrealistic deadlines, or teams gaming the system to skip controls.
- Change backlog trend: A growing backlog signals resource or process bottlenecks.
- Post-change incident rate (24-48 hours): Tie changes directly to downstream stability, not just whether the deployment itself succeeded.

Service request management

Service request (SR) management handles predictable, repeatable, and low-risk user requests for standard services, information, or access. The goal of this practice is to fulfill requests quickly and consistently without burdening the teams that handle incidents and problems. Unlike changes, there's no need for risk assessment in this practice; the service already exists, the process is documented, and fulfillment follows a known path.

Service requests can be for any of the following:

Information
Advice
Standard change
Access to a service
Feedback, compliments, or complaints

These user requests are already agreed to as part of normal service delivery and presented to the user via the service catalog. Each catalog item defines a workflow, typically including what approvals are needed, what tasks get created, and who fulfills them.

High-level flow of a typical service request

However, do not confuse the practice of service request management with basic ticket handling. Service request management depends on well-defined policies and processes, but once defined, most requests—no matter how simple or complex—become prime candidates for automation and self-service. This is a key aspect that isn’t possible for generic tickets that are fulfilled differently each time.

What can go wrong?

Trouble starts when the service catalog isn’t user-friendly. If people don’t understand what to request or where to find it, they’ll either pick “Other” or email someone directly. That’s a reporting nightmare because generic requests require manual triage, and fulfillment data becomes meaningless because everything lands in the same bucket.

Then there's the automation gap. Organizations invest in ITSM tools but never configure the workflows that would eliminate repetitive tasks. In such cases, every request still requires manual handling regardless of the request's complexity.

Recommendations

Build a service catalog in plain language, not using IT jargon. Keep it simple and visual as much as possible.
Describe what’s available and what the user needs to do to get it. The smoother the request experience, the fewer tickets get escalated unnecessarily.
Automating SR workflows can speed up fulfillment, increase employee experience, and lighten the load on the agents.

Employee onboarding phases with FreshService Journeys

Keep an eye on metrics to evaluate how your flows are going and see where you need to step in:
- Service request volumes: How many service requests are created in a specific timeframe, ideally split per category
- Average fulfillment time: The average time taken to deliver a service request
- SLA adherence: The percentage of SRs resolved within agreed service-level agreements.

Knowledge management

The connection between knowledge management and self-service is invaluable. Knowledge management ensures that the right information reaches the right people at the right time; it captures what teams learn and makes that knowledge accessible when it's needed. When the knowledge content is strong, self-service delivers real deflection because users find answers without waiting for an analyst. The service desk handles fewer routine inquiries and focuses only on issues that require human judgment.

ITIL 4 distinguishes between explicit knowledge (documented, transferable) and tacit knowledge (personal expertise, hard to articulate). Most knowledge management focuses on explicit knowledge, but the real challenge is converting tacit knowledge into something shareable.

A typical knowledge management lifecycle

Once tacit knowledge becomes explicit, it needs a structure that people can make sense of. The taxonomy matters less than findability, though. If search works well and articles are tagged properly, users won't browse categories anyway.

Most knowledge bases are organized by audience (end-user vs. analyst), by service (email, HR systems, hardware), or by content type (how-tos, troubleshooting, policies). However, it is perfectly fine to blend these approaches, as following a single KB structure may not suit every use case. What matters is that articles surface when people need them.

What can go wrong?

A lot of organizations don’t have clear flows, owners, and tooling to ensure that the knowledge base remains up to date. Articles become outdated fast, become less relevant for users, and as such, the self-service experience is severely degraded. People stop looking for information if they cannot trust the source.

Another major sticking point is the structure and ability to tag articles for effective search. Knowledge bases very commonly become cluttered with tons of categories that make navigation a nightmare for users.

Recommendations

Ensure that you have clear ownership assigned and reinforce a simple structure. Schedule periodic review of articles to ensure relevance and accuracy.
Use templates to ensure consistency. Templates not only help with creation and review but also make it more intuitive for users to quickly find the information they need.
As the knowledge base grows, maintenance becomes even more resource-intensive; consider automation to refine and auto-populate articles.
Monitor the health of your knowledge base with metrics like:
- Percentage of outdated articles: The number of articles that have passed the review deadline relative to the total number of articles
- Average article rating: User feedback on the article quality
- Top queries: The most used search terms in the organization
- Average article age: Number of days since article was last updated
- Time to review: Average days it takes for flagged articles to be reviewed after a reminder is sent

People-first AI for exceptional employee experiences

Freddy AI Agent

Deliver 24/7, personalized support with Freddy Al Agent on Slack and MS Teams to free up agents for more complex tasks.

Freddy Al Copilot

Enable faster, smarter support with Freddy Al Copilot to draft responses, summarize tickets, and build help content.

Freddy Al Insights

Uncover trends and get actionable insights with Freddy Al Insights to improve performance and guide smarter decisions.

Final thoughts

Every organization runs some version of these processes, whether formalized or not. The important question is whether they're connected.

Connection means three things. First, data should flow automatically. Second, handoffs must have owners and someone who is accountable. Third, metrics should expose gaps and reveal where the system is breaking down.

That said, not every connection warrants the same level of investment. Start with the handoffs that cause the most problems when they break. For most organizations, that's incident to problem and change to knowledge. Get those working reliably before optimizing the rest.

Ready to connect your ITSM workflows? Try the Freshservice demo to see it in action.