We've all been there - launch day arrives, and despite the countless hours of planning, something goes wrong. Maybe it's a minor bug that slipped through QA, or perhaps it's a critical issue that affects the user experience. Either way, the team is left scratching their heads, wondering, "What went wrong?" But what if I told you that these moments, as uncomfortable as they are, hold the key to future success in post-mortem business practices? Welcome to the world of Post-Mortems! It’s a practice that can transform setbacks into stepping stones.
A Post-Mortem is a structured process for analyzing the events and decisions that led to a particular outcome in a project or initiative. It's not just a meeting to point fingers or assign blame; it's a collaborative effort to understand the "why" behind the "what happened." In essence, it's a diagnostic tool for your project's health, akin to a medical post-mortem that seeks to understand the cause of death. Except, in this case, the patient can be revived and made stronger than before.
You might be wondering, "Why go through the trouble?" The answer lies in the invaluable insights that a well-executed Post-Mortem can provide. Here are some key benefits:
The main purpose of a post-mortem is to accomplish two things. First, it aims to dig deep and identify the root causes that led to a specific issue or set of issues. Understanding the "why" behind the "what" is crucial for any team looking to improve. Second, the post-mortem serves as a forum for discussing how to prevent similar issues from happening again. It's worth noting that the goal isn't necessarily to come up with immediate solutions. Some problems require more in-depth analysis, and given the time constraints of a post-mortem meeting, the focus should be on generating actionable items and setting up follow-up plans.
In most cases, the person who worked directly on the issue - often referred to as the "agent" - should be the one to schedule the post-mortem. If you're in this role and unsure how to proceed, this article aims to answer most of your questions. One critical point to remember is that post-mortems shouldn't be confined to internal team discussions. They should be publicly announced to ensure transparency and collective learning. For example, making an announcement in a designated Slack channel like #on-duty can be an effective way to keep everyone in the loop.
Determining what incidents should trigger a post-mortem is crucial for its effectiveness. Here are some guidelines:
Timing is of the essence when it comes to post-mortems. The general rule of thumb is to schedule them as soon as possible after resolving the incident. Here's a quick guide:
The reason for this urgency is simple: details fade from memory quickly. The longer you wait, the more likely you are to forget key facts or start rationalizing the issue. That's why it's highly recommended for those involved in the incident to document facts as soon as they can.
Determining who should attend a post-mortem is crucial for its success. Generally, the people involved in the incident, whether engineers or not, should be present. The agent on duty at the time of the incident and other engineers involved in resolving the issue must attend. Additionally, consider involving a representative from the customer success department and either a solution architect or project manager for issues that have impacted or may impact customers. However, try to keep the number of attendees to a maximum of six people, excluding the moderator. Anyone else interested can read the post-mortem meeting notes afterward.
The moderator's job is to ensure the post-mortem runs smoothly and adheres to a set process. They are responsible for taking notes and publishing them on the relevant Confluence page. The moderator should enforce a code of conduct that includes no blaming, no finger-pointing, and no guessing if facts can be easily collected. They have the authority to remove attendees or postpone the meeting under specific conditions. When choosing a moderator, opt for someone not involved in the incident or anyone familiar with the process.
To ensure that action items from the post-mortem are being addressed, schedule bi-weekly open meetings that everyone interested can attend, and those assigned to post-mortem tickets should update on the status.
Writing a post-mortem is an art that combines analytical thinking with effective communication. The goal is to create a document that not only serves as a record of events but also as a learning tool that can help prevent future incidents. Here's a step-by-step guide on how to write a compelling post-mortem:
Start by providing a brief overview of the incident. Include the date, time, and a high-level description of what happened. This sets the stage and gives readers context for what they're about to delve into.
In this section, outline the key details of the incident. This should include:
Create a detailed timeline that chronicles the incident from start to finish. Use precise timestamps and include all significant events, such as when the incident was detected when the team was alerted, and what steps were taken to resolve it.
This is the core of the post-mortem. Use the 5 Whys technique or a similar method to drill down into the root cause of the incident. Be thorough but avoid blaming individuals; focus on processes and systems.
List the key takeaways from the incident and the post-mortem discussion. Then, outline the action items, specifying who is responsible for each and setting deadlines.
Wrap up the post-mortem by summarizing the key points and reiterating the action items. This serves as a quick reference for anyone revisiting the document later.
Here's a simplified example of a hypothetical incident where a software deployment caused a service outage:
In this comprehensive guide, we've delved into the intricacies of conducting and writing post-mortems in a business context. From understanding its value to knowing who should be involved and how to effectively document and learn from incidents, a well-executed post-mortem is an invaluable tool for continuous improvement. It's not just about identifying what went wrong; it's about creating a culture of accountability and learning that drives your team and your product forward.
If you found this article insightful and want to discuss post-mortems or any other aspect of product management further, I invite you to connect with me on LinkedIn.