![]() ![]() Trigger incident response if necessary ( !ic page).Mention on Slack if you think it has the potential to escalate.Monitor status and notice if/when it escalates.If related to recent deployment, rollback.Liaise with engineers of affected systems to identify cause.No redundancy in a service (failure of 1 more node will cause outage).Something that has the likelihood of becoming a SEV-2 if nothing is done.Partial loss of functionality, not affecting majority of customers. ![]() Stability or minor customer-impacting issues that require immediate attention from service owners. Our incident response process should be triggered for any major incidents. Any other event to which a PagerDuty employee deems necessary of an incident response.Īnything above this line is considered a "Major Incident".Monitoring of PagerDuty systems for major incident conditions is impaired.Web app is unavailable or experiencing severe performance degradation for most/all users.Incident response functionality (ack, resolve, etc) is severely impaired.Notification pipeline is severely impaired.Customer-data-exposing security vulnerability has come to our attention.Ĭritical system issue actively impacting many customers' ability to use the product.Functionality has been severely impaired for a long time, breaking SLA.The system is in a critical state and is actively impacting a large number of customers.SeverityĬritical issue that warrants public notification and liaison with executive teams. The Incident Commander can make a determination on whether full incident response is necessary. If you require coordinated response, even for lower severity issues, trigger our incident response process. Anything above a SEV-3 is automatically considered a "major incident" and gets a more intensive response than a normal incident.Īll SEV-2's are major incidents, but not all major incidents need to be SEV-2's. Operational issues can be classified at one of these severity levels, and in general you are able to take more risky moves to resolve a higher severity issue. Incidents can then be classified by severity, usually done by using "SEV" definitions, with the lower numbered severities being more urgent. The first step in any incident response process is to determine what actually constitutes an incident.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |