We're talking about triaging bugs in software systems. So at a very simple level, this topic comes down to the question: Is your software working in the hands of your users? If your app, or your game, or your web service isn't working perfectly for every interaction, then it's time to think about bug triaging. For many systems, there are going to be some software bugs that are tolerable if they happen infrequently, while other bugs if they happen even once, are critical and need immediate action. So bug triage is all about making those judgment calls quickly and accurately. This video is the first in a three-part series on how to approach software bug triage. Part 1: Bug Triaging Principles Part 2: Tips for Effective + Efficient Bug Triage Part 3: Developing a Bug Triage Process with Your Team Transcript 0:15 okay so let's set the stage here 0:17 we're talking about triaging bugs in 0:19 software systems 0:20 so at a very simple level this topic 0:23 comes down to the question 0:24 is your software working in the hands of 0:27 your users 0:28 if your app or your game or your web 0:31 service 0:31 isn't working perfectly for every 0:33 interaction then it's time to think 0:35 about bug triaging 0:37 for many systems there are going to be 0:39 some software bugs that are tolerable if 0:41 they happen infrequently 0:43 while other bugs if they happen even 0:44 once are critical 0:46 and need immediate action so bug triage 0:49 is all about making those judgment calls 0:51 quickly and accurately 0:55 so it's useful to break bugstown into 0:57 three categories for our discussion 0:59 today 1:00 there are those bugs that need immediate 1:01 action these are things that require 1:03 some developer intervention maybe 1:05 rolling back a recent release or 1:07 flipping a feature flag to get the bug 1:09 out of the hands of users 1:10 and get back onto a stable version of 1:12 the software there are those bugs that 1:15 do not require immediate action now but 1:18 may require action in the future 1:20 if they become more impactful let's say 1:22 if they happen more frequently or affect 1:24 more users 1:26 and third there are those bugs that 1:28 regardless of their frequency 1:30 are safe to ignore of these three 1:33 categories it's really the first two 1:35 that are most interesting for our 1:36 purposes today 1:37 and those are the ones we're going to 1:38 focus on 1:42 so when it comes to determining which 1:44 category a given bug falls into 1:46 there are really two main workflows 1:49 there's what we call reactive triage 1:51 and what we call periodic triage 1:53 reactive triage 1:55 is the scenario where a bug occurs or 1:58 something changes with a bug's frequency 2:00 and it requires someone on your team to 2:02 drop what they're doing and go 2:03 investigate it immediately 2:05 so these tend to be bugs that are high 2:08 impact 2:09 or affect a critical area of your system 2:12 so some examples here might be a new bug 2:15 that the system has never seen before 2:17 it might be an issue involving a bug 2:19 that was previously occurring at some 2:22 safe steady state frequency but now bug 2:24 snag has detected an anomalous spike in 2:26 the frequency of that bug 2:29 could be a bug in a critical area of 2:31 each system a bug that you previously 2:33 fixed 2:33 that has come back now in a future 2:35 release of the software 2:37 or it could be something to do with a 2:39 stability score 2:40 being off target for your project these 2:43 are all concepts we'll talk about more 2:45 but the crucial point here is that a key 2:47 category of bug triaging 2:49 is all about reactively jumping into bug 2:53 snag 2:53 figuring out what's going on and making 2:55 sure that 2:57 some immediate action isn't needed to 3:00 get a bug 3:00 away from your users 3:04 so when you're thinking about reactive 3:05 triage in 3:07 your project and within your team it's 3:10 really important to think about 3:12 which subset of your errors are going to 3:14 rise to the level of importance that you 3:16 want someone on your team 3:18 to effectively drop what they're doing 3:19 and go triage that bug immediately 3:23 once you've made that determination you 3:24 can configure bug snag via 3:26 the alerting and workflow engine to 3:28 notify your team via 3:30 team chat or via on-call alerting system 3:32 whenever one of the bugs that meets your 3:34 custom defined criteria 3:36 occurs so it's worth pointing out that 3:39 the 3:40 alerting and workflow engine is is 3:42 highly configurable 3:43 you decide when bug snag notifies you 3:45 and through what means 3:47 some examples of how you can use this to 3:50 your team's advantage 3:52 let's say you have a spike in errors 3:54 affecting your vip customers 3:56 where you define what it means for a 3:58 customer to be a vip 4:00 bug snag can detect that and 4:02 automatically open a pager duty incident 4:04 for you 4:05 fitting into your team's existing 4:06 on-call rotation 4:08 and bug remediation process 4:11 or let's say you work in a monolithic 4:13 code base where 4:15 each team works out of a different slack 4:17 channel but ultimately you share the 4:18 same code 4:20 you can configure bug snag to notify 4:22 your slack channel 4:24 about bugs in your team's part of the 4:26 monolith 4:27 and the possibilities are really 4:30 infinite from there 4:33 so most teams aren't going to triage 4:35 every single error 4:36 using a purely reactive workflow they're 4:40 going to be those bugs that 4:42 aren't critical enough that require 4:44 people to drop what they're doing 4:46 and go triage them immediately of course 4:49 this varies from team to team but this 4:51 is generally true 4:53 all bugs affecting your system need to 4:55 be reviewed and prioritized regularly 4:56 though 4:57 so an initial target that we recommend 5:00 is to have your team 5:01 triage your for review errors once per 5:04 day 5:05 this is especially important to do first 5:07 thing in the workday or 5:08 after lunch any time where there may 5:10 have been a lapse in coverage 5:13 and new bugs may have crept in or 5:16 previously triage bugs may have come 5:18 back into the for review state 5:20 and we'll talk about all that in greater 5:22 detail in a moment 5:25 let's quickly review the workflow 5:28 actions available 5:29 in bug snag when a bug is first detected 5:32 by bug snag 5:33 it goes into the open and for review 5:36 workflow states 5:37 and we'll talk more about the four 5:39 review workflow state because that's 5:41 really 5:41 key to triaging so when an error 5:45 is in an open error state there are some 5:47 key workflow actions you can perform on 5:49 the error 5:50 and these map back to those three 5:52 categories of errors we talked about at 5:53 the beginning right 5:54 things you want to fix immediately 5:56 things you may want to fix in the future 5:58 and things you're safe to ignore 6:00 so starting from 6:05 top left to right here snoozing an error 6:08 is something you can do to conditionally 6:09 reopen an error in the future and this 6:11 is something you would do 6:13 if an error is in that category where 6:15 you want to 6:17 keep an eye on it but you're not going 6:18 to fix it right now and you're only 6:19 going to address it 6:20 if it becomes more impactful you can 6:24 create an issue 6:25 to track the work related to an imminent 6:27 fix of a bug 6:28 so for example if you're using jira 6:32 this would be equivalent to clicking a 6:34 button in bugsmag which will create a 6:35 jira ticket 6:37 which will then be used in your sprint 6:39 or other work 6:40 uh work planning process 6:43 to track the work of actually going in 6:45 and making the necessary code or 6:47 infrastructure changes to remove the bug 6:51 you can mark a bug as fixed and this is 6:53 typically what you would do 6:54 for those category one bugs that you've 6:57 decided to fix right now 6:59 when you've taken some action to 7:00 remediate the bug 7:02 and when you mark a bug as fixed it will 7:05 only return to the for 7:06 for review state if it's seen again in a 7:08 future version of the code 7:10 and lastly you can ignore an open error 7:14 which will signify that you're not 7:16 planning to take any action on it 7:18 regardless of how frequently it may 7:20 occur in the future 7:22 during error triage you're typically 7:24 going to be taking these workflow 7:26 actions 7:26 from a specific error details view 7:29 inside bug snag 7:30 you can also take these workflow actions 7:33 from the inbox view and bug snag 7:35 which also gives you the ability to take 7:37 workflow actions on 7:38 more than one error at once we'll look 7:41 at some examples of doing this in the 7:43 product in just a moment 7:46 a key tip for error triaging and bug 7:49 snag 7:49 is to start your triaging workflow with 7:52 four review filter 7:53 in the bug snack inbox so if we look at 7:56 the screenshot below you'll see that 7:58 we're viewing the bug snag inbox 8:00 and that it's currently filtered to four 8:02 review errors and you can see this in 8:04 two key places 8:05 in the filter bar it says status for 8:07 review and in the left hand 8:10 column it says for review with 18 in 8:13 parentheses and that has 8:15 an active ui state and this signifies 8:17 that we're currently filtering for four 8:19 review errors there are 18 8:20 errors that need to be reviewed and the 8:22 tooltip there is giving us a hint it 8:23 says open errors that are awaiting 8:25 triage 8:26 so what we need to do if we imagine that 8:29 we're on this 8:30 team that's responsible for the software 8:32 that's being monitored by this bug snag 8:34 project here what we need to do is look 8:36 at every one of these errors currently 8:38 affecting our users 8:39 and determine its impact and then we 8:41 need to determine which of these 8:43 workflow actions that we just discussed 8:45 fixing snoozing creating an issue 8:47 ignoring etc. 8:48 is most appropriate given the current 8:50 impact of the bug 8:52 and given the current work that is on 8:54 our team's plate 8:57 so let's take a look at a project in bug 8:59 snag 9:00 so we can see some of these things in 9:01 action 9:03 if we go to this photosnap android 9:06 project 9:07 and have a look at the inbox we can see 9:10 that this project has quite a few 9:12 open errors and notice we're filtered to 9:15 areas that have occurred only in the 9:16 past 30 days 9:18 so it's likely that there are even more 9:19 than the 38 open is currently affecting 9:21 this 9:22 project but let's as we said have a look 9:25 at the four review errors so we can see 9:27 in the last 30 days there are 24 9:29 errors that are for review so we might 9:31 start our triaging here 9:33 and again what we're going to do is 9:35 we're going to look at every one of 9:36 these errors 9:38 in the for review set and we're going to 9:40 figure out 9:41 what the appropriate next step is for 9:43 each of these errors 9:46 one thing you might want to consider 9:47 doing at this point is sorting 9:49 the inbox either by total number of 9:51 events per error 9:52 so you can see this is the error that 9:55 had 9:56 the most events in the past 30 days or 9:59 you could sort by users affected as well 10:01 and you can see this one affected 56 10:03 users 10:05 this happens to be an application not 10:07 responding error 10:09 which is pretty severe so let's go and 10:12 take a look at that 10:15 so here we are on the error details 10:17 screen 10:18 this gives us a overview of all of the 10:22 specific information to do with this one 10:24 particular 10:26 defect in the application so we can see 10:29 again this affected 10:31 56 users it happened a total of 80 times 10:33 in the past 30 days 10:34 we can go between these tabs and see 10:37 more information about 10:39 how those 80 occurrences are distributed 10:41 across 10:42 specific users we can see which releases 10:45 of the software the bug has occurred in 10:47 os versions of end user devices and so 10:50 on 10:50 and if you're new to bug snag it's worth 10:54 pointing out that 10:55 all of these we call these pivots all 10:58 the information in these pivots can be 10:59 used to 11:00 filter down the view of this error even 11:03 more 11:04 so if we're only looking at os version 7 11:07 1 1 it goes down to 28 11:11 events and 24 users affected the point 11:13 is 11:14 you can use all of this information that 11:16 bug site gives you about the 11:18 frequency of the error the specific 11:22 device context in which this error has 11:25 been seen 11:26 to determine the impact and to determine 11:28 the next step 11:29 once you've determined what makes sense 11:32 to do for this you'd come up here 11:34 these are the error actions that we 11:35 talked about so this is where you would 11:37 create an issue 11:38 this is where you could mark it as fixed 11:40 so where you would snooze it 11:42 ignore it so let's say that we've just 11:45 shipped a fix for this and we don't 11:47 expect to see it in a future version of 11:49 the software anymore 11:50 then the next step would be to mark this 11:53 as fixed 11:55 here it's prompting us to add a comment 11:57 about why we think this has been 11:59 fixed and we can say something to the 12:01 effect of 12:03 fixed in last release mark is fixed 12:07 there you go now you can see that it's 12:08 fixed 12:10 and if we go to the comment and activity 12:12 view we can see 12:14 that this was fixed and here's my 12:16 comment explaining why 12:19 so you start your triaging workflow with 12:21 your four review errors 12:23 now your goal should be to get that 12:25 total number of four review errors down 12:27 to zero 12:28 on a regular basis and what it means 12:30 when you do that when you achieve bug 12:32 snag inbox zero 12:33 it means that all of your critical bugs 12:35 have been addressed and for any lower 12:37 priority 12:38 bugs you've determined the criteria at 12:40 which point you will take further action 12:42 on them in the future 12:46 a common question we get at this point 12:48 is when will bugs 12:50 ever go back into the for review state 12:52 and there are a few 12:53 key situations where this will happen 12:56 the obvious one is 12:57 newly introduced bugs bugs that bug 12:59 snake has never seen before 13:01 will continue to go in the for review 13:02 state for you to triage 13:04 but also any previously snoozed 13:07 bugs that have exceeded their previous 13:09 news thresholds will also go back into 13:11 the for review state 13:13 and any bugs that you've marked as fixed 13:14 that have happened in a new version of 13:16 your software 13:17 will also return to the for review state 13:20 and the reason for this is that 13:21 even though you've looked at these bugs 13:23 in the past now 13:24 their context has changed they've begun 13:27 to happen more frequently 13:29 or they've happened in a version of your 13:31 software where you're not expecting them 13:32 to happen and so in all of these cases 13:35 these are things that you want to be 13:36 looking at to be determining their 13:37 current impact 13:38 and whether you need to take some new 13:40 action based on this new information 13:43 so why aim for bug snag inbox zero well 13:46 first and foremost if you're regularly 13:48 getting your inbox down to 13:50 around zero errors for review it means 13:52 that when new errors do come in 13:54 to be reviewed your team can be more 13:56 efficient with their attention 13:58 because if you consider the case where 14:00 you're not getting anywhere close to 14:01 inbox 14:02 hero when someone comes in to do 14:03 periodic review they may have to sift 14:05 through 14:06 several errors that have been given 14:08 varying degrees of review 14:10 already but that's not necessarily clear 14:13 because a workflow action hasn't been 14:14 taken on those errors appropriately 14:17 so if you are getting to inbox hero 14:18 regularly it means that 14:20 the errors that your team looks at 14:21 during the triaging workflow are only 14:23 those errors that need to be considered 14:25 in their current context 14:28 the other thing about getting close to 14:30 inbox zero or hitting inbox zero on a 14:32 daily basis 14:33 is that it increases the likelihood that 14:34 your team is going to be engaged 14:36 with the periodic triaging process 14:38 because the lower that number is that's 14:40 for review the closer to zero 14:42 the more likely people are to want to 14:44 get that down to zero you know you 14:46 consider the case of 1000 errors to 14:49 review versus five errors to review 14:52 one is much more inviting than the other 14:54 as far as 14:55 you know someone on the team wanting to 14:56 go in and do the necessary work to get 14:58 those errors triaged 14:59 so try to hit inbox zero every day it's 15:02 going to make your team more engaged 15:03 it's going to allow them to spend time 15:05 in bug snag 15:10 efficiently 15:19 you

bugsnag

Slack

Target

What Are the Best Practices for Triaging Software Bugs

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Little Dropbox, Bugsnag, and a Lot of Visibility During Error Investigation

How to Achieve Effective and Efficient Bug Triage

How to Develop a Bug Triage Process Efficiently

Identifying and Addressing Key Web3 Vulnerabilities

Streamlining Bug Triage: 10 Must-Have Code Snippets for QA Teams

#10 Rules of Bug Bounty

A Little Dropbox, Bugsnag, and a Lot of Visibility During Error Investigation

How to Achieve Effective and Efficient Bug Triage

How to Develop a Bug Triage Process Efficiently

Identifying and Addressing Key Web3 Vulnerabilities

Streamlining Bug Triage: 10 Must-Have Code Snippets for QA Teams

#10 Rules of Bug Bounty

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps