No matter how good your code is, that beautiful system you've built is going to break. Or someone is going to break it. Either way, sooner or later, you system going to have problems, and someone will have the job of figuring out what is wrong and how to fix it. It might be you, if you're in a "DevOps" culture, or it might be some graveyard shift junior sysadmin who got stuck with the pager.
Whoever it is who gets the call, there are a variety of tools and techniques you can apply to your code to make it easier to isolate the location of the fault, determine the cause and identify possible mitigations and fixes.
This talk will go through the basics of why things break, how problems are found and fixed, and the few small changes in how systems and services are developed that can make a big difference to the ease of finding problems, and the time it takes to get a problematic service back on its feet. If you develop software, you owe it to your future self (or the graveyard shift junior sysadmin) to catch this talk.
Matt is a beardy Unix guy who has been convincing computers to do things for a very long time. Part developer, part sysadmin, part manager, he has seen a lot of things, and once started on a topic, is very difficult to stop.