Monitoring @ Scale in Salesforce
Have you ever tried to monitor the health of your service? You know, retrieving server stats, application error counts and plotting cool graphs? Measuring as much as possible is crucial to understand how your software behaves in production. But what if you had to monitor “the cloud” with hundreds of thousands of servers and customers? Alerts can create “noise” and spam your team. Would you be able to answer “how is my user's experience” at any given point in time? This talk will present a case study on how Salesforce approaches monitoring at scale by putting a customer first.
I’m passionate about running software in production - monitoring, scaling, HA, performance, incident management, and all that jazz. Started in software quality, worked as an SRE, moved to software development and currently running a team of programmers in Salesforce.
I am a lead software developer in the Customer Centric Engineering team in Salesforce which solves the toughest of production problems fast and helps prevent recurrences.