As of today, I’ve been a SRE for one month now. I’d like to start by saying that this first post and probably the next few, will not be very technically oriented. While I’ve published a little bit of code to the prod environment, the majority of my time these first four weeks has been spent learning the architecture of our infrastructure inch by inch. It’s been a humbling experience to an extent. When I realized just how big my environment spans, I also realized that this would not be a quick or easy excursion.
However, after a few weeks of research and whiteboard sessions, I finally had a good handle on our architecture. The next struggle was learning what a Site Reliability Engineer actually does. Not only that, but what the workflow should be, what should I be working on, what are the priority levels. To quote the great movie ‘Office Space’: “What would you say, you do here?”.
Admittedly, I’m still figuring this part at this point but I have a much better understanding of it then I did when I started. I’ve come to learn that a SRE encompasses quite a bit more then just automating tasks. We handle incidents, bridge calls on high priority issues, report analysis, network analysis, the list goes on. In fact, today I published my first KB article to the Knowledge base used by our level 1 and 2 troubleshooting teams. It was an interesting experience writing a KB from the perspective of someone who would not be using it. No matter though, I received a “Nice Work” shout out from my manager before he forwarded it off to a few other managers, so I must have done something right.
In my next post, I’ll try to discuss something a little more technical. Thank you to anyone who views this and I hope you enjoyed.