SRE: Putting The Pieces Together – For The Love of Engineering

To preface, this post will be divided into two parts one of which will focus on a “theory” aspect of Site Reliability Engineering and the other will focus on creating a PowerShell module, adding existing scripts to it, and interacting with the module via the shell. So, without further adieu, let’s get into this.

The Debate

In recent months, now that I’ve started to get acclimated with my new role, I’ve started to focus on learning more about the concepts behind SRE. The fact is that SRE covers a fairly broad spectrum of potential responsibilities in a role. Some are geared towards web, some systems, some network only, some the entire spectrum. With that in mind, there is a reoccurring topic that I’ve been running into in regards to the world of SRE. SREs with a background in Systems (SysAdmin, Desktop Support, etc) make the best SREs. Out of curiosity, I did a little research on the areas most SREs come from. It boils down to two of areas of IT, Systems and Development (Application, Web, etc).

Question: Do SysAdmins make the best SREs?

Note: Before I get to far into this post, I want to be very clear about the message I’m attempting to convey here. I’m not trying to start a debate or anything silly like that nor am I trying to show favor towards one area of the IT world or another. However, I do feel that there is some validity to the idea that SysAdmins make the best SREs.

Answer: Yes, SysAdmins do make the best SREs

The reason I feel this is true is not because one area or the other has more talented people. Instead, the reason is because of an underlying aspect involved with both types of roles, troubleshooting (or problem solving). This is not because people with a dev. background are not good at troubleshooting. In fact, you can bet that they are also very effective troubleshooters. However, the deciding factor in this case is based off the fact the fundamental concept behind the objective of an SRE. This objective is to manage systems using a programmatic approach. The majority of this is done via scripting as opposed to building applications to do so. Back to troubleshooting.

The World of Troubleshooting

It doesn’t matter where you work in IT, every job involves some form of troubleshooting so let’s look closer at both areas of IT. On one side of the coin you have Systems, in which you spend the majority of your time troubleshooting any and all infrastructure based issues ranging from managing server space, to application issues, to network issues, the list is endless. On the other side of the coin you have developers who spend the majority of their time writing and debugging applications. The issues that they troubleshoot are those such as a programs not compiling, application slowness, or compatibility issues.Their list is also endless. Both areas deal with incredibly complex issues and both areas have to use problem solving and logic to solve them. The reality is that these are two completely different worlds. The troubleshooting scenarios are just different, neither superior to one another.

So why do people with a Systems background make the best SREs? It has nothing to do with ability or skill sets. It’s because being an SRE is exactly that, managing systems. The concept of using a programmatic approach isn’t anything new. SysAdmins have always utilized shell or have written batch files to do something like install a piece of software. The only difference is that SysAdmins are changing their approach to managing their Systems. It’s a proactive approach and it focuses on automating the prevention, detection, and resolution of issues rather than the reactive approach of waiting for stuff to break and then fixing it.

The same would apply on the other side of the coin. There’s no question that they will be the best at solving problems encountered in the development world. I think that a portion of the cause of this confusion is the word Site. I’m guessing that when this concept emerged, the likely weren’t sure how to differentiate between engineers that focus on network reliability from those who work on server reliability or what have you they figured, screw it, let’s just call it Site Reliability Engineering (or something to that effect). If you ask me, they should have went with Systems Reliability Engineering and Application Reliability Engineering or whatever. Both sound pretty damn cool to me.

Note: If you do happen to be making the jump or considering a move from the developer arena to that of an SRE, do not let this post disdain you from doing so. However, I would highly recommend taking some time to read up on systems administration, spend some time with end users, and even spend some time learning concepts like the OSI model. In fact, I would recommend this to any developer out there. It’s really easy to distance yourself too far away from the folks that are using the systems. While you obviously can’t spend all day every day working with end users, it keeps them in the back of your head while you’re developing which will help you ensure a better experience for your end users.

With that in mind, let’s get nerdy and create a PowerShell module.

Behold. I’ve Created a Module

Note: In addition to accidentally uploading like three different version of the same function to get the firefox version, I’ve also decided that it needs a different name. Therefore, I’ve deleted all three of them and replaced it with Get-FirefoxDetails due to the fact that more functionality will be added later on and my Github has been updated with the changes.

Now that we have our baseline of troubleshooting tools, it’s time to build a module. Since we already have the directory structure set up, our next step is to create the module manifest file. This can be done by entering the following command into PowerShell.

The Module Manifest file is a data file that is composed of information about the contents of the module as well as information about how to process the module files. This file is saved with the (.psd1) extension and should be stored in the root directory of the module.

Next, we’ll create our module file. The module file will be saved with a (.psm1) extension in the same directory as our manifest file. The module file is where we’ll define the scripts that will be in the module, as well as any assemblies or other various components of the module. A colleague introduced me to a neat little tweak that enables you to be able to call any of the functions regardless of working directory. (Thanks G-Squared!)

Now that we have our module file and our manifest file, we just have to do a little house keeping on the troubleshooting scripts so that they can be utilized by the module. This is done by adding the following to the last line of a given script file.

Note: It’s important to remember that, if you do not add this as the last line of a given function, it will automatically run upon importing the module. I went ahead and updated all of the current scripts to reflect this change. Now let’s go back to the shell, import our module, and check it out.

We can also view all of the functions are available by running the following cmdlet.

OK, let’s test out one of these functions to make sure we’re in business by running Network-Check, except this time, we won’t have to call the full path, filename, and extension to run.

Isn’t that just the coolest thing? It’s also very exciting because now we have the baseline of our Powershell module with an infinite amount of possibilities that it could be used for. In essence, this is my favorite part of being an SRE, or rather, focusing on automation. It opens the door to being able to streamline and stabilize the management of an infrastructure, which leads to being able to save countless amounts of time, which leads to being able to work on more fun and amazing things.

As always, thank you to all of you folks who read this as well as the great feedback you’ve provided. It has really helped me learn while doing so. Additionally, all of the module code and functions are available on my github. Stay tuned for my next post where I plan on adding more functionality and creating a couple new functions. Thanks again!