5 Tips for Writing Reusable Python Scripts

You are having a productive work day, checking email, writing some code, sipping coffee;  it is a happy day.   But suddenly, without warning, you hit a snag.  You have to do some manual labor.   When software developers use the term manual, we aren’t typically talking about lifting, heaving, hammering, or lugging.  We are usually referring to doing a repetitive task that can be automated, but isn’t.

Now you have to make a decision of whether to suck it up and do the manual labor, or find some way to automate it.  There is an art to this assessment that I will not get into here, but it is critical to make the right decision. Early on in my career I tried to automate everything, but I have become much better at discerning when it is worth it.  Certainly, if the time it takes to develop and execute the automation of the task is remotely comparable to the time it takes to do the task manually,  most software developers would rather spend their time writing the code. If this is the case, then it is time to write a script.

My personal scripting language of choice these days is Python, mainly because that is just what I am used to, and I love how so much can be done with so little code.   The tendency is to just whip up a single-purpose script that is geared towards the task at hand, without any regard to some future need.  However, I have found that so often, if I have a need for automation one time, I almost always have a need to automate something with an overlapping need.  Rather than just making one-off copies of a script and tweaking it to serve each purpose, it is much better to build a library of reusable building blocks that can be used to piece together a script in the future.  Python is great for this, as long as some foresight is taken prior to scripting.

Here are a few tips that I use and follow when writing Python scripts to make them as reusable as possible.

1. Always check for __main__

This is Python 101, but seriously, you should check for __main__ as your entry point into your script, even if you expect your script to be a quick hack.  For those who may be new to Python, I will walk through why this is needed and how it should be done.

Let’s say, for example, that you want to write a quick script that adds all numbers that are provided as script arguments.  A common first approach would be to write something like this:

Yes, this will work, and will output the correct result.  However, if down the road we want to reuse that handy-dandy add_nums function from another module, we will need to import the add_nums module by doing the following:

When we go ahead and run the use_add_nums.py file, we may expect the output result to just be “6”, but it turns out that it actually outputs “0” followed by “6”.  So where does this extra “0” come from?

When a module is imported, the actual module code is executed, so any statements that are present in the module scope are run. In this case, all the code in the add_nums module is run, including the print out.  To make matters worse, whatever arguments are passed in sys.argv are also forwarded to the module.

This side effect can be avoided by putting the script entry point logic under a condition that checks if the script is the executing script. The Python convention to do this is as follows:

Notice the __name__ property, which returns “__main__” when the script is the entry point and the name of the module (add_nums in this case) when it is not. This little trick let’s us use the same script as both a library of logic AND to run the script.

By convention, often the logic under the if condition is placed in function called main().  There was a proposal to make a magic __main__() function that is automatically called to simplify this, but it was ultimately rejected.

2. Avoid assumptions about current working directory

I have seen so many scripts that only work if executed from a specific directory, usually the directory that the script is in.  This assumption can very quickly run deeply into the script and the logic if not careful.  As long as you recognize that you are making this assumption, and take measures to minimize the impact of it, then it is fine to do it, but there are better ways than others to do it.  For example, let’s say you have a script to recursively find and delete all ‘.pyc’ files under some root directory.  One approach could be something like:

This is problematic because the current directory is assumed and hardcoded into the call to os.walk, reducing the reusability if someone wants to use that utility programatically.  A better approach would be something like this:

Notice that by default it uses the current directory, but allows for an argument to specify.  In addition, if someone wants to import this module to use it, they can provide any root directory.  Of course this could be made even more generic by also inputting the file extension of file we want to delete, but you have to draw the line at some point when programming.  This is simple stuff, but often I see these assumptions inserted into scripts when they can be easily avoided.

3. Avoid assumptions about platform

Try not to embed assumptions if the script is running on Windows, Mac, or Linux.  Python is intended to be cross-platform, so always try to find a Pythonic way to do it for you, whether it by something in the standard libs, or in a third-party library.

For one, avoid the usage of string path slashes “\” or “/” throughout the code. Just get in the habit of using os.path.join().  It gets nasty otherwise.

If you can’t find a library, and have to fallback to using the subprocess module to invoke system commands, you can separate the implementations in a clean way such that you don’t have sys.platform conditions scattered throughout your code base.  One way to do this is to implement the classes/modules for each platform as needed and then do a condition ONCE to assign the actual class for the current platform.   Below is a simple example of this:

What’s nice about this approach is if you just get started with a single implementation Fooand you later learn you need to support a different platform, you can just add another implementation and dynamically assign it as shown above.

4. Limit global variables

This is one of those things that is pretty much a generally-accepted best practice for coding in general, so it is no surprise that it makes the list here.  Global variables greatly coupling between components and make it very difficult to reuse any code or create multiple instances of the running logic.

Many times the global variables are often configuration variables that define how things are to be run, such as a verbose flag to log debug info, or general script inputs.  The logging case is a bit involved and I may tackle that at a different time, so for now, let’s focus on script inputs.

I very often see scripts configured and shared with all the input parameters listed at the top of the script, making it easy to find, edit and use.  Here is an example, similar to our earlier delete_pyc example, but now we will be providing the extension of files to delete:

There is nothing particularly wrong with wanting to have the script inputs listed at the top of the script file, but when this is desired, I typically recommend either to (1) have a separate script file to just execute with parameters, or to (2) restrict usage of global parameters under the __main__.For such a simple case as this, I would probably go with option (2) here as follows:

This may not seem like a big difference, but this small change greatly impacts the ability to reuse this chunk of logic.  To avoid the global parameters, notice that we are passing the options as parameters into the method.  If there are many parameters needed, or several methods that share the same parameters, rather than passing the parameters all around, it is usually a good indication that the methods should be combined to form a class.  Creating a class reduces the need to global variables, and makes code maintenance much easier when parameters are added.

The reusability of these scripts can be improved even further if we strive to follow the single responsibility principle a bit closer.  We will discuss this in the next section.

5. Follow the Single Responsibility Principle

By restricting each logical partition of code to do one primary thing, it makes it much easier to put the pieces together while reusing the components.  If we look back at the delete_files example that we have been working with, one can notice that the delete files is actually doing two fundamentally separate things, that should be separated: (1) it is finding files to delete, and (2) it is deleting files.  If we ever need to find files to do something else with them (print, rename, copy, modify, etc), then we will want to use the exact same code to find, but don’t need the delete.  The way it is now, these two functions of finding and deleting are intertwined and are inseparable.  Let’s pull these out into two separate methods and stitch them together in the __main__.

By inspecting the delete_files method and realizing that it was doing more than one thing, we were able to make the components of our script much more reusable.  Sometimes the only way this is recognized is out of necessity, when a portion of the method is needed to do something else.  Resist the urge to copy-and-paste the code out of one method and into another, and refactor instead into components that only do one thing.

This example highlights the single responsibility principle at a method level, but this can certainly be expanded for classes, modules, packages, etc.  Always good to ask “What does this thing do?”  If the answer to that questions is “It does A, B, some of C, parts of D, etc”, then you may need to break it down.

Conclusion

I touched up on a few of the main things that can be done to make your Python scripts as reusable as possible.  I’m sure there are so many others, so feel free to comment and let me know what other things you have found useful.

Sometimes you have to make certain assumptions about your code, such as platform or current working directory, and that is OK, as long as you are fully aware and have weighed the costs of making that assumption.  As I mentioned earlier, no code is fully generic.

The most important thing is to begin with the end in mind, so that reusability is a consideration from the start.  This impacts how you break out your classes, structure your modules, and implement your code.

Leave a Reply

Your email address will not be published. Required fields are marked *