27 June 2022
I recommend the following approach towards these principles:
Don't try to go all in! Look for the low hanging fruit. See what is easiest to apply in your codebase. The more you get used to these principles, the more you'll see ways of using them.
Some examples of I/O, so we're clear what I'm talking about:
Basically, whatever your program does to interact with the world outside of it.
In Python, you can do I/O wherever you want in your code. It can be in any function deep in the callstack.
What's wrong with doing it this way? Functions that do I/O are hard to reason about, because their behavior depends on what comes from "outside". They're also harder to test, because you have to mock that outside thing the function is reading from.
If you only do I/O in main, it means that your other functions can just take arguments, do stuff with them and return a value.
main function would look something like this:
def main(): # I/O input_value = read_file("some_file") # non-I/O result = do_stuff(input_value) # I/O print(result)
However, sometimes the logic of your program consists of several steps interspered with I/O operations. In that case, do the non-I/O steps and return the result to
main. Then, do the required I/O operation. Then do more work with the previous value and the one you just got. It could look like this:
def main(): # I/O inputValue = read_file("some_file") # non-I/O result = do_stuff(input_value) # I/O more_data = get_value_from_db(result) # non-I/O final_result = do_more_stuff(result, more_data) # I/O print(final_result)
Have you ever written a paper and kept saving it towards the end in new files, just so you could still keep previous versions? Something like
It's the same principle. If you create a new variable, instead of assigning to the old one, then you keep a trace of your results.
If you keep writing to the same variable, it's harder to predict what its value is at a certain time. You have to go look through all the places where its being assigned to and see which is the last one.
This is even more important, if several functions read from and write to the same variable. Doubly so, if a function expects a certain value of a certain type.
def do_stuff(): result = first_step() result = second_step(result) result = third_step(result) return result
def do_stuff(): first_result = first_step() second_result = second_step(first_result) final_result = third_step(second_result) return final_result
if, always cover the
elsecase as well
If you don't cover the
else, you've just created a black hole for your data. The value enteres your
if and, if it matches the
else, it gets lost and you can end up with None somewhere down the line.
When you cover both cases, you are guaranteed a value comes out of the
if clause and your data keeps flowing.
def do_stuff(value): if matches_condition(value): return process_value(value) # if it didn't match the condition, our data went into a black hole
def do_stuff(value): if matches_condition(value): return process_value(value) else: return default_value
Stated another way, this would be:
A function that only works with values it has received as arguments is easier to test. You know its behavior is only determined by the arguments and what it does with them.
The fact that the function returns rather than assigning also makes it easier to test, because we just check its return value.
This also makes your entire codebase easier to reason about. If you're looking at a variable, you know that only code from the function where it's defined can change it.
some_value = 5 def main(): do_stuff() print(some_value) def do_stuff(): some_value = process_value(some_value)
some_value = 5 def main(): result = do_stuff(some_value) print(result) def do_stuff(value): return process_value(value)
Some types in Python are mutable: list and dict. Others are immutable: tuple and string. This principle suggests that you treat all these types as if they're immutable.
Let's take a list as an example. This principle suggests that you always create a new list by starting from a copy of the old one. This guarantees that only one piece of your code is modifying the list.
Imagine the opposite scenario: two functions are working with the same list. Again, it's hard to reason about the contents of the list at a certain time. Also, any of the functions can make a wrong assumption about the contents of the list and cause an error, or, worse, return the wrong result.
def main(): items = list(range(1, 10)) new_items = do_stuff(items) def do_stuff(a_list): for index, value in enumerate(a_list): a_list[index] = value * 2 return a_list
def main(): items = list(range(1, 10)) new_items = do_stuff(items) def do_stuff(a_list): new_list =  for index, value in enumerate(a_list): new_list[index] = value * 2 return new_list
The first piece of code changes the original list that was passed to
do_stuff. If other part of the code relies on the contents of the list to remain unchanged, then that assumption was broken.