18 October 2022
These are some guidelines, not rules, that make Python more like a dataflow language.
Since they are guidelines, you should use them when they make sense. Don't force them on yourself or on others.
main
functionSome examples of I/O, so we're clear what I'm talking about:
Basically, whatever your program does to interact with the world outside of it.
In Python, you can do I/O wherever you want in your code. It can be in any function deep in the callstack.
What's wrong with doing it this way? Functions that do I/O are hard to reason about, because their behavior depends on what comes from "outside". They're also harder to test, because you have to mock that outside thing the function is reading from.
If you only do I/O in main, it means that your other functions can just take arguments, do stuff with them and return a value.
So, your main
function would look something like this:
def main():
# I/O
input_value = read_file("some_file")
# non-I/O
result = do_stuff(input_value)
# I/O
print(result)
However, sometimes the logic of your program consists of several steps
interspered with I/O operations. In that case, do the non-I/O steps and return
the result to main
. Then, do the required I/O operation. Then do more work
with the previous value and the one you just got. It could look like this:
def main():
# I/O
inputValue = read_file("some_file")
# non-I/O
result = do_stuff(input_value)
# I/O
more_data = get_value_from_db(result)
# non-I/O
final_result = do_more_stuff(result, more_data)
# I/O
print(final_result)
If you're developing a web app, you can replace main
with "your controller".
Have you ever written a paper and kept saving it towards the end in new files,
just so you could still keep previous versions? Something like final.doc
,
final_v2.doc
, final_finally.doc
.
It's the same principle. If you create a new variable, instead of assigning to the old one, then you keep a trace of your results.
If you keep writing to the same variable, it's harder to predict what its value is at a certain time. You have to go look through all the places where its being assigned to and see which is the last one.
This is even more important, if several functions read from and write to the same variable. Doubly so, if a function expects a certain value of a certain type.
def do_stuff():
result = first_step()
result = second_step(result)
result = third_step(result)
return result
def do_stuff():
first_result = first_step()
second_result = second_step(first_result)
final_result = third_step(second_result)
return final_result
if
, always cover the else
case as wellIf you don't cover the else
, you've just created a black hole for your data.
If the value doesn't match your if
condition, your code doesn't provide a
substitute for it. That means you can end up with None
somewhere down the
line.
When you cover both cases, you are guaranteed a value comes out of the if
clause and data keeps flowing.
def do_stuff(value):
if matches_condition(value):
return process_value(value)
# if it didn't match the condition, our data went into a black hole
def do_stuff(value):
if matches_condition(value):
return process_value(value)
else:
return default_value
Stated another way, this would be:
A function that only works with values it has received as arguments is easier to test. You know its behavior is only determined by the arguments and what it does with them.
The fact that the function returns rather than assigning also makes it easier to test, because we just check its return value.
This also makes your entire codebase easier to reason about. If you're looking at a variable, you know that only code from the function where it's defined can change it.
some_value = 5
def main():
do_stuff()
print(some_value)
def do_stuff():
some_value = process_value(some_value)
some_value = 5
def main():
result = do_stuff(some_value)
print(result)
def do_stuff(value):
return process_value(value)
Some types in Python are mutable: list and dict. Others are immutable: tuple and string. This principle suggests that you treat all these types as if they're immutable.
Let's take a list as an example. This principle suggests that you always create a new list by starting from a copy of the old one. This guarantees that only one piece of your code is modifying the list.
Imagine the opposite scenario: two functions are working with the same list. Again, it's hard to reason about the contents of the list at a certain time. Also, any of the functions can make a wrong assumption about the contents of the list and cause an error, or, worse, return the wrong result.
def main():
items = list(range(1, 10))
new_items = do_stuff(items)
def do_stuff(a_list):
for index, value in enumerate(a_list):
a_list[index] = value * 2
return a_list
def main():
items = list(range(1, 10))
new_items = do_stuff(items)
def do_stuff(a_list):
new_list = []
for index, value in enumerate(a_list):
new_list[index] = value * 2
return new_list
The first piece of code changes the original list that was passed to do_stuff
.
If other part of the code relies on the contents of the list to remain
unchanged, then that assumption was broken.