18 Octombrie 2022
Update
I published this article on Reddit and it received a lot of pushback. I spent some time thinking about the reason and I think it's because of FP zealotry. There are developers who discover functional programming and treat it as The One True Way. And they treat everyone who doesn't see the light as ignorant and make them feel bad.
So, with that in mind, I want to state more clearly my attitude towards the principles below.
Use these principles as something to aim for. But, if you find situations where you don't know how to apply them, don't sweat it.
For example, rather than getting stuck trying to make all your functions pure, it's better to only convert the ones that are obvious.
Use these principles as tools that help you, not as rules that constrain you.
main
functionSome examples of I/O, so we're clear what I'm talking about:
Basically, whatever your program does to interact with the world outside of it.
In Python, you can do I/O wherever you want in your code. It can be in any function deep in the callstack.
What's wrong with doing it this way? Functions that do I/O are hard to reason about, because their behavior depends on what comes from "outside". They're also harder to test, because you have to mock that outside thing the function is reading from.
If you only do I/O in main, it means that your other functions can just take arguments, do stuff with them and return a value.
So, your main
function would look something like this:
def main():
# I/O
input_value = read_file("some_file")
# non-I/O
result = do_stuff(input_value)
# I/O
print(result)
However, sometimes the logic of your program consists of several steps interspered with I/O operations. In that case, do the non-I/O steps and return the result to main
. Then, do the required I/O operation. Then do more work with the previous value and the one you just got. It could look like this:
def main():
# I/O
inputValue = read_file("some_file")
# non-I/O
result = do_stuff(input_value)
# I/O
more_data = get_value_from_db(result)
# non-I/O
final_result = do_more_stuff(result, more_data)
# I/O
print(final_result)
If you're developing a web app, you can replace main
with "your controller".
Have you ever written a paper and kept saving it towards the end in new files, just so you could still keep previous versions? Something like final.doc
, final_v2.doc
, final_finally.doc
.
It's the same principle. If you create a new variable, instead of assigning to the old one, then you keep a trace of your results.
If you keep writing to the same variable, it's harder to predict what its value is at a certain time. You have to go look through all the places where its being assigned to and see which is the last one.
This is even more important, if several functions read from and write to the same variable. Doubly so, if a function expects a certain value of a certain type.
def do_stuff():
result = first_step()
result = second_step(result)
result = third_step(result)
return result
def do_stuff():
first_result = first_step()
second_result = second_step(first_result)
final_result = third_step(second_result)
return final_result
if
, always cover the else
case as wellIf you don't cover the else
, you've just created a black hole for your data. The value enteres your if
and, if it matches the else
, it gets lost and you can end up with None somewhere down the line.
When you cover both cases, you are guaranteed a value comes out of the if
clause and your data keeps flowing.
def do_stuff(value):
if matches_condition(value):
return process_value(value)
# if it didn't match the condition, our data went into a black hole
def do_stuff(value):
if matches_condition(value):
return process_value(value)
else:
return default_value
Stated another way, this would be:
A function that only works with values it has received as arguments is easier to test. You know its behavior is only determined by the arguments and what it does with them.
The fact that the function returns rather than assigning also makes it easier to test, because we just check its return value.
This also makes your entire codebase easier to reason about. If you're looking at a variable, you know that only code from the function where it's defined can change it.
some_value = 5
def main():
do_stuff()
print(some_value)
def do_stuff():
some_value = process_value(some_value)
some_value = 5
def main():
result = do_stuff(some_value)
print(result)
def do_stuff(value):
return process_value(value)
Some types in Python are mutable: list and dict. Others are immutable: tuple and string. This principle suggests that you treat all these types as if they're immutable.
Let's take a list as an example. This principle suggests that you always create a new list by starting from a copy of the old one. This guarantees that only one piece of your code is modifying the list.
Imagine the opposite scenario: two functions are working with the same list. Again, it's hard to reason about the contents of the list at a certain time. Also, any of the functions can make a wrong assumption about the contents of the list and cause an error, or, worse, return the wrong result.
def main():
items = list(range(1, 10))
new_items = do_stuff(items)
def do_stuff(a_list):
for index, value in enumerate(a_list):
a_list[index] = value * 2
return a_list
def main():
items = list(range(1, 10))
new_items = do_stuff(items)
def do_stuff(a_list):
new_list = []
for index, value in enumerate(a_list):
new_list[index] = value * 2
return new_list
The first piece of code changes the original list that was passed to do_stuff
. If other part of the code relies on the contents of the list to remain unchanged, then that assumption was broken.