Flow Python

18 Octombrie 2022

These are some guidelines, not rules, that make Python more like a dataflow language.

Since they are guidelines, you should use them when they make sense. Don't force them on yourself or on others.

Do I/O only in your main function

Some examples of I/O, so we're clear what I'm talking about:

Basically, whatever your program does to interact with the world outside of it.

In Python, you can do I/O wherever you want in your code. It can be in any function deep in the callstack.

What's wrong with doing it this way? Functions that do I/O are hard to reason about, because their behavior depends on what comes from "outside". They're also harder to test, because you have to mock that outside thing the function is reading from.

If you only do I/O in main, it means that your other functions can just take arguments, do stuff with them and return a value.

So, your main function would look something like this:

def main():
    # I/O
    input_value = read_file("some_file")

    # non-I/O
    result = do_stuff(input_value)

    # I/O
    print(result)

However, sometimes the logic of your program consists of several steps interspered with I/O operations. In that case, do the non-I/O steps and return the result to main. Then, do the required I/O operation. Then do more work with the previous value and the one you just got. It could look like this:

def main():
    # I/O
    inputValue = read_file("some_file")

    # non-I/O
    result = do_stuff(input_value)

    # I/O
    more_data = get_value_from_db(result)

    # non-I/O
    final_result = do_more_stuff(result, more_data)

    # I/O
    print(final_result)

If you're developing a web app, you can replace main with "your controller".

Avoid reassigning to a variable

Have you ever written a paper and kept saving it towards the end in new files, just so you could still keep previous versions? Something like final.doc, final_v2.doc, final_finally.doc.

It's the same principle. If you create a new variable, instead of assigning to the old one, then you keep a trace of your results.

If you keep writing to the same variable, it's harder to predict what its value is at a certain time. You have to go look through all the places where its being assigned to and see which is the last one.

This is even more important, if several functions read from and write to the same variable. Doubly so, if a function expects a certain value of a certain type.

Code

NO

def do_stuff():
  result = first_step()
  result = second_step(result)
  result = third_step(result)
  return result

YES

def do_stuff():
  first_result = first_step()
  second_result = second_step(first_result)
  final_result = third_step(second_result)
  return final_result

When using if, always cover the else case as well

If you don't cover the else, you've just created a black hole for your data. If the value doesn't match your if condition, your code doesn't provide a substitute for it. That means you can end up with None somewhere down the line.

When you cover both cases, you are guaranteed a value comes out of the if clause and data keeps flowing.

Code

NO

def do_stuff(value):
  if matches_condition(value):
    return process_value(value)
  # if it didn't match the condition, our data went into a black hole

YES

def do_stuff(value):
  if matches_condition(value):
    return process_value(value)
  else:
    return default_value

Don't allow your functions to access variables outside of them

Stated another way, this would be:

A function that only works with values it has received as arguments is easier to test. You know its behavior is only determined by the arguments and what it does with them.

The fact that the function returns rather than assigning also makes it easier to test, because we just check its return value.

This also makes your entire codebase easier to reason about. If you're looking at a variable, you know that only code from the function where it's defined can change it.

Code

NO

some_value = 5

def main():
  do_stuff()
  print(some_value)

def do_stuff():
  some_value = process_value(some_value)

YES

some_value = 5

def main():
  result = do_stuff(some_value)
  print(result)

def do_stuff(value):
  return process_value(value)

Create a new value rather than changing the old one

Some types in Python are mutable: list and dict. Others are immutable: tuple and string. This principle suggests that you treat all these types as if they're immutable.

Let's take a list as an example. This principle suggests that you always create a new list by starting from a copy of the old one. This guarantees that only one piece of your code is modifying the list.

Imagine the opposite scenario: two functions are working with the same list. Again, it's hard to reason about the contents of the list at a certain time. Also, any of the functions can make a wrong assumption about the contents of the list and cause an error, or, worse, return the wrong result.

Code

NO

def main():
  items = list(range(1, 10))
  new_items = do_stuff(items)


def do_stuff(a_list):
  for index, value in enumerate(a_list):
    a_list[index] = value * 2
  return a_list

YES

def main():
  items = list(range(1, 10))
  new_items = do_stuff(items)


def do_stuff(a_list):
  new_list = []
  for index, value in enumerate(a_list):
    new_list[index] = value * 2
  return new_list

The first piece of code changes the original list that was passed to do_stuff. If other part of the code relies on the contents of the list to remain unchanged, then that assumption was broken.