Functions, Pure Functions, Immutability

November 28, 2017

Today, I’ll be explaining some of the most important concepts in functional programming, pure functions and immutability!

Side Effect’s Gonna Get You All!

You sure have heard the word function in programming before, even if you have only coded in imperative styles. It’s just a block of code with a name, where you can call - maybe with some parameters - in other places, right? Why actually is a function so different in functional programming?

Well, when doing FP, we will only be dealing with pure functions. Take a look at the comparison of an impure function with a pure function below, can you spot the reasons why the function plus_one() is pure or impure?

# An impure plus_one() function example
x = 4
def plus_one():
  global x
  x += 1

plus_one()
print(x) # Prints "5" here
# A pure plus_one() function example
x = 4
def plus_one(x):
  return x + 1

x = plus_one(x)
print(x) # Also prints "5" here

Simply put, the pure plus_one() function here has no access to the outside world, except for the parameters passed into it when it was called. Other than that, it always “returns” a some new data, instead of making changes to the original data you passed in to it. We say that pure function has no side effects. On the contrary, the impure plus_one() function relies on data in the outside scope. It can not only read the x variable in the outside world, but also modify it.

You can tell impure functions are not encouraged in Python - we even have to explicitly declare the outsider variable x as global in order to use inside the function in this case! In other languages like Java or JavaScript, however, it is often much easier to access and mutate data in the outside scope though. But why are side effects or impure functions bad?

Well, impure function depends heavily on the context it is placed in, which makes the function impossible to reuse else where. Since it also has a mutable nature, it’s quite possible that we would obtain different results every time we call it - the context it relies on gets changed every time we call the function. This makes interfacing multiple functions also very problematic. After a while, your code becomes unreadable, even to your future self 3 days later.

In the example, if your x variable is declared 200 lines above while the function definition lies 100 lines after that, it would just be painful trying to tell or predict what this plus_one() function does, looking at the function call itself. If the name is right, sure it adds 1 to something… But what is it? Say we have located the origin of variable x and function definition, but the function may have already been called 30 times above, because we got 35 printed out instead of 5… Sigh, guess we have to keep a note of where and how many times we have called this function then…

In a big project, chances are someone will always be changing some code somewhere, i.e. fixing some bugs, adding more features into the code base. If they unknowingly touched something your impure function relies on, the whole program just breaks down. If you want control of your code, not being controlled by your code, you will see the beauty of pure functions and immutability.

Replace a Pure Function Call with Static Data!

So… A pure function only has access to the data given to it, and spits out a new piece of data, right? If we throw in the same data any time, the pure function should return us the same result too - the function should have no knowledge of anything else, therefore it’s unable to do anything “creative” on its own. Does this mean that we can just replace the function call with the return value of the call, and our program will still work?

Well, that’s right! This characteristic has a fancy name called referential transparency.

In the plus_one() example, we could just write the whole thing like this.

x = 4
def plus_one(x):
  return x + 1

x = 5 # Here! Since plus_one(x) == 5, we can do this replacement
print(x) # Prints "5"

Think about it! When dealing with math, doesn’t every function yield the same result every time with the same input? If not, who can say there’s a concrete answer to a problem? Similarly, if we write a pure functional program, the thing should work 100% the same under same circumstances. This means FP code is far easier to test and be reasoned with.

Pass by Reference Is Safe under Pure Functions

I was always confused by “pass by reference” and “pass by value” when I first started programming. In general, the latter is far more resource hungry than the former. So in most languages, data objects are often passed by reference, which means that the variable you assigned some data to acts like a pointer, not the data itself. If you assign another variable to the same piece of data, operations on the first variable also affects the second one because they basically point to the same thing.

# Python passes by reference
one_list = [1, 2, 3]
another_list = one_list
another_list.append(4)
print(one_list) # What??? I didn't mean to modify the original list!

Ha, that’s some confusing side effects, ain’t it? Unfortunately, in non-functional or hybrid languages like Python, you are very likely to encounter side effects using built-in functions. It makes sense though, since append() is actually a method of the object another_list points to.

For JavaScript fans, Facebook has an awesome library out there called ”Immutable.js” to help you avoid mutating data. You won’t notice a performance issue, because the library utilizes a smart way of structural sharing, storing only a delta every time you create a new object based on an existing one. It’s very complicated under the hood, but that’s a black box we don’t care about. Perhaps I’ll write something about using Immutable.js in the future!

Also, if you are doing hardcore functional programming in a pure functional language like Haskell though, it would be impossible to create side effects in your code - you will always return a new piece of data in your functions.

A workaround in Python would be something like below.

# Workaround for non-functional list.append function
one_list = [1, 2, 3]
# in python, the "+" operation on two lists is apparently "pure"
another_list = one_list + [4]
print(one_list)
print(another_list) # Now we are satisfied!

Always remember - mutating any data is not cool!

Limitation of Pure Functions

You might have wondered, what about a random number generator? It returns a different number every time! What about the functional version of print() function? It prints stuff on my screen, that’s not an abstract, non-physical return value!

You are right. If everything is a pure function, we just can’t do anything useful. Therefore, yes, some side effects are wanted.

The idea of functional programming is not to make everything pure. It is a style of coding, a means to an end. We could make our code 99% pure functional. If 99% of our code is mathematically sound, when the program encounters an error, we can be certain that it happens in the 1% that has useful side effects.

In fact, there is another super important concept in functional programming called Monad, which can be used to specifically deal with impure stuff we do in our program. With the concept of monad, we could kind of wrap around an impure activity to make it look like a pure, static and stable thing we normally deal with in functions. However, Monad is not only useful when dealing with side effects. It deserves its own post, so I will be talking about it in the future.