Vectorial computing

OpenFisca calculation are all vectorial. That means they operate on arrays rather than single (“scalar”) values.

The practical benefit is that computations are almost as expensive for one entity as they are for hundred thousands. This is how datasets can be analysed and how reforms can be modelled accurately. However, to support this feature, you will need to apply some constraints on how you write formulas.

Formulas always return vectors

Each formula computation in OpenFisca must return a vector.

For instance, for a simulation containing 3 persons whose ages are 41, 42 and 45, executing the following formula:

def formula(persons, period, parameters):
    age = persons('age', period)
    print(age)
    # ... do some computation and return a value

will print array([41, 42, 45]).

This formula code will work the same if there is one Person or three or three million in the modelled situation. Formulas always receive as their first parameter an array of the entity on which they operate (e.g. n Person, Household…) and they should return an array of the same length.

Most of the time, formulas will refer to other variables and NumPy will do the appropriate computation without you even noticing:

def formula(persons, period, parameters):
    tax_rebate = parameters(period).tax_rebate  # let's say this is 500
    eligibility_multiplier = persons('eligibility_multiplier', period)  # and this is [2, 0, 1]: there are three Persons
    return eligibility_multiplier * tax_rebate  # this is [1000, 0, 500]. We've returned a vector, yay!

What happens if you don’t return a vector

As programmers, we more often work with scalars than vectors. We thus have a tendency to write straightforward code that returns a scalar rather than a unidimensional vector (in other words, an array of length 1), and get stuck when wanting to loop over it:

# THIS IS NOT A VALID OPENFISCA FORMULA
def formula(persons, period, parameters):
    tax_rebate = parameters(period).tax_rebate  # let's say this is worth 500
    rebate_threshold = tax_rebate * persons[0].eligibility_multiplier  # so this is 1000; see how we've accidentally left out other Persons?
    return rebate_threshold  # and this returns 1000. But it's not a vector!

OpenFisca will help you notice this mistake by raising an error:

The formula ‘tax_rebate@2018’ should return a NumPy array; instead it returned ‘1000.0’ of type ‘float’.

In a similar fashion, if you expect a formula to return a boolean and forget that you will actually get an array of boolean values (one for each entity in the situation), you will receive the following safeguard error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

The rest of this page gives practical replacements for situations in which you get such errors.

Control structures

Some usual control structures such as if...else, switch, and native Python logical operators such as or and not do not work with vectors. Semantically however, they all have alternatives, and the only change is in syntax.

if / else

Let’s say you want to write that logically reads as:

# THIS IS NOT A VALID OPENFISCA FORMULA
def formula(person, period):
    salary = person('salary', period)
    if salary < 1000:
        return 200
    else:
        return 0

This code does not work: it makes the assumption that there is always one single person, and that its salary is provided as a number, while salary is actually a vector of salaries that could be of any length.

In such a case, apply the comparison to the vector of salaries, which will create a vector of booleans, and then multiply it:

def formula(persons, period):
    condition_salary = persons('salary', period) < 1000
    return condition_salary * 200

What happens is that for every Person in persons, if condition_salary is True (equivalent to 1 in logical algebra), the returned value will be 200. And if condition_salary is False (equivalent to 0), the returned value will be 0.

Ternaries

Let’s now write a formula that returns 200 if the Person’s salary is lower than 1000, and 100 otherwise.

The NumPy function where offers a simple syntax to handle these cases.

def formula(persons, period):
    condition_salary = persons('salary', period) < 1000
    return where(condition_salary, 200, 100)

where takes 3 arguments: a vector of boolean values (the “condition”), the value to set for this element in the vector if the condition is met, and the value to set otherwise.

This where function is provided directly by NumPy. There are many other NumPy functions provided that can be useful.

Multiples conditions

Let’s consider a more complex case, where we want to attribute to a person:

  • 200 if their salary is less than 500;

  • 100 if their salary is strictly more than 500, but less than 1000;

  • 50 if their salary is strictly more than 1000, but less than 1500;

  • 0 otherwise.

We can use the NumPy function select to implement this behaviour:

def formula(person, period):
    salary = person('salary', period)
    return select(
        [salary <= 500, salary <= 1000, salary <= 1500, salary > 1500],
        [200, 100, 50, 0],
        )

If the first condition is met, the first value will be assigned, without considering the other conditions. For instance, if salary = 100, salary <= 500 is true and therefore 200 will be assigned. It doesn’t matter that salary <= 1000 is also true.

If the first condition is not met, then only the second condition will be considered, and so on. If no condition is met, 0 will be assigned.

Complex conditions

If no NumPy function helps you express a very specific condition, you can code arbitrary conditions using * instead of and, and + instead of or.

For instance, let’s consider that a person will be granted 200 if either:

  • they are more than 25 and make less than 1000 per month;

  • or they are disabled.

def formula(person, period):
    condition_age = person('age') >= 25
    condition_salary = person('salary', period) < 1000
    condition_handicap = person('handicap')
    condition = condition_age * condition_salary + condition_handicap
    return condition * 200

You should always use NumPy function such as where and select when they are relevant: logical operations using arithmetic operators should be used as last resort as they are not very readable.

Arithmetic operations

Basic arithmetic operations such as + or * behave the same way on vectors than on numbers, you can thus use them in OpenFisca formulas. However, some operations must be adapted.

Scalar (won’t work)

Vectorial alternative

min

min_(x,y)

max

max_(x,y)

round

round_(x,y)

Boolean operations

Scalar (won’t work)

Vectorial alternative

not

not_(x)

and

x * y

or

x + y

String concatenation

The + operator, as well as formatted %s strings for concatenation should be replaced by a call to concat(x, y).