OpenFisca calculation are all vectorial. That means they operate on arrays rather than single (“scalar”) values.
The practical benefit is that computations are almost as expensive for one entity as they are for hundred thousands. This is how datasets can be analysed and how reforms can be modelled accurately. However, to support this feature, you will need to apply some constraints on how you write formulas.
Each formula computation in OpenFisca must return a vector.
For instance, for a simulation containing 3 persons whose ages are 41, 42 and 45, executing the following formula:
def formula(persons, period, parameters):
age = persons('age', period)
print(age)
# ... do some computation and return a value
will print array([41, 42, 45])
.
This formula code will work the same if there is one Person or three or three million in the modelled situation. Formulas always receive as their first parameter an array of the entity on which they operate (e.g. n Person, Household…) and they should return an array of the same length.
Most of the time, formulas will refer to other variables and NumPy will do the appropriate computation without you even noticing:
def formula(persons, period, parameters):
tax_rebate = parameters(period).tax_rebate # let's say this is 500
eligibility_multiplier = persons('eligibility_multiplier', period) # and this is [2, 0, 1]: there are three Persons
return eligibility_multiplier * tax_rebate # this is [1000, 0, 500]. We've returned a vector, yay!
As programmers, we more often work with scalars than vectors. We thus have a tendency to write straightforward code that returns a scalar rather than a unidimensional vector (in other words, an array of length 1), and get stuck when wanting to loop over it:
# THIS IS NOT A VALID OPENFISCA FORMULA
def formula(persons, period, parameters):
tax_rebate = parameters(period).tax_rebate # let's say this is worth 500
rebate_threshold = tax_rebate * persons[0].eligibility_multiplier # so this is 1000; see how we've accidentally left out other Persons?
return rebate_threshold # and this returns 1000. But it's not a vector!
OpenFisca will help you notice this mistake by raising an error:
The formula ‘tax_rebate@2018’ should return a NumPy array; instead it returned ‘1000.0’ of type ‘float’.
In a similar fashion, if you expect a formula to return a boolean and forget that you will actually get an array of boolean values (one for each entity in the situation), you will receive the following safeguard error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
The rest of this page gives practical replacements for situations in which you get such errors.
Some usual control structures such as if...else
, switch
, and native Python logical operators such as or
and not
do not work with vectors. Semantically however, they all have alternatives, and the only change is in syntax.
if
/ else
¶Let’s say you want to write that logically reads as:
# THIS IS NOT A VALID OPENFISCA FORMULA
def formula(person, period):
salary = person('salary', period)
if salary < 1000:
return 200
else:
return 0
This code does not work: it makes the assumption that there is always one single person, and that its salary is provided as a number, while salary
is actually a vector of salaries that could be of any length.
In such a case, apply the comparison to the vector of salaries, which will create a vector of booleans, and then multiply it:
def formula(persons, period):
condition_salary = persons('salary', period) < 1000
return condition_salary * 200
What happens is that for every Person in persons
, if condition_salary
is True
(equivalent to 1
in logical algebra), the returned value will be 200
. And if condition_salary
is False
(equivalent to 0
), the returned value will be 0
.
Let’s now write a formula that returns 200
if the Person’s salary is lower than 1000
, and 100
otherwise.
The NumPy function where
offers a simple syntax to handle these cases.
def formula(persons, period):
condition_salary = persons('salary', period) < 1000
return where(condition_salary, 200, 100)
where
takes 3 arguments: a vector of boolean values (the “condition”), the value to set for this element in the vector if the condition is met, and the value to set otherwise.
This where
function is provided directly by NumPy. There are many other NumPy functions provided that can be useful.
Let’s consider a more complex case, where we want to attribute to a person:
200
if their salary is less than 500
;
100
if their salary is strictly more than 500
, but less than 1000;
50
if their salary is strictly more than 1000
, but less than 1500;
0
otherwise.
We can use the NumPy function select
to implement this behaviour:
def formula(person, period):
salary = person('salary', period)
return select(
[salary <= 500, salary <= 1000, salary <= 1500, salary > 1500],
[200, 100, 50, 0],
)
If the first condition is met, the first value will be assigned, without considering the other conditions. For instance, if salary = 100
, salary <= 500
is true and therefore 200
will be assigned. It doesn’t matter that salary <= 1000
is also true.
If the first condition is not met, then only the second condition will be considered, and so on. If no condition is met, 0
will be assigned.
If no NumPy function helps you express a very specific condition, you can code arbitrary conditions using *
instead of and
, and +
instead of or
.
For instance, let’s consider that a person will be granted 200
if either:
they are more than 25 and make less than 1000
per month;
or they are disabled.
def formula(person, period):
condition_age = person('age') >= 25
condition_salary = person('salary', period) < 1000
condition_handicap = person('handicap')
condition = condition_age * condition_salary + condition_handicap
return condition * 200
You should always use NumPy function such as
where
andselect
when they are relevant: logical operations using arithmetic operators should be used as last resort as they are not very readable.
Basic arithmetic operations such as +
or *
behave the same way on vectors than on numbers, you can thus use them in OpenFisca formulas. However, some operations must be adapted.
Scalar (won’t work) |
Vectorial alternative |
---|---|
|
|
|
|
|
|
Scalar (won’t work) |
Vectorial alternative |
---|---|
|
|
|
|
|
|
The +
operator, as well as formatted %s
strings for concatenation should be replaced by a call to concat(x, y)
.