OpenFisca calculation are all vectorial. That means they operate on arrays rather than single (“scalar”) values.
The practical benefit is that computations are almost as expensive for one entity as they are for hundred thousands. This is how datasets can be analysed and how reforms can be modelled accurately. However, to support this feature, you will need to apply some constraints on how you write formulas.
Formulas always return vectors
Each formula computation in OpenFisca must return a vector.
For instance, for a simulation containing 3 persons whose ages are 41, 42 and 45, executing the following formula:
def formula(persons, period, parameters): age = persons('age', period) print(age) … # do some computation and return a value
array([41, 42, 45]).
This formula code will work the same if there is one Person or three or three million in the modelled situation. Formulas always receive as their first parameter an array of the entity on which they operate (e.g. n Person, Household…) and they should return an array of the same length.
Most of the time, formulas will refer to other variables and NumPy will do the appropriate computation without you even noticing:
def formula(persons, period, parameters): tax_rebate = parameters(period).tax_rebate # let's say this is 500 eligibility_multiplier = persons('eligibility_multiplier', period) # and this is [2, 0, 1]: there are three Persons return eligibility_multiplier * tax_rebate # this is [1000, 0, 500]. We've returned a vector, yay!
What happens if you don't return a vector
As programmers, we more often work with scalars than vectors. We thus have a tendency to write straightforward code that returns a scalar rather than a unidimensional vector (in other words, an array of length 1), and get stuck when wanting to loop over it:
# THIS IS NOT A VALID OPENFISCA FORMULA def formula(persons, period, parameters): tax_rebate = parameters(period).tax_rebate # let's say this is worth 500 rebate_threshold = tax_rebate * persons.eligibility_multiplier # so this is 1000; see how we've accidentally left out other Persons? return rebate_threshold # and this returns 1000. But it's not a vector!
OpenFisca will help you notice this mistake by raising an error:
The formula 'tax_rebate@2018' should return a NumPy array; instead it returned '1000.0' of type 'float'.
In a similar fashion, if you expect a formula to return a boolean and forget that you will actually get an array of boolean values (one for each entity in the situation), you will receive the following safeguard error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
The rest of this page gives practical replacements for situations in which you get such errors.
Some usual control structures such as
switch, and native Python logical operators such as
not do not work with vectors. Semantically however, they all have alternatives, and the only change is in syntax.
Let's say you want to write that logically reads as:
# THIS IS NOT A VALID OPENFISCA FORMULA def formula(person, period): salary = person('salary', period) if salary < 1000: return 200 else: return 0
This code does not work: it makes the assumption that there is always one single person, and that its salary is provided as a number, while
salary is actually a vector of salaries that could be of any length.
In such a case, apply the comparison to the vector of salaries, which will create a vector of booleans, and then multiply it:
def formula(persons, period): condition_salary = persons('salary', period) < 1000 return condition_salary * 200
What happens is that for every Person in
True (equivalent to
1 in logical algebra), the returned value will be
200. And if
False (equivalent to
0), the returned value will be
Let's now write a formula that returns
200 if the Person’s salary is lower than
The NumPy function
where offers a simple syntax to handle these cases.
def formula(persons, period): condition_salary = persons('salary', period) < 1000 return where(condition_salary, 200, 100)
where takes 3 arguments: a vector of boolean values (the “condition”), the value to set for this element in the vector if the condition is met, and the value to set otherwise.
where function is provided directly by NumPy. There are many other NumPy functions provided that can be useful.
Let's consider a more complex case, where we want to attribute to a person:
200if their salary is less than
100if their salary is strictly more than
500, but less than
50if their salary is strictly more than
1000, but less than
We can use the NumPy function
select to implement this behaviour:
def formula(person, period): salary = person('salary', period) return select( [salary <= 500, salary <= 1000, salary <= 1500, salary > 1500], [200, 100, 50, 0], )
If the first condition is met, the first value will be assigned, without considering the other conditions. For instance, if
salary = 100,
salary <= 500 is true and therefore
200 will be assigned. It doesn't matter that
salary <= 1000 is also true.
If the first condition is not met, then only the second condition will be considered, and so on. If no condition is met,
0 will be assigned.
If no NumPy function helps you express a very specific condition, you can code arbitrary conditions using
* instead of
+ instead of
For instance, let's consider that a person will be granted
200 if either:
- they are more than 25 and make less than
- or they are disabled.
def formula(person, period): condition_age = person('age') >= 25 condition_salary = person('salary', period) < 1000 condition_handicap = person('handicap') condition = condition_age * condition_salary + condition_handicap return condition * 200
Basic arithmetic operations such as
* behave the same way on vectors than on numbers, you can thus use them in OpenFisca formulas. However, some operations must be adapted.
|Scalar (won't work)||Vectorial alternative|
|Scalar (won't work)||Vectorial alternative|
+ operator, as well as formatted
%s strings for concatenation should be replaced by a call to