To calculate some legislation variables on people’s situations, you need to create and run a new *Simulation*. Whether situations are described with test cases or data, OpenFisca looks for two kinds of inputs:

- how persons are dispatched in other entities,
- what variables’ values are already known.

Test cases can be expressed in Python, or in JSON when using the Web API (see the specific section of the documentation).

A test case describes persons and other entities with their variables values.It’s the usual solution to define a small number of situations.

Here is an example of test case (in Python):

```
BASIC_TEST_CASE = {
'persons': {'Ari': {}, 'Paul': {}, 'Leila': {}, 'Javier': {}},
'households': {
'h1': {'children': ['Leila'], 'parents': ['Ari', 'Paul']},
'h2': {'parents': ['Javier']}
},
}
```

This test case defines 4 persons, `Ari`

, `Paul`

, `Leila`

and `Javier`

.
They belong to 2 households named `h1`

and `h2`

.
For example, `Ari`

and `Paul`

are parents in `h1`

and have one child, `Leila`

.

You may add information at the *individual* level or at the *group entity* level:

- known variable values,
- and definition periods for those variable values.

Let’s say that we want to add a salary to `Ari`

and a `rent`

to `h1`

.
Here is the updated test case:

```
TEST_CASE = {
'persons': {
'Ari': {
'salary': {'2011-01': 1000}
},
'Paul': {},
'Leila': {},
'Javier': {}
},
'households': {
'h1': {
'children': ['Leila'],
'parents': ['Ari', 'Paul'],
'rent': {'2011-01': 300}
},
'h2': {'parents': ['Javier']}
},
}
```

Where `salary`

and `rent`

names come from the salary and rent variables of the `OpenFisca-Country-Template`

. In this model:

`salary`

is a Person entity variable defined on a monthly basis,`rent`

is a Household entity variable defined on a monthly basis as well.

It’s a Python dictionary object that we will use to build a *Simulation*.

Let’s assume that you want to calculate households’ `housing_allowance`

for the same period.
You have to follow these steps:

- Load a tax and benefits system like OpenFisca-Country-Template.
- Initialise a
*SimulationBuilder*. - Create a
*Simulation*using, for example, the`build_from_entities(...)`

function. - Calculate the housing_allowance and print its value for every test case household.

Which gives:

```
# -*- coding: utf-8 -*-
from openfisca_core.simulation_builder import SimulationBuilder
from openfisca_country_template import CountryTaxBenefitSystem
TEST_CASE = {
# ... whole test case ...
}
tax_benefit_system = CountryTaxBenefitSystem()
simulation_builder = SimulationBuilder()
simulation = simulation_builder.build_from_entities(tax_benefit_system, TEST_CASE)
housing_allowance = simulation.calculate('housing_allowance', '2011-01')
print("households", simulation.household.ids)
print("housing_allowance", housing_allowance)
```

You can build a *Simulation* on multiple data formats.
Any well structured tabular input shoud be fine as long as you are able to iterate over its items in Python.

In the following example, will use the pandas library to do so.

Data sets describe multiple people situations. It could define a whole population. This data could come from a survey with aggregated data, data files extracted from a database, etc.

Here is a survey example. It typically goes from 50 000 households to 500 000.

Here is a minimal example of data (in CSV format):

```
person_id,household_id,person_salary,age
1,a,2694,40
2,a,2720,43
3,b,3865,45
4,b,1300,23
5,c,0,12
6,c,0,14
7,c,2884,44
12,e,1200,38
8,d,3386,27
9,d,2929,28
10,e,0,10
```

As for the *test case* content, you will need the following information:

unique indentifiers for persons and

*group entities*like

`person_id`

and`household_id`

columns information in the CSV exampleif you have multiple entities types (persons, households, ...), you need to know how your persons list is dispatched over your

*group entities*in CSV example, every

`person_id`

is associated with a`household_id`

on the same linethe name of the corresponding variable in your model for every set of values

`person_salary`

values become salary values in`OpenFisca-Country-Template`

modelthe period and entity for every set of values

`person_salary`

and`age`

belong to*Person*entitythe reference period isn’t in the CSV file but it might, for example, come from the CSV creation time and be identical for the whole data set.

Let’s say you are using the country-template, which describes the legislation of a yet to be country.

Let’s also say you have the following `data.csv`

and that you want to calculate income_tax for all persons:

```
person_id,person_salary,person_age
1,2694,40
2,2720,43
3,1865,45
4,1941,23
5,2393,31
6,3008,47
7,2286,23
8,3386,28
9,2929,38
10,3981,37
11,3643,38
12,2078,23
```

- Install the required libraries, by running in your console:

```
$ python --version # Python 3.7.0 or greater should be installed on your computer
$ pip install --upgrade pip openfisca_country_template pandas ipython
$ ipython
```

- Load the
`country-template`

legislation and, then, the content of the`data.csv`

file with the pandas library:

```
In [1]: from openfisca_country_template import CountryTaxBenefitSystem
In [2]: import pandas as pandas
In [3]: tax_benefit_system = CountryTaxBenefitSystem()
In [4]: data = pandas.read_csv('./data.csv') # pandas.DataFrame object
In [5]: length = len(data) # ignores CSV header
```

You can now access the `person_salary`

column values:

```
In [6]: data.person_salary
Out[6]:
0 2694
1 2720
2 1865
3 1941
4 2393
5 3008
6 2286
7 3386
8 2929
9 3981
10 3643
11 2078
Name: person_salary, dtype: int64
```

- Build a simulation according to your data’s length:

```
In [7]: from openfisca_core.simulation_builder import SimulationBuilder
In [8]: simulation = SimulationBuilder().build_default_simulation(tax_benefit_system, length)
```

- Configure the simulation and calculate the income_tax variable for all persons on the same period:

```
In [9]: import numpy as numpy
In [10]: period = '2018-01'
# match data from the 'person_salary' column
# with the 'salary' variable of our yet to be country's tax-benefit system
In [11]: simulation.set_input('salary', period, numpy.array(data.person_salary))
In [12]: income_tax = simulation.calculate('income_tax', period)
```

- You are all set! The
`income_tax`

has been calculated for each person on your`data.csv`

file.

Persons’ order is kept:

```
In [13]: data.person_id.values
Out[13]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
```

And `income_tax`

is an instance of `numpy.ndarray`

:

```
In [14]: income_tax
Out[14]:
array([404.1 , 408.00003, 279.75 , 291.15002, 358.95 , 451.2 ,
342.90002, 507.90002, 439.35 , 597.15 , 546.45 , 311.7 ],
dtype=float32)
In [15]: income_tax.item(2) # person_id == 3
Out[15]: 279.75
```

So far we’ve seen two ways of populating a Simulation object with data:

- either describe a small population with fine control over input variables and over the relationship between individuals and group entities;
- or provide inputs in bulk, typically using tabular data (CSV, Excel, etc.)

A third possibility exists: you can set up a small-scale situation, as in test cases; and use it to generate a number of “copies” of this situation, in which one or more variables of your choice take on a range of values.

We do this by adding an “axes” entry to a test case:

```
WITH_AXES = {
'persons': {'Ari': {}, 'Paul': {}, 'Leila': {}, 'Javier': {}},
'households': {
'h1': {'children': ['Leila'], 'parents': ['Ari', 'Paul']},
'h2': {'parents': ['Javier']}
},
'axes': [[{'count':10, 'name':'salary', 'min':0, 'max':3000, 'period':'2018-11'}]]
}
simulation_builder = SimulationBuilder()
simulation = simulation_builder.build_from_entities(tax_benefit_system, WITH_AXES)
```

Be careful to note the structure of the “axes” field: an **array of arrays** of axis objects. We will come back to this, but for now let’s investigate the effects of adding this single axis.

As before, `BASIC_TEST_CASE`

describes one household with two parents and one child, and a second household which is in fact a single adult person. However, the specification of an axis results in something quite different now:

```
>>> len(simulation.calculate('salary', '2018-11'))
40
```

We started with a “prototype” situation containing 4 individuals, and we specified an axis which replicates this situation 10 times. And so, as expected, we end up with a simulation containing 4 times 10 individuals.

What happened with respect to the data? It’s easier to represent if we first “reshape” the computed data to reflect this structure of 10 groups of 4 individuals:

```
>>> numpy.reshape(simulation.calculate('salary', '2018-11'),(10,4))
array([[ 0. , 0. , 0. , 0. ],
[ 333.33334, 0. , 0. , 0. ],
[ 666.6667 , 0. , 0. , 0. ],
[1000. , 0. , 0. , 0. ],
[1333.3334 , 0. , 0. , 0. ],
[1666.6666 , 0. , 0. , 0. ],
[2000. , 0. , 0. , 0. ],
[2333.3333 , 0. , 0. , 0. ],
[2666.6667 , 0. , 0. , 0. ],
[3000. , 0. , 0. , 0. ]], dtype=float32)
```

We can see that, for the requested period, the variable `salary`

**of the first individual in each group** is varying in increments from 0 to 3000.

The control provided by an axis is fine-grained and targets one individual. If you wanted to set Javier’s salary instead of Ari’s, you could do so by providing the *index* of Javier in the original situation; since our indices are 0-based, this is 3:

```
WITH_AXES = {
'persons': {'Ari': {}, 'Paul': {}, 'Leila': {}, 'Javier': {}},
'households': {
'h1': {'children': ['Leila'], 'parents': ['Ari', 'Paul']},
'h2': {'parents': ['Javier']}
},
'axes': [[{'count':10, 'index': 3, 'name':'salary', 'min':0, 'max':3000, 'period':'2018-11'}]]
}
```

And the result is now:

```
>>> numpy.reshape(simulation.calculate('salary', '2018-11'),(10,4))
array([[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 333.33334],
[ 0. , 0. , 0. , 666.6667 ],
[ 0. , 0. , 0. , 1000. ],
[ 0. , 0. , 0. , 1333.3334 ],
[ 0. , 0. , 0. , 1666.6666 ],
[ 0. , 0. , 0. , 2000. ],
[ 0. , 0. , 0. , 2333.3333 ],
[ 0. , 0. , 0. , 2666.6667 ],
[ 0. , 0. , 0. , 3000. ]], dtype=float32)
```

Axes are particularly interesting when you want to chart how one variable relates to another, as in this tutorial notebook.

We noted above that the “axes” are in fact an array of arrays, which allows us to use several axes at once : parallel **or** perpendicular axes.

Sets of axes in the inner array are “parallel”. They allow additional variables to be generated in increments. For instance (again take careful note of the position of the square brackets):

```
WITH_PARALLEL_AXES = {
'persons': {'Ari': {}, 'Paul': {}, 'Leila': {}, 'Javier': {}},
'households': {
'h1': {'children': ['Leila'], 'parents': ['Ari', 'Paul']},
'h2': {'parents': ['Javier']}
},
'axes': [[
{'count':10, 'name':'age', 'min':18, 'max':78, 'period':'2018-11'},
{'count':10, 'name':'salary', 'min':0, 'max':3000, 'period':'2018-11'}
]]
}
```

The result should be as follows, with both age and salary changing in lockstep:

```
>>> simulation_builder = SimulationBuilder()
>>> simulation = simulation_builder.build_from_entities(tax_benefit_system, WITH_PARALLEL_AXES)
>>> numpy.reshape(simulation.calculate('age', '2018-11'),(10,4))
array([[18, 0, 0, 0],
[24, 0, 0, 0],
[31, 0, 0, 0],
[38, 0, 0, 0],
[44, 0, 0, 0],
[51, 0, 0, 0],
[58, 0, 0, 0],
[64, 0, 0, 0],
[71, 0, 0, 0],
[78, 0, 0, 0]], dtype=int32)
>>> numpy.reshape(simulation.calculate('salary', '2018-11'),(10,4))
array([[ 0. , 0. , 0. , 0. ],
[ 333.33334, 0. , 0. , 0. ],
[ 666.6667 , 0. , 0. , 0. ],
[1000. , 0. , 0. , 0. ],
[1333.3334 , 0. , 0. , 0. ],
[1666.6666 , 0. , 0. , 0. ],
[2000. , 0. , 0. , 0. ],
[2333.3333 , 0. , 0. , 0. ],
[2666.6667 , 0. , 0. , 0. ],
[3000. , 0. , 0. , 0. ]], dtype=float32)
```

For this to work, the `count`

values of parallel axes must be the same. An error will be raised if they are different.

Sets of axes in the outer array are “perpendicular”, that is, they result in independent variation. For instance, we might have:

```
WITH_PERPENDICULAR_AXES = {
'persons': {'Ari': {}, 'Paul': {}, 'Leila': {}, 'Javier': {}},
'households': {
'h1': {'children': ['Leila'], 'parents': ['Ari', 'Paul']},
'h2': {'parents': ['Javier']}
},
'axes': [
[{'count':4, 'name':'age', 'min':18, 'max':78, 'period':'2018-11'}],
[{'count':4, 'name':'salary', 'min':0, 'max':3000, 'period':'2018-11'}]
]
}
```

Note the difference in nesting: we no longer have an inner set of two axes, but two sets of one axis each. The result is more complex:

```
>>> simulation_builder = SimulationBuilder()
>>> simulation = simulation_builder.build_from_entities(tax_benefit_system, WITH_PERPENDICULAR_AXES)
>>> len(simulation.calculate('salary', '2018-11'))
64
```

Why? Because we are varying `age`

and `salary`

independently, and each axis results in multiplying by 4 our original population of 4, resulting in 4 times 4 times 4 individuals, which is indeed 64.

What does the data look like?

```
>>> numpy.reshape(simulation.calculate('salary', '2018-11'),(16,4))
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[1000., 0., 0., 0.],
[1000., 0., 0., 0.],
[1000., 0., 0., 0.],
[1000., 0., 0., 0.],
[2000., 0., 0., 0.],
[2000., 0., 0., 0.],
[2000., 0., 0., 0.],
[2000., 0., 0., 0.],
[3000., 0., 0., 0.],
[3000., 0., 0., 0.],
[3000., 0., 0., 0.],
[3000., 0., 0., 0.]], dtype=float32)
>>> numpy.reshape(simulation.calculate('age', '2018-11'),(16,4))
array([[18, 0, 0, 0],
[38, 0, 0, 0],
[58, 0, 0, 0],
[78, 0, 0, 0],
[18, 0, 0, 0],
[38, 0, 0, 0],
[58, 0, 0, 0],
[78, 0, 0, 0],
[18, 0, 0, 0],
[38, 0, 0, 0],
[58, 0, 0, 0],
[78, 0, 0, 0],
[18, 0, 0, 0],
[38, 0, 0, 0],
[58, 0, 0, 0],
[78, 0, 0, 0]], dtype=int32)
```

We can see that both the `age`

and the `salary`

variables vary by increments, so that **all combinations are present**: for each `age`

increment, there exists a variant with each `salary`

increment.

This allows us to use OpenFisca for `multivariate observation`

: charting how two variables interact to control a third, as in this example.