Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ doc/build/
Speed_test.ipynb
*.prof
.DS_Store
vault_migration/
vault_migration/
AGENTS.md
119 changes: 119 additions & 0 deletions CONTINUOUS_EXPERIENCE_CHANGES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Continuous Experience Refactor (Summary)

This document summarizes the changes applied so far to migrate `soepy` from a
(discrete) full-time/part-time experience state representation to a single
continuous experience stock.

## Model / State Space

- Discrete experience dimensions removed.
- New discrete state vector layout (6 columns):
- `period`
- `educ_level`
- `lagged_choice`
- `type`
- `age_youngest_child`
- `partner`
- Column indices are centralized in `soepy/shared/state_space_indices.py`.

Files:
- `soepy/shared/state_space_indices.py`
- `soepy/solve/create_state_space.py`

## Continuous Experience Stock

- Experience is represented as a stock `x ∈ [0, 1]`.
- Period-specific scaling uses
- `max_exp_years(period) = init_exp_max + max(period, period * pt_increment)`
- Experience accumulation:
- non-employment: `+0`
- part-time: `+pt_increment`
- full-time: `+1`
- Expected vs actual law of motion:
- `is_expected=True`: `pt_increment = gamma_p_bias`
- else: `pt_increment = pt_exp_ratio`

Files:
- `soepy/shared/experience_stock.py`

## Interpolation

- Added a minimal 1D linear interpolation helper for values on a 1D grid.
- Guideline: avoid using `jnp.asarray` inside JAX code; assume JAX arrays.

Files:
- `soepy/shared/interpolation.py`
- `AGENTS.md`

## Continuation Values: Interpolate Then Aggregate

- Implemented required ordering for the EMAX recursion:
1. interpolate continuation values on the experience grid
2. aggregate over child/partner probabilities
- Implementation is intentionally low-dimensional and readable via `vmap`.

Files:
- `soepy/solve/continuous_continuation.py`

## EMAX / Solve Output Shape

- Solver now produces `emaxs` with shape:
- `(n_states, n_grid, n_choices + 1)`
- `n_grid` defaults to `10` via `model_spec.experience_grid_points`.

Files:
- `soepy/solve/solve_python.py`
- `soepy/solve/emaxs.py`

## Wages

- Wage equation is now continuous-experience only:
- single return to experience (reusing `gamma_f` as the slope)
- wage depends on `log(exp_years + 1)`
- expectation bias is handled in experience accumulation (not in wages)

Files:
- `soepy/shared/wages.py`

## Non-employment / Resources

- `non_employment` functions were updated to broadcast correctly when the wage input
is on the experience grid (`(n_states, n_grid)`).

Files:
- `soepy/shared/non_employment.py`

## Simulation (Refactor: Continuous Stock)

- Simulation state now carries `Experience_Stock` instead of PT/FT experience.
- `emaxs` and wage/resources are interpolated from the grid to each agent’s stock.
- Initial experience years are drawn from legacy PT/FT share files by convolution,
then mapped to the stock.
- Initial `lagged_choice` rule:
- `2` if initial experience years `> 1`, else `0`.

Files:
- `soepy/simulate/constants_sim.py`
- `soepy/simulate/simulate_auxiliary.py`
- `soepy/simulate/simulate_python.py`
- `soepy/exogenous_processes/experience.py`
- `soepy/exogenous_processes/determine_lagged_choice.py`

## Tests

Added / updated (continuous-only):
- `soepy/test/test_experience_stock.py`
- `soepy/test/test_interpolation.py`
- `soepy/test/test_continuous_continuation.py`
- `soepy/test/test_full_solve_continuous.py` (5-period full solve vs explicit reference DP)
- `soepy/test/test_child_index.py` (child transition indexer consistency)

Adjusted/skipped because they depended on discrete experience regression targets:
- `soepy/test/test_regression.py`
- `soepy/test/test_single_woman.py`

## Notes / Follow-ups

- CI-level checks are intentionally left to the user (`pytest`, `pre-commit`).
- Legacy regression-vault expectations are not comparable after this refactor; they
need to be regenerated under the continuous model if you want regression testing.
40 changes: 12 additions & 28 deletions soepy/exogenous_processes/children.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,24 @@
import numpy as np
import pandas as pd

from soepy.shared.constants_and_indices import AGE_YOUNGEST_CHILD


def define_child_age_update_rule(model_spec, states):
"""Defines a vector with the length of the number of states that contains the
value the state space component `age_kid` should take depending on whether or not
a child arrives in the period.
The purpose of this object is to facilitate easy child-parent state look-up
in the backward induction."""

# Child arrives is equivalent to new child age = 0
# If no child arrives we need to specify an update rule

# Age stays at -1 if no kids so far
child_age_update_rule = np.full(states.shape[0], -1)
# Age increases by one, if there is a kid
child_age_update_rule[states[:, 6] != -1] = states[states[:, 6] != -1][:, 6] + 1
# Age does not exceed 10. We assume that the moment the youngest child reaches age 10
# individuals behave as if they do not have children
child_age_update_rule[child_age_update_rule > model_spec.child_age_max] = -1
"""Define next-period child age under the no-new-child branch."""

child_age_update_rule = np.full(states.shape[0], -1, dtype=np.int32)

has_kid = states[:, AGE_YOUNGEST_CHILD] != -1
child_age_update_rule[has_kid] = states[has_kid][:, AGE_YOUNGEST_CHILD] + 1

child_age_update_rule[child_age_update_rule > model_spec.child_age_max] = -1
return child_age_update_rule


def gen_prob_child_vector(model_spec):
"""Generates a vector with length `num_periods` which contains
the probability to get a child in the corresponding period."""
"""Generate probability of childbirth for each period and lagged choice."""

# Read data frame with information on probability to get a child
# in every period
exog_child_info_df = pd.read_pickle(model_spec.child_info_file_name)

exog_child_info_df = exog_child_info_df.iloc[
Expand All @@ -46,18 +36,12 @@ def gen_prob_child_vector(model_spec):
0 : min(model_spec.last_child_bearing_period + 1, model_spec.num_periods)
]

# Assert length of array equals num periods
assert (
len(prob_child) == model_spec.num_periods
), "Probability of childbirth and number of periods length mismatch"

assert len(prob_child) == model_spec.num_periods
return prob_child


def gen_prob_child_init_age_vector(model_spec):
"""Generates a list of lists containing the shares of individuals with
kids aged -1 (no kids), 0, 1, 2, 3, and 4 in the model's first period.
Shares differ by the level of education of the individuals."""
"""Generate shares of initial child ages by education level."""

child_age_shares = pd.read_pickle(model_spec.child_age_shares_file_name)

Expand Down
27 changes: 0 additions & 27 deletions soepy/exogenous_processes/determine_lagged_choice.py

This file was deleted.

47 changes: 44 additions & 3 deletions soepy/exogenous_processes/experience.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,25 @@
"""This module reads in information on probabilities to have accumulated experience
in part-time and/or full-time work before the model entry/initial age."""
"""Initial experience distributions.

The continuous-experience refactor uses a single experience stock. For simulation and
initial conditions we still need distributions over *experience years*.

We keep the legacy inputs (separate PT and FT experience share files) but expose them
explicitly:

- ``gen_prob_init_exp_component_vector`` reads one share file and returns a
distribution over years 0..init_exp_max by education.
- ``gen_prob_init_exp_years_vector`` combines PT and FT distributions via
convolution to obtain a distribution over total experience years
0..(2*init_exp_max) by education.

No implicit defaults: the model spec must define
``ft_exp_shares_file_name`` and ``pt_exp_shares_file_name``.
"""
import numpy as np
import pandas as pd


def gen_prob_init_exp_vector(model_spec, model_spec_exp_file_key):
def gen_prob_init_exp_component_vector(model_spec, model_spec_exp_file_key):
"""Generates a list of lists containing the shares of individuals with
ft/pt experience of 0, 1, 2, 3, and 4 years in the model's first period.
Shares differ by the level of education of the individuals."""
Expand All @@ -19,3 +35,28 @@ def gen_prob_init_exp_vector(model_spec, model_spec_exp_file_key):
init_exp.append(exp_shares_list)

return init_exp


def gen_prob_init_exp_years_vector(model_spec):
"""Generate distribution over total initial experience years by education."""

prob_ft = gen_prob_init_exp_component_vector(
model_spec, model_spec.ft_exp_shares_file_name
)
prob_pt = gen_prob_init_exp_component_vector(
model_spec, model_spec.pt_exp_shares_file_name
)

max_years = 2 * model_spec.init_exp_max
out = []

for educ_level in range(model_spec.num_educ_levels):
p_ft = np.asarray(prob_ft[educ_level], dtype=float)
p_pt = np.asarray(prob_pt[educ_level], dtype=float)

p = np.convolve(p_ft, p_pt)
p = p[: max_years + 1]
p = p / p.sum()
out.append(p.tolist())

return out
17 changes: 12 additions & 5 deletions soepy/pre_processing/model_processing.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import collections
import copy

import jax.numpy as jnp
import numpy as np
import pandas as pd
import yaml
Expand Down Expand Up @@ -84,20 +85,23 @@ def group_parameters(model_params_dict_expanded):

for category, param in [
("const_wage_eq", "gamma_0"),
("exp_returns_f", "gamma_f"),
("exp_returns_p", "gamma_p"),
("exp_returns_p_bias", "gamma_p_bias"),
("exp_return", "gamma_1"),
("exp_increase_p", "gamma_p"),
]:
model_params_dict_flat[param] = np.zeros(
len(model_params_dict_expanded["const_wage_eq"]), dtype=float
len(model_params_dict_expanded[category])
)

for educ_ind, educ_type in enumerate(["low", "middle", "high"]):
model_params_dict_flat[param][educ_ind] = np.array(
model_params_dict_expanded[category][f"{param}_{educ_type}"],
dtype=float,
)

# Additional part-time increment for mothers of small children (not education-specific).
model_params_dict_flat["gamma_p_mom"] = float(
model_params_dict_expanded["exp_increase_p_mom"]["gamma_p_mom"]
)

for key_ in list(model_params_dict_expanded["disutil_work"].keys()):
if "child" in key_:
model_params_dict_flat[key_] = model_params_dict_expanded["disutil_work"][
Expand Down Expand Up @@ -168,6 +172,9 @@ def read_model_spec_init(model_spec_init_dict, model_params):

model_spec_dict_flat = flatten_model_spec_dict(model_spec_dict_expand)

# Continuous experience grid (required input).
model_spec_dict_flat["exp_grid"] = jnp.asarray(model_spec_init["exp_grid"])

model_spec = dict_to_namedtuple_spec(model_spec_dict_flat)

return model_spec
Expand Down
5 changes: 4 additions & 1 deletion soepy/pre_processing/tax_and_transfers_params.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import jax.numpy as jnp
import numpy as np

M_FACTOR = 4.3
Expand Down Expand Up @@ -76,7 +77,9 @@ def create_child_care_costs(model_dict):
child_care_costs[2, :] = model_dict["TAXES_TRANSFERS"]["child_care_costs"][
"3_to_6"
]
model_dict["TAXES_TRANSFERS"]["child_care_costs"] = child_care_costs / M_FACTOR
model_dict["TAXES_TRANSFERS"]["child_care_costs"] = jnp.asarray(
child_care_costs / M_FACTOR
)
return model_dict


Expand Down
27 changes: 27 additions & 0 deletions soepy/shared/constants_and_indices.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
"""This module contains constants, indices etc. which are constant throughout the project.

State vector layout (columns):

0. period
1. educ_level
2. lagged_choice
3. type
4. age_youngest_child
5. partner
"""
import numpy as np

# Set values of constants used across all modules here
MISSING_INT = -99
INVALID_FLOAT = -99.0
NUM_CHOICES = 3
# Hours worked per month
# Assumption: weekly working hours times 4.5 weeks in a month
HOURS = np.array([0, 18, 38])
PERIOD = 0
EDUC_LEVEL = 1
LAGGED_CHOICE = 2
TYPE = 3
AGE_YOUNGEST_CHILD = 4
PARTNER = 5
N_STATE_VARS = 6
Loading
Loading