COMPUTATIONAL COGNITIVE SCIENCE¶

Pulkit Goyal and Joel Vasama

Setup¶

Some Suggestions

from functools import lru_cache

from tqdm import tqdm

Decision Making¶

1. Non-Binary Value

❌ Accuracy Maximisation

2. Model Agnostic

❌ Probility Matching
❌ Sample Based Rule

✔️ Softmax

+ Epsilon-Greedy Error Model

@epsilon_greedy
def softmax(s1, s2, *std): # Softmax Rule
    return 1/(1 + np.e**(s1 - s2))

epsilon_greedy = lambda choice_function: lambda e: lambda *args, **kwargs: choice_function(*args, **kwargs) * (1 - e) + 0.5 * e

Scoring¶

Highest (Bayesian) posterior probability ⇒ BIC.
Highest generalization performance on unseen data ⇒ AIC.
Highest compression of the observed data ⇒ MDL.

✔️ Aim → the best fitting model or highest posterior probability ⇒ BIC

# Bayesian Information Criterion
# > BIC = p.ln(t.k) - 2.LL
bic = lambda P, N, LL: P*np.log(N) - 2*LL

Model Fitting¶

@lru_cache
def fit_model(participant_id,
              Model,
              model_params,
              choice_function=softmax(0),
              conditions=()):
    ...

def optimise(Model,
             model_search_space=(),
             search_algorithm=lambda *args, **kwargs: optimize.brute(*args, **kwargs, full_output=True, finish=None),
             error_search_space=(0, 1, 0.1),
             choice_function=softmax,
             xic=bic,
             conditions={}):
    ...

Comparison of Speed/Accuracy of Optimisation Techniques¶

# Gradient Descent
bics_rw_slow_2, optimal_model_params_rw_slow_2, optimal_error_rw_slow_2 = optimise(
    RescorlaWagner,
    model_search_space=((0, 1.1, 0.1),),
    error_search_space=(0, 1, 0.1),
    conditions={'slow': 1, 'num_features': 2}
    search_algorithm=lambda *args, **kwargs: optimize.brute(*args, **kwargs, full_output=True, finish=None),
)

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [04:05<00:00, 12.94s/it]

# Grid Search
bics_rw_slow_2_nm, optimal_model_params_rw_slow_2_nm, optimal_error_rw_slow_2_nm = optimise(
    RescorlaWagner,
    model_search_space=(0.5,),
    error_search_space=0.1,
    conditions={'slow': 1, 'num_features': 2},
    search_algorithm=lambda *args, **kwargs: [optimize.minimize(*args, **kwargs, bounds=((0, 1), (0, 1)), method='Nelder-Mead')[key] for key in ['x', 'fun']]
)

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [04:32<00:00, 14.34s/it]

np.mean(abs(bics_rw_slow_2_nm - bics_rw_slow_2))

0.4230812774305918

✔️ Grid Search

Simple / Easy to Implement / Available in Standard Libraries
Very Low Difference in Optimal Values

Data Pre-Processing¶

Outlier Detection¶

fig, ax = plt.subplots(1, 2, figsize=(8, 4))
boxplots([par_correct], ['Accuracy'], ax[0])
boxplots([par_RT], ['Reaction Time'], ax[1])
fig.suptitle('Outlier Detection');

Behavioral Analysis¶

Do participants perform better than chance?¶

One-Sample $t$-Test (Against 50% Chance Level -- Participants Had Two Alternatives).
✔️ Statistically significant → Performance above chance level ($t$ = 9.75, $p$ < 0.05), with an average accuracy of 63.3%.

Do they improve over trials within each round?¶

smf.ols(formula='correct ~ trial', ...).fit()

Linear Regression Analysis.
✔️ Statistically significant. Performance is better trial-wise within blocks ($\beta$ = 0.014, $R^2$ = 0.007, $p$ < 0.001), on average 1.4% better each trial after the first.

Do they improve over rounds?¶

smf.ols(formula='correct ~ task', ...).fit()

Linear Regression Analysis.
❌ Statistically insignificant. Performance doesn't really improve over rounds ($p$ = 0.242).

How does the number of observed features affect performance?¶

smf.ols(formula='correct ~ num_features', ...).fit()

Linear Regression Analysis.
✔️ Statistically significant. Performance decreases with additional features ($\beta$ = -0.043, $R^2$ = 0.005, $p$ < 0.001), with the addition of each feature decreasing accuracy on average by 4.3%.

How does time pressure affect performance?¶

$t$-Test for Paired Samples.
✔️ Statistically significant. Performance better in slow tasks over fast tasks ($df$ = 17, $t$ = -3.5, $p$ < 0.001), performing on average 4.5% better in slow trials.

Does RT affect performance?¶

smf.ols(formula='correct ~ num_features * C(slow) * decision_time', ...).fit()

Linear Regression Analysis.
✔️ Statistically significant. Higher reaction time generally resulted in better accuracy ($\beta$ = 0.00005, $R$ = 0.011, $p$ < 0.001), with a 0.5% increase in accuracy for every 100ms.

st.pearsonr(par_correct, par_RT);

Pearson Correlation Analysis.
❌ Statistically significant. However, an additional Pearson's correlation test suggested no statistically significant relationship ($r$ = -0.43, $p$ = 0.07).

Parameter Fitting¶

Look at the resulting parameters and try to answer the following questions

Rescorla-Wagner¶

bics_rw_slow, optimal_model_params_rw_slow, optimal_error_rw_slow = optimise(RescorlaWagner, ((0, 1, 0.1),), conditions={'slow': 1})

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [10:08<00:00, 32.03s/it]

bics_rw_fast, optimal_model_params_rw_fast, optimal_error_rw_fast = optimise(RescorlaWagner, ((0, 1, 0.1),), conditions={'slow': 0})

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [10:09<00:00, 32.07s/it]

Does time pressure lead to slower learning?¶

Compare learning rate in slow vs fast tasks.

st.ttest_rel(optimal_model_params_rw_slow[:18, 0], optimal_model_params_rw_fast[:18, 0])

$t$-Test for Paired Samples.
❌ Statistically insignificant. Time-pressure doesn't affect learning ($p$ = 0.862).

Does time pressure lead to more noisy decisions?¶

Compare $\epsilon$-error parameter in slow vs fast tasks.

st.ttest_rel(optimal_error_rw_slow[:18], optimal_error_rw_fast[:18])

$t$-Test for Paired Samples.
❌ Statistically insignificant. Time-pressure doesn't affect noisiness of decisions ($p$ = 0.195).

Does time pressure lead to more noisy decisions for low number of features?¶

st.ttest_rel(optimal_error_rw_slow_2[:18], optimal_error_rw_fast_2[:18])

$t$-Test for Paired Samples.
✔️ Statistically significant. Time pressure affects noisiness of decisions for two features ($df$ = 17, $t$ = -2.667, $p$ = 0.016).

Kalman Filter¶

bics_kf_slow, optimal_model_params_kf_slow, optimal_error_kf_slow = optimise(KalmanFilter, ((0.1, 1, 0.1), (0.1, 1, 0.1)), conditions={'slow': 1})

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [1:23:57<00:00, 265.11s/it]

bics_kf_fast, optimal_model_params_kf_fast, optimal_error_kf_fast = optimise(KalmanFilter, ((0.1, 1, 0.1), (0.1, 1, 0.1)), conditions={'slow': 0})

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [1:23:01<00:00, 262.18s/it]

Does time pressure lead to slower learning?¶

Compare $\sigma_y$ in slow vs fast tasks.

st.ttest_rel(optimal_model_params_kf_slow[:18, 0], optimal_model_params_kf_fast[:18, 0])

$t$-Test for Paired Samples.
❌ Statistically insignificant. Time-pressure doesn't affect learning ($p$ = 0.810).

Does time pressure lead to more noisy decisions?¶

Compare $\epsilon$-error parameter in slow vs fast tasks.

st.ttest_rel(optimal_error_kf_slow[:18], optimal_error_kf_fast[:18])

$t$-Test for Paired Samples.
❌ Statistically insignificant. Time-pressure doesn't affect noisiness of decisions ($p$ = 0.129).

Random Model¶

# bics_rm = np.array([bic(0, NUM_TRIALS*data['global_task'].nunique(), NUM_TRIALS*data['global_task'].nunique()*np.log(0.5))] * NUM_PARTICIPANTS)

bics_rm_slow = np.array([bic(0, NUM_TRIALS*data[data['slow'] == 1]['global_task'].nunique(), NUM_TRIALS*data[data['slow'] == 1]['global_task'].nunique()*np.log(0.5))] * NUM_PARTICIPANTS)

bics_rm_fast = np.array([bic(0, NUM_TRIALS*data[data['slow'] == 0]['global_task'].nunique(), NUM_TRIALS*data[data['slow'] == 0]['global_task'].nunique()*np.log(0.5))] * NUM_PARTICIPANTS)

Model Comparison¶

Does time pressure lead to simpler learning strategies?¶

Increased Time Pressure → Increased Random Decisions

Which model explains the human data best?¶

Rescorla Wagner > Random Model >> Kalman Filter

`Participant 18`¶

Kalman Filter Atypically Performs Best!

Participant was a bot actually using Kalman Filter to learn.

BIC score is not too low for an ideal learner?!

Some artificial randomness added for realism.
Also, $\epsilon \neq 0$.

Parameters	$\sigma_y$	$\sigma_w$
Actual	0.1	1.0
Recovered	0.1	0.9

Additional Analysis¶

Does increasing the number of features lead to slower learning in regular (slow) tasks?¶

smf.mixedlm(formula='learning_rate ~ num_features', groups='participant_id', ...).fit()

Linear Mixed Effects Model Analysis.
❌ Statistically insignificant. Number of features doesn't affect learning ($\beta$ = -0.072, Marginal $R^2$ = 0.057, $p$ = 0.073).
Increasing features might lead to slower learning.

Does increasing the number of features lead to slower learning under time pressure (in fast tasks)?¶

smf.mixedlm(formula='learning_rate ~ num_features', groups='participant_id', ...).fit()

Linear Mixed Effects Model Analysis.
❌ Statistically insignificant. Number of features doesn't affect learning ($p$ = 0.632).

Does increasing the number of features lead to noisy decisions in regular (slow) tasks?¶

fig, ax = plt.subplots(1,2, figsize=(16, 4), sharey=True)

fig.suptitle(r'Optimal Error Weight in $\epsilon$-Greedy Error Model Across Participants (Slow Tasks)')
ax[0].bar(PARTICIPANTS - 0.23, optimal_error_rw_slow_2, 0.23, label='2', alpha=0.8)
ax[0].bar(PARTICIPANTS, optimal_error_rw_slow_3, 0.23, label='3', alpha=0.8)
ax[0].bar(PARTICIPANTS + 0.23, optimal_error_rw_slow_4, 0.23, label='4', alpha=0.8)
ax[0].set_xticks(PARTICIPANTS)
ax[0].legend()
ax[0].set_xlabel('Participant ID')
ax[0].set_ylabel('Optimal Error Weight')

boxplots([optimal_error_rw_slow_2, optimal_error_rw_slow_3, optimal_error_rw_slow_4], ('2', '3', '4'), ax[1])
ax[1].set_xlabel('Number of Features')
plt.tight_layout()

smf.mixedlm(formula='epsilon_error ~ num_features', groups='participant_id', ...).fit()

Linear Mixed Effects Model Analysis.
✔️ Statistically significant. Number of features affects noisiness of decisions ($\beta$ = 0.119, Marginal $R^2$ = 0.146, $p$ = 0.005), with the addition of each feature increasing noisiness on average by 0.119.

Tukey's Honestly Significant Difference Test¶

Multiple Comparison of Means - Tukey HSD, FWER=0.05
group1	group2	meandiff	p-adj	lower	upper	reject
2	3	0.2889	0.0055	0.075	0.5028	True
2	4	0.2389	0.0252	0.025	0.4528	True
3	4	-0.05	0.8396	-0.2639	0.1639	False

Participants make more random decisions for features greater than two in (slow) tasks.

Does increasing the number of features lead to noisy decisions under time pressure (in fast tasks)?¶

smf.mixedlm(formula='epsilon_error ~ num_features', groups='participant_id', ...).fit()

Linear Mixed Effects Model Analysis.
❌ Statistically insignificant. Number of features doesn't affect noisiness of decisions ($\beta$ = 0.086, Marginal $R^2$ = 0.075, $p$ = 0.057).
Increasing features might lead to noisier decisions.

Conclusions¶

Performance above chance level ($t$ = 9.75, $p$ < 0.05), with an average accuracy of 63.3%.

Performance improves over trials within task ($\beta$ = 0.014, $R^2$ = 0.007, $p$ < 0.001), on average 1.4% better each trial after the first.

Performance doesn't really improve over rounds ($p$ = 0.242).

Higher reaction time generally resulted in better accuracy ($\beta$ = 0.00005, $R$ = 0.011, $p$ < 0.001), with a 0.5% increase in accuracy for every 100ms.

But, an additional Pearson's correlation test suggested no statistically significant relationship ($r$ = -0.43, $p$ = 0.07).

Increased Time Pressure → Increased Random Decisions

Rescorla Wagner > Random Model >> Kalman Filter

Performance better in slow tasks over fast tasks (paired-samples, $t(17)$ = -3.5, $p$ < 0.001), performing on average 4.5% better in slow trials.

Time pressure significantly affects noisiness of decisions for two features (paired-samples, $t(17)$ = -2.667, $p$ = 0.016).

Performance decreases with additional features ($\beta$ = -0.043, $R^2$ = 0.005, $p$ < 0.001), with the addition of each feature decreasing accuracy on average by 4.3%.

Increasing features might lead to slower learning in regular (slow) tasks ($\beta$ = -0.072, Marginal $R^2$ = 0.057, $p$ = 0.073).

Increasing features leads to noisier decisions in regular (slow) tasks ($\beta$ = 0.119, Marginal $R^2$ = 0.146, $p$ = 0.005), with the addition of each feature increasing noisiness on average by 0.119.

Increasing features might lead to noisier decisions under time pressure (in fast tasks) ($\beta$ = 0.086, Marginal $R^2$ = 0.075, $p$ = 0.057).

Potential Improvements¶

Modelling Improvements
- More advanced models can be added (Heuristics, Neural Networks, Exemplar-Based Models, Gaussin Process).
- Compare different decision making rules (Accuracy Maximisation, Probability Matching, Sample Based Rules, Drift Diffusion Models, Resource-rational Models)
- Increase search space for parameter fitting.

Other Analyses
- Does doing slow tasks first lead to better performance in subsequent fast tasks?
- ...

References¶

Binz, M., Gershman, S.J., Schulz, E. and Endres, D., 2022. Heuristics from bounded meta-learned inference. Psychological review.

Schulz, E., Speekenbrink, M. and Krause, A., 2018. A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions. Journal of Mathematical Psychology, 85, pp.1-16.

Wu, C.M., Schulz, E., Gerbaulet, K., Pleskac, T.J. & Speekenbrink, M. (2019). Under pressure: The influence of time limits on human exploration. Proceedings of the 41st Annual Conference of the Cognitive Science Society.

Questions?¶

Thank You!

COMPUTATIONAL COGNITIVE SCIENCE¶

Setup¶

Decision Making¶

Scoring¶

Model Fitting¶

Comparison of Speed/Accuracy of Optimisation Techniques¶

Data Pre-Processing¶

Outlier Detection¶

Behavioral Analysis¶

Do participants perform better than chance?¶

Do they improve over trials within each round?¶

Do they improve over rounds?¶

How does the number of observed features affect performance?¶

How does time pressure affect performance?¶

Does RT affect performance?¶

Parameter Fitting¶

Rescorla-Wagner¶

Does time pressure lead to slower learning?¶

Does time pressure lead to more noisy decisions?¶

Does time pressure lead to more noisy decisions for low number of features?¶

Kalman Filter¶

Does time pressure lead to slower learning?¶

Does time pressure lead to more noisy decisions?¶

Random Model¶

Model Comparison¶

Does time pressure lead to simpler learning strategies?¶

Which model explains the human data best?¶

Participant 18¶

Additional Analysis¶

Does increasing the number of features lead to slower learning in regular (slow) tasks?¶

Does increasing the number of features lead to slower learning under time pressure (in fast tasks)?¶

Does increasing the number of features lead to noisy decisions in regular (slow) tasks?¶

Tukey's Honestly Significant Difference Test¶

Does increasing the number of features lead to noisy decisions under time pressure (in fast tasks)?¶

Conclusions¶

Potential Improvements¶

References¶

Questions?¶

`Participant 18`¶