COMPUTATIONAL COGNITIVE SCIENCE

Pulkit Goyal and Joel Vasama

Setup

Some Suggestions

from functools import lru_cache
from tqdm import tqdm

Decision Making

1. Non-Binary Value
❌ Accuracy Maximisation
2. Model Agnostic
❌ Probility Matching
❌ Sample Based Rule
✔️ Softmax

+ Epsilon-Greedy Error Model

@epsilon_greedy
def softmax(s1, s2, *std): # Softmax Rule
    return 1/(1 + np.e**(s1 - s2))
epsilon_greedy = lambda choice_function: lambda e: lambda *args, **kwargs: choice_function(*args, **kwargs) * (1 - e) + 0.5 * e

Scoring

  1. Highest (Bayesian) posterior probability ⇒ BIC.
  2. Highest generalization performance on unseen data ⇒ AIC.
  3. Highest compression of the observed data ⇒ MDL.
✔️ Aim → the best fitting model or highest posterior probability ⇒ BIC
# Bayesian Information Criterion
# > BIC = p.ln(t.k) - 2.LL
bic = lambda P, N, LL: P*np.log(N) - 2*LL

Model Fitting

@lru_cache
def fit_model(participant_id,
              Model,
              model_params,
              choice_function=softmax(0),
              conditions=()):
    ...
def optimise(Model,
             model_search_space=(),
             search_algorithm=lambda *args, **kwargs: optimize.brute(*args, **kwargs, full_output=True, finish=None),
             error_search_space=(0, 1, 0.1),
             choice_function=softmax,
             xic=bic,
             conditions={}):
    ...

Comparison of Speed/Accuracy of Optimisation Techniques

# Gradient Descent
bics_rw_slow_2, optimal_model_params_rw_slow_2, optimal_error_rw_slow_2 = optimise(
    RescorlaWagner,
    model_search_space=((0, 1.1, 0.1),),
    error_search_space=(0, 1, 0.1),
    conditions={'slow': 1, 'num_features': 2}
    search_algorithm=lambda *args, **kwargs: optimize.brute(*args, **kwargs, full_output=True, finish=None),
)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [04:05<00:00, 12.94s/it]
# Grid Search
bics_rw_slow_2_nm, optimal_model_params_rw_slow_2_nm, optimal_error_rw_slow_2_nm = optimise(
    RescorlaWagner,
    model_search_space=(0.5,),
    error_search_space=0.1,
    conditions={'slow': 1, 'num_features': 2},
    search_algorithm=lambda *args, **kwargs: [optimize.minimize(*args, **kwargs, bounds=((0, 1), (0, 1)), method='Nelder-Mead')[key] for key in ['x', 'fun']]
)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [04:32<00:00, 14.34s/it]
np.mean(abs(bics_rw_slow_2_nm - bics_rw_slow_2))
0.4230812774305918
✔️ Grid Search
  • Simple / Easy to Implement / Available in Standard Libraries
  • Very Low Difference in Optimal Values

Data Pre-Processing

Outlier Detection

fig, ax = plt.subplots(1, 2, figsize=(8, 4))
boxplots([par_correct], ['Accuracy'], ax[0])
boxplots([par_RT], ['Reaction Time'], ax[1])
fig.suptitle('Outlier Detection');

Behavioral Analysis

Do participants perform better than chance?

One-Sample $t$-Test (Against 50% Chance Level -- Participants Had Two Alternatives).
✔️ Statistically significant → Performance above chance level ($t$ = 9.75, $p$ < 0.05), with an average accuracy of 63.3%.

Do they improve over trials within each round?

smf.ols(formula='correct ~ trial', ...).fit()
Linear Regression Analysis.
✔️ Statistically significant. Performance is better trial-wise within blocks ($\beta$ = 0.014, $R^2$ = 0.007, $p$ < 0.001), on average 1.4% better each trial after the first.

Do they improve over rounds?

smf.ols(formula='correct ~ task', ...).fit()
Linear Regression Analysis.
Statistically insignificant. Performance doesn't really improve over rounds ($p$ = 0.242).

How does the number of observed features affect performance?

smf.ols(formula='correct ~ num_features', ...).fit()
Linear Regression Analysis.
✔️ Statistically significant. Performance decreases with additional features ($\beta$ = -0.043, $R^2$ = 0.005, $p$ < 0.001), with the addition of each feature decreasing accuracy on average by 4.3%.

How does time pressure affect performance?

$t$-Test for Paired Samples.
✔️ Statistically significant. Performance better in slow tasks over fast tasks ($df$ = 17, $t$ = -3.5, $p$ < 0.001), performing on average 4.5% better in slow trials.

Does RT affect performance?

smf.ols(formula='correct ~ num_features * C(slow) * decision_time', ...).fit()
Linear Regression Analysis.
✔️ Statistically significant. Higher reaction time generally resulted in better accuracy ($\beta$ = 0.00005, $R$ = 0.011, $p$ < 0.001), with a 0.5% increase in accuracy for every 100ms.
st.pearsonr(par_correct, par_RT);
Pearson Correlation Analysis.
Statistically significant. However, an additional Pearson's correlation test suggested no statistically significant relationship ($r$ = -0.43, $p$ = 0.07).

Parameter Fitting

Look at the resulting parameters and try to answer the following questions

Rescorla-Wagner

bics_rw_slow, optimal_model_params_rw_slow, optimal_error_rw_slow = optimise(RescorlaWagner, ((0, 1, 0.1),), conditions={'slow': 1})
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [10:08<00:00, 32.03s/it]
bics_rw_fast, optimal_model_params_rw_fast, optimal_error_rw_fast = optimise(RescorlaWagner, ((0, 1, 0.1),), conditions={'slow': 0})
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [10:09<00:00, 32.07s/it]

Does time pressure lead to slower learning?

Compare learning rate in slow vs fast tasks.

st.ttest_rel(optimal_model_params_rw_slow[:18, 0], optimal_model_params_rw_fast[:18, 0])
$t$-Test for Paired Samples.
Statistically insignificant. Time-pressure doesn't affect learning ($p$ = 0.862).

Does time pressure lead to more noisy decisions?

Compare $\epsilon$-error parameter in slow vs fast tasks.

st.ttest_rel(optimal_error_rw_slow[:18], optimal_error_rw_fast[:18])
$t$-Test for Paired Samples.
Statistically insignificant. Time-pressure doesn't affect noisiness of decisions ($p$ = 0.195).
Does time pressure lead to more noisy decisions for low number of features?
st.ttest_rel(optimal_error_rw_slow_2[:18], optimal_error_rw_fast_2[:18])
$t$-Test for Paired Samples.
✔️ Statistically significant. Time pressure affects noisiness of decisions for two features ($df$ = 17, $t$ = -2.667, $p$ = 0.016).

Kalman Filter

bics_kf_slow, optimal_model_params_kf_slow, optimal_error_kf_slow = optimise(KalmanFilter, ((0.1, 1, 0.1), (0.1, 1, 0.1)), conditions={'slow': 1})
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [1:23:57<00:00, 265.11s/it]
bics_kf_fast, optimal_model_params_kf_fast, optimal_error_kf_fast = optimise(KalmanFilter, ((0.1, 1, 0.1), (0.1, 1, 0.1)), conditions={'slow': 0})
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [1:23:01<00:00, 262.18s/it]

Does time pressure lead to slower learning?

Compare $\sigma_y$ in slow vs fast tasks.

st.ttest_rel(optimal_model_params_kf_slow[:18, 0], optimal_model_params_kf_fast[:18, 0])
$t$-Test for Paired Samples.
Statistically insignificant. Time-pressure doesn't affect learning ($p$ = 0.810).

Does time pressure lead to more noisy decisions?

Compare $\epsilon$-error parameter in slow vs fast tasks.

st.ttest_rel(optimal_error_kf_slow[:18], optimal_error_kf_fast[:18])
$t$-Test for Paired Samples.
Statistically insignificant. Time-pressure doesn't affect noisiness of decisions ($p$ = 0.129).

Random Model

# bics_rm = np.array([bic(0, NUM_TRIALS*data['global_task'].nunique(), NUM_TRIALS*data['global_task'].nunique()*np.log(0.5))] * NUM_PARTICIPANTS)
bics_rm_slow = np.array([bic(0, NUM_TRIALS*data[data['slow'] == 1]['global_task'].nunique(), NUM_TRIALS*data[data['slow'] == 1]['global_task'].nunique()*np.log(0.5))] * NUM_PARTICIPANTS)
bics_rm_fast = np.array([bic(0, NUM_TRIALS*data[data['slow'] == 0]['global_task'].nunique(), NUM_TRIALS*data[data['slow'] == 0]['global_task'].nunique()*np.log(0.5))] * NUM_PARTICIPANTS)

Model Comparison

Does time pressure lead to simpler learning strategies?

Increased Time Pressure → Increased Random Decisions

Which model explains the human data best?

Rescorla Wagner > Random Model >> Kalman Filter

Participant 18

Kalman Filter Atypically Performs Best!
Participant was a bot actually using Kalman Filter to learn.
BIC score is not too low for an ideal learner?!
Some artificial randomness added for realism.
Also, $\epsilon \neq 0$.
Parameters $\sigma_y$ $\sigma_w$
Actual 0.1 1.0
Recovered 0.1 0.9

Additional Analysis

Does increasing the number of features lead to slower learning in regular (slow) tasks?

smf.mixedlm(formula='learning_rate ~ num_features', groups='participant_id', ...).fit()
Linear Mixed Effects Model Analysis.
Statistically insignificant. Number of features doesn't affect learning ($\beta$ = -0.072, Marginal $R^2$ = 0.057, $p$ = 0.073).
Increasing features might lead to slower learning.

Does increasing the number of features lead to slower learning under time pressure (in fast tasks)?

smf.mixedlm(formula='learning_rate ~ num_features', groups='participant_id', ...).fit()
Linear Mixed Effects Model Analysis.
Statistically insignificant. Number of features doesn't affect learning ($p$ = 0.632).

Does increasing the number of features lead to noisy decisions in regular (slow) tasks?

fig, ax = plt.subplots(1,2, figsize=(16, 4), sharey=True)

fig.suptitle(r'Optimal Error Weight in $\epsilon$-Greedy Error Model Across Participants (Slow Tasks)')
ax[0].bar(PARTICIPANTS - 0.23, optimal_error_rw_slow_2, 0.23, label='2', alpha=0.8)
ax[0].bar(PARTICIPANTS, optimal_error_rw_slow_3, 0.23, label='3', alpha=0.8)
ax[0].bar(PARTICIPANTS + 0.23, optimal_error_rw_slow_4, 0.23, label='4', alpha=0.8)
ax[0].set_xticks(PARTICIPANTS)
ax[0].legend()
ax[0].set_xlabel('Participant ID')
ax[0].set_ylabel('Optimal Error Weight')

boxplots([optimal_error_rw_slow_2, optimal_error_rw_slow_3, optimal_error_rw_slow_4], ('2', '3', '4'), ax[1])
ax[1].set_xlabel('Number of Features')
plt.tight_layout()
smf.mixedlm(formula='epsilon_error ~ num_features', groups='participant_id', ...).fit()
Linear Mixed Effects Model Analysis.
✔️ Statistically significant. Number of features affects noisiness of decisions ($\beta$ = 0.119, Marginal $R^2$ = 0.146, $p$ = 0.005), with the addition of each feature increasing noisiness on average by 0.119.
Tukey's Honestly Significant Difference Test
Multiple Comparison of Means - Tukey HSD, FWER=0.05
group1 group2 meandiff p-adj lower upper reject
2 3 0.2889 0.0055 0.075 0.5028 True
2 4 0.2389 0.0252 0.025 0.4528 True
3 4 -0.05 0.8396 -0.2639 0.1639 False
Participants make more random decisions for features greater than two in (slow) tasks.

Does increasing the number of features lead to noisy decisions under time pressure (in fast tasks)?

smf.mixedlm(formula='epsilon_error ~ num_features', groups='participant_id', ...).fit()
Linear Mixed Effects Model Analysis.
Statistically insignificant. Number of features doesn't affect noisiness of decisions ($\beta$ = 0.086, Marginal $R^2$ = 0.075, $p$ = 0.057).
Increasing features might lead to noisier decisions.

Conclusions

Performance above chance level ($t$ = 9.75, $p$ < 0.05), with an average accuracy of 63.3%.
Performance improves over trials within task ($\beta$ = 0.014, $R^2$ = 0.007, $p$ < 0.001), on average 1.4% better each trial after the first.
Performance doesn't really improve over rounds ($p$ = 0.242).
Higher reaction time generally resulted in better accuracy ($\beta$ = 0.00005, $R$ = 0.011, $p$ < 0.001), with a 0.5% increase in accuracy for every 100ms.

But, an additional Pearson's correlation test suggested no statistically significant relationship ($r$ = -0.43, $p$ = 0.07).

Increased Time Pressure → Increased Random Decisions
Rescorla Wagner > Random Model >> Kalman Filter

Performance better in slow tasks over fast tasks (paired-samples, $t(17)$ = -3.5, $p$ < 0.001), performing on average 4.5% better in slow trials.
Time pressure significantly affects noisiness of decisions for two features (paired-samples, $t(17)$ = -2.667, $p$ = 0.016).
Performance decreases with additional features ($\beta$ = -0.043, $R^2$ = 0.005, $p$ < 0.001), with the addition of each feature decreasing accuracy on average by 4.3%.
Increasing features might lead to slower learning in regular (slow) tasks ($\beta$ = -0.072, Marginal $R^2$ = 0.057, $p$ = 0.073).
Increasing features leads to noisier decisions in regular (slow) tasks ($\beta$ = 0.119, Marginal $R^2$ = 0.146, $p$ = 0.005), with the addition of each feature increasing noisiness on average by 0.119.
Increasing features might lead to noisier decisions under time pressure (in fast tasks) ($\beta$ = 0.086, Marginal $R^2$ = 0.075, $p$ = 0.057).

Potential Improvements

  • Modelling Improvements
    • More advanced models can be added (Heuristics, Neural Networks, Exemplar-Based Models, Gaussin Process).
    • Compare different decision making rules (Accuracy Maximisation, Probability Matching, Sample Based Rules, Drift Diffusion Models, Resource-rational Models)
    • Increase search space for parameter fitting.
  • Other Analyses
    • Does doing slow tasks first lead to better performance in subsequent fast tasks?
    • ...

References

Binz, M., Gershman, S.J., Schulz, E. and Endres, D., 2022. Heuristics from bounded meta-learned inference. Psychological review.

Schulz, E., Speekenbrink, M. and Krause, A., 2018. A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions. Journal of Mathematical Psychology, 85, pp.1-16.

Wu, C.M., Schulz, E., Gerbaulet, K., Pleskac, T.J. & Speekenbrink, M. (2019). Under pressure: The influence of time limits on human exploration. Proceedings of the 41st Annual Conference of the Cognitive Science Society.

Questions?

Thank You!