✏️ Esercizi#
In questo problema ci poniamo il problema del confronto tra le medie di due gruppi usando il modello di regressione. Useremo di nuovo il set di dati relativo a grandezza del cervello e il Quoziente d’Intelligenza (Full Scale Intelligence Quotient, FSIQ) in un gruppo di studenti universitari. I dati provengono da uno studio che ha utilizzato scansioni MRI per misurare la grandezza del cervello.
In questo problema ci concentreremo sulla relazione tra grandezza del cervello e genere. Sappiamo che la grandezza del cervello tende ad essere maggiore per i maschi rispetto alle femmine – anche se questo, ovviamente, non significa che i maschi abbiano un QI maggiore delle femmine.
Carichiamo prima i dati e diamo un’occhiata alle loro caratteristiche.
import pymc as pm
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import arviz as az
# Impostazione del seme per la riproducibilità
np.random.seed(84735)
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
%config InlineBackend.figure_format = 'retina'
%load_ext watermark
RANDOM_SEED = 42
rng = np.random.default_rng(RANDOM_SEED)
plt.style.use("https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle")
brain_data = pd.read_csv("../data/brain_data.csv")
brain_data.head()
ID | GENDER | FSIQ | VIQ | PIQ | MRI | IQDI | |
---|---|---|---|---|---|---|---|
0 | 2 | Male | 140 | 150 | 124 | 1001121 | Higher IQ |
1 | 3 | Male | 139 | 123 | 150 | 1038437 | Higher IQ |
2 | 4 | Male | 133 | 129 | 128 | 965353 | Higher IQ |
3 | 9 | Male | 89 | 93 | 84 | 904858 | Lower IQ |
4 | 10 | Male | 133 | 114 | 147 | 955466 | Higher IQ |
Questo set di dati è basato su uno studio di Willerman et al. (1991) relativo alla relazione tra dimensione del cervello, genere e intelligenza.
Concentreremo l’attenzione sulla relazione tra dimensione del cervello e genere. Sappiamo che la dimensione del cervello tende ad essere maggiore nei maschi rispetto alle femmine, ma ciò non implica differenze nel QI.
Visualizziamo la distribuzione delle dimensioni del cervello per genere.
sns.kdeplot(data=brain_data, x="MRI", hue="GENDER");
![../_images/b677654fa4e0868bf3d762979351d4ba078722a6b066d20c84e64a809a799b65.png](../_images/b677654fa4e0868bf3d762979351d4ba078722a6b066d20c84e64a809a799b65.png)
Usando PyMC e un modello di regressione, si stabilisca (a) se c’è una differenza credibile nella grandezza del cervello tra maschi e femmine (b) se c’è una differenza credibile nel FSQI tra maschi e femmine. (c) Si interpretino i risultati.
Si standardizzimo i dati prima di effettuare l’analisi di regressione.
Soluzione#
Standardizziamo i dati MRI e codifichiamo il genere come una variabile dicotomica.
brain_data['mri'] = (brain_data['MRI'] - brain_data['MRI'].mean()) / brain_data['MRI'].std()
brain_data['gender'] = (brain_data['GENDER'] == 'Male').astype(int)
data_list = {
'N': len(brain_data['mri']),
'x': brain_data['gender'].values,
'y': brain_data['mri'].values
}
df = pd.DataFrame(data_list)
df.head()
N | x | y | |
---|---|---|---|
0 | 40 | 1 | 1.277855 |
1 | 40 | 1 | 1.794111 |
2 | 40 | 1 | 0.783016 |
3 | 40 | 1 | -0.053914 |
4 | 40 | 1 | 0.646232 |
with pm.Model() as model:
alpha = pm.Normal('alpha', 0, 2.5)
beta = pm.Normal('beta', 0, 2.5)
sigma = pm.HalfNormal('sigma', 10)
mu = alpha + beta * df['x']
y_obs = pm.Normal('y_obs', mu, sigma, observed=df['y'])
with model:
idata = pm.sample()
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta, sigma]
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 14 seconds.
Esaminiamo i risultati del campionamento MCMC:
az.summary(idata, round_to=2)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
alpha | -0.63 | 0.18 | -0.97 | -0.30 | 0.00 | 0.0 | 1846.19 | 2139.30 | 1.0 |
beta | 1.27 | 0.25 | 0.81 | 1.77 | 0.01 | 0.0 | 2045.74 | 2493.18 | 1.0 |
sigma | 0.80 | 0.10 | 0.62 | 0.98 | 0.00 | 0.0 | 2351.73 | 2347.73 | 1.0 |
# Diagnostica
az.plot_trace(idata);
![../_images/dc3a32b0768c0300d38d71f851eec60e43f3ce55be47077ca6336c964e64370b.png](../_images/dc3a32b0768c0300d38d71f851eec60e43f3ce55be47077ca6336c964e64370b.png)
az.hdi(idata, hdi_prob=0.95)
<xarray.Dataset> Dimensions: (hdi: 2) Coordinates: * hdi (hdi) <U6 'lower' 'higher' Data variables: alpha (hdi) float64 -0.9725 -0.2819 beta (hdi) float64 0.766 1.772 sigma (hdi) float64 0.6187 0.9926
az.plot_posterior(idata, round_to=2);
![../_images/61bd0bab4ead2c748148ba954b36e197e2641c0cc828a7776580b25bc83ff30d.png](../_images/61bd0bab4ead2c748148ba954b36e197e2641c0cc828a7776580b25bc83ff30d.png)
Il parametro di interesse è \(\beta\). Ci dice che, quando passiamo dalla distribuzione a posteriori delle femmine a quella dei maschi (da 0 a 1 nella variabile genere), la grandezza del cervello aumenta in media di 1.25 deviazioni standard, 95% CI [0.75, 1.75].
Analisi del FSIQ#
Ora analizziamo le differenze nel FSIQ tra maschi e femmine usando lo stesso modello.
data_list2 = {
'N': len(brain_data['gender']),
'x': brain_data['gender'].values,
'y': (brain_data['FSIQ'] - brain_data['FSIQ'].mean()) / brain_data['FSIQ'].std()
}
df2 = pd.DataFrame(data_list2)
df2.head()
N | x | y | |
---|---|---|---|
0 | 40 | 1 | 1.102480 |
1 | 40 | 1 | 1.060955 |
2 | 40 | 1 | 0.811807 |
3 | 40 | 1 | -1.015278 |
4 | 40 | 1 | 0.811807 |
with pm.Model() as model:
alpha = pm.Normal('alpha', 0, 2.5)
beta = pm.Normal('beta', 0, 2.5)
sigma = pm.HalfNormal('sigma', 10)
mu = alpha + beta * df2['x']
y_obs = pm.Normal('y_obs', mu, sigma, observed=df2['y'])
with model:
idata2 = pm.sample()
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta, sigma]
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 14 seconds.
# Grafico della distribuzione a posteriori dei parametri
az.plot_posterior(idata2, round_to=2);
![../_images/bdf56995de19a74b65415fa47483c3ea5b348e1ad1748f7732344b53976045c9.png](../_images/bdf56995de19a74b65415fa47483c3ea5b348e1ad1748f7732344b53976045c9.png)
az.summary(idata2, round_to=2)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
alpha | -0.06 | 0.23 | -0.48 | 0.38 | 0.01 | 0.00 | 1879.15 | 2137.76 | 1.0 |
beta | 0.12 | 0.33 | -0.45 | 0.80 | 0.01 | 0.01 | 1897.61 | 2132.46 | 1.0 |
sigma | 1.04 | 0.13 | 0.82 | 1.27 | 0.00 | 0.00 | 2478.59 | 2383.28 | 1.0 |
La distribuzione a posteriori di \(\beta\) indica come l’intervallo di credibilità al 94% include il valore 0. Non vi è dunque alcuna evidenza di differenze nel FSIQ tra maschi e femmine.
Conclusioni#
La letteratura che descrive lo studio dell’intelligenza ha conciliato questi risultati apparentemente contraddittori (vi è un’associazione positiva tra grandezza del cervello e IQ; la grandezza del cervello è maggiore per i maschi rispetto alle femmine; non c’è evidenza di differenza nel QI in funzione del genere) nel modo seguente. Anche se il cervello delle femmine, in media, è più piccolo di quello dei maschi, l’efficienza delle computazioni neurali delle femmine è maggiore di quella dei maschi.