✏️ Esercizi#
Stiamo studiando la relazione tra la grandezza del cervello e il Quoziente d’Intelligenza (Full Scale Intelligence Quotient, FSIQ) in un gruppo di studenti universitari. I dati provengono da uno studio che ha utilizzato scansioni MRI per misurare la grandezza del cervello.
Riporto qui sotto la descrizione del set di dati.
The data are based on a study by Willerman et al. (1991) of the relationships between brain size, gender, and intelligence. The research participants consisted of 40 right-handed introductory psychology students with no history of alcoholism, unconsciousness, brain damage, epilepsy, or heart disease who were selected from a larger pool of introductory psychology students with total Scholastic Aptitude Test Scores higher than 1350 or lower than 940. The students in the study took four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. Among the students with Wechsler full-scale IQ’s less than 103, 10 males and 10 females were randomly selected. Similarly, among the students with Wechsler full-scale IQ’s greater than 130, 10 males and 10 females were randomly selected, yielding a randomized blocks design. MRI scans were performed at the same facility for all 40 research participants to measure brain size. The scans consisted of 18 horizontal MRI images. The computer counted all pixels with non-zero gray scale in each of the 18 images, and the total count served as an index for brain size. The dataset and description are adapted from the Data and Story Library (DASL) website.
In questa analisi, ci concentreremo sui dati relativi ai maschi, cercando di capire se vi è una associazione positiva tra la grandezza del cervello (MRI) e il FSIQ. Si usi un’analisi di regressione con FSIQ come variabile dipendente e MRI come predittore.
Si trovi la distribuzione a posteriori del parametro \(\beta\). (a) Si trovi l’intervallo di credibilità a posteriodi HDI al 95% per \(\beta\). (b) Si trovi la probabilità a posteriori che \(\beta\) sia positivo. (c) Si interpretino i risultati.
Prima di eseguire l’analisi di regressione, si standardizzino i dati.
Soluzione#
import pymc as pm
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import arviz as az
# Impostazione del seme per la riproducibilità
np.random.seed(84735)
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
%config InlineBackend.figure_format = 'retina'
%load_ext watermark
RANDOM_SEED = 42
rng = np.random.default_rng(RANDOM_SEED)
plt.style.use("https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle")
brain_data = pd.read_csv("../data/brain_data.csv")
brain_data.head()
ID | GENDER | FSIQ | VIQ | PIQ | MRI | IQDI | |
---|---|---|---|---|---|---|---|
0 | 2 | Male | 140 | 150 | 124 | 1001121 | Higher IQ |
1 | 3 | Male | 139 | 123 | 150 | 1038437 | Higher IQ |
2 | 4 | Male | 133 | 129 | 128 | 965353 | Higher IQ |
3 | 9 | Male | 89 | 93 | 84 | 904858 | Lower IQ |
4 | 10 | Male | 133 | 114 | 147 | 955466 | Higher IQ |
# Filtraggio dei dati per i maschi
males = brain_data[brain_data['GENDER'] == 'Male']
males.shape
(20, 7)
# Standardizzazione di MRI e FSIQ
males['fsiq'] = (males['FSIQ'] - males['FSIQ'].mean()) / males['FSIQ'].std()
males['mri'] = (males['MRI'] - males['MRI'].mean()) / males['MRI'].std()
/var/folders/cl/wwjrsxdd5tz7y9jr82nd5hrw0000gn/T/ipykernel_13871/2956472145.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
males['fsiq'] = (males['FSIQ'] - males['FSIQ'].mean()) / males['FSIQ'].std()
/var/folders/cl/wwjrsxdd5tz7y9jr82nd5hrw0000gn/T/ipykernel_13871/2956472145.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
males['mri'] = (males['MRI'] - males['MRI'].mean()) / males['MRI'].std()
# Diagramma a dispersione
sns.scatterplot(data=males, x='mri', y='fsiq')
<Axes: xlabel='mri', ylabel='fsiq'>
![../_images/827dbdd5a1111abf5e9c0a5ee92111a8da36d340f28c5a0fd6f808d0be3496db.png](../_images/827dbdd5a1111abf5e9c0a5ee92111a8da36d340f28c5a0fd6f808d0be3496db.png)
# Dati per il modello
data = {
'N': len(males['fsiq']),
'x': males['mri'].values,
'y': males['fsiq'].values
}
df = pd.DataFrame(data)
df.head()
N | x | y | |
---|---|---|---|
0 | 20 | 0.827481 | 1.000548 |
1 | 20 | 1.494895 | 0.960526 |
2 | 20 | 0.187754 | 0.720394 |
3 | 20 | -0.894226 | -1.040570 |
4 | 20 | 0.010921 | 0.720394 |
# Definizione del modello
with pm.Model() as model:
alpha = pm.Normal("alpha", 0, 2.5)
beta = pm.Normal("beta", 0, 2.5)
sigma = pm.HalfNormal("sigma", 10)
mu = alpha + beta * data["x"]
y_obs = pm.Normal("y_obs", mu, sigma, observed=data["y"])
with model:
idata = pm.sample()
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta, sigma]
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 14 seconds.
# Diagnostica
az.plot_trace(idata);
![../_images/f87ae5fdae22709c6f6eccd2f3bc2f604600e0bbe7dfb39406916aeba1172205.png](../_images/f87ae5fdae22709c6f6eccd2f3bc2f604600e0bbe7dfb39406916aeba1172205.png)
az.summary(idata, round_to=2)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
alpha | 0.01 | 0.22 | -0.40 | 0.43 | 0.0 | 0.0 | 4401.73 | 2866.29 | 1.0 |
beta | 0.49 | 0.22 | 0.08 | 0.90 | 0.0 | 0.0 | 4710.35 | 3307.43 | 1.0 |
sigma | 0.95 | 0.17 | 0.66 | 1.27 | 0.0 | 0.0 | 3871.46 | 2896.86 | 1.0 |
az.hdi(idata, hdi_prob=0.95)
<xarray.Dataset> Dimensions: (hdi: 2) Coordinates: * hdi (hdi) <U6 'lower' 'higher' Data variables: alpha (hdi) float64 -0.412 0.4537 beta (hdi) float64 0.06877 0.929 sigma (hdi) float64 0.6488 1.277
idata
-
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999 Data variables: alpha (chain, draw) float64 0.07491 0.02426 0.2144 ... 0.04371 -0.02099 beta (chain, draw) float64 0.5868 0.6395 0.5766 ... 0.4324 0.5625 0.5659 sigma (chain, draw) float64 1.053 1.345 1.33 ... 0.8664 1.093 0.7494 Attributes: created_at: 2023-09-17T09:21:41.098267 arviz_version: 0.16.0 inference_library: pymc inference_library_version: 5.6.1 sampling_time: 13.798460006713867 tuning_steps: 1000
-
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 ... 994 995 996 997 998 999 Data variables: (12/17) lp (chain, draw) float64 -32.15 -34.3 ... -32.33 -31.92 step_size (chain, draw) float64 1.044 1.044 ... 0.9529 0.9529 step_size_bar (chain, draw) float64 0.8931 0.8931 ... 0.8772 0.8772 max_energy_error (chain, draw) float64 -0.2865 0.2507 ... -0.3251 largest_eigval (chain, draw) float64 nan nan nan nan ... nan nan nan energy (chain, draw) float64 33.1 34.58 ... 34.16 34.06 ... ... energy_error (chain, draw) float64 -0.1184 0.2507 ... -0.3251 index_in_trajectory (chain, draw) int64 2 -2 -1 2 0 3 -1 ... 2 3 1 1 3 -1 perf_counter_start (chain, draw) float64 5e+03 5e+03 ... 5.001e+03 process_time_diff (chain, draw) float64 0.000511 0.00026 ... 0.000233 acceptance_rate (chain, draw) float64 0.9846 0.8456 ... 0.6763 0.8575 n_steps (chain, draw) float64 7.0 3.0 7.0 3.0 ... 3.0 3.0 3.0 Attributes: created_at: 2023-09-17T09:21:41.106415 arviz_version: 0.16.0 inference_library: pymc inference_library_version: 5.6.1 sampling_time: 13.798460006713867 tuning_steps: 1000
-
<xarray.Dataset> Dimensions: (y_obs_dim_0: 20) Coordinates: * y_obs_dim_0 (y_obs_dim_0) int64 0 1 2 3 4 5 6 7 ... 12 13 14 15 16 17 18 19 Data variables: y_obs (y_obs_dim_0) float64 1.001 0.9605 0.7204 ... -1.361 -1.041 Attributes: created_at: 2023-09-17T09:21:41.109410 arviz_version: 0.16.0 inference_library: pymc inference_library_version: 5.6.1
# Probabilità che beta sia maggiore di 0
prob_beta_positive = (idata.posterior["beta"] > 0).mean()
print("Probabilità che beta sia maggiore di 0:", prob_beta_positive)
Probabilità che beta sia maggiore di 0: <xarray.DataArray 'beta' ()>
array(0.985)
# Grafico della distribuzione a posteriori dei parametri
az.plot_posterior(idata, round_to=2)
array([<Axes: title={'center': 'alpha'}>,
<Axes: title={'center': 'beta'}>,
<Axes: title={'center': 'sigma'}>], dtype=object)
![../_images/cd998da763dffcd2f401679a208db6fd4cce6c283f40be6fcda1432b19cce84c.png](../_images/cd998da763dffcd2f401679a208db6fd4cce6c283f40be6fcda1432b19cce84c.png)
Conclusioni#
Analizzando i dati, troviamo evidenze che, nei maschi, la grandezza del cervello, così come indicizzata dagli scan MRI, è positivamente associata al FSIQ. In particolare, un aumento di una deviazione standard nella grandezza del cervello, così com’è stata misurata nel presente studio, corrisponde a un aumento medio nel FSIQ di un valore proporzionale alla stima del parametro \(\beta\). Gli intervalli di credibilità possono essere utilizzati per quantificare l’incertezza associata a questa stima.
Questi risultati supportano l’idea che vi sia una relazione positiva tra la grandezza del cervello e l’intelligenza, almeno in questo specifico campione di maschi. Tuttavia, è importante notare che questi risultati non dimostrano una relazione causale, e ulteriori ricerche potrebbero essere necessarie per comprendere pienamente la natura di questa associazione.