✏️ Esercizi#

Stiamo studiando la relazione tra la grandezza del cervello e il Quoziente d’Intelligenza (Full Scale Intelligence Quotient, FSIQ) in un gruppo di studenti universitari. I dati provengono da uno studio che ha utilizzato scansioni MRI per misurare la grandezza del cervello.

Riporto qui sotto la descrizione del set di dati.

The data are based on a study by Willerman et al. (1991) of the relationships between brain size, gender, and intelligence. The research participants consisted of 40 right-handed introductory psychology students with no history of alcoholism, unconsciousness, brain damage, epilepsy, or heart disease who were selected from a larger pool of introductory psychology students with total Scholastic Aptitude Test Scores higher than 1350 or lower than 940. The students in the study took four subtests (Vocabulary, Similarities, Block Design, and Picture Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. Among the students with Wechsler full-scale IQ’s less than 103, 10 males and 10 females were randomly selected. Similarly, among the students with Wechsler full-scale IQ’s greater than 130, 10 males and 10 females were randomly selected, yielding a randomized blocks design. MRI scans were performed at the same facility for all 40 research participants to measure brain size. The scans consisted of 18 horizontal MRI images. The computer counted all pixels with non-zero gray scale in each of the 18 images, and the total count served as an index for brain size. The dataset and description are adapted from the Data and Story Library (DASL) website.

In questa analisi, ci concentreremo sui dati relativi ai maschi, cercando di capire se vi è una associazione positiva tra la grandezza del cervello (MRI) e il FSIQ. Si usi un’analisi di regressione con FSIQ come variabile dipendente e MRI come predittore.

Si trovi la distribuzione a posteriori del parametro \(\beta\). (a) Si trovi l’intervallo di credibilità a posteriodi HDI al 95% per \(\beta\). (b) Si trovi la probabilità a posteriori che \(\beta\) sia positivo. (c) Si interpretino i risultati.

Prima di eseguire l’analisi di regressione, si standardizzino i dati.

Soluzione#

import pymc as pm
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import arviz as az

# Impostazione del seme per la riproducibilità
np.random.seed(84735)
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
%config InlineBackend.figure_format = 'retina'
%load_ext watermark

RANDOM_SEED = 42
rng = np.random.default_rng(RANDOM_SEED)

plt.style.use("https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle")
brain_data = pd.read_csv("../data/brain_data.csv")
brain_data.head()
ID GENDER FSIQ VIQ PIQ MRI IQDI
0 2 Male 140 150 124 1001121 Higher IQ
1 3 Male 139 123 150 1038437 Higher IQ
2 4 Male 133 129 128 965353 Higher IQ
3 9 Male 89 93 84 904858 Lower IQ
4 10 Male 133 114 147 955466 Higher IQ
# Filtraggio dei dati per i maschi
males = brain_data[brain_data['GENDER'] == 'Male']
males.shape
(20, 7)
# Standardizzazione di MRI e FSIQ
males['fsiq'] = (males['FSIQ'] - males['FSIQ'].mean()) / males['FSIQ'].std()
males['mri'] = (males['MRI'] - males['MRI'].mean()) / males['MRI'].std()
/var/folders/cl/wwjrsxdd5tz7y9jr82nd5hrw0000gn/T/ipykernel_13871/2956472145.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  males['fsiq'] = (males['FSIQ'] - males['FSIQ'].mean()) / males['FSIQ'].std()
/var/folders/cl/wwjrsxdd5tz7y9jr82nd5hrw0000gn/T/ipykernel_13871/2956472145.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  males['mri'] = (males['MRI'] - males['MRI'].mean()) / males['MRI'].std()
# Diagramma a dispersione
sns.scatterplot(data=males, x='mri', y='fsiq')
<Axes: xlabel='mri', ylabel='fsiq'>
../_images/827dbdd5a1111abf5e9c0a5ee92111a8da36d340f28c5a0fd6f808d0be3496db.png
# Dati per il modello
data = {
    'N': len(males['fsiq']),
    'x': males['mri'].values,
    'y': males['fsiq'].values
}

df = pd.DataFrame(data)
df.head()
N x y
0 20 0.827481 1.000548
1 20 1.494895 0.960526
2 20 0.187754 0.720394
3 20 -0.894226 -1.040570
4 20 0.010921 0.720394
# Definizione del modello
with pm.Model() as model:
    alpha = pm.Normal("alpha", 0, 2.5)
    beta = pm.Normal("beta", 0, 2.5)
    sigma = pm.HalfNormal("sigma", 10)
    mu = alpha + beta * data["x"]
    y_obs = pm.Normal("y_obs", mu, sigma, observed=data["y"])
with model:
    idata = pm.sample()
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta, sigma]
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
100.00% [8000/8000 00:01<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 14 seconds.
# Diagnostica
az.plot_trace(idata);
../_images/f87ae5fdae22709c6f6eccd2f3bc2f604600e0bbe7dfb39406916aeba1172205.png
az.summary(idata, round_to=2)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
alpha 0.01 0.22 -0.40 0.43 0.0 0.0 4401.73 2866.29 1.0
beta 0.49 0.22 0.08 0.90 0.0 0.0 4710.35 3307.43 1.0
sigma 0.95 0.17 0.66 1.27 0.0 0.0 3871.46 2896.86 1.0
az.hdi(idata, hdi_prob=0.95)
<xarray.Dataset>
Dimensions:  (hdi: 2)
Coordinates:
  * hdi      (hdi) <U6 'lower' 'higher'
Data variables:
    alpha    (hdi) float64 -0.412 0.4537
    beta     (hdi) float64 0.06877 0.929
    sigma    (hdi) float64 0.6488 1.277
idata
arviz.InferenceData
    • <xarray.Dataset>
      Dimensions:  (chain: 4, draw: 1000)
      Coordinates:
        * chain    (chain) int64 0 1 2 3
        * draw     (draw) int64 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999
      Data variables:
          alpha    (chain, draw) float64 0.07491 0.02426 0.2144 ... 0.04371 -0.02099
          beta     (chain, draw) float64 0.5868 0.6395 0.5766 ... 0.4324 0.5625 0.5659
          sigma    (chain, draw) float64 1.053 1.345 1.33 ... 0.8664 1.093 0.7494
      Attributes:
          created_at:                 2023-09-17T09:21:41.098267
          arviz_version:              0.16.0
          inference_library:          pymc
          inference_library_version:  5.6.1
          sampling_time:              13.798460006713867
          tuning_steps:               1000

    • <xarray.Dataset>
      Dimensions:                (chain: 4, draw: 1000)
      Coordinates:
        * chain                  (chain) int64 0 1 2 3
        * draw                   (draw) int64 0 1 2 3 4 5 ... 994 995 996 997 998 999
      Data variables: (12/17)
          lp                     (chain, draw) float64 -32.15 -34.3 ... -32.33 -31.92
          step_size              (chain, draw) float64 1.044 1.044 ... 0.9529 0.9529
          step_size_bar          (chain, draw) float64 0.8931 0.8931 ... 0.8772 0.8772
          max_energy_error       (chain, draw) float64 -0.2865 0.2507 ... -0.3251
          largest_eigval         (chain, draw) float64 nan nan nan nan ... nan nan nan
          energy                 (chain, draw) float64 33.1 34.58 ... 34.16 34.06
          ...                     ...
          energy_error           (chain, draw) float64 -0.1184 0.2507 ... -0.3251
          index_in_trajectory    (chain, draw) int64 2 -2 -1 2 0 3 -1 ... 2 3 1 1 3 -1
          perf_counter_start     (chain, draw) float64 5e+03 5e+03 ... 5.001e+03
          process_time_diff      (chain, draw) float64 0.000511 0.00026 ... 0.000233
          acceptance_rate        (chain, draw) float64 0.9846 0.8456 ... 0.6763 0.8575
          n_steps                (chain, draw) float64 7.0 3.0 7.0 3.0 ... 3.0 3.0 3.0
      Attributes:
          created_at:                 2023-09-17T09:21:41.106415
          arviz_version:              0.16.0
          inference_library:          pymc
          inference_library_version:  5.6.1
          sampling_time:              13.798460006713867
          tuning_steps:               1000

    • <xarray.Dataset>
      Dimensions:      (y_obs_dim_0: 20)
      Coordinates:
        * y_obs_dim_0  (y_obs_dim_0) int64 0 1 2 3 4 5 6 7 ... 12 13 14 15 16 17 18 19
      Data variables:
          y_obs        (y_obs_dim_0) float64 1.001 0.9605 0.7204 ... -1.361 -1.041
      Attributes:
          created_at:                 2023-09-17T09:21:41.109410
          arviz_version:              0.16.0
          inference_library:          pymc
          inference_library_version:  5.6.1

# Probabilità che beta sia maggiore di 0
prob_beta_positive = (idata.posterior["beta"] > 0).mean()
print("Probabilità che beta sia maggiore di 0:", prob_beta_positive)
Probabilità che beta sia maggiore di 0: <xarray.DataArray 'beta' ()>
array(0.985)
# Grafico della distribuzione a posteriori dei parametri
az.plot_posterior(idata, round_to=2)
array([<Axes: title={'center': 'alpha'}>,
       <Axes: title={'center': 'beta'}>,
       <Axes: title={'center': 'sigma'}>], dtype=object)
../_images/cd998da763dffcd2f401679a208db6fd4cce6c283f40be6fcda1432b19cce84c.png

Conclusioni#

Analizzando i dati, troviamo evidenze che, nei maschi, la grandezza del cervello, così come indicizzata dagli scan MRI, è positivamente associata al FSIQ. In particolare, un aumento di una deviazione standard nella grandezza del cervello, così com’è stata misurata nel presente studio, corrisponde a un aumento medio nel FSIQ di un valore proporzionale alla stima del parametro \(\beta\). Gli intervalli di credibilità possono essere utilizzati per quantificare l’incertezza associata a questa stima.

Questi risultati supportano l’idea che vi sia una relazione positiva tra la grandezza del cervello e l’intelligenza, almeno in questo specifico campione di maschi. Tuttavia, è importante notare che questi risultati non dimostrano una relazione causale, e ulteriori ricerche potrebbero essere necessarie per comprendere pienamente la natura di questa associazione.