Philani notebook

Created 29 Sep 2016 • Last modified 18 Mar 2017

Notes

Outlet: Invited article in a special issue of AIDS edited by MJ

Use data up to the 5-year follow-up.

Some children were referred for FAS.

Timepoints are:

  • 0: Baseline
  • 1: Post birth (1-2 weeks)
  • 2: 6 months
  • 3: 18 months
  • 4: 3 years
  • 5: 5 years

Describe alcohol use over time, intervention vs. control. Alcohol use can be represented as any drinking or as AUDIT > 2 (problem drinking).

  • Adriane: Looking at the surveys, I believe the alcohol questions were only asked of mothers. There are two variables for whether the mother drank any alcohol:
    • alc_any: drank alcohol in the last month
    • alc_any_a same as above, except for at time 1 (birth assessment), it is assigned a "1" if she drank in the last month and/or drank alcohol in the month prior to birth
  • AUDIT: audit_score, audit_risk

Child outcomes (as predicted by alcohol × intervention, and by whether the child is with the mother):

  • Fetal alcohol syndrome (FAS)
    • Variables starting with FASD_ were only taken for children referred for FAS treatment; ref_need indicates need for a referral
    • philtrum_rating, lip_rating: 4 or 5 indicates possible need for referral
  • CBCL
    • First available at the year-3 followup. At year 5, only aggression items are available.
      • Aggressive_Behavior
  • Strengths & Difficulties
    • First available at the year-3 followup. At year 5, only prosocial items are available.
      • Prosocial
  • Growth charts (weight for age, height for age)
    • waz, haz, bmiz (Adriane: "we should probably look at these more closely to see if there are outliers we want to remove")
  • Clancy-Blair (executive function)
    • TBD
  • Kaufman IQ
    • Kaufman_Standard_Score_MPI - only measured at timepoint 5

Variables with names ending with _1, _2, or _3 mean measurements of the first, second, and third child of a multiple birth.

Mary O'Connor says that Dawson, Grant, and Stinson (2005) is the right citation for our version of the AUDIT-C. In particular, that's where the threshold of 3 comes from. O'Connor et al. (2011) has the four items we used, which are the first two items from Dawson et al. (2005), plus two heavy-drinking items like the last item of Dawson et al., but with 4 and 3 drinks rather than 5. O'Connor et al. (2011) also says how to score each of the three items. The fact that Dawson et al.'s threshold of 3 might not be appropriate when using four rather than three items isn't addressed.

Mary O'Connor: "women are inclined to be more truthful about their drinking during that time [before recognizing their pregnancy] and… those data correlate most highly with cognitive deficits and physical dysmorphology"

Jackie Stewart: "The props [prop drink containers] we gave were a beer bottle, a wine glass (250mls) and a tot glass. Most of the participants were drinking cheap wine or beer." She confirmed that the goal of the props was to define "drink" for subjects in the sense of the American standard drink, which is 14 g of ethanol.

Mary O'Connor: "we ask about prior to pregnancy recognition drinking because mothers are supposedly more candid about that time period and because that measure predicts neurocognitive outcome better than during pregnancy results.… Prior to pregnancy recognition measures are important because most of the women booked very late in their pregnancies not finding out until much later than women from other countries and backgrounds so their prior to pregnancy recognition measures would include a large portion of their total gestational period."

Alcohol items, not about pregnancy

The number of the roughly equivalent AUDIT item is shown first.

(AUDIT 1) alc_freq (Absent at time 0) In the last month, about how often did you drink ANY alcoholic beverage?

  • Never [1]
  • Less than once a month [2]
  • Once a month [3]
  • 2 to 3 times a month [4]
  • Once a week [5]
  • 2 times a week [6]
  • 3 to 4 times a week [7]
  • Nearly every day [8]
  • Every day [9]
  • Decline to answer [NA]

(AUDIT 2) alc_num (Absent at time 0) In the previous month before today, counting all types of alcohol combined, how many drinks did you USUALLY have on days when you drank alcohol?

  • 1 or 2 [1]
  • 3 or 4 [2]
  • 5 or 6 [3]
  • 7,8 or 9 [4]
  • 10 or more [5]
  • Decline to answer [NA]

(AUDIT 3) alc_4_freq (Absent at time 0) In the previous month before today, about how often did you drink FOUR or MORE drinks in a single day?

  • Never [1]
  • Less than once a month [2]
  • Once a month [3]
  • 2 to 3 times a month [4]
  • Once a week [5]
  • 2 times a week [6]
  • 3 to 4 times a week [7]
  • Nearly every day [8]
  • Every day [9]
  • Decline to answer [NA]

(AUDIT 6) alc_morning (Always present) Do you sometimes take a drink in the morning when you first get up?

  • Yes [1]
  • No [0]
  • Decline to answer [NA]

(Very roughly AUDIT 7) alc_cut_down (Always present) Do you sometimes feel the need to cut down on your drinking?

  • Yes [1]
  • No [0]
  • Decline to answer [NA]

(AUDIT 8) alc_memory (Always present) Has a friend or family member ever told you about things you said or did while you were drinking that you could not remember?

  • Yes [1]
  • No [0]
  • Decline to answer [NA]

(AUDIT 10) alc_friend (Always present) Have close friends or relatives worried or complained about your drinking?

  • Yes [1]
  • No [0]
  • Decline to answer [NA]

What we'll use: (wc dd (valcounts $time (& (>= $alc_4_freq 3) (>= (+ $alc_morning $alc_memory $alc_friend) 1)))) . It's now in the combined dataset under the name alc_risk.

Alcohol items, about pregnancy

alc_freq_pre_b (Present only at time 0) How often did you use alcohol in the month before you found out you were pregnant?

  • Never [1]
  • Less than once a month [2]
  • Once a month [3]
  • 2 to 3 times a month [4]
  • Once a week [5]
  • 2 times a week [6]
  • 3 to 4 times a week [7]
  • Nearly every day [8]
  • Every day [9]
  • Decline to answer [NA]

fas_alc_before (Present only at time 5) How often did you use alcohol in the month before you found out you were pregnant with [child]?

  • Never [1]
  • Less than once a month [2]
  • Once a month [3]
  • 2 to 3 times a month [4]
  • Once a week [5]
  • 2 times a week [6]
  • 3 to 4 times a week [7]
  • Nearly every day [8]
  • Every day [9]
  • Decline to answer [NA]

alc_num_pre_b (Present only at time 0) During the month before you found out you were pregnant, counting all types of alcohol combined, how many drinks did you USUALLY have on days when you drank alcohol?

  • 1 or 2 [1]
  • 3 or 4 [2]
  • 5 or 6 [3]
  • 7,8 or 9 [4]
  • 10 or more [5]
  • Decline to answer [NA]

fas_alc_per_day_before (Present only at time 5) During the month before you found out you were pregnant with [child] counting all types of alcohol combined, how many drinks did you USUALLY have on days when you drank alcohol?

  • 1 or 2 [1]
  • 3 or 4 [2]
  • 5 or 6 [3]
  • 7,8 or 9 [4]
  • 10 or more [5]
  • Decline to answer [NA]

Previous studies on retest of drinking during pregnancy

Columns are contemporaneous and rows are retrospective.

Ernhart, Morrow-Tlucak, Sokol, and Martier (1988):

(setv ernhart-retest (pd.DataFrame
  :columns [4 3 2 1 0]
  :index [4 3 2 1 0]
  [
    [2  3  14  14  3]
    [0  0  7  8  2]
    [0  1  9  20  7]
    [0  0  5  38  20]
    [0  0  0  35  50]]))
(rd 2 [
  ["Proportion of T1 sober who were consistent"
    (/ (getl ernhart-retest 0 0)
      (.sum (getl ernhart-retest : 0)))]
  ["Proportion of T1 drinking who were consistent"
    (/ (.sum (.sum (getl ernhart-retest [4 3 2 1] [4 3 2 1])))
      (.sum (.sum (getl ernhart-retest : [4 3 2 1]))))]])
Proportion of T1 sober who were consistent 0.61
Proportion of T1 drinking who were consistent 0.78

Jacobson, Chiodo, Sokol, and Jacobson (2002):

(setv jacobson-drink-cats (qw Very_Low Low Moderate Heavy Very_Heavy))
(setv jacobson-retest (pd.DataFrame
  :columns (+ ["Abstain"] jacobson-drink-cats)
  :index (+ ["Abstain"] jacobson-drink-cats)
  [
    [46  49  10  1  0  0]
    [7  45  17  2  0  0]
    [5  20  30  3  0  0]
    [0  9  25  2  3  0]
    [0  7  33  7  5  1]
    [1  2  9  8  6  1]]))
(rd 2 [
  ["Proportion of T1 sober who were consistent"
    (/ (getl jacobson-retest "Abstain" "Abstain")
      (.sum (getl jacobson-retest : "Abstain")))]
  ["Proportion of T1 drinking who were consistent"
    (/ (.sum (.sum (getl jacobson-retest jacobson-drink-cats jacobson-drink-cats)))
      (.sum (.sum (getl jacobson-retest : jacobson-drink-cats))))]])
Proportion of T1 sober who were consistent 0.78
Proportion of T1 drinking who were consistent 0.80

Hannigan et al. (2010):

(setv hannigan-retest (pd.DataFrame
  :columns (qw Abstain Lo)
  :index (qw Abstain Lo Mid Hi)
  [
    [222 3]
    [25 3]
    [14 0]
    [17 4]]))
(rd 2 [
  ["Proportion of T1 sober who switched"
    (/ (getl hannigan-retest "Abstain" "Abstain")
      (.sum (getl hannigan-retest : "Abstain")))]
  ["Proportion of T1 drinking who switched"
    (/ (.sum (.sum (getl hannigan-retest (qw Lo Mid Hi) "Lo")))
      (.sum (.sum (getl hannigan-retest : "Lo"))))]])
Proportion of T1 sober who switched 0.8
Proportion of T1 drinking who switched 0.7

Data

Drinking

Simple

(setv dd (cbind
  ($ (.set-index (ss obs (= $timepoint "After_birth")) "s") alcohol_any)
  ($ (.set-index (ss obs (= $timepoint "5_years")) "s") alcohol_any_while_preg)))
(wc dd (valcounts $alcohol_any $alcohol_any_while_preg))
alcohol_any Drinking Sober ~N/A
Drinking 40 23 14
Sober 137 646 205
~N/A 12 61 3

This compares subject's responses during year-5 as to whether they had drunk during pregnancy (columns) to their after-birth responses as to whether they had drunk in the past month (rows).

(setv d (kwc .rename :columns (λ (.replace (string it) ".0" "")) (wc obs (pd.crosstab
  (.fillna (.astype $alcohol_any object) "~N/A")
  (.fillna (.astype $alcohol_audit object) "~N/A")))))
(setv d.index.name "")
d
I 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ~N/A
Drinking 0 11 169 39 36 49 203 128 128 60 63 30 13 24 3 1 5 0
Sober 5272 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
~N/A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 168

Subjects have an AUDIT score over 0 iff they have any drinking. Makes sense.

(setv d (.reset-index (.stack (.apply
  (.groupby (ss obs (pd.notnull $alcohol_any)) (qw condition timepoint))
  (λ (cbind
    :alcohol_any (.mean (= ($ it alcohol_any) "Drinking"))
    :alcohol_problem (.mean (= ($ it alcohol_audit_problem) "Problem_Drinking"))))))))
(setv d.columns (qw condition timepoint _ variable proportion))
(kwc sns.factorplot :data d
  :x "timepoint" :y "proportion" :hue "condition" :row "variable"
  :aspect 2 :!legend)
;(.set-ylim (plt.gca) -.01 .3)
(kwc plt.legend :loc "upper center")

drinking.png

The two plots look similar since only a few observations have AUDIT scores of 1 or 2.

(wc (ss obs (pd.notnull $alcohol_audit)) (pd.crosstab
  [$timepoint $condition]
  (.astype $alcohol_audit int)))
timepoint condition 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Baseline Control 371 1 27 9 4 11 19 19 12 10 9 4 3 1 0 0 0
Baseline Mentor_Mother 489 5 37 7 9 12 27 15 16 4 9 6 1 6 0 1 0
After_birth Control 443 2 4 2 1 1 8 13 6 0 6 1 0 1 0 0 0
After_birth Mentor_Mother 566 0 3 4 3 3 8 11 2 3 1 1 0 0 0 0 0
6_months Control 477 2 9 3 4 2 17 3 5 3 2 0 0 0 0 0 1
6_months Mentor_Mother 518 1 7 7 2 4 12 9 8 3 1 1 0 0 0 0 0
18_months Control 427 0 15 1 3 2 20 6 8 6 2 1 0 0 0 0 0
18_months Mentor_Mother 473 0 13 0 1 3 16 12 8 8 1 1 0 0 0 0 0
3_years Control 369 0 16 2 3 2 40 6 9 4 1 1 0 1 0 0 0
3_years Mentor_Mother 413 0 23 2 4 2 30 7 3 3 4 2 2 1 0 0 0
5_years Control 333 0 11 2 0 6 3 13 27 6 21 4 6 6 2 0 2
5_years Mentor_Mother 393 0 4 0 2 1 3 14 24 10 6 8 1 8 1 0 2
(setv d (.copy (ss obs (> $alcohol_audit 0))))
(setv ($ d alcohol_audit) (.astype ($ d alcohol_audit) int))
(kwc sns.factorplot :data d :kind "count"
  :x "alcohol_audit" :row "condition"
  :color "black")

audit_cond_only.png

Note that I've excluded AUDIT 0 from the above since it overwhelms the other categories. Interestingly, the mode is 6 rather than 1.

(for [[_ d] (.groupby obs "s")]
  (kwc plt.plot
    (. ($ d timepoint) cat codes)
    (+ ($ d alcohol_audit) (np.random.uniform -.2 .2 (len d)))
    :c "black" :alpha (/ 1 5)))
(.set-ylim (plt.gca) -.2 20)

audit_spaghetti.png

GLMM

library(RJSONIO)
library(lme4)
library(optimx)
library(reshape2)
library(ggplot2)
ilogit = function(v) 1 / (1 + exp(-v))

j = fromJSON("../data.json")
obs = data.frame(lapply(1 : length(j$table[[1]]), function(ci)
    sapply(j$table[-1], function(row) if (is.null(row[[ci]])) NA else row[[ci]])))
colnames(obs) = j$table[[1]]
for (vname in names(j$categories))
   {obs[[vname]] = factor(obs[[vname]],
        levels = j[["categories"]][[vname]][["categories"]])}
#        ordered = j[["categories"]][[vname]][["ordered"]])}
m = glmer(
   alcohol_any ~ condition * timepoint + (1|nhood/s),
   data = transform(ss(obs, !is.na(alcohol_any)),
       alcohol_any = alcohol_any == "Drinking"),
   family = binomial,
   optimizer = "optimx",
   control = glmerControl(optimizer = "optimx",
       optCtrl = list(method = c("bobyqa", "L-BFGS-B"))))

This model estimates the SD of the nhood random intercepts as 0, so we might as well remove them.

m = glmer(
   alcohol_any ~ condition * timepoint + (1|s),
   data = transform(subset(obs, !is.na(alcohol_any)),
       alcohol_any = alcohol_any == "Drinking"),
   family = binomial,
   optimizer = "optimx",
   control = glmerControl(optimizer = "optimx",
       optCtrl = list(method = c("bobyqa", "L-BFGS-B"))))

round(d = 2, summary(m)$coefficients[,"Estimate", drop = F])
  Estimate
(Intercept) -2.59
conditionMentor_Mother -0.14
timepointAfter_birth -2.35
timepoint6_months -1.85
timepoint18_months -1.40
timepoint3_years -0.64
timepoint5_years 0.00
conditionMentor_Mother:timepointAfter_birth -0.29
conditionMentor_Mother:timepoint6_months -0.09
conditionMentor_Mother:timepoint18_months -0.16
conditionMentor_Mother:timepoint3_years -0.17
conditionMentor_Mother:timepoint5_years -0.64
v = summary(m)$coefficients[,"Estimate"]
tp = paste0("timepoint", levels(obs$timepoint)[-1])
d = data.frame(Control = c(0, v[tp]))
d$MM = d$Control + v["conditionMentor_Mother"] + c(0,
    v[paste0("conditionMentor_Mother:", tp)])
rownames(d) = levels(obs$timepoint)
transform(d,
    Control = round(d = 1, Control),
    MM = round(d = 1, MM))
  Control MM
Baseline 0.0 -0.1
After_birth -2.4 -2.8
6_months -1.9 -2.1
18_months -1.4 -1.7
3_years -0.6 -1.0
5_years 0.0 -0.8
d2 = transform(
    melt(cbind(timepoint = row.names(d), d),
        id.vars = "timepoint", variable.name = "condition",
        value.name = "effect"),
    timepoint = factor(timepoint, levels = unique(timepoint)))
ggplot(d2, aes(timepoint, effect, group = condition, color = condition)) +
    geom_line() +
    geom_point() +
    coord_cartesian(ylim = c(-3, 0))

time-cond-drinking-glmm-coefs.png

This looks a lot like fig--g/drinking, which is nice.

Retest of drinking during pregnancy

Any or none

(setv pregdrink-retest (.pivot
  (ss obs (.isin $timepoint (qw Baseline 5_years)))
  "s" "timepoint" "alcohol_any_preg_before_noticed"))
(setv pregdrink-retest (.astype pregdrink-retest {"Baseline" object "5_years" object}))
  ; This works around a pandas bug that would otherwise cause the
  ; next statement to fail.
(setv ($ pregdrink-retest condition)
  (getl (.set-index obs-bl "s") pregdrink-retest.index "condition"))
[
  ["subjects responding at 1 or both timepoints"
    (.sum (wc pregdrink-retest (| (pd.notnull $Baseline) (pd.notnull $5_years))))]
  ["subjects responding at both timepoints"
    (.sum (wc pregdrink-retest (& (pd.notnull $Baseline) (pd.notnull $5_years))))]
  ["subjects responding at baseline"
    (.sum (pd.notnull ($ pregdrink-retest Baseline)))]
  ["subjects reporting drinking at baseline"
    (.sum (= ($ pregdrink-retest Baseline) "Drinking"))]
  ["proportion" (rd 2 (/
    (.sum (= ($ pregdrink-retest Baseline) "Drinking"))
    (.sum (pd.notnull ($ pregdrink-retest Baseline)))))]
  ["subjects responding at 5 years"
    (.sum (pd.notnull ($ pregdrink-retest 5_years)))]
  ["subjects reporting drinking at 5 years"
    (.sum (= ($ pregdrink-retest 5_years) "Drinking"))]
  ["proportion" (rd 2 (/
    (.sum (= ($ pregdrink-retest 5_years) "Drinking"))
    (.sum (pd.notnull ($ pregdrink-retest 5_years)))))]
  ["proportion of subjects with concordant responses" (rd 2
    (.mean (wc (.dropna pregdrink-retest) (= $Baseline $5_years))))]
]
subjects responding at 1 or both timepoints 1220.00
subjects responding at both timepoints 843.00
subjects responding at baseline 1144.00
subjects reporting drinking at baseline 284.00
proportion 0.25
subjects responding at 5 years 919.00
subjects reporting drinking at 5 years 174.00
proportion 0.19
proportion of subjects with concordant responses 0.85
(rmap [[timepoint condition] (product (qw Baseline 5_years) (qw Control Mentor_Mother))]
  [(.format "proportion drinking: {}, {}" timepoint condition)
    (rd 2 (.mean (= "Drinking" (.dropna
      (getl (ss pregdrink-retest (= $condition condition)) : timepoint)))))])
proportion drinking: Baseline, Control 0.26
proportion drinking: Baseline, Mentor_Mother 0.24
proportion drinking: 5_years, Control 0.21
proportion drinking: 5_years, Mentor_Mother 0.17
condition Baseline Drinking Sober prop
Control Drinking 56 33 0.63
Control Sober 23 254 0.08
Mentor_Mother Drinking 60 51 0.54
Mentor_Mother Sober 22 344 0.06

Here's a within-subjects point of view. The columns "Drinking" and "Sober" are 5-year responses. In both conditions, subjects who said sober at baseline were very likely to say the same later, whereas subjects who said drinking at baseline had a fair likelihood of switching to sober.

Amounts

(setv pregdrink-retest-amnt (.pivot
  (ss obs (.isin $timepoint (qw Baseline 5_years)))
  "s" "timepoint" "alcohol_aa/day_preg_before_noticed"))
(setv pregdrink-retest-amnt (.astype pregdrink-retest-amnt {}))
  ; This works around a pandas bug that would otherwise cause the
  ; next statement to fail.
(setv ($ pregdrink-retest-amnt condition)
  (getl (.set-index obs-bl "s") pregdrink-retest-amnt.index "condition"))
(setv d pregdrink-retest-amnt  both (.dropna pregdrink-retest-amnt))

[
  ["subjects responding at 1 or both timepoints"
    (.sum (wc d (| (pd.notnull $Baseline) (pd.notnull $5_years))))]
  ["subjects responding at both timepoints"
    (.sum (wc d (& (pd.notnull $Baseline) (pd.notnull $5_years))))]
  ["proportion increasing" (rd 2
    (.mean (wc both (> $5_years $Baseline))))]
  ["proportion decreasing" (rd 2
    (.mean (wc both (< $5_years $Baseline))))]
  ["proportion with no change" (rd 2
    (.mean (wc both (= $5_years $Baseline))))]
  ["mean change" (rd 2
    (.mean (wc both (- $5_years $Baseline))))]]
subjects responding at 1 or both timepoints 1220.00
subjects responding at both timepoints 842.00
proportion increasing 0.16
proportion decreasing 0.13
proportion with no change 0.72
mean change 0.07
(plt.clf)
(plt.axis "equal")
(plt.plot [0 3] [0 3] :c "black" :linewidth .5 :zorder -3)
(defn jitter [v]
  (+ v (* (- (np.random.random (len v)) .5) .05)))
(wc (.dropna pregdrink-retest-amnt)
  (plt.scatter (jitter $Baseline) (jitter $5_years) :s 1))

amnt-scatter.png

RMSE comparison (no CV)

(import [statsmodels.api :as sm])

(defn pred-dichot [df xv yv]
  (setv means (.mean (.groupby (getl df : [xv yv]) xv)))
  (setv y-pred (getl means (getl df : xv) yv))
  (rmse (. (getl df : yv) values) y-pred.values))

(defn pred-amnt [df xv yv]
  (setv m (sm.OLS
    (. (getl df : yv) values)
    (sm.add-constant (. (.astype (getl df : xv) float) values))))
  (setv y-pred (.predict (.fit m)))
  (rmse (. (getl df : yv) values) y-pred))

(setv [d-dichot d-amnt] (rmap [d [pregdrink-retest pregdrink-retest-amnt]]
  (.dropna (.merge (.reset-index d)
    (ss obs (= $timepoint "5_years") ["s" "child_iq"])))))
(setv v (rd (pds-from-pairs :name "RMSE" [
  ["Baseline (SD)" (wc d-dichot (.std $child_iq :ddof 0))]
  ["T1, dichotomous" (pred-dichot d-dichot "Baseline" "child_iq")]
  ["T1, continuous" (pred-amnt d-amnt "Baseline" "child_iq")]
  ["T2, dichotomous" (pred-dichot d-dichot "5_years" "child_iq")]
  ["T2, continuous" (pred-amnt d-amnt "5_years" "child_iq")]])))
(setv v.index.name "Model")
v
Table 1. Accuracy of predicting year-5 child IQ using measures of drinking during pregnancy taken at baseline (T1) or 5 years after the child's birth (T2), and using dichotomous (drinking or not drinking) or continuous (fluid ounces of absolute alcohol per day) coding. RMSE = root mean square error
Model RMSE
Baseline (SD) 11.301
T1, dichotomous 11.286
T1, continuous 11.284
T2, dichotomous 11.297
T2, continuous 11.290

CV

(defn rmse [v1 v2]
  (np.sqrt (np.mean (** (- v1 v2) 2))))

(defn pred-ols [x y cv-obj]
  (np.sqrt (- (np.mean (sklearn.model-selection.cross-val-score
    (sklearn.linear-model.LinearRegression :fit-intercept T)
    x y
    :scoring "neg_mean_squared_error" :cv cv-obj)))))
(defn pred-nothing [df yv cv-obj]
  (setv y (. (getl df : yv) values))
  (setv y-pred (np.array (rmap [[i-train i-test] cv-obj]
    (np.mean (geta y i-train)))))
  (rmse y y-pred))
(defn pred-dichot [df xv yv cv-obj]
  (pred-ols
    (.reshape (. (= (getl df : xv) "Drinking") values) -1 1)
    (. (getl df : yv) values)
    cv-obj))
(defn pred-amnt [df xv yv cv-obj]
  (pred-ols
    (.reshape (. (.astype (getl df : xv) float) values) -1 1)
    (. (getl df : yv) values)
    cv-obj))

(setv d (rd (pd.concat :axis 1 (rmap [[colname yv y-timepoint] [
    ["Head circumference" "child_head_circumference_for_age_z" "After_birth"]
    ["Intelligence" "child_iq" "5_years"]]]
  (setv [d-dichot d-amnt] (rmap [d [pregdrink-retest pregdrink-retest-amnt]]
    (setv d (.dropna (.merge (.reset-index d) (ss obs
      (& (= $condition "Control") (= $timepoint y-timepoint))
      ["s" yv]))))
    (.set-index (getl d : ["s" "Baseline" "5_years" yv]) "s")))
  (assert (.issubset (set d-amnt.index) (set d-dichot.index)))
  (setv d-dichot (getl d-dichot d-amnt.index))
  (setv cv-obj (list (.split (sklearn.model-selection.LeaveOneOut) d-dichot)))
  (pds-from-pairs :name colname [
    ["(Sample size)" (str (len d-dichot))]
    ["Baseline (constant only)" (pred-nothing d-dichot yv cv-obj)]
    ["T1, dichotomous" (pred-dichot d-dichot "Baseline" yv cv-obj)]
    ["T1, continuous" (pred-amnt d-amnt "Baseline" yv cv-obj)]
    ["T2, dichotomous" (pred-dichot d-dichot "5_years" yv cv-obj)]
    ["T2, continuous" (pred-amnt d-amnt "5_years" yv cv-obj)]])))))
(setv d.index.name "Model")
d
Table 2. Accuracy (as root mean square error) of predicting head circumference after birth (WHO z-score for age and sex) and year-5 child intelligence (Kaufman mental processing index; MPI) using measures of drinking during pregnancy taken at baseline (T1) or 5 years after the child's birth (T2), and using dichotomous (drinking or not drinking) or continuous (fluid ounces of absolute alcohol per day) coding.
Model Head circumference Intelligence
(Sample size) 364 284
Baseline (constant only) 1.847 11.057
T1, dichotomous 1.847 11.070
T1, continuous 1.847 11.120
T2, dichotomous 1.848 11.061
T2, continuous 1.839 11.086

Growth

I've clipped z-scores to the range [-10, 10].

Simple

(for [drink-v (qw Sober Drinking) cond-v (qw Control Mentor_Mother)]
  (kwc sns.kdeplot :label (+ drink-v " " cond-v) ($
    (ss obs (& (= $alcohol_any drink-v) (= $condition cond-v)))
    child_weight_for_age_z)))
(plt.xlim [-10 11])

condition_weight.png

(rd (wc obs (kwc pd.crosstab
  $condition $alcohol_any
  :values $child_weight_for_age_z :aggfunc np.mean)))
condition Drinking Sober
Control -0.102 0.193
Mentor_Mother -0.119 0.198

Drinking does seem to be related to slightly lower weights.

(for [drink-v (qw Sober Drinking) cond-v (qw Control Mentor_Mother)]
  (kwc sns.kdeplot :label (+ drink-v " " cond-v) ($
    (ss obs (& (= $alcohol_any drink-v) (= $condition cond-v)))
    child_height_for_age_z)))
(plt.xlim [-11 10])

condition_height.png

(rd (wc obs (kwc pd.crosstab
  $condition $alcohol_any
  :values $child_height_for_age_z :aggfunc np.mean)))
condition Drinking Sober
Control -0.778 -0.648
Mentor_Mother -0.761 -0.550

Non-drinking MM subjects seem to have slightly taller children.

Prediction

Let's try focusing on the last timepoint (the 5-year follow-up) as our DV. The predictors for each case are: condition, drinking at the each timepoint, possibly neighborhood. Ignore the DV at the previous timepoints.

(setv po (pred-outcome "child_weight_for_age_z" "5_years" obs))
(rd 5 (rmse-table po))
I RMSE
trivial 1.17529
ridge 1.09020
ridge-interact 1.08902
(setv v (get po "ridge" "on_all_data"))
(rd v)
I value
Intercept 0.031
mentor_mother -0.002
alcohol_any_Baseline -0.019
alcohol_any_After_birth -0.012
alcohol_any_6_months -0.017
alcohol_any_18_months -0.010
alcohol_any_3_years -0.015
alcohol_any_5_years -0.019
alcohol_any_while_preg -0.020
(rd (.sum (getl v (filt (.startswith it "alcohol_any_") v.index))))
-0.113
(setv po (pred-outcome "child_height_for_age_z" "5_years" obs))
(rd 5 (rmse-table po))
I RMSE
trivial 1.04357
ridge 1.01218
ridge-interact 1.01238
(setv v (get po "ridge" "on_all_data"))
(rd 4 v)
I value
Intercept -0.5565
mentor_mother 0.0004
alcohol_any_Baseline -0.0019
alcohol_any_After_birth -0.0009
alcohol_any_6_months -0.0009
alcohol_any_18_months -0.0005
alcohol_any_3_years 0.0003
alcohol_any_5_years -0.0009
alcohol_any_while_preg -0.0023

All the coefficients are tiny, in keeping with the tiny improvement over the trivial model.

Kaufman IQ

Simple

(kwc sns.kdeplot (.dropna ($ obs child_iq)))
;(rectplot ($ obs child_iq))

child_iq_overall.png

Troublingly, the mode is around 80, more than an SD below the general average IQ of 100.

(for [drink-v (qw Sober Drinking) cond-v (qw Control Mentor_Mother)]
  (kwc sns.kdeplot :label (+ drink-v " " cond-v) ($
    (ss obs (& (= $alcohol_any drink-v) (= $condition cond-v)))
    child_iq)))

child_iq_by_condalc.png

Prediction

(setv po (pred-outcome "child_iq" "5_years" obs))
(rd 5 (rmse-table po))
I RMSE
trivial 11.67886
ridge 11.26431
ridge-interact 11.25672
(setv v (get po "ridge" "on_all_data"))
(rd 4 v)
I value
Intercept 83.1735
mentor_mother -0.0002
alcohol_any_Baseline -0.0014
alcohol_any_After_birth -0.0005
alcohol_any_6_months 0.0006
alcohol_any_18_months -0.0000
alcohol_any_3_years 0.0017
alcohol_any_5_years -0.0000
alcohol_any_while_preg -0.0008

Mostly 0 effects.

References

Dawson, D. A., Grant, B. F., & Stinson, F. S. (2005). The AUDIT-C: Screening for alcohol use disorders and risk drinking in the presence of other psychiatric disorders. Comprehensive Psychiatry, 46(6), 405–416. doi:10.1016/j.comppsych.2005.01.006

Ernhart, C. B., Morrow-Tlucak, M., Sokol, R. J., & Martier, S. (1988). Underreporting of alcohol use in pregnancy. Alcoholism, 12(4), 506–511. doi:10.1111/j.1530-0277.1988.tb00233.x

Hannigan, J. H., Chiodo, L. M., Sokol, R. J., Janisse, J., Ager, J. W., Greenwald, M. K., & Delaney-Black, V. (2010). A 14-year retrospective maternal report of alcohol consumption in pregnancy predicts pregnancy and teen outcomes. Alcohol, 44(7–8), 583–594. doi:10.1016/j.alcohol.2009.03.003. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2889143/

Jacobson, S. W., Chiodo, L. M., Sokol, R. J., & Jacobson, J. L. (2002). Validity of maternal report of prenatal alcohol, cocaine, and smoking in relation to neurobehavioral outcome. Pediatrics, 109(5), 815. doi:10.1542/peds.109.5.815

O'Connor, M. J., Tomlinson, M., LeRoux, I. M., Stewart, J., Greco, E., & Rotheram-Borus, M. J. (2011). Predictors of alcohol use prior to pregnancy recognition among township women in Cape Town, South Africa. Social Science and Medicine, 72(1), 83–90. doi:10.1016/j.socscimed.2010.09.049