Philani notebook

Created 29 Sep 2016 • Last modified 14 Apr 2017

Notes

Tom and Adriane's manuscripts can be found in \\CCHFS.semel.ucla.edu\Root\Projects & Programs\South Africa\PHILANI STUDY\FIVE YEAR ASSESSMENT\MANUSCRIPTS.

Outlet: Invited article in a special issue of AIDS edited by MJ

Use data up to the 5-year follow-up.

Some children were referred for FAS.

The timepoints are:

  • 0: Baseline
  • 1: Post birth (1-2 weeks)
  • 2: 6 months
  • 3: 18 months
  • 4: 3 years
  • 5: 5 years

Describe alcohol use over time, intervention vs. control. Alcohol use can be represented as any drinking or as our dichotomous problem-drinking measure.

  • Adriane: Looking at the surveys, I believe the alcohol questions were only asked of mothers. There are two variables for whether the mother drank any alcohol:
    • alc_any: drank alcohol in the last month
    • alc_any_a same as above, except for at time 1 (birth assessment), it is assigned a "1" if she drank in the last month and/or drank alcohol in the month prior to birth
  • alc_risk: see below

Child outcomes (as predicted by alcohol × intervention, and by whether the child is with the mother):

  • Fetal alcohol syndrome (FAS)
    • Variables starting with FASD_ were only taken for children referred for FAS treatment; ref_need indicates need for a referral
    • philtrum_rating, lip_rating: 4 or 5 indicates possible need for referral
  • CBCL
    • First available at the year-3 followup. At year 5, only aggression items are available.
      • Aggressive_Behavior
  • Strengths & Difficulties
    • First available at the year-3 followup. At year 5, only prosocial items are available.
      • Prosocial
  • Growth charts (weight for age, height for age)
    • waz, haz, bmiz (Adriane: "we should probably look at these more closely to see if there are outliers we want to remove")
  • Clancy-Blair (executive function)
    • TBD
  • Kaufman IQ
    • Kaufman_Standard_Score_MPI - only measured at timepoint 5

Variables with names ending with _1, _2, or _3 mean measurements of the first, second, and third child of a multiple birth.

Mary O'Connor says that Dawson, Grant, and Stinson (2005) is the right citation for our version of the AUDIT-C. In particular, that's where the threshold of 3 comes from. O'Connor et al. (2011) has the four items we used, which are the first two items from Dawson et al. (2005), plus two heavy-drinking items like the last item of Dawson et al., but with 4 and 3 drinks rather than 5. O'Connor et al. (2011) also says how to score each of the three items. The fact that Dawson et al.'s threshold of 3 might not be appropriate when using four rather than three items isn't addressed.

Mary O'Connor: "women are inclined to be more truthful about their drinking during that time [before recognizing their pregnancy] and… those data correlate most highly with cognitive deficits and physical dysmorphology"

Jackie Stewart: "The props [i.e., prop drink containers] we gave were a beer bottle, a wine glass (250mls) and a tot glass. Most of the participants were drinking cheap wine or beer." She confirmed that the goal of the props was to define "drink" for subjects in the sense of the American standard drink, which is 14 g of ethanol.

Mary O'Connor: "we ask about prior to pregnancy recognition drinking because mothers are supposedly more candid about that time period and because that measure predicts neurocognitive outcome better than during pregnancy results.… Prior to pregnancy recognition measures are important because most of the women booked very late in their pregnancies not finding out until much later than women from other countries and backgrounds so their prior to pregnancy recognition measures would include a large portion of their total gestational period."

Alcohol items, not about pregnancy

The number of the roughly equivalent AUDIT item is shown first.

(AUDIT 1) alc_freq (Absent at time 0) In the last month, about how often did you drink ANY alcoholic beverage?

  • Never [1]
  • Less than once a month [2]
  • Once a month [3]
  • 2 to 3 times a month [4]
  • Once a week [5]
  • 2 times a week [6]
  • 3 to 4 times a week [7]
  • Nearly every day [8]
  • Every day [9]
  • Decline to answer [NA]

(AUDIT 2) alc_num (Absent at time 0) In the previous month before today, counting all types of alcohol combined, how many drinks did you USUALLY have on days when you drank alcohol?

  • 1 or 2 [1]
  • 3 or 4 [2]
  • 5 or 6 [3]
  • 7,8 or 9 [4]
  • 10 or more [5]
  • Decline to answer [NA]

(AUDIT 3) alc_4_freq (Absent at time 0) In the previous month before today, about how often did you drink FOUR or MORE drinks in a single day?

  • Never [1]
  • Less than once a month [2]
  • Once a month [3]
  • 2 to 3 times a month [4]
  • Once a week [5]
  • 2 times a week [6]
  • 3 to 4 times a week [7]
  • Nearly every day [8]
  • Every day [9]
  • Decline to answer [NA]

(AUDIT 6) alc_morning (Always present) Do you sometimes take a drink in the morning when you first get up?

  • Yes [1]
  • No [0]
  • Decline to answer [NA]

(Very roughly AUDIT 7) alc_cut_down (Always present) Do you sometimes feel the need to cut down on your drinking?

  • Yes [1]
  • No [0]
  • Decline to answer [NA]

(AUDIT 8) alc_memory (Always present) Has a friend or family member ever told you about things you said or did while you were drinking that you could not remember?

  • Yes [1]
  • No [0]
  • Decline to answer [NA]

(AUDIT 10) alc_friend (Always present) Have close friends or relatives worried or complained about your drinking?

  • Yes [1]
  • No [0]
  • Decline to answer [NA]

What we'll use: (wc dd (valcounts $time (& (>= $alc_4_freq 3) (>= (+ $alc_morning $alc_memory $alc_friend) 1)))) .

Alcohol items, about pregnancy

alc_freq_pre_b (Present only at time 0) How often did you use alcohol in the month before you found out you were pregnant?

  • Never [1]
  • Less than once a month [2]
  • Once a month [3]
  • 2 to 3 times a month [4]
  • Once a week [5]
  • 2 times a week [6]
  • 3 to 4 times a week [7]
  • Nearly every day [8]
  • Every day [9]
  • Decline to answer [NA]

fas_alc_before (Present only at time 5) How often did you use alcohol in the month before you found out you were pregnant with [child]?

  • Never [1]
  • Less than once a month [2]
  • Once a month [3]
  • 2 to 3 times a month [4]
  • Once a week [5]
  • 2 times a week [6]
  • 3 to 4 times a week [7]
  • Nearly every day [8]
  • Every day [9]
  • Decline to answer [NA]

alc_num_pre_b (Present only at time 0) During the month before you found out you were pregnant, counting all types of alcohol combined, how many drinks did you USUALLY have on days when you drank alcohol?

  • 1 or 2 [1]
  • 3 or 4 [2]
  • 5 or 6 [3]
  • 7,8 or 9 [4]
  • 10 or more [5]
  • Decline to answer [NA]

fas_alc_per_day_before (Present only at time 5) During the month before you found out you were pregnant with [child] counting all types of alcohol combined, how many drinks did you USUALLY have on days when you drank alcohol?

  • 1 or 2 [1]
  • 3 or 4 [2]
  • 5 or 6 [3]
  • 7,8 or 9 [4]
  • 10 or more [5]
  • Decline to answer [NA]

Previous studies on retest of drinking during pregnancy

Reading notes

one

  • Ernhart, Morrow-Tlucak, Sokol, and Martier (1988) recruited 238 mothers from a Cleveland hospital and compared baseline (asking about drinking during the last 2 weeks at various points during pregnancy) to 5 years. They found that people more often went up than down (41% vs. 18%) but the proportion reporting no drinking was about the same at the two times (34% baseline vs. 36% later). Among sober at T1, 39% switched. Among drinkers at T1, 22% switched.
  • Jacobson, Chiodo, Sokol, and Jacobson (2002) recruited black mothers at a Detroit hospital, most of whom were recruited on the basis of drinking at conception, comparing during pregnancy to 13 months after birth. 59 mothers stated they abstained at baseline vs. 106 at follow-up. Mean absolute alcohol use per day was .23 at baseline vs. .88 at follow-up. "Among those who were inconsistent, 139 (61.8%) reported higher levels retrospectively, whereas only 86 (38.2%) report higher levels antenatally."
  • Hannigan et al. (2010) recruited black women from a Detroit hospital. "Among the 288 women, 43 (14.9%) reported more average drinking retrospectively (AAD) than they had previously reported antenatally" (and only 1 mother made the reverse change). Mean ounces of alcohol per day was .03 at baseline vs. .47 retrospectively. 284 mothers reporting abstaining or light drinking at baseline vs. 245 14 years after.
  • Alvik, Haldorsen, Groholt, and Lindemann (2006) recruited women from Oslo, Norway. "Questionnaires were answered at 17 (T1) and 30 weeks of pregnancy (T2) and 6 months after term (T3)." "Significantly more alcohol consumption after pregnancy recognition was reported retrospectively at both T2 and T3 [T2 0.15 and T3 0.18 standard units per week (SU/wk)] than concurrently at T1 or T2 (T1 0.10 and T2 0.14 SU/wk)." Reporting about T1, 24% of mothers reported any use at T1, 36% at T2, and 37% at T2. Specific within-subjects differences don't seem to be reported.

two

  • Ernhart et al. (1988): O'Connor and Paley (2006) says they "found that retrospective reports of drinking collected 5 years after pregnancy were more highly related to scores on a measure of alcohol-related problems and to craniofacial anomalies in children than were reports collected during pregnancy. Moreover, when questioned 5 years after the birth of their children, a large proportion of the women provided retrospective reports that were appreciably higher than those reports given while they were pregnant"
  • Jacobson et al. (2002): O'Connor and Paley (2006) says they "found reports of alcohol consumption taken during the antenatal period to be more highly associated with multiple measures of infant behavior at 13 months than retrospective reports of pregnancy drinking levels. Like Ernhart and associates (1988), they also found that retrospective recall yielded reports of higher levels of drinking than antenatal reports and that there was a statistically significant correlation between postpartum or current drinking and retrospective reports of consumption during pregnancy"
  • Hannigan et al. (2010): "Retrospective maternal self-reported drinking assessed 14 years postpartum was significantly higher than antenatal reports of consumption.… Retrospective report predicted more teen behavior problems (e.g., attention problems and externalizing behaviors) than the antenatal report."
  • Alvik et al. (2006): Oslo, Norway. "Questionnaires were answered at 17 (T1) and 30 weeks of pregnancy (T2) and 6 months after term (T3)." "Significantly more alcohol consumption after pregnancy recognition was reported retrospectively at both T2 and T3 [T2 0.15 and T3 0.18 standard units per week (SU/wk)] than concurrently at T1 or T2 (T1 0.10 and T2 0.14 SU/wk). When comparing the 2 retrospective reports at T2 and T3, there was a significant increase over time."

Analyses

Columns are contemporaneous and rows are retrospective.

Ernhart et al. (1988):

(setv ernhart-retest (pd.DataFrame
  :columns [4 3 2 1 0]
  :index [4 3 2 1 0]
  [
    [2  3  14  14  3]
    [0  0  7  8  2]
    [0  1  9  20  7]
    [0  0  5  38  20]
    [0  0  0  35  50]]))
(rd 2 [
  ["Proportion of T1 sober who were consistent"
    (/ (getl ernhart-retest 0 0)
      (.sum (getl ernhart-retest : 0)))]
  ["Proportion of T1 drinking who were consistent"
    (/ (.sum (.sum (getl ernhart-retest [4 3 2 1] [4 3 2 1])))
      (.sum (.sum (getl ernhart-retest : [4 3 2 1]))))]])
Proportion of T1 sober who were consistent 0.61
Proportion of T1 drinking who were consistent 0.78

Jacobson et al. (2002):

(setv jacobson-drink-cats (qw Very_Low Low Moderate Heavy Very_Heavy))
(setv jacobson-retest (pd.DataFrame
  :columns (+ ["Abstain"] jacobson-drink-cats)
  :index (+ ["Abstain"] jacobson-drink-cats)
  [
    [46  49  10  1  0  0]
    [7  45  17  2  0  0]
    [5  20  30  3  0  0]
    [0  9  25  2  3  0]
    [0  7  33  7  5  1]
    [1  2  9  8  6  1]]))
(rd 2 [
  ["Proportion of T1 sober who were consistent"
    (/ (getl jacobson-retest "Abstain" "Abstain")
      (.sum (getl jacobson-retest : "Abstain")))]
  ["Proportion of T1 drinking who were consistent"
    (/ (.sum (.sum (getl jacobson-retest jacobson-drink-cats jacobson-drink-cats)))
      (.sum (.sum (getl jacobson-retest : jacobson-drink-cats))))]])
Proportion of T1 sober who were consistent 0.78
Proportion of T1 drinking who were consistent 0.80

Hannigan et al. (2010):

(setv hannigan-retest (pd.DataFrame
  :columns (qw Abstain Lo)
  :index (qw Abstain Lo Mid Hi)
  [
    [222 3]
    [25 3]
    [14 0]
    [17 4]]))
(rd 2 [
  ["Proportion of T1 sober who switched"
    (/ (getl hannigan-retest "Abstain" "Abstain")
      (.sum (getl hannigan-retest : "Abstain")))]
  ["Proportion of T1 drinking who switched"
    (/ (.sum (.sum (getl hannigan-retest (qw Lo Mid Hi) "Lo")))
      (.sum (.sum (getl hannigan-retest : "Lo"))))]])
Proportion of T1 sober who switched 0.8
Proportion of T1 drinking who switched 0.7

Data

Drinking

Figures

(setv d (pd.melt
  (getl obs : (qw s condition timepoint alcohol_any alcohol_problem))
  (qw s condition timepoint)
  (qw alcohol_any alcohol_problem)))
(setv d (.dropna d :subset ["value"]))
(setv ($ d value) (.isin ($ d value) (qw Drinking Problem)))
(sns.factorplot "timepoint" "value" :hue "condition" :col "variable"
  :data d :ci None :legend F :size 6)
(plt.legend :loc "best")

drinking.png

This graph shows the proprtion of mothers who drank at all (left) or had a drinking problem (right) at each timepoint.

(setv d (.dropna (getl obs :
  (qw s condition timepoint alcohol_any alcohol_problem))))
(setv ($ d alcohol_state) (.astype (wc d
  (np.where (= $alcohol_any "Sober")    0
  (np.where (= $alcohol_problem "Okay") 1
                                        2))) float))
(setv d (.dropna (.pivot d "s" "timepoint" "alcohol_state")))
(setv d (get (ordf d (.sum d :axis 1) $Baseline $After_birth $6_months $18_months $3_years $5_years)
  (.any d :axis 1)))
(sns.heatmap d :cbar F)

drinking-lasagna.png

This lasagna plot isn't very useful, but it shows each mother's state (light for non-drinking, pink for non-problem drinking, and dark for problem drinking) at each timepoint. Each mother is a row. Mothers who never drank are excluded from this plot.

Mixed-effects ordinal regression

I fit a mixed-effects ordinal probit-regression model, with per-subject random intercepts. The DV is a 3-valued ordinal variable where the lowest level is not drinking (Sober), the intermediate level is non-problem drinking (Okay), and the highest level is problem drinking (Problem). The IVs are timepoint and condition. I exclude the After_birth timepoint since almost nobody drank then.

library(RJSONIO)
library(ordinal)

j = fromJSON("../data.json")
obs = data.frame(lapply(1 : length(j$table[[1]]), function(column.ix)
    sapply(j$table[-1], function(row)
        if (is.null(row[[column.ix]])) NA else row[[column.ix]])))
colnames(obs) = j$table[[1]]
for (vname in names(j$categories))
   {obs[[vname]] = factor(obs[[vname]],
        levels = j[["categories"]][[vname]][["categories"]])}
d = subset(obs,
    !is.na(alcohol_any) & !is.na(alcohol_problem) &
    timepoint != "After_birth")
d$drinking = droplevels(with(d, ordered(
   ifelse(alcohol_any == "Sober", "Sober", as.character(alcohol_problem)),
   levels = c("Sober", "Okay", "Problem"))))
set.seed(5)
m = clmm2(drinking ~ condition * timepoint, random = s,
    data = d, Hess = T, link = "probit",
    nAGQ = 7, gradTol = 1e-5)
v = coef(summary(m))[,"Estimate"]
tp = paste0("timepoint", levels(obs$timepoint)[c(-1, -2)])
d = data.frame(Control = c(0, v[tp]))
d$MM = d$Control + v["conditionMentor_Mother"] + c(0,
    v[paste0("conditionMentor_Mother:", tp)])
rownames(d) = levels(obs$timepoint)[-2]
transform(d,
    Control = round(d = 2, Control),
    MM = round(d = 2, MM))
  Control MM
Baseline 0.00 -0.04
6_months 0.04 -0.04
18_months 0.23 0.17
3_years 0.61 0.47
5_years 0.98 0.56

Here are the effects at each combination of timepoint and condition, compared to control subjects at baseline. The overall picture is similar to that of the between-subjects graphs. Drinking increases over time, but less steeply for mentored subjects.

v = round(coef(summary(m))[1:2,"Estimate"], 2)
names(v) = sub("\\|", " vs. ", names(v))
v
  value
Sober vs. Okay 2.15
Okay vs. Problem 3.04

Here are the threshold scores the model used to distinguish the three categories. (The underlying error is effectively standardized to SD 1.)

Retest of drinking during pregnancy

Any or none

(setv pregdrink-retest (.pivot
  (ss obs (.isin $timepoint (qw Baseline 5_years)))
  "s" "timepoint" "alcohol_any_preg_before_noticed"))
(setv pregdrink-retest (.astype pregdrink-retest {"Baseline" object "5_years" object}))
  ; This works around a pandas bug that would otherwise cause the
  ; next statement to fail.
(setv ($ pregdrink-retest condition)
  (getl (.set-index obs-bl "s") pregdrink-retest.index "condition"))
[
  ["subjects responding at 1 or both timepoints"
    (.sum (wc pregdrink-retest (| (pd.notnull $Baseline) (pd.notnull $5_years))))]
  ["subjects responding at both timepoints"
    (.sum (wc pregdrink-retest (& (pd.notnull $Baseline) (pd.notnull $5_years))))]
  ["subjects responding at baseline"
    (.sum (pd.notnull ($ pregdrink-retest Baseline)))]
  ["subjects reporting drinking at baseline"
    (.sum (= ($ pregdrink-retest Baseline) "Drinking"))]
  ["proportion" (rd 2 (/
    (.sum (= ($ pregdrink-retest Baseline) "Drinking"))
    (.sum (pd.notnull ($ pregdrink-retest Baseline)))))]
  ["subjects responding at 5 years"
    (.sum (pd.notnull ($ pregdrink-retest 5_years)))]
  ["subjects reporting drinking at 5 years"
    (.sum (= ($ pregdrink-retest 5_years) "Drinking"))]
  ["proportion" (rd 2 (/
    (.sum (= ($ pregdrink-retest 5_years) "Drinking"))
    (.sum (pd.notnull ($ pregdrink-retest 5_years)))))]
  ["proportion of subjects with concordant responses" (rd 2
    (.mean (wc (.dropna pregdrink-retest) (= $Baseline $5_years))))]
]
subjects responding at 1 or both timepoints 1220.00
subjects responding at both timepoints 843.00
subjects responding at baseline 1144.00
subjects reporting drinking at baseline 284.00
proportion 0.25
subjects responding at 5 years 919.00
subjects reporting drinking at 5 years 174.00
proportion 0.19
proportion of subjects with concordant responses 0.85
(rmap [[timepoint condition] (product (qw Baseline 5_years) (qw Control Mentor_Mother))]
  [(.format "proportion drinking: {}, {}" timepoint condition)
    (rd 2 (.mean (= "Drinking" (.dropna
      (getl (ss pregdrink-retest (= $condition condition)) : timepoint)))))])
proportion drinking: Baseline, Control 0.26
proportion drinking: Baseline, Mentor_Mother 0.24
proportion drinking: 5_years, Control 0.21
proportion drinking: 5_years, Mentor_Mother 0.17
condition Baseline Drinking Sober prop
Control Drinking 56 33 0.63
Control Sober 23 254 0.08
Mentor_Mother Drinking 60 51 0.54
Mentor_Mother Sober 22 344 0.06

Here's a within-subjects point of view. The columns "Drinking" and "Sober" are 5-year responses. In both conditions, subjects who said sober at baseline were very likely to say the same later, whereas subjects who said drinking at baseline had a fair likelihood of switching to sober.

Amounts

(setv pregdrink-retest-amnt (.pivot
  (ss obs (.isin $timepoint (qw Baseline 5_years)))
  "s" "timepoint" "alcohol_aa/day_preg_before_noticed"))
(setv pregdrink-retest-amnt (.astype pregdrink-retest-amnt {}))
  ; This works around a pandas bug that would otherwise cause the
  ; next statement to fail.
(setv ($ pregdrink-retest-amnt condition)
  (getl (.set-index obs-bl "s") pregdrink-retest-amnt.index "condition"))
(setv d pregdrink-retest-amnt  both (.dropna pregdrink-retest-amnt))

[
  ["subjects responding at 1 or both timepoints"
    (.sum (wc d (| (pd.notnull $Baseline) (pd.notnull $5_years))))]
  ["subjects responding at both timepoints"
    (.sum (wc d (& (pd.notnull $Baseline) (pd.notnull $5_years))))]
  ["proportion increasing" (rd 2
    (.mean (wc both (> $5_years $Baseline))))]
  ["proportion decreasing" (rd 2
    (.mean (wc both (< $5_years $Baseline))))]
  ["proportion with no change" (rd 2
    (.mean (wc both (= $5_years $Baseline))))]
  ["mean change" (rd 2
    (.mean (wc both (- $5_years $Baseline))))]]
subjects responding at 1 or both timepoints 1220.00
subjects responding at both timepoints 842.00
proportion increasing 0.16
proportion decreasing 0.13
proportion with no change 0.72
mean change 0.07
(plt.clf)
(plt.axis "equal")
(plt.plot [0 3] [0 3] :c "black" :linewidth .5 :zorder -3)
(defn jitter [v]
  (+ v (* (- (np.random.random (len v)) .5) .05)))
(wc (.dropna pregdrink-retest-amnt)
  (plt.scatter (jitter $Baseline) (jitter $5_years) :s 1))

amnt-scatter.png

RMSE comparison (no CV)

(import [statsmodels.api :as sm])

(defn rmse [v1 v2]
  (np.sqrt (np.mean (** (- v1 v2) 2))))

(defn pred-dichot [df xv yv]
  (setv means (.mean (.groupby (getl df : [xv yv]) xv)))
  (setv y-pred (getl means (getl df : xv) yv))
  (rmse (. (getl df : yv) values) y-pred.values))

(defn pred-amnt [df xv yv]
  (setv m (sm.OLS
    (. (getl df : yv) values)
    (sm.add-constant (. (.astype (getl df : xv) float) values))))
  (setv y-pred (.predict (.fit m)))
  (rmse (. (getl df : yv) values) y-pred))

(defn pred-amnt-hinged [df xv yv]
  (setv x (. (.astype (getl df : xv) float) values))
  (setv m (sm.OLS
    (. (getl df : yv) values)
    (np.column-stack [
      (sm.add-constant x)
      (!= x 0)])))
  (setv y-pred (.predict (.fit m)))
  (rmse (. (getl df : yv) values) y-pred))

(setv [d-dichot d-amnt] (rmap [d [pregdrink-retest pregdrink-retest-amnt]]
  (.dropna (.merge (.reset-index d)
    (ss obs (= $timepoint "5_years") ["s" "child_iq"])))))
(setv v (rd (pds-from-pairs :name "RMSE" [
  ["Baseline (SD)" (wc d-dichot (.std $child_iq :ddof 0))]
  ["T1, dichotomous" (pred-dichot d-dichot "Baseline" "child_iq")]
  ["T1, continuous" (pred-amnt d-amnt "Baseline" "child_iq")]
  ["T1, hinged" (pred-amnt-hinged d-amnt "Baseline" "child_iq")]
  ["T2, dichotomous" (pred-dichot d-dichot "5_years" "child_iq")]
  ["T2, continuous" (pred-amnt d-amnt "5_years" "child_iq")]
  ["T2, hinged" (pred-amnt-hinged d-amnt "5_years" "child_iq")]])))
(setv v.index.name "Model")
v
Table 1. Accuracy of "predicting" year-5 child IQ using measures of drinking during pregnancy taken at baseline (T1) or 5 years after the child's birth (T2), and using dichotomous (drinking or not drinking) or continuous (fluid ounces of absolute alcohol per day) coding. RMSE = root mean square error
Model RMSE
Baseline (SD) 11.301
T1, dichotomous 11.286
T1, continuous 11.285
T1, hinged 11.273
T2, dichotomous 11.297
T2, continuous 11.290
T2, hinged 11.287

CV

(defn rmse [v1 v2]
  (np.sqrt (np.mean (** (- v1 v2) 2))))

(defn pred-ols [x y cv-obj]
  (np.sqrt (- (np.mean (sklearn.model-selection.cross-val-score
    (sklearn.linear-model.LinearRegression :fit-intercept T)
    x y
    :scoring "neg_mean_squared_error" :cv cv-obj)))))
(defn pred-nothing [df yv cv-obj]
  (setv y (. (getl df : yv) values))
  (setv y-pred (np.array (rmap [[i-train i-test] cv-obj]
    (np.mean (geta y i-train)))))
  (rmse y y-pred))
(defn pred-dichot [df xv yv cv-obj]
  (pred-ols
    (.reshape (. (= (getl df : xv) "Drinking") values) -1 1)
    (. (getl df : yv) values)
    cv-obj))
(defn pred-amnt [df xv yv cv-obj]
  (pred-ols
    (.reshape (. (.astype (getl df : xv) float) values) -1 1)
    (. (getl df : yv) values)
    cv-obj))

(setv d (rd (pd.concat :axis 1 (rmap [[colname yv y-timepoint] [
    ["Head circumference" "child_head_circumference_for_age_z" "After_birth"]
    ["Intelligence" "child_iq" "5_years"]]]
  (setv [d-dichot d-amnt] (rmap [d [pregdrink-retest pregdrink-retest-amnt]]
    (setv d (.dropna (.merge (.reset-index d) (ss obs
      (& (= $condition "Control") (= $timepoint y-timepoint))
      ["s" yv]))))
    (.set-index (getl d : ["s" "Baseline" "5_years" yv]) "s")))
  (assert (.issubset (set d-amnt.index) (set d-dichot.index)))
  (setv d-dichot (getl d-dichot d-amnt.index))
  (setv cv-obj (list (.split (sklearn.model-selection.LeaveOneOut) d-dichot)))
  (pds-from-pairs :name colname [
    ["(Sample size)" (str (len d-dichot))]
    ["Baseline (constant only)" (pred-nothing d-dichot yv cv-obj)]
    ["T1, dichotomous" (pred-dichot d-dichot "Baseline" yv cv-obj)]
    ["T1, continuous" (pred-amnt d-amnt "Baseline" yv cv-obj)]
    ["T2, dichotomous" (pred-dichot d-dichot "5_years" yv cv-obj)]
    ["T2, continuous" (pred-amnt d-amnt "5_years" yv cv-obj)]])))))
(setv d.index.name "Model")
d
Table 2. Accuracy (as root mean square error) of predicting head circumference after birth (WHO z-score for age and sex) and year-5 child intelligence (Kaufman mental processing index; MPI) using measures of drinking during pregnancy taken at baseline (T1) or 5 years after the child's birth (T2), and using dichotomous (drinking or not drinking) or continuous (fluid ounces of absolute alcohol per day) coding.
Model Head circumference Intelligence
(Sample size) 364 284
Baseline (constant only) 1.847 11.057
T1, dichotomous 1.847 11.070
T1, continuous 1.847 11.120
T2, dichotomous 1.848 11.061
T2, continuous 1.839 11.086

Growth

I've clipped z-scores to the range [-10, 10].

Simple

(for [drink-v (qw Sober Drinking) cond-v (qw Control Mentor_Mother)]
  (kwc sns.kdeplot :label (+ drink-v " " cond-v) ($
    (ss obs (& (= $alcohol_any drink-v) (= $condition cond-v)))
    child_weight_for_age_z)))
(plt.xlim [-10 11])

condition_weight.png

(rd (wc obs (kwc pd.crosstab
  $condition $alcohol_any
  :values $child_weight_for_age_z :aggfunc np.mean)))
condition Drinking Sober
Control -0.102 0.193
Mentor_Mother -0.119 0.198

Drinking does seem to be related to slightly lower weights.

(for [drink-v (qw Sober Drinking) cond-v (qw Control Mentor_Mother)]
  (kwc sns.kdeplot :label (+ drink-v " " cond-v) ($
    (ss obs (& (= $alcohol_any drink-v) (= $condition cond-v)))
    child_height_for_age_z)))
(plt.xlim [-11 10])

condition_height.png

(rd (wc obs (kwc pd.crosstab
  $condition $alcohol_any
  :values $child_height_for_age_z :aggfunc np.mean)))
condition Drinking Sober
Control -0.778 -0.648
Mentor_Mother -0.761 -0.550

Non-drinking MM subjects seem to have slightly taller children.

Prediction

Let's try focusing on the last timepoint (the 5-year follow-up) as our DV. The predictors for each case are: condition, drinking at the each timepoint, possibly neighborhood. Ignore the DV at the previous timepoints.

(setv po (pred-outcome "child_weight_for_age_z" "5_years" obs))
(rd 5 (rmse-table po))
I RMSE
trivial 1.17529
ridge 1.09020
ridge-interact 1.08902
(setv v (get po "ridge" "on_all_data"))
(rd v)
I value
Intercept 0.031
mentor_mother -0.002
alcohol_any_Baseline -0.019
alcohol_any_After_birth -0.012
alcohol_any_6_months -0.017
alcohol_any_18_months -0.010
alcohol_any_3_years -0.015
alcohol_any_5_years -0.019
alcohol_any_while_preg -0.020
(rd (.sum (getl v (filt (.startswith it "alcohol_any_") v.index))))
-0.113
(setv po (pred-outcome "child_height_for_age_z" "5_years" obs))
(rd 5 (rmse-table po))
I RMSE
trivial 1.04357
ridge 1.01218
ridge-interact 1.01238
(setv v (get po "ridge" "on_all_data"))
(rd 4 v)
I value
Intercept -0.5565
mentor_mother 0.0004
alcohol_any_Baseline -0.0019
alcohol_any_After_birth -0.0009
alcohol_any_6_months -0.0009
alcohol_any_18_months -0.0005
alcohol_any_3_years 0.0003
alcohol_any_5_years -0.0009
alcohol_any_while_preg -0.0023

All the coefficients are tiny, in keeping with the tiny improvement over the trivial model.

Kaufman IQ

Simple

(kwc sns.kdeplot (.dropna ($ obs child_iq)))
;(rectplot ($ obs child_iq))

child_iq_overall.png

Troublingly, the mode is around 80, more than an SD below the general average IQ of 100.

(for [drink-v (qw Sober Drinking) cond-v (qw Control Mentor_Mother)]
  (kwc sns.kdeplot :label (+ drink-v " " cond-v) ($
    (ss obs (& (= $alcohol_any drink-v) (= $condition cond-v)))
    child_iq)))

child_iq_by_condalc.png

Prediction

(setv po (pred-outcome "child_iq" "5_years" obs))
(rd 5 (rmse-table po))
I RMSE
trivial 11.67886
ridge 11.26431
ridge-interact 11.25672
(setv v (get po "ridge" "on_all_data"))
(rd 4 v)
I value
Intercept 83.1735
mentor_mother -0.0002
alcohol_any_Baseline -0.0014
alcohol_any_After_birth -0.0005
alcohol_any_6_months 0.0006
alcohol_any_18_months -0.0000
alcohol_any_3_years 0.0017
alcohol_any_5_years -0.0000
alcohol_any_while_preg -0.0008

Mostly 0 effects.

References

Alvik, A., Haldorsen, T., Groholt, B., & Lindemann, R. (2006). Alcohol consumption before and during pregnancy comparing concurrent and retrospective reports. Alcoholism, 30(3), 510–515. doi:10.1111/j.1530-0277.2006.00055.x

Dawson, D. A., Grant, B. F., & Stinson, F. S. (2005). The AUDIT-C: Screening for alcohol use disorders and risk drinking in the presence of other psychiatric disorders. Comprehensive Psychiatry, 46(6), 405–416. doi:10.1016/j.comppsych.2005.01.006

Ernhart, C. B., Morrow-Tlucak, M., Sokol, R. J., & Martier, S. (1988). Underreporting of alcohol use in pregnancy. Alcoholism, 12(4), 506–511. doi:10.1111/j.1530-0277.1988.tb00233.x

Hannigan, J. H., Chiodo, L. M., Sokol, R. J., Janisse, J., Ager, J. W., Greenwald, M. K., & Delaney-Black, V. (2010). A 14-year retrospective maternal report of alcohol consumption in pregnancy predicts pregnancy and teen outcomes. Alcohol, 44(7–8), 583–594. doi:10.1016/j.alcohol.2009.03.003. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2889143/

Jacobson, S. W., Chiodo, L. M., Sokol, R. J., & Jacobson, J. L. (2002). Validity of maternal report of prenatal alcohol, cocaine, and smoking in relation to neurobehavioral outcome. Pediatrics, 109(5), 815. doi:10.1542/peds.109.5.815

O'Connor, M. J., & Paley, B. (2006). The relationship of prenatal alcohol exposure and the postnatal environment to child depressive symptoms. Journal of Pediatric Psychology, 31(1), 50–64.

O'Connor, M. J., Tomlinson, M., LeRoux, I. M., Stewart, J., Greco, E., & Rotheram-Borus, M. J. (2011). Predictors of alcohol use prior to pregnancy recognition among township women in Cape Town, South Africa. Social Science and Medicine, 72(1), 83–90. doi:10.1016/j.socscimed.2010.09.049