Soccer notebook

Created 27 Jun 2016 • Last modified 4 Jun 2018

Protocol notes

Jason Bantjes, 14 Feb 2017: "The initial baselines (Baseline) were conducted on 900 men in June 2015. Following this we had a false start with the intervention and had to delay the intervention by 12 months. Consequently we did new baselines (BaselineT2) in May 2016 on the same 900 men who were interviewed in 2015 - except some of them could not be traced and so we did not get all 900 back. We then had to recruit new men to join the study (this recruitment is currently ongoing) and the data for this in Baseline V2. We have not yet recruited the full cohort of 1200 men. So the data we have so far is not the full data set."

African notes

  • alcohol
    • (alcoho* OR ethanol OR drinking) AND (self-rep* OR validity) AND afri* NOT african-amer* → lots of results
    • (alcoho* OR ethanol OR drinking) AND (self-rep* OR validity) AND "cape town"
    • Cherpitel et al. (2007): 464 patients who arrived at a Cape Town ER within 6 hours of an injury. BAC was obtained with breath analysis. Subjects were asked "In the 6 hours before your injury, did you have any alcoholic beverages to drink—even one drink?"
      • 236 with positive BAC
      • 96% positive report given positive BAC
      • 238 positive report
      • 95% positive BAC given position report
      • It also found (using data from 28 sites around the world) that when controlling for a variety of other variables (age, cause of injury, etc.), gender was no more than very weakly predictive of self-report validity (log odds ratio .045)
  • cannabis and alcohol
    • Peden, van der Spuy, Smith, and Bautz (2000): 254 patients at a Cape Town trauma center, 78% male
      • 13% tested positive for cannabis with urinalysis. "When compared with urinary cannabinoid excretion, the validity of self-reported cannabis smoking in the 48 hours before injury was found to be very poor (sensitivity = 40%, specificity = 99.1%)."
      • 60% tested positive for alcohol; BrAC found "that 40.8% of the patients had zero levels and that one-third had levels at or above 0.08 g / 100 ml, i.e. the legal limit for drivers". "Compared with breath alcohol, self-reported alcohol consumption yielded a sensitivity of 86.7% and a specificity of 96.7%"
  • cannabis

(cannabis OR marijuana OR dagga) AND self-rep* AND afri* NOT african-amer*

  • Plüddemann and Parry (2003): Among 1,002 adults arrested in Cape Town, Jo'burg, or Durban (80% male, mean age 28), self-report was compared to urinalysis:

urine self-report No Yes Negative 565 44 Positive 182 211

  • methamphetamine

(methamphet* OR amphet* OR tik OR stimulant) AND self-rep* AND afri* NOT african-amer* → Nothing. I guess I need to resort to non-South African data for this one

  • all three drugs

    • Kader, Seedat, Koch, and Parry (2012): Among 43 (36 female) HIV-positive patients in a Cape Town clinic, self-report questionnaires of problematic chronic use and dependence (Alcohol Use Disorders Identification Test, Drug Use Disorders Identification Test) were compared to hair analysis:


    • +
  • 15 0


  • 13 2


    • +
  • 11 1


  • 7 5

Of the 43 participants who completed the DUDIT, 12 (28%) scored above the respective cut-off points on the DUDIT (≥2 for females; ≥ 6 for males) for over-the-counter drugs (OTC) and 1 tested positive for cannabis while none of the participants scored above the cut-off for other drugs. In contrast, using hair analysis, all 30 tested negative for cannabis, amphetamines, opiates, cocaine, PCP and methaqualone. Urinalysis revealed one positive test for cannabis which is consistent with the results of the DUDIT, with the remaining participants testing negative for amphetamines, opiates, cocaine, PCP and methaqualone.

Non-African notes

  • alcohol

    • [Wetterling et al. (2014): Confusing numbers and bad reporting] German, 297 subjects, self-report vs. breath vs. urine (EtG)
    • [Whitford, Widner, Mellick, and Elkins (2009): Insufficent reporting]
    • [Barry, Chaney, Stellefson, and Dodd (2013): Again lacking the proportion of breath-identified drunks who reported it] two kinds of self-report vs. breath, American, >1,000, "hazardous drinking"
    • Helander et al. (1999): urine vs. breath vs. self-report, "outpatients attending a methadone maintenance treatment clinic"
      • "Seventeen out of the 47 patients who reported intake of any alcohol showed a urinary 5HTOL/5HIAA ratio"
      • "An additional nine patients who claimed to be abstinent showed abnormal 5HTOL/5HIAA ratios"
    • Bonevski, Campbell, and Sanson-Fisher (2010)
      • Recruited from doctor's offices waiting rooms in New South Wales
      • 575 (93%) consented to completing the Time 1 surveys
      • Compared to urine assay, the computer alcohol use self-report survey demonstrated 92% sensitivity
      • The equivalent paper survey demonstrated 75% sensitivity,

n 154 143 138 140 % % % % male 30 37 44 41 ages 18-29 18 21 17 20 ages 30-49 39 40 35 41

((30*154) + (37*143) + (44*138) + (41*140)) / (154+143+138+140) = 38

((18*154) + (21*143) + (17*138) + (20*140)) / (154+143+138+140) = 19

((39*154) + (40*143) + (35*138) + (41*140)) / (154+143+138+140) = 39

Newer and nicer literature notes

Study summaries

Study summaries summary

Alcohol has been studied with breath, urine, and hair tests in Sweden, Australia, and Cape Town in medical and treatment settings. Accuracy was good except for a study of opioid addicts in Stockholm.

Cannabis has been studied in the US and South Africa (including Cape Town) in medical and treatment settings and among arrestees with urine tests. Accuracy was good or middling. Odd one out: a study of Belgian drivers with saliva tests that got quite low accuracy.

The two studies on meth that got at least some positive cases were in the US, used urine samples, and got middling accuracy.


I drug rdt sr n
0 booze False False 523
1 booze False True 100
2 booze True False 108
3 booze True True 173
0 weed False False 445
1 weed False True 12
2 weed True False 149
3 weed True True 298
0 meth False False 702
1 meth False True 3
2 meth True False 114
3 meth True True 85

Credible intervals

bernbeta = function(falses, trues, prior.shape1 = 1, prior.shape2 = 1) c(
# Given some Bernoulli-distributed data and a prior beta
# distribution on the parameter of the Bernoulli distribution,
# gives the parameters of the posterior beta distribution.
    shape1 = trues + prior.shape1,
    shape2 = falses + prior.shape2)

f = function(falses, trues)
   {x = bernbeta(falses, trues)
    TeachingDemos::hpd(conf = .95,
        function(p) qbeta(p, x["shape1"], x["shape2"]))}

round(digits = 2, rbind(
  f(108, 173),
  f(149, 298),
  f(114, 85)))
  V1 V2
1 0.56 0.67
2 0.62 0.71
3 0.36 0.50


self-report ~ subject random intercept + rdt + subject random slope * rdt + drug dummy 1 + drug dummy 2 + rdt * drug dummy 1 + rdt * drug dummy 2

We want to compare honesty rates between the drugs, where by "honesty rate", I mean the probability of positive self-report given positive RDT.

Suppose we were taking a predictive approach and the drug type was between subjects. Then the question is just predicting the honesty rate for each drug and comparing those predictions.

But we have within-subjects data, which raises the question of whether you care about differential honesty between subjects or within subjects. We can ask how likely a poly user is to report alcohol use vs. cannabis use, and ask how likely a single user is to report alcohol use vs. a single user to report cannabis use, and the answers might be different.

And a look at the data shows that poly use is quite common. Number of subjects with each number of drugs used (according to RDT):

(valcounts (kwc .sum :axis 1 (getl sb :
  (qw drug_alcohol_rdt drug_cannabis_rdt drug_methamphetamine_rdt))))
I value
0 284
1 355
2 223
3 42

So I guess we want to marginalize over such differences; in other words, we want to compare the proportion of positive self-report among all cannabis users (poly or not) to the proportion of positive self-report among all drinkers (poly or not).

Confidence intervals

(rmap [drug (qw booze weed meth)]
  (setv y (get dr
    (getl dr : (+ drug "_rdt"))
    (+ drug "_sr")))
  (+ [drug] (list (rd 2 (np.array (proportion-confint (.sum y) (len y)
    :alpha .05
    :method "jeffrey"))))))
booze 0.56 0.66
weed 0.67 0.74
meth 0.42 0.54

Bootstrapping of differences

Confidence intervals of log odds ratios of honesty:

(setv n-bootrep 50000)

(rmap [[drug1 drug2] [["weed" "booze"] ["booze" "meth"] ["weed" "meth"]]]
  (defn lor [df]
    (log (apply odds-ratio (rmap [drug [drug1 drug2]] (.mean (get df
      (getl df : (+ drug "_rdt"))
      (+ drug "_sr")))))))
  (setv resampled-log-ors (replicate n-bootrep
    (lor (geti dr
      (np.random.random-integers 0 (- (len dr) 1) (len dr))))))
  (setv [lo hi] (rd 2 (np.percentile resampled-log-ors [2.5 97.5])))
  [(+ drug1 " vs. " drug2) lo (rd 2 (lor dr)) hi])
weed vs. booze 0.15 0.42 0.69
booze vs. meth 0.21 0.53 0.86
weed vs. meth 0.67 0.95 1.23


(defn go []

  (setv rdt-probs                [.1  .2  .3  .4])
  (setv booze-sr-probs (np.array [.01 .01 .8  .6]))
  (setv weed-sr-probs  (np.array [.02 .4  .02 .3]))

  (setv booze-rate-true (+
    (* (/ (get rdt-probs 2) (+ (get rdt-probs 2) (get rdt-probs 3)))
      (get booze-sr-probs 2))
    (* (/ (get rdt-probs 3) (+ (get rdt-probs 2) (get rdt-probs 3)))
      (get booze-sr-probs 3))))
  (setv weed-rate-true (+
    (* (/ (get rdt-probs 1) (+ (get rdt-probs 1) (get rdt-probs 3)))
      (get weed-sr-probs 1))
    (* (/ (get rdt-probs 3) (+ (get rdt-probs 1) (get rdt-probs 3)))
      (get weed-sr-probs 3))))
  (setv true-log-or (log (odds-ratio booze-rate-true weed-rate-true)))
  (print true-log-or)

  (setv n (len dr))

  (setv n-sim 20)
  (setv n-bootrep 1000)

  (np.mean (rmap [rep (range n-sim)]

    (print rep)

    (setv rdtpair (kwc np.random.choice (range 4) n :+replace :p rdt-probs))
    (setv d-sim (cbind
      :booze_rdt (> rdtpair 1)
      :weed_rdt (= (% rdtpair 2) 1)
      :booze_sr (np.random.binomial 1 (geta booze-sr-probs rdtpair) n)
      :weed_sr (np.random.binomial 1 (geta weed-sr-probs rdtpair) n)))

    (setv [lo hi] (np.percentile (replicate n-bootrep
      (setv d-ss (geti d-sim (kwc
        np.random.choice (range n) n :+replace)))
      (log (odds-ratio
        (.mean ($ (ss d-ss $booze_rdt) booze_sr))
        (.mean ($ (ss d-ss $weed_rdt) weed_sr))))) [12.5 87.5]))

    (<= lo true-log-or hi))))



fmt = "/home/hippo/Scratch/Research/Soccer/R/%s5_15_18.rda"
load(sprintf(fmt, "Baseline"))
load(sprintf(fmt, "SixMo"))
load(sprintf(fmt, "TwelveMo"))
cols = c("Participant ID ICF", "Date of Interview")
d = rbind(
    cbind(time = "Baseline", Baseline[, cols]),
    cbind(time = "SixMo", SixMo[, cols]),
    cbind(time = "TwelveMo", TwelveMo[, cols]))
colnames(d) = c("timepoint", "id", "date")
d$date = as.Date(d$date, "%d-%m-%Y")
man.days = as.numeric(sum(ddply(d, "id", function(v) max(v$date) - min(v$date))$V1))
man.years = man.days / 365.25
    "Man-days observed" = man.days,
    "Man-years observed" = round(man.years))
Man-days observed 289725
Man-years observed 793


Barry, A. E., Chaney, B. H., Stellefson, M. L., & Dodd, V. (2013). Validating the ability of a single-item assessing drunkenness to detect hazardous drinking. American Journal of Drug and Alcohol Abuse, 39(5), 320–325. doi:10.3109/00952990.2013.810745

Bonevski, B., Campbell, E., & Sanson-Fisher, R. W. (2010). The validity and reliability of an interactive computer tobacco and alcohol use survey in general practice. Addictive Behaviors, 35(5), 492–498. doi:10.1016/j.addbeh.2009.12.030

Buchan, B. J., Dennis, M. L., Tims, F. M., & Diamond, G. S. (2002). Cannabis use: Consistency and validity of self-report, on-site urine testing and laboratory testing. Addiction, 97(Suppl1), 98–108. doi:10.1046/j.1360-0443.97.s01.1.x

Cherpitel, C. J., Ye, Y., Bond, J., Borges, G., Macdonald, S., Stockwell, T., … Giesbrecht, N. (2007). Validity of self-reported drinking before injury compared with a physiological measure: Cross-national analysis of emergency-department data from 16 countries. Journal of Studies on Alcohol and Drugs, 68(2), 296–302. doi:10.15288/jsad.2007.68.296

Dahl, H., Hammarberg, A., Franck, J., & Helander, A. (2011). Urinary ethyl glucuronide and ethyl sulfate testing for recent drinking in alcohol-dependent outpatients treated with acamprosate or placebo. Alcohol and Alcoholism, 46(5), 553–557. doi:10.1093/alcalc/agr055

Dembo, R., Briones-Robinson, R., Barrett, K., Winters, K. C., Ungaro, R., Karas, L., … Wareham, J. (2015). The validity of truant youths' marijuana use and its impact on alcohol use and sexual risk taking. Journal of Child and Adolescent Substance Abuse, 24(6), 355–365. doi:10.1080/1067828X.2013.844089

Helander, A., von Wachenfeldt, J., Hiltunen, A., Beck, O., Liljeberg, P., & Borg, S. (1999). Comparison of urinary 5-hydroxytryptophol, breath ethanol, and self-report for detection of recent alcohol use during outpatient treatment: A study on methadone patients. Drug and Alcohol Dependence, 56(1), 33–38. doi:10.1016/S0376-8716(99)00007-1

Kader, R., Seedat, S., Koch, J. R., & Parry, C. D. (2012). A preliminary investigation of the AUDIT and DUDIT in comparison to biomarkers for alcohol and drug use among HIV-infected clinic attendees in Cape Town, South Africa. African Journal of Psychiatry, 15(5), 346–351. doi:10.4314/ajpsy.v15i5.43

Lee, M. O., Vivier, P. M., & Diercks, D. B. (2009). Is the self-report of recent cocaine or methamphetamine use reliable in illicit stimulant drug users who present to the Emergency Department with chest pain? Journal of Emergency Medicine, 37(2), 237–241. doi:10.1016/j.jemermed.2008.05.024

Lu, N. T., Taylor, B. G., & Riley, K. J. (2001). The validity of adult arrestee self-reports of crack cocaine use. American Journal of Drug and Alcohol Abuse, 27(3), 399–419. doi:10.1081/ADA-100104509

Martin, G. W., Wilkinson, D. A., & Kapur, B. M. (1988). Validation of self-reported cannabis use by urine analysis. Addictive Behaviors, 13(2), 147–150. doi:10.1016/0306-4603(88)90004-4

Melnikov, A., Hedden, S. L., & Latimer, W. W. (2009). Validity of marijuana and opiate use self‐report among adult drug users in Novosibirsk, Russia. Journal of Substance Use, 14(3, 4), 221–229. doi:10.1080/14659890902872075

Nichols, S. L., Lowe, A., Zhang, X., Garvie, P. A., Thornton, S., Goldberger, B. A., … Sleasman, J. W. (2014). Concordance between self-reported substance use and toxicology among HIV-infected and uninfected at risk youth. Drug and Alcohol Dependence, 134, 376–382. doi:10.1016/j.drugalcdep.2013.11.010

Peden, M., van der Spuy, J., Smith, P., & Bautz, P. (2000). Substance abuse and trauma in Cape Town. South African Medical Journal, 90(3), 251–255. Retrieved from

Plüddemann, A., & Parry, C. D. H. (2003). A short report: Self-reported drug use vs. urinalysis in a sample of arrestees in South Africa. Drugs: Education, Prevention and Policy, 10(4), 379–383. doi:10.1080/0968763031000102626

Van der Linden, T., Silverans, P., & Verstraete, A. G. (2014). Comparison between self-report of cannabis use and toxicological detection of THC/THCCOOH in blood and THC in oral fluid in drivers in a roadside survey. Drug Testing and Analysis, 6(1, 2), 137–142. doi:10.1002/dta.1517

Wetterling, T., Dibbelt, L., Wetterling, G., Göder, R., Wurst, F., Margraf, M., & Junghanns, K. (2014). Ethyl glucuronide (EtG): Better than breathalyser or self-reports to detect covert short-term relapses into drinking. Alcohol and Alcoholism, 49(1), 51–54. doi:10.1093/alcalc/agt155

Whitford, J. L., Widner, S. C., Mellick, D., & Elkins, R. L. (2009). Self-report of drinking compared to objective markers of alcohol consumption. American Journal of Drug and Alcohol Abuse, 35(2), 55–58. doi:10.1080/00952990802295212