Donkey notebook

Created 25 Sep 2018 • Last modified 18 Jan 2019

Task planning

I want a bare-bones trivial decision-making task where people have a repeated choice between two meaningless and maximally equivalent options. By studying the prediction of decisions in this task, I hope to learn more about how to predict variation in choices that arises from something other than content. Then perhaps I can e.g. break the ceiling of 85% accuracy I observed in Arfer and Luhmann (2015).

It seems like there's a lot of analytic approaches for predicting binary sequences. Let's get at least a little data to start with. Let's run the task on Mechanical Turk for convenience.

I was thinking the task could use the mouse, and I could record mouse movements and use mouse movements before making previous choices as a rich predictor. The trouble is that you can't move the cursor in JavaScript, so I can't force the cursor to a neutral position after each choice. Even if I could, the physical mouse would still be a bit to the side, of course. Asking subjects to perfectly recenter the mouse, and not starting the next trial till they did, would probably drive subjects a little crazy.

Instead, let's use the keyboard, and ask subjects to position their fingers over both of the two allowed keys. "F" and "J" seem like good choices.

Suppose we get 200 choices. For each subject, the first 100 choices will be purely training data, while I'll evaluate predictions on the second 100. I may use online learning methods that effectively treat every trial the same, but I'll be most interested in performance after the method has already had 100 trials to train with. During the task, I'll show the subject how many choices are left to make, but I won't give feedback on which choices they made.

Some first subjects

Let's try some models with the first few real subjects. Their first 100 trials look like this (O means pressing "f", x means pressing "k"):

(setv N 200)
(setv n-train (// N 2))
(setv n-test (- N n-train))
(setv d (.copy trials))
(setv ($ d train) (< ($ d trial) n-train))
(lfor [X x] (.groupby d "sn")
  [(pretty-choice-seq (ss x $train "chose-key1"))])
OOxOOxOxOOOOOOOxxxxOOOxxxOOOxOxxxxxxOOOOOOOOOOxxxxOxOxOxOOOOOOxxxOOxxOOOOOxxxOxOxOxOxOxOxOxOOxxxxxxx
OOOxOxOOOOOOOOxOxOOxOxOOxOOOxOOOOOxxOxOOOOOxxOOxxxxxxxxxxxxxxxxxxxxxxxxOxOxOOOOxOxOOOxxOOOxOOOOOOxxO
xOOOxxxxOOxxxxOOxOOOOOxOOOxxxOOOxxxxxxxOOOOOxxxxxxOOOOOOOOOOOOOOOOOOOxxxOOxxxOOOOOxxxxOOOOOxxxxOOOOO
(wcby d [$sn $train] (.mean $chose-key1))
sn train value
0 False 0.50
0 True 0.54
1 False 0.53
1 True 0.54
2 False 0.51
2 True 0.59

Since the base rates are near 50%, a trivial model will do poorly.

Let's try a model that assumes that a given choice is a probabilistic function of the previous n trials. The trouble with this is of course when predicting more than n trials of the test set, it will need to use its own past predictions as predictors. When we didn't get to train on the n-tuple in question, we'll predict the mode.

(defn supermarkov-train [lookbehind train]
  (setv mode (>= (.mean train) .5))
  (setv train (tuple train))
  (setv memory {})
  (for [i (range lookbehind (len train))]
    (setv present (get train i))
    (setv past (cut train (- i lookbehind) i))
    (when (not-in past memory)
      (setv (get memory past) [0 0]))
    (+= (get memory past present) 1))
  {"lookbehind" lookbehind
    "mode" mode  "memory" memory  "train" train})

(defn supermarkov-predict [model test-n]
  (setv preds (,))
  (for [X (range test-n)]
    (setv past (cut
      (+ (get model "train") preds)
      (- (get model "lookbehind"))))
    (+= preds (, (if (in past (get model "memory"))
      (do
        (setv [f t] (get model "memory" past))
        (cond
          [(> t f) True]
          [(> f t) False]
          [True (get model "mode")]))
      (get model "mode")))))
  preds)
(wcby (ss d $train) $sn
  (pd.concat (lfor n [2 3 4 5]
    (cbind :n n
      :pred_sum (sum (supermarkov-predict
        (supermarkov-train n $chose-key1)
        n-test))))))
sn i1 n pred_sum
0 0 2 0
0 0 3 0
0 0 4 0
0 0 5 0
1 0 2 100
1 0 3 100
1 0 4 100
1 0 5 98
2 0 2 100
2 0 3 100
2 0 4 100
2 0 5 100

Hilariously, for various n, this model predicts False or True every time or nearly every time. So it behaves similarly to the trivial model.

Let's try regression models using the trial number and transformations thereof as predictors.

(setv ooo (wcby (ss d $train) $sn
  (setv i (np.array (list (range N))))
  (setv X (sklearn.preprocessing.scale (np.column-stack [
    i (** i 2) (** i 3)
    (% i 2) (% i 3) (% i 5)
    (np.sin i) (np.cos i)])))
  (setv model (sklearn.linear-model.LogisticRegressionCV
    :Cs 100 :cv 10 :penalty "l2" :scoring "accuracy"))
  (.fit model (geta X (: n-train)) $chose-key1)
  (pd.DataFrame (dict
    :trial (geta i (: n-train N))
    :pred (.predict model (geta X (: n-train N)))))))

At least sometimes, this model produces some nontrivial predictions, thanks to its modular and trigonometric terms:

(lfor [X x] (.groupby ooo "sn")
  [(pretty-choice-seq ($ x pred))])
xOOOxOxOOOxOxOOOxOxOOOxOxOOOxOxxxOxOxxxOOOxxxOOOxxxOxOxxxOxOxxxxxOxxxxxOxxxxxOxxxxxOxxxxxxxxxxxxxxxx
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

How accurate is it for that one subject?

(.mean (=
  (getl ooo 0 "pred")
  (.reset-index :drop T (ss d (& (= $sn 0) (~ $train)) "chose-key1"))))
0.42

Worse than trivial; oops.

Regression on run length

A Stack Exchange answer suggests a model for the probability of an option that only depends on how many times in a row that option has been selected previously.

(defn rlm-encode [values]
  (setv accums [0 0])
  (setv hist (np.array (+ [[0 0]] (lfor
    x values
    :do (setv (get accums (not x)) 0)
    :do (+= (get accums x) 1)
    (.copy accums)))))
  (np.column-stack [
    hist
    (** hist 2)
    (> (geta hist : 0) 0)]))

(setv ooo (wcby (ss d $train) $sn
  (setv scaler (sklearn.preprocessing.StandardScaler :with-mean F))
  (setv model (sklearn.linear-model.LogisticRegressionCV
    :Cs 100 :cv 10 :penalty "l2" :scoring "accuracy"))
  (setv X (.fit-transform scaler (rlm-encode $chose-key1)))
  (.fit model (geta X (: 0 n-train)) $chose-key1)
  (setv preds (.tolist $chose-key1))
  (for  [… (range n-test)]
    (.append preds (first (.predict model
      (geta (.transform scaler (rlm-encode preds)) None -1 :)))))
  (pd.DataFrame (dict
    :trial (range n-train N)
    :pred (cut preds n-train)))))

1
1
(lfor [X x] (.groupby ooo "sn")
  [(pretty-choice-seq ($ x pred))])
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
OOOOOOxOOOOOOOxOOOOOOOxOOOOOOOxOOOOOOOxOOOOOOOxOOOOOOOxOOOOOOOxOOOOOOOxOOOOOOOxOOOOOOOxOOOOOOOxOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
(.mean (=
  (getl ooo 1 "pred")
  (.reset-index :drop T (ss d (& (= $sn 0) (~ $train)) "chose-key1"))))
0.48

References

Arfer, K. B., & Luhmann, C. C. (2015). The predictive accuracy of intertemporal-choice models. British Journal of Mathematical and Statistical Psychology, 68(2), 326–341. doi:10.1111/bmsp.12049. Retrieved from http://arfer.net/projects/builder/paper