Appendix: Cross-cultural predictors of rape

Created 30 Sep 2013 • Last modified 5 Feb 2014

It's difficult to tell how much the analyses here use the same data as those of Sanday (1981b). However, Sanday's analyses are pretty simplistic (just Pearson correlations), so these are probably better.

All the below code is in R.

Preliminaries

In this appendix, I will examine cross-cultural predictors of rape using the Standard Cross-Cultural Sample.

I got sscsfac.Rdata from http://dl.dropbox.com/u/9256203/sccsfac.Rda, which I found a link to in the SCCS codebook.

qw = function(...) {m = match.call(expand.dots = FALSE); as.character(m[["..."]])}
load("sccsfac.Rdata")
virtue = qw(v330, v331, v332, v333, v827, v828, v829, v830)

What should be use for our dependent variable? The following look appropriate:

v173: Rape
v174: Frequency of Rape
v667: Rape: Incidents Reports, Or Thought of as Means of Punishment Women, or Part of Ceremony

v667 is described by the coder (Sanday, 1981a, p. 253) thus: "Incidents of rape are reported (suggesting that rape is not uncommon) or rape is thought of as a means for punishing women, or rape is part of a ceremony." It is a dichotomous variable, but going by Palmer (1989), a value of Absent doesn't indicate the complete absence of rape. Rather, I guess, we should regard Present as indicating more common or more thoroughly institutionalized rape than Absent.

Sexual-inhibition predictors

Here are the candidate predictors:

v159: Talk about Sex
v161: Sex Believed Dangerous
v166: Frequency of Premarital Sex—Male
v167: Frequency of Premarital Sex—Female
v602: an Explicit View That Sexual Activity Is Dangerous Or Contaminating
v330 to v333: Inculcation of the value of sexual restraint in children
v827 to v830: Value of sexual expression and non-restraint in adolescents

(v330 to v333 and v827 to v830 are collectively virtue.)

Let's see how many cases there are for each combination of IV and DV. I'm not going to attempt any sort of missing-value interpolation here, so we need complete cases for whatever variables we want to use.

t(sapply(c("v159", "v161", "v166", "v167", "v602", virtue), function(predictor)
    sapply(qw(v173, v174, v667), function(crit)
        sum(!is.na(sccsfac[[predictor]]) & !is.na(sccsfac[[crit]])))))

	v173	v174	v667
v159	16	13	37
v161	11	5	19
v166	30	24	64
v167	29	23	67
v602	15	13	32
v330	34	26	84
v331	33	25	82
v332	35	26	90
v333	36	26	89
v827	35	30	84
v828	36	30	83
v829	33	29	82
v830	36	30	83

v159, v161, and v602 have too few cases. All the DVs do, too, except for v667. There we have about 65 cases for the two premarital-sex variables (v166 and v167) and about 80 cases for the eight virtue variables. The virtue variables are probably worth inspecting independently of the premarital-sex variables so we don't have to lose those twenty-odd cases. Let's consider the premarital-sex variables first.

Premarital-sex frequency

d = na.omit(sccsfac[qw(v166, v167, v667)])
d = data.frame(
    # Recode v166 and v167 so higher values mean more premarital
    # sex.
    prem.male = 4 - as.integer(d$v166),
    prem.female = 4 - as.integer(d$v167),
    rape = d$v667 == "Present")
c("Complete cases" = nrow(d))

	value
Complete cases	63

It turns out that prem.male and prem.female are generally equal, and neither varies much:

with(d, table(prem.male, prem.female))

	0	1	2	3
0	7	0	0	0
1	0	6	0	0
2	0	2	12	0
3	0	1	2	33

Suppose we then just use prem.female.

with(d, table(prem.female, rape))

	FALSE	TRUE
0	3	4
1	5	4
2	4	10
3	16	17

It looks like rape is about 50% true across with board with an oddball at prem.female == 2. Let's try logistic regression just for laughs.

model = glm(rape ~ prem.female, data = d,
    family = binomial(link = "logit"))

Not surprisingly, this model is completely useless. The probability it predicts of rape being true is greater than 1/2 for every case:

round(d = 3, min(predict(model, type = "response")))

	value
	0.549

Conclusion: across societies, premarital-sex frequency is not predictive of rape.

Virtue variables

d = na.omit(sccsfac[c(virtue, "v667")])
c("Complete cases" = nrow(d))

	value
Complete cases	67

This number is similar to that of the previous analysis, but unfortunately, the intersection of cases is substantially smaller:

nrow(na.omit(sccsfac[c(virtue, "v166", "v167", "v667")]))

	value
	49

So we'll do an analysis with just the virtue variables and the DV (v667).

The virtue variables have many high correlations:

corm = cor(d[virtue], method = "spearman")
corm[!upper.tri(corm)] = NA
round(d = 2, corm)

	v331	v332	v333	v827	v828	v829	v830
v330	0.94	0.77	0.75	-0.47	-0.50	-0.46	-0.60
v331		0.79	0.81	-0.55	-0.56	-0.50	-0.63
v332			0.90	-0.51	-0.52	-0.51	-0.62
v333				-0.60	-0.60	-0.58	-0.69
v827					0.92	0.69	0.74
v828						0.65	0.80
v829							0.89
v830

Let's try summarizing them with a principal component or two. We'll do PCA based on the Spearman correlation matrix, since these variables are all on rating scales.

dvr = d[virtue]
for (i in seq_along(dvr))
    dvr[,i] = rank(dvr[,i])
round(d = 2, eigen(cor(dvr))$values / ncol(dvr))

	value
	0.71 0.15 0.06 0.04 0.02 0.01 0.01 0.01

The first PC accounts for the bulk of the variance (71%).

virt.pca = prcomp(dvr, center = T, scale = T)
round(d = 2, virt.pca$rotation[,1:3])

	PC1	PC2	PC3
v330	0.34	0.42	-0.07
v331	0.36	0.37	-0.13
v332	0.35	0.35	0.08
v333	0.37	0.24	0.05
v827	-0.34	0.40	0.46
v828	-0.35	0.38	0.47
v829	-0.33	0.36	-0.64
v830	-0.37	0.27	-0.35

Here are loadings from the first three PCs. On the first, the loadings are such that positive values of the PC indicate more sexual restraint. The second is more opaque (its pattern is consistent with the male–female distinction, but not very strongly) and the third doesn't account for much variance (it distinguishes between the "sexual expression" and "sexual nonrestraint" variables, which the authors say should be added together anyway). So let's just use the first as a general measure of sexual restraint.

dm = data.frame(
    restraint = virt.pca$x[,"PC1"],
    rape = d$v667 == "Present")
model = glm(rape ~ restraint, data = dm,
    family = binomial(link = "logit"))

This time, at least, we have predicted probabilities on both sides of .5:

table(predict(model, type = "response") >= .5)

	count
FALSE	19
TRUE	48

So how well can this model fit its own training data?

round(d = 2,
    mean((predict(model, type = "response") >= .5) == model$y))

	value
	0.48

Amazingly, the model has achieved a training-set accuracy of less than 50%.

Okay, that was a bust; how about adding PC2 for an additional 15% of the variance in the original IVs?

dm = data.frame(
    restraint.pc1 = virt.pca$x[,"PC1"],
    restraint.pc2 = virt.pca$x[,"PC2"],
    rape = d$v667 == "Present")
model = glm(rape ~ restraint.pc1 * restraint.pc2, data = dm,
    family = binomial(link = "logit"))

round(d = 2,
    mean((predict(model, type = "response") > .5) == model$y))

	value
	0.6

That's not much bigger than the base rate in the training set:

round(d = 2,
    max(mean(model$y), mean(!model$y)))

	value
	0.52

I'm not going to bother checking test error.

Conclusion: across societies, attitudes about sexual expression in children are not predictive of rape.

Sexism predictors

Here are the candidate predictors:

v582: Intermediate Or Local Political Leaders
v583: Leadership Posts in Kinship Or Extended Family Units
v596: Double Standard in Regard to Premarital Sex
v597: Double Standard in Regard to Extramarital Sex
v604: Voice of the Potential Bride And Groom in Marriage Decisions
v610: Relative Ease of Initiating Divorce
v615: Wife to Husband Institutionalized Deference (guttman Scale)
v621: an Explicit View That Men Should And Do Dominate Their Wives
v620: Physical Punishment of the Spouse Condoned
v626: Belief That Women Are Generally Inferior to Men
v632: Domestic Authority Scale (for women)
v663: Female Power Guttman Scale constructed from 657-662
v664: Ideology of Male Toughness
v665: Male Segregation
v1767: Ideology of Male Superiority

(v670, "Composite of Male Dominance", overlaps with v667.)

Again, let's see how many cases there are for each combination of IV and DV.

t(sapply(qw(v582, v583, v596, v597, v604, v610, v615, v621, v620, v626, v632, v663, v664, v665, v1767), function(predictor)
    sapply(qw(v173, v174, v667), function(crit)
        sum(!is.na(sccsfac[[predictor]]) & !is.na(sccsfac[[crit]])))))

	v173	v174	v667
v582	17	11	32
v583	10	7	26
v596	17	13	36
v597	19	15	39
v604	18	14	39
v610	21	17	43
v615	19	15	39
v621	17	12	29
v620	18	14	31
v626	21	17	43
v632	21	17	43
v663	29	23	83
v664	20	17	73
v665	24	19	71
v1767	20	16	47

As in the previous section, v173 and v174 have few cases. v667 has more, but still less than 50, with the notable exceptions of v663, v664, and v665.

Here's the sorted number of cases for v667:

sort(dec = T, sapply(qw(v582, v583, v596, v597, v604, v610, v615, v621, v620, v626, v632, v663, v664, v665, v1767), function(predictor)
    sum(!is.na(sccsfac[[predictor]]) & !is.na(sccsfac$v667))))

	value
v663	83
v664	73
v665	71
v1767	47
v610	43
v626	43
v632	43
v597	39
v604	39
v615	39
v596	36
v582	32
v620	31
v621	29
v583	26

It turns out, if you explore the dataset, that the situation with respect to how much missingness there is on which combinations of variables is complicated and confusing. Ultimately, I picked the three predictors that I considered most interesting: v626, v663, and v1767. When considering v1767, we might as well consider v663 at the same time, since there's just a handful of cases that are missing on v663 but not on v1767 or the DV. So we have two lines of analysis.

Belief in female inferiority

Look at the cross-classification table:

table(na.omit(sccsfac[qw(v626, v667)]))

	Absent	Present
Yes	1	10
No such belief	18	14

This table suggests that in almost all cultures with less rape, there is no "Belief That Women Are Generally Inferior to Men". In cultures with more rape, on the other hand, such a belief is reasonably likely, although perhaps still less likely than a 50% chance.

Here is the table with probabilities of rape as a function of belief:

round(d = 2, prop.table(margin = 1,
    table(na.omit(sccsfac[qw(v626, v667)]))))

	Absent	Present
Yes	0.09	0.91
No such belief	0.56	0.44

Without the belief, higher rape is 44% likely, but with it, it's 91% likely. Interesting.

c(accuracy = round(d = 2, with(na.omit(sccsfac[qw(v626, v667)]),
    mean((v626 == "Yes") == (v667 == "Present")))))

	value
accuracy	0.65

The accuracy rate is still not very impressive, since in the no-such-belief case, the rape rate is around 50%.

I won't do a separate (i.e., cross-validatory) estimate of predictive accuracy because I don't think of this model as having any free parameters. (If this IV is to be nontrivially dichotomously used to predict this DV, No such belief must map to Absent and Yes to Present. Vice versa wouldn't make sense.)

Conclusion: among cultures with a belief that women are generally inferior to men, rape is likely to be reasonably common or institutionalized. Among cultures without such a belief, rape rates vary more widely.

Female power and male superiority

Here's the contingency table for female power and rape:

table(na.omit(sccsfac[qw(v663, v667)]))

	Absent	Present
All items absent	3	4
Flexible marriage mores only	3	3
Plus Female nondomestic production	1	2
Plus demand for female produce	1	6
Plus female economic control	4	8
Plus female political participation	24	9
Plus female solidarity groups	7	8

An inconsistent picture. However, the modal level of v663 also seems to be one in which the rape rate is distant from 50%. So let's see how well we can do with logistic regression.

dm = with(na.omit(sccsfac[qw(v663, v667)]), data.frame(
    female.power = as.integer(v663),
    rape = v667 == "Present"))
model = glm(rape ~ female.power, data = dm,
    family = binomial(link = "logit"))

round(d = 2,
    mean((predict(model, type = "response") > .5) == model$y))

	value
	0.6

Not great.

Suppose we add ideology of male superiority. It has only three levels, so let's treat it as nominal.

dm = with(na.omit(sccsfac[qw(v663, v1767, v667)]), data.frame(
    female.power = as.integer(v663),
    male.sup = v1767,
    rape = v667 == "Present"))
model = glm(rape ~ female.power * male.sup, data = dm,
    family = binomial(link = "logit"))

round(d = 2,
    mean((predict(model, type = "response") > .5) == model$y))

	value
	0.67

Better. But all the predictive ability is coming from v1767, actually:

model = glm(rape ~ male.sup, data = dm,
    family = binomial(link = "logit"))
round(d = 2,
    mean((predict(model, type = "response") > .5) == model$y))

	value
	0.65

So let's use just that.

d = with(na.omit(sccsfac[qw(v1767, v667)]), data.frame(
    male.sup = ordered(levels = qw(None, Weak, Strong),
        ifelse(v1767 == "strongly articulated ideology of male superiority", "Strong",
        ifelse(v1767 == "weakly articulated ideology of male superiority", "Weak",
        "None"))),
    rape = v667 == "Present"))
table(d)

	FALSE	TRUE
None	16	11
Weak	2	5
Strong	4	9

Here it is with probabilities:

round(d = 2, prop.table(table(d), 1))

	FALSE	TRUE
None	0.59	0.41
Weak	0.29	0.71
Strong	0.31	0.69

You can see that ideologies of male superiority, weak or strong, predict rape at 70% probability, whereas societies without such an ideology are only 41% likely to have lots of rape.

round(d = 2, mean(with(d,
    (male.sup %in% qw(Weak, Strong)) == rape)))

	value
	0.64

Let's estimate predictive accuracy. The only free parameter I would consider the model to have is whether Weak implies TRUE or FALSE. None will be fixed at FALSE and Strong at TRUE.

cv.folds = 10
set.seed(20)
d$fold = sample(rep(factor(1 : cv.folds), len = nrow(d)))
results = c(recursive = T, lapply(1 : cv.folds, function(fold)
   {test = d[d$fold == fold,]
    train = d[d$fold != fold,]
    weak.rate = mean(subset(train, male.sup == "Weak")$rape)
    weak.pred =
        if (weak.rate < .5) F
        else if (weak.rate > .5) T
        else sample(c(T, F), 1)
    with(test, rape ==
        ifelse(male.sup == "None", F,
        ifelse(male.sup == "Strong", T,
        weak.pred)))}))
round(d = 2, mean(results))

	value
	0.64

The cross-validatory folds always assigned Weak to TRUE, so this is the same number as the training-set accuracy.

Conclusion: across cultures, an ideology of male superiority predicts more rape.

References

Palmer, C. T. (1989). Is rape a cultural universal? A re-examination of the ethnographic data. Ethnology, 28(1), 1–16. Retrieved from http://www.jstor.org/stable/3773639

Sanday, P. R. (1981a). Female power and male dominance: On the origins of sexual inequality. Cambridge: Cambridge University Press. ISBN 978-0-521-23618-8.

Sanday, P. R. (1981b). The socio-cultural context of rape: A cross-cultural study. Journal of Social Issues, 37(4), 5–27. doi:10.1111/j.1540-4560.1981.tb01068.x