Appendix: Cross-cultural predictors of rape
Created 30 Sep 2013 • Last modified 5 Feb 2014
It's difficult to tell how much the analyses here use the same data as those of Sanday (1981b). However, Sanday's analyses are pretty simplistic (just Pearson correlations), so these are probably better.
All the below code is in R.
Preliminaries
In this appendix, I will examine cross-cultural predictors of rape using the Standard Cross-Cultural Sample.
I got sscsfac.Rdata
from http://dl.dropbox.com/u/9256203/sccsfac.Rda, which I found a link to in the SCCS codebook.
qw = function(...) {m = match.call(expand.dots = FALSE); as.character(m[["..."]])} load("sccsfac.Rdata") virtue = qw(v330, v331, v332, v333, v827, v828, v829, v830)
What should be use for our dependent variable? The following look appropriate:
- v173
- Rape
- v174
- Frequency of Rape
- v667
- Rape: Incidents Reports, Or Thought of as Means of Punishment Women, or Part of Ceremony
v667 is described by the coder (Sanday, 1981a, p. 253) thus: "Incidents of rape are reported (suggesting that rape is not uncommon) or rape is thought of as a means for punishing women, or rape is part of a ceremony." It is a dichotomous variable, but going by Palmer (1989), a value of Absent
doesn't indicate the complete absence of rape. Rather, I guess, we should regard Present
as indicating more common or more thoroughly institutionalized rape than Absent
.
Sexual-inhibition predictors
Here are the candidate predictors:
- v159
- Talk about Sex
- v161
- Sex Believed Dangerous
- v166
- Frequency of Premarital Sex—Male
- v167
- Frequency of Premarital Sex—Female
- v602
- an Explicit View That Sexual Activity Is Dangerous Or Contaminating
- v330 to v333
- Inculcation of the value of sexual restraint in children
- v827 to v830
- Value of sexual expression and non-restraint in adolescents
(v330 to v333 and v827 to v830 are collectively virtue
.)
Let's see how many cases there are for each combination of IV and DV. I'm not going to attempt any sort of missing-value interpolation here, so we need complete cases for whatever variables we want to use.
t(sapply(c("v159", "v161", "v166", "v167", "v602", virtue), function(predictor) sapply(qw(v173, v174, v667), function(crit) sum(!is.na(sccsfac[[predictor]]) & !is.na(sccsfac[[crit]])))))
v173 | v174 | v667 | |
---|---|---|---|
v159 | 16 | 13 | 37 |
v161 | 11 | 5 | 19 |
v166 | 30 | 24 | 64 |
v167 | 29 | 23 | 67 |
v602 | 15 | 13 | 32 |
v330 | 34 | 26 | 84 |
v331 | 33 | 25 | 82 |
v332 | 35 | 26 | 90 |
v333 | 36 | 26 | 89 |
v827 | 35 | 30 | 84 |
v828 | 36 | 30 | 83 |
v829 | 33 | 29 | 82 |
v830 | 36 | 30 | 83 |
v159, v161, and v602 have too few cases. All the DVs do, too, except for v667. There we have about 65 cases for the two premarital-sex variables (v166 and v167) and about 80 cases for the eight virtue variables. The virtue variables are probably worth inspecting independently of the premarital-sex variables so we don't have to lose those twenty-odd cases. Let's consider the premarital-sex variables first.
Premarital-sex frequency
d = na.omit(sccsfac[qw(v166, v167, v667)]) d = data.frame( # Recode v166 and v167 so higher values mean more premarital # sex. prem.male = 4 - as.integer(d$v166), prem.female = 4 - as.integer(d$v167), rape = d$v667 == "Present") c("Complete cases" = nrow(d))
value | |
---|---|
Complete cases | 63 |
It turns out that prem.male
and prem.female
are generally equal, and neither varies much:
with(d, table(prem.male, prem.female))
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | 7 | 0 | 0 | 0 |
1 | 0 | 6 | 0 | 0 |
2 | 0 | 2 | 12 | 0 |
3 | 0 | 1 | 2 | 33 |
Suppose we then just use prem.female
.
with(d, table(prem.female, rape))
FALSE | TRUE | |
---|---|---|
0 | 3 | 4 |
1 | 5 | 4 |
2 | 4 | 10 |
3 | 16 | 17 |
It looks like rape
is about 50% true across with board with an oddball at prem.female == 2
. Let's try logistic regression just for laughs.
model = glm(rape ~ prem.female, data = d,
family = binomial(link = "logit"))
Not surprisingly, this model is completely useless. The probability it predicts of rape
being true is greater than 1/2 for every case:
round(d = 3, min(predict(model, type = "response")))
value | |
---|---|
0.549 |
Conclusion: across societies, premarital-sex frequency is not predictive of rape.
Virtue variables
d = na.omit(sccsfac[c(virtue, "v667")]) c("Complete cases" = nrow(d))
value | |
---|---|
Complete cases | 67 |
This number is similar to that of the previous analysis, but unfortunately, the intersection of cases is substantially smaller:
nrow(na.omit(sccsfac[c(virtue, "v166", "v167", "v667")]))
value | |
---|---|
49 |
So we'll do an analysis with just the virtue variables and the DV (v667).
The virtue variables have many high correlations:
corm = cor(d[virtue], method = "spearman") corm[!upper.tri(corm)] = NA round(d = 2, corm)
v330 | v331 | v332 | v333 | v827 | v828 | v829 | v830 | |
---|---|---|---|---|---|---|---|---|
v330 | 0.94 | 0.77 | 0.75 | -0.47 | -0.50 | -0.46 | -0.60 | |
v331 | 0.79 | 0.81 | -0.55 | -0.56 | -0.50 | -0.63 | ||
v332 | 0.90 | -0.51 | -0.52 | -0.51 | -0.62 | |||
v333 | -0.60 | -0.60 | -0.58 | -0.69 | ||||
v827 | 0.92 | 0.69 | 0.74 | |||||
v828 | 0.65 | 0.80 | ||||||
v829 | 0.89 | |||||||
v830 |
Let's try summarizing them with a principal component or two. We'll do PCA based on the Spearman correlation matrix, since these variables are all on rating scales.
dvr = d[virtue] for (i in seq_along(dvr)) dvr[,i] = rank(dvr[,i]) round(d = 2, eigen(cor(dvr))$values / ncol(dvr))
value | |
---|---|
0.71 0.15 0.06 0.04 0.02 0.01 0.01 0.01 |
The first PC accounts for the bulk of the variance (71%).
virt.pca = prcomp(dvr, center = T, scale = T) round(d = 2, virt.pca$rotation[,1:3])
PC1 | PC2 | PC3 | |
---|---|---|---|
v330 | 0.34 | 0.42 | -0.07 |
v331 | 0.36 | 0.37 | -0.13 |
v332 | 0.35 | 0.35 | 0.08 |
v333 | 0.37 | 0.24 | 0.05 |
v827 | -0.34 | 0.40 | 0.46 |
v828 | -0.35 | 0.38 | 0.47 |
v829 | -0.33 | 0.36 | -0.64 |
v830 | -0.37 | 0.27 | -0.35 |
Here are loadings from the first three PCs. On the first, the loadings are such that positive values of the PC indicate more sexual restraint. The second is more opaque (its pattern is consistent with the male–female distinction, but not very strongly) and the third doesn't account for much variance (it distinguishes between the "sexual expression" and "sexual nonrestraint" variables, which the authors say should be added together anyway). So let's just use the first as a general measure of sexual restraint.
dm = data.frame( restraint = virt.pca$x[,"PC1"], rape = d$v667 == "Present") model = glm(rape ~ restraint, data = dm, family = binomial(link = "logit"))
This time, at least, we have predicted probabilities on both sides of .5:
table(predict(model, type = "response") >= .5)
count | |
---|---|
FALSE | 19 |
TRUE | 48 |
So how well can this model fit its own training data?
round(d = 2,
mean((predict(model, type = "response") >= .5) == model$y))
value | |
---|---|
0.48 |
Amazingly, the model has achieved a training-set accuracy of less than 50%.
Okay, that was a bust; how about adding PC2 for an additional 15% of the variance in the original IVs?
dm = data.frame( restraint.pc1 = virt.pca$x[,"PC1"], restraint.pc2 = virt.pca$x[,"PC2"], rape = d$v667 == "Present") model = glm(rape ~ restraint.pc1 * restraint.pc2, data = dm, family = binomial(link = "logit"))
round(d = 2,
mean((predict(model, type = "response") > .5) == model$y))
value | |
---|---|
0.6 |
That's not much bigger than the base rate in the training set:
round(d = 2, max(mean(model$y), mean(!model$y)))
value | |
---|---|
0.52 |
I'm not going to bother checking test error.
Conclusion: across societies, attitudes about sexual expression in children are not predictive of rape.
Sexism predictors
Here are the candidate predictors:
- v582
- Intermediate Or Local Political Leaders
- v583
- Leadership Posts in Kinship Or Extended Family Units
- v596
- Double Standard in Regard to Premarital Sex
- v597
- Double Standard in Regard to Extramarital Sex
- v604
- Voice of the Potential Bride And Groom in Marriage Decisions
- v610
- Relative Ease of Initiating Divorce
- v615
- Wife to Husband Institutionalized Deference (guttman Scale)
- v621
- an Explicit View That Men Should And Do Dominate Their Wives
- v620
- Physical Punishment of the Spouse Condoned
- v626
- Belief That Women Are Generally Inferior to Men
- v632
- Domestic Authority Scale (for women)
- v663
- Female Power Guttman Scale constructed from 657-662
- v664
- Ideology of Male Toughness
- v665
- Male Segregation
- v1767
- Ideology of Male Superiority
(v670, "Composite of Male Dominance", overlaps with v667.)
Again, let's see how many cases there are for each combination of IV and DV.
t(sapply(qw(v582, v583, v596, v597, v604, v610, v615, v621, v620, v626, v632, v663, v664, v665, v1767), function(predictor) sapply(qw(v173, v174, v667), function(crit) sum(!is.na(sccsfac[[predictor]]) & !is.na(sccsfac[[crit]])))))
v173 | v174 | v667 | |
---|---|---|---|
v582 | 17 | 11 | 32 |
v583 | 10 | 7 | 26 |
v596 | 17 | 13 | 36 |
v597 | 19 | 15 | 39 |
v604 | 18 | 14 | 39 |
v610 | 21 | 17 | 43 |
v615 | 19 | 15 | 39 |
v621 | 17 | 12 | 29 |
v620 | 18 | 14 | 31 |
v626 | 21 | 17 | 43 |
v632 | 21 | 17 | 43 |
v663 | 29 | 23 | 83 |
v664 | 20 | 17 | 73 |
v665 | 24 | 19 | 71 |
v1767 | 20 | 16 | 47 |
As in the previous section, v173 and v174 have few cases. v667 has more, but still less than 50, with the notable exceptions of v663, v664, and v665.
Here's the sorted number of cases for v667:
sort(dec = T, sapply(qw(v582, v583, v596, v597, v604, v610, v615, v621, v620, v626, v632, v663, v664, v665, v1767), function(predictor) sum(!is.na(sccsfac[[predictor]]) & !is.na(sccsfac$v667))))
value | |
---|---|
v663 | 83 |
v664 | 73 |
v665 | 71 |
v1767 | 47 |
v610 | 43 |
v626 | 43 |
v632 | 43 |
v597 | 39 |
v604 | 39 |
v615 | 39 |
v596 | 36 |
v582 | 32 |
v620 | 31 |
v621 | 29 |
v583 | 26 |
It turns out, if you explore the dataset, that the situation with respect to how much missingness there is on which combinations of variables is complicated and confusing. Ultimately, I picked the three predictors that I considered most interesting: v626, v663, and v1767. When considering v1767, we might as well consider v663 at the same time, since there's just a handful of cases that are missing on v663 but not on v1767 or the DV. So we have two lines of analysis.
Belief in female inferiority
Look at the cross-classification table:
table(na.omit(sccsfac[qw(v626, v667)]))
Absent | Present | |
---|---|---|
Yes | 1 | 10 |
No such belief | 18 | 14 |
This table suggests that in almost all cultures with less rape, there is no "Belief That Women Are Generally Inferior to Men". In cultures with more rape, on the other hand, such a belief is reasonably likely, although perhaps still less likely than a 50% chance.
Here is the table with probabilities of rape
as a function of belief:
round(d = 2, prop.table(margin = 1, table(na.omit(sccsfac[qw(v626, v667)]))))
Absent | Present | |
---|---|---|
Yes | 0.09 | 0.91 |
No such belief | 0.56 | 0.44 |
Without the belief, higher rape is 44% likely, but with it, it's 91% likely. Interesting.
c(accuracy = round(d = 2, with(na.omit(sccsfac[qw(v626, v667)]), mean((v626 == "Yes") == (v667 == "Present")))))
value | |
---|---|
accuracy | 0.65 |
The accuracy rate is still not very impressive, since in the no-such-belief case, the rape
rate is around 50%.
I won't do a separate (i.e., cross-validatory) estimate of predictive accuracy because I don't think of this model as having any free parameters. (If this IV is to be nontrivially dichotomously used to predict this DV, No such belief
must map to Absent
and Yes
to Present
. Vice versa wouldn't make sense.)
Conclusion: among cultures with a belief that women are generally inferior to men, rape is likely to be reasonably common or institutionalized. Among cultures without such a belief, rape rates vary more widely.
Female power and male superiority
Here's the contingency table for female power and rape:
table(na.omit(sccsfac[qw(v663, v667)]))
Absent | Present | |
---|---|---|
All items absent | 3 | 4 |
Flexible marriage mores only | 3 | 3 |
Plus Female nondomestic production | 1 | 2 |
Plus demand for female produce | 1 | 6 |
Plus female economic control | 4 | 8 |
Plus female political participation | 24 | 9 |
Plus female solidarity groups | 7 | 8 |
An inconsistent picture. However, the modal level of v663 also seems to be one in which the rape rate is distant from 50%. So let's see how well we can do with logistic regression.
dm = with(na.omit(sccsfac[qw(v663, v667)]), data.frame( female.power = as.integer(v663), rape = v667 == "Present")) model = glm(rape ~ female.power, data = dm, family = binomial(link = "logit"))
round(d = 2,
mean((predict(model, type = "response") > .5) == model$y))
value | |
---|---|
0.6 |
Not great.
Suppose we add ideology of male superiority. It has only three levels, so let's treat it as nominal.
dm = with(na.omit(sccsfac[qw(v663, v1767, v667)]), data.frame( female.power = as.integer(v663), male.sup = v1767, rape = v667 == "Present")) model = glm(rape ~ female.power * male.sup, data = dm, family = binomial(link = "logit"))
round(d = 2,
mean((predict(model, type = "response") > .5) == model$y))
value | |
---|---|
0.67 |
Better. But all the predictive ability is coming from v1767, actually:
model = glm(rape ~ male.sup, data = dm, family = binomial(link = "logit")) round(d = 2, mean((predict(model, type = "response") > .5) == model$y))
value | |
---|---|
0.65 |
So let's use just that.
d = with(na.omit(sccsfac[qw(v1767, v667)]), data.frame( male.sup = ordered(levels = qw(None, Weak, Strong), ifelse(v1767 == "strongly articulated ideology of male superiority", "Strong", ifelse(v1767 == "weakly articulated ideology of male superiority", "Weak", "None"))), rape = v667 == "Present")) table(d)
FALSE | TRUE | |
---|---|---|
None | 16 | 11 |
Weak | 2 | 5 |
Strong | 4 | 9 |
Here it is with probabilities:
round(d = 2, prop.table(table(d), 1))
FALSE | TRUE | |
---|---|---|
None | 0.59 | 0.41 |
Weak | 0.29 | 0.71 |
Strong | 0.31 | 0.69 |
You can see that ideologies of male superiority, weak or strong, predict rape at 70% probability, whereas societies without such an ideology are only 41% likely to have lots of rape.
round(d = 2, mean(with(d,
(male.sup %in% qw(Weak, Strong)) == rape)))
value | |
---|---|
0.64 |
Let's estimate predictive accuracy. The only free parameter I would consider the model to have is whether Weak
implies TRUE
or FALSE
. None
will be fixed at FALSE
and Strong
at TRUE
.
cv.folds = 10 set.seed(20) d$fold = sample(rep(factor(1 : cv.folds), len = nrow(d))) results = c(recursive = T, lapply(1 : cv.folds, function(fold) {test = d[d$fold == fold,] train = d[d$fold != fold,] weak.rate = mean(subset(train, male.sup == "Weak")$rape) weak.pred = if (weak.rate < .5) F else if (weak.rate > .5) T else sample(c(T, F), 1) with(test, rape == ifelse(male.sup == "None", F, ifelse(male.sup == "Strong", T, weak.pred)))})) round(d = 2, mean(results))
value | |
---|---|
0.64 |
The cross-validatory folds always assigned Weak
to TRUE
, so this is the same number as the training-set accuracy.
Conclusion: across cultures, an ideology of male superiority predicts more rape.
References
Palmer, C. T. (1989). Is rape a cultural universal? A re-examination of the ethnographic data. Ethnology, 28(1), 1–16. Retrieved from http://www.jstor.org/stable/3773639
Sanday, P. R. (1981a). Female power and male dominance: On the origins of sexual inequality. Cambridge: Cambridge University Press. ISBN 978-0-521-23618-8.
Sanday, P. R. (1981b). The socio-cultural context of rape: A cross-cultural study. Journal of Social Issues, 37(4), 5–27. doi:10.1111/j.1540-4560.1981.tb01068.x