Guidelines for asking data-analysis questions

Created 30 Jun 2016 • Last modified 20 Jul 2016

Please follow these few guidelines when asking a question about data analysis, whether on Cross Validated or elsewhere. These details may seem unnecessary, but providing them is likely to get you better, more complete, and more useful advice (partly by avoiding what's called the XY problem).

  1. Clarify which of these things you're doing:
    1. Working on a homework problem
    2. Learning about data analysis, statistics, or machine learning on your own
    3. Analyzing data as part of a scientific research project
    4. Analyzing data to solve a real-world problem
  2. Describe your situation completely.
    1. Clarify whether you already have data or you're in some kind of planning phase (or both).
    2. If you're doing a scientific research project, describe the research goal (e.g., estimating the maximum running speed of an average cheetah, or developing a better treatment for people with a fear of elevators).
      1. Describe the design of your study. How many times was each subject (or other unit of analysis) measured on each variable, and which measurements happened when? Are there any groups subjects are organized into, whether naturally (like countries or schools) or artificially (like treatment conditions or therapy groups)?
    3. If you're trying to solve a real-world problem, describe what you're trying to accomplish (e.g., predict how American Congressmen will vote on a certain bill, or design a banner ad that maximizes clicks per impression).
    4. Mention your (expected) sample size, and name and describe every variable (or, if you have a very large number of variables, count them and describe their overall organization).
      1. If your question is about one or a few particular variables, provide plots.
      2. Providing your raw data, or even just a few rows of your raw data, is always helpful.