The Future of Data Analysis, Tukey 1962

“Individual parts of mathematical statistics must look for their justification toward either data analysis or pure mathematics.”

Large parts of data analysis are (but as a whole is larger and more varied than):

  • inferential in the sample-to-population sense
  • incisive, revealing indications imperceptible by simple examination of raw data
  • allocation, guiding us in observation, experimentation, or analysis

How can new data analysis be initiated?

  1. seek out wholly new questions to be answered
  2. tackle old problems in more realistic frameworks
  3. seek out unfamiliar summaries of observational material, and establish their useful properties
  4. still more novelty can come from finding, and evading, still deeper lying constraints

data analysis is a science because it has 1) intellectual content, 2) organization into an understandable form, and 3) reliable upon the test of experience as the ultimate standard of validity

(Mathematics is not a science: standard of validity is an agreed-upon logical consistency and provability)

Data analysis, and the parts of statistics which adhere to it, must…take on the characteristics of science rather than those of mathematics:

  1. must seek for scope and usefulness rather than security
  2. must be willing to err moderately often in order that inadequate evidence shall more often suggest the right answer
  3. must use mathematical argument and mathematical results as bases for judgment rather than as bases for proof or stamps of validity

data analysis is intrinsically an empirical science

data analysis must look to a very heavy emphasis on judgement:

  1. judgement based upon the experience of the particular field of subject matter from which data come
  2. judgement based upon a broad experience with how particular techniques of data analysis have worked out in a variety of fields of application
  3. judgement based upon abstract results about the properties of particular techniques, whether obtained by mathematical proofs or empirical sampling

a scientist’s actions are guided, not determined, by what has been derived from theory of established by experiment

scientists know that they will sometimes be wrong; they try not to err too often, but they accept some insecurity as the price of wider scope; data analysts must do the same

“Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.” Data analysis must progress by approximate answers, at best, since its knowledge of what the problem really is will at best be approximate.

Leave a Reply

Your email address will not be published. Required fields are marked *