ISSN 1393-614X Minerva - An Internet Journal of Philosophy Vol. 8 2004.
ADVANCE IN MONTE CARLO SIMULATIONS AND ROBUSTNESS STUDY AND THEIR IMPLICATIONS FOR THE DISPUTE IN PHILOSOPHY OF MATHEMATICS
Chong Ho Yu
Both Carnap and Quine made significant contributions to the philosophy of mathematics despite their diversed views. Carnap endorsed the dichotomy between analytic and synthetic knowledge and classified certain mathematical questions as internal questions appealing to logic and convention. On the contrary, Quine was opposed to the analytic-synthetic distinction and promoted a holistic view of scientific inquiry. The purpose of this paper is to argue that in light of the recent advancement of experimental mathematics such as Monte Carlo simulations, limiting mathematical inquiry to the domain of logic is unjustified. Robustness studies implemented in Monte Carlo Studies demonstrate that mathematics is on par with other experimental-based sciences.
Carnap and Quine made tremendous contributions to numerous areas of modern philosophy, including the philosophy of mathematics. Carnap endorsed the dichotomy between analytic and synthetic knowledge and classified certain mathematical questions as internal questions appealing to logic and convention. In addition, he regarded the ontological question about the reality of mathematical objects as a pseudo-question. On the contrary, Quine made an ontological commitment to mathematical entities by asserting that mathematical objects are on par with physical objects. This assertion is tied to his belief that there is no first philosophy prior to natural science. In addition, Quine was opposed to the analytic-synthetic distinction and promoted a holistic view of scientific inquiry. On one hand, Quine recognized that there are differences between logic/mathematics and physical sciences. On the other hand, Quine maintained that it is a mistake to hold a dualistic view. For Quine logic and mathematics are essentially empirically-based and they are subject to revision according to new evidence. The purpose of this paper is to argue that in light of the recent advancement of experimental mathematics such as Monte Carlo simulations, limiting mathematical inquiry to the domain of logic is unjustified. Robustness studies implemented in Monte Carlo Studies demonstrate that mathematics is on a par with other experimental-based sciences.
Quine (1966/1976) wrote, “Carnap more than anyone else was the embodiment of logical positivism, logical empiricism, the Vienna circle” (p. 40). To discuss Carnap’s philosophy of mathematics, it is essential to illustrate the ideas of the Vienna circle, as well as how members of the Vienna circle adopted and rejected other ideas. In the following, the theories of Frege, Russell, Whitehead and Gödel will be briefly introduced. These are by no means the only ones who are related to the formulation of Carnap’s and the Vienna Circle’s notions. Nonetheless, since this article concentrates on the argument against the logical view of mathematics endorsed by Carnap, discussion of Frege, Russell, Whitehead and Gödel is germane to the topic.
DIFFERENT VIEWS ON THE PHILOSOPHY OF MATHEMATICS
The Vienna Circle
Logical positivism, which originated with the Vienna circle, embraced verificationism as the criterion for obtaining meaningful knowledge. The verification criterion is not just a demand for evidence. Verification does not mean that, with other things being equal, a proposition that can be verified is of vastly greater significance than one that cannot. Rather, the verification thesis is much more restrictive than the above. According to logical positivism, a statement is meaningless if verification is not possible or the criteria for verification are not clear (Ayer, 1936; Schlick, 1959). To be specific, the verification principle is not an account of the relative importance of propositions, but a definition of meaning. Meaning and verifiability are almost interchangeable (Werkmeister, 1937). The principle of verification was used by the Vienna Circle as a tool to counteract metaphysics by enforcing adherence to empiricism. However, one may then ask how we can substantiate mathematical knowledge when mathematics is considered by many to be a form of knowledge that cannot be verified by sensory input. Following the strict criterion of verifictionism, the analytic philosopher Ayer (1946) has said that mathematics is nonsense. In his view, mathematics says nothing about the world. What it can accomplish is to enlighten us how to manipulate symbols.
Russell and Whitehead
In order to make sense out of mathematics, logical positivists adopted a view of mathematics in the Frege-Russell-Whitehead tradition. This tradition took care of logic and mathematics, and thus left a separate epistemological problem of non-logical and non-mathematical discourse (Isaacson, 2000). According to Frege (1884/1960), logical and mathematical truths are true by virtue of the nature of thought. This notion is further expanded by Russell, and also by collaboration between Russell and Whitehead.
In Russell's view (1919), in order to uncover the underlying structures of mathematical objects, mathematics should be reduced to a more basic element, namely, logic. Thus, his approach is termed logical atomism. Russell's philosophy of mathematics is mainly concerned with geometry. At the time of Russell, the existence of geometric objects and the epistemology of geometry could not be answered by empiricists. In geometry a line can be broken down infinitely to a smaller line. We can neither see nor feel a mathematical line or a mathematical point. Thus, it seems that geometric objects are not objects of empirical perception (sense experience). If this is true, how could conceptions of such objects and their properties be derived from experience as an empiricist would require? Russell's answer is that although geometric objects are theoretical objects, we can still understand geometric structures by applying logic to the study of relationships among those objects: "What matters in mathematics, and to a very great extent in physical science, is not the intrinsic nature of our terms, but the logical nature of their inter-relations" (1919, p.59).
Whitehead and Russell’s work on “Principia Mathematica” (1910/1950) is a bold attempt to develop a fully self-sufficient mathematical system through logical relationships. For Russell and Whitehead, mathematics is a purely formal science. The existence of mathematical objects is conditional upon structures. If a certain structure or system exists, then there also exist some other structures or systems whose existence follows logically from the existence of the former. In their view, mathematics could be reduced to logical relationships within the logical system without external references. The Frege-Russell-Whitehead tradition is considered the logical approach to mathematics. This approach is said to be a solution to infinite regress or circular proof.
However, the proposal by Whitehead and Russell is seriously challenged by Gödel. Gödel (1944, 1961) proposed that a complete and consistent mathematical system is inherently impossible, and within any consistent mathematical system there are propositions that cannot be proved or disproved on the basis of the axioms within that system. Thus, the consequences drawn from mathematical axioms have meaning only in a hypothetical sense. In addition, mathematical propositions cannot be proved by using combinations of symbols without introducing more abstract elements. In Gödel’s sense, logicism in mathematics does not solve the problem of infinite regress or circular proof.
In rejecting the logical approach, Gödel took an "intuitionistic" position to mathematics. Unlike Russell, who asserted mathematical structures exist in terms of relationships, Gödel maintained that it is not a question of whether there are some real objects "out there". Rather, our sequences of acts construct our perceptions of so-called "reality" (Tieszen, 1992). According to Gödel, "despite their remoteness from sense experience, we do have something like a perception also of the objects of set theory… I don't see any reason why we should have less confidence in this kind of perception, i.e. in mathematical intuition, than in sense perception" (cited in Lindstrom, 2000, p.123). Indeed, there are followers of Gödel’s even in the late 20th century. Jaffe and Quinn (1993) observed that there is “a trend toward basing mathematics on intuitive reasoning without proof” (p.1).
Carnap disliked ontology and metaphysics. For Carnap intuition is a kind of mysterious and unreliable access to matters of independent fact. Creath (1990a, 1990b) argued that anti-intuition is one of the primary motives of Carnap’s philosophy. Carnap was firmly opposed to the Platonic tradition of accepting "truths" based upon "supposed direct metaphysical insight or grasp of objects or features of things independent of ourselves but inaccessible to ordinary sensory observation." (p. 4) Creath (1900b) pointed out,
Carnap's proposal, then, is to treat the basic axioms of mathematics, of logic, and of the theory of knowledge itself, as well as the sundry other special sentences, as an implicit definition of the terms they contain. The upshot of this is that simultaneously the basic terms are understood with enough meaning for the purpose of mathematics, logic and so on, and the basic claims thereof need no further justification, for we have so chosen our language as to make these particular claims true… On Carnap’s proposal the basic claims are in some sense truths of their own making. It is not that we make objects and features thereof, rather we construct our language in such a way that those claims are true. (p. 6)
Following Poincaré and Hilbert’s assertion that the axioms of mathematics can be constructed as implicit definitions of the terms they contain, Carnap viewed numbers as logical objects and rejected the intuitionist approach to mathematics. Although Gödel’s theorem brought arguably insurmountable difficulties to the Russell-Whitehead project, Carnap still adopted Russell’s logico-analytic method of philosophy, including philosophy of mathematics. By working on logical syntax, Carnap attempted to make philosophy into a normal science in a logical, but not empirical, sense (Wang, 1986). Carnap accepted Russell and Whitehead’s thesis that mathematics can be reduced to logic. Further, Carnap asserted that logic is based on convention and thus it is true by convention. In his essay entitled “Foundations of logic and mathematics” (1971, originally published in 1939), Carnap clearly explained his position on logic and convention:
It is important to be aware of the conventional components in the construction of a language system. This view leads to an unprejudiced investigation of the various forms of new logical systems which differ more or less from the customary form (e.g. the intuitionist logic constructed by Brouwer and Heyting, the systems of logic of modalities as constructed by Lewis and others, the systems of plurivalued logic as constructed by Lukasiewicz and Taski, etc.), and it encourages the construction of further new forms. The task is not to decide which of the different systems is the right logic, but to examine their formal properties and the possibilities for their interpretation and application in science. (pp. 170-171)
The preceding approach is called linguistic conventionalism, in which things can make sense with reference to particular linguistic frameworks. Once we learn the rules of a certain logical and mathematical framework, we have everything we need for knowledge of the required mathematical propositions. In this sense, like the Russell-Whitehead approach, a linguistic framework is a self-contained system.
As mentioned earlier, the verification criterion of logical positivism might face certain difficulties in the context of mathematical proof. Carnap supported a distinction between synthetic and analytical knowledge as a way to delimit the range of application of the verification principle (Isaacson, 2000). To be specific, Carnap (1956) distinguished analytic knowledge from synthetic knowledge, and also internal questions from external questions. An external question is concerned with the existence or reality of the system of entities as a whole. A typical example is, “Is there a white piece of paper on my desk?” This question can be answered by empirical investigation. A question like “Do normal distributions exist?” is also an external question, but for Carnap, it is a pseudo-question that cannot be meaningfully answered at all.
On the other hand, an internal question is about the existence of certain entities within a given framework. Mathematical truths, such as 1+1=2, or a set theoretic truth, are tautology in the sense that they are verified by meanings within a given frame of reference; any revision may lead to a change of meanings. In Carnap’s view it is meaningful to ask a question like “Is there a theoretical sampling t-distribution in the Fisherian significance testing?” In other words, to be real in logic and mathematics is to be an element of the system. Logic and mathematics do not rely on empirical substantiation, because they are empty of empirical content.
Unlike Carnap, Quine did not reject the ontological question of the realness of mathematical entities. Instead, for Quine the existence of mathematical entities should be justified in the way that one justifies the postulation of theoretical entities in physics (Quine, 1957). However, this notion is misunderstood by some mathematicians such as Hersh (1997), and thus needs clarification. Hersh argued that physics depends on machines that accept only finite decimals. No computer can use real numbers that are written in infinite decimals; the microprocessor would be trapped in an infinite process. For example, pi (3.14159…) exists conceptually, but not physically and computationally. While electrons and protons are measurable and accessible, mathematical objects are not. Thus, Hersh was opposed to Quine’s ontological position. Hersh was confused here because he was equating measurability and representation to existence. In the realist sense, the existence of an object does not require that it be known and measured by humans in an exact and precise manner. While the numeric representation of pi does not exist, one could not assert that π also does not exist. Actually, the ontological commitment made by Quine, in which mathematical objects are considered on par with physical objects, is strongly related to his holistic view of epistemology. While Quine asserted that logic/mathematics and physical sciences are different in many aspects, drawing a sharp distinction between them, such as placing logic/mathematics in the analytic camp and putting physical science on the synthetic side, is erroneous. In his well-known paper “Two dogmas of empiricism,” Quine (1951) bluntly rejected not only this dualism, but also reductionism, which will be discussed next.
Quine (1966/1976, originally published in 1936) challenged Carnap’s notion that mathematics is reduced to logic and that logic is true by convention. Quine asserted that logic cannot be reduced to convention, because to derive anything from conventions one still needs logic. Carnap viewed logical and mathematical syntax as a formalization of meaning, but for Quine a formal system must be a formalization of some already meaningful discourse. Moreover, in rejecting the analytic-synthetic dichotomy, Quine rejected the notion that mathematics and logical truths are true definitions and we can construct a logical language through the selection of meaning. A definition is only a form of notation to express one term in form of others. Nothing of significance can follow from a definition itself. For example, in the regression equation, y=a+bx+e, where y is the outcome variable, x is the regressor/predictor, a is the slope, b is the beta weight, and e is the error term, these symbols cannot not help us to find truths; they are nothing more than a shorthand to express a wordy and complicated relationship. For Quine, meaning is a phenomenon of human agency. There is no meaning apart from what we can learn from interaction with the human community. In this sense, logical truths are not purely analytical; rather, constructing logic can be viewed as a type of empirical inquiry (Isaacson, 2000).
Quine (1951) asserted that there are no purely internal questions. Our commitment to a certain framework is never absolute, and no issue is entirely isolated from pragmatic concerns about the possible effects of the revisions of the framework. In Putnam’s (1995) interpretation, Quine’s doctrine implies that even so-called logical truths are subject to revision. This doctrine of revisibility is strongly tied to the holistic theme in Quine’s philosophy. To be specific, the issue of what logic to accept is a matter of what logic, as a part of our actual science, fits the truth that we are establishing in the science that we engaged in (Isaacson, 2000). Logics are open to revision in light of new experience, background knowledge, and a web of theories. According to Quine’s holism, mathematics, like logic, has to be viewed not by itself, but as a part of all-embracing conceptual scheme. In this sense, even so-called mathematical truths are subject to revision, too.
It is essential to further discuss two Quineian notions: revisability of terms and holism, because viewing these Quineian notions as opposition to Carnapian views is a mistake. According to Friedman (2002), criticism of Carnap by Quine is based on Quine’s “misleading” assumption that analytic statements are said to be unrevisable. However, Carnap did not equate analyiticity to unrevisability. It is true that in Carnap’s linguistic conventionalism logical and mathematical principles play a constitutive role. Nevertheless, even if we stay within the same framework, terms can be revised but their meanings would be changed. Further, we could move from one framework to another, whicn contains a different set of principles. Consequently, terms are revised in the process of framework migration.
According to Creath (1991), the holist view that Quine embraced in Quine’s earlier career might be called radical holism. In Quine’s view it is the totality of our beliefs which meets experience or not at all. French scientist Duhem was cited in defense of this holism, but Duhem’s argument was not that extreme. In the Duhemian thesis, scientists do not test a single theory; instead, the test involves a web of hypotheses such as auxiliary assumptions associated with the main hypothesis. On the other hand, radical holism states that in theory testing the matter is concerned with whether the totality of our beliefs meets the experience. Creath (1991) criticized that if that is the case, then all our beliefs are equally well confirmed by experience and also are equally disposed to give up as another.
In Quine’s later career (1990/1992), he modified his holist position to a moderate one, in which we test theories againist a critical mass rather than a totality. A critical mass is a big enough subset of science to imply what to expect from some observation or experiment. The size of this critical mass will vary from case to case. According to Friedman (2002), Carnap explicitly embraced certain portions of holism such as the Duhemian thesis. For Carnap, a linguistic framework is wholly predicated on the idea that logical principles, just like empirical ones, can be revised in light of a web of empirical science. In this sense, the philosophies of Quine and Carnap share the common ground based on the Duhemian thesis.
According to Pyle (1999), Quine viewed moderate holism as an answer to certain questions in philosophy of mathematics, which are central to Carnapian philosophy. Carnap asserted that mathematics is analytic and thus mathematics can be meaningful without empirical context. Moderate holism's answer is that mathematics absorbs the shared empirical content of the critical masses to which it contributes. In addition, Carnap’s analytic position to mathematics makes mathematical truth necessary rather than contingent. Moderate holism's answer is that when a critical mass of sentences jointly implies a false prediction, we could choose what component sentence to revoke. On the other hand, we employ a maxim of “minimum mutilation” (conversativism) to guide our revision, and this accounts for mathematical necessity. Nevertheless, Carnap might not have objections to this, because as mentioned before, Carnap accepted revision of beliefs in light of empirical science. Indeed, moderate holism, as the guiding principle of mathematical and other scientific inquiries, is more reasonable and practical than radical holism.
Carnap’s views on logic and mathematics, such as distinguishing between analytic-synthetic knowledge, reducing mathematics to logic and basing logic on convention, are problematic. Indeed, Quine has deeper insight than Carnap because he asserted that logic and mathematics are based on empirical input in the human community; and thus they are subject to revision.
Statistical Theories and Empirical Evidence
There are many examples of mathematical theories that have been substantively revised in light of new evidence. How the newer Item Response Theory amends Classical True Score Theory is a good example. In the article “New rules of measurement,” prominent statisticians Embretson and Reise (2000) explained why the conventional rules of measurement are inadequate and proposed another set of new rules, which are theoretically better and empirically substantiated. For example, the conventional theory states that the standard error of measurement applies to all scores in a particular population, but Embretson found that the standard error of measurement differs across scores but generalizes across populations.
In addition, R. A. Fisher criticized Neyman’s statistical theory because Fisher asserted that mathematical abstraction to the neglect of scientific applications was useless. He mocked that Neyman was misled by algebraic symbolism (Howie, 2002). Interestingly enough, on some occasion Fisher was also confined by mathematical abstraction and algebraic symbolism. In the theory of maximum likelihood estimation, Fisher suggested that as sample size increases, the estimated parameter gets closer to closer to the true parameter (Eliason, 1993). But in the actual world, the data quality may decrease as the sample size increases. To be specific, when measurement instruments are exposed to the public, the pass rate would rise regardless of the examinee’s ability. In this case the estimation might be farther away from the true parameter! Statisticians could not blindly trust the mathematical properties postulated in the Fisherian theorems.
Someone may argue that the preceding examples have too much “application,” that they are concerned with the relation between a measurement theory and observations, not a “pure” relation among mathematical entities. Nevertheless, on some occasions, even the functional relationship among mathematical entities is not totally immune from empirical influence. For example, the Logit function, by definition, is the natural log of the odd ratio, which is the ratio between the success rate and the failure rate. However, in the context that the rate of failure is the focal interest of the model, the odd ratio can be reversed.
Putting statistical findings in the arena of “applied mathematics” seems to be an acceptable approach to dismissing the argument that mathematics is subject to revisions. Actually, the distinction between pure and applied mathematics is another form of dualism that attempts to place certain mathematics in the logical domain. In the following I argue that there is no sharp demarcation point between them, and mathematics, like the physical sciences, is subject to empirical verification. Empirically verifying mathematical theories does not mean using a mapping approach to draw correspondence between mathematical and physical objects. Counting two apples on the right hand side and two on the left is not a proof that 2+2=4. Instead, empirical verification in mathematics is implemented in computer-based Monte Carlo simulations, in which “behaviors” of numbers and equations are investigated.
Distinction Between Pure and Applied Mathematics
Conventionally speaking, mathematics is divided into pure mathematics and applied mathematics. There is a widespread belief that some branches of mathematics, such as statistics, orient toward application and thus are considered applied mathematics. Interestingly enough, in discussion of the philosophy of mathematics, philosophers tend to cite examples from “theoretical mathematics” such as geometry and algebra, but not “applied” mathematics such as statistics. Although I hesitate to totally tear down the demarcation between pure and applied mathematics, I doubt whether being so-called “pure” or “applied” is the “property” or “essence” of the discipline. As a matter of fact, geometry could be applied to architecture and civil engineering, while statistics can be studied without any reference to empirical measurement. To be specific, a t-test can be asked in an applied manner, such as “Does the IQ mean of Chinese people in Phoenix significantly higher than that of Japanese people in Phoenix?” However, a t-test-related question can be reframed as “Is the mean of set A higher than that of set B given that the Alpha level is 0.5, the power level is 0.75, both sets have equal variances and numbers in each set are normally distributed?” A research question could be directed to the t-test itself: “Would the actual Type I error rate equal the assumed Type I error rate when the Welch’s t-test is applied to a non-normal sample of 30?”
A mathematician can study the last two preceding questions without assigning numbers to any measurement scale or formulating a hypothesis related to mental constructs, social events, or physical objects. He/she could generate numbers in computer to conduct a mathematical experiment. There is another widespread belief that computer-based experimental mathematics is applied mathematics while traditional mathematics is pure. A century ago our ancestors who had no computers relied on paper and pencil to construct theorems, equations, and procedures. Afterwards, they plugged in some numbers for verification. Today these tasks are performed in a more precise and efficient fashion with the aid of computers. However, it is strange to say that mathematics using pencil and paper is pure mathematics while that employing computers is applied.
In brief, I argue that the line between pure and applied mathematics is blurred. Conventional criteria for this demarcation are highly questionable; the subject matter (geometry or statistics) and the tool (pencil or computer) cannot establish the nature of mathematics (pure or applied). In the following discussion I will discuss how mathematicians use Monte Carlo simulations to support my argument that mathematics is not purely logical but rather has empirical elements. Next, I will use an example of a robustness study to demonstrate how traditional claims on certain statistical theories are revised by findings in simulations.
With the advancement of high-powered computers, computer simulation is often employed by mathematicians and statisticians as a research methodology. This school is termed "experimental mathematics" and a journal entitled “Journal of Experimental Mathematics” is specifically devoted to this inquiry (Bailey & Borwein, 2001). Chaitin (1998), a supporter of experimental mathematics, asserted that it is a mistake to regard mathematical axioms as self-evident truths; rather the behaviors of numbers should be verified by computer-based experiments. It is important to differentiate the goal of controlled experiments in psychology, sociology, and engineering from that of experimental mathematics. In the former, the objective is to draw conclusions about mental constructs and physical objects, such as the treatment effectiveness of a counseling program or the efficiency of a microprocessor. In these inquiries, mathematical theories are the frame of reference for making inferences. But in the latter, the research question is directed to the mathematical theories themselves.
Both of them are considered “experiments” because conventional experimental criteria, such as random sampling, random assignment of group membership, manipulation of experimental variables, and control of non-experimental variables, are applied (Cook & Campbell, 1979). Interestingly enough, in terms of the degree of fulfillment of these experimental criteria, experimental mathematics has even more experimental elements than controlled experiments in the social sciences. Consider random sampling first. In social sciences, it is difficult, if not impossible, to collect true random samples. Usually the sample obtained by social scientists is just a convenient sample. For example, a researcher at Arizona State University may recruit participants in the Greater Phoenix area, but he/she rarely obtains subjects from Los Angeles, New York, Dallas, etc., let alone Hong Kong, Beijing, or Seoul. In terms of controlling extraneous variables or conditions that might have an impact on dependent variables, again social sciences face inherent limitations. Human subjects carry multiple dimensions such as personality, family background, religious beliefs, cultural context, etc. It is definitely impossible that the experimenter could isolate or control all other sources of influences outside the experimental setting. On the other hand, computer-based experiments achieve random sampling by using a random number generator. It is argued that some random number generators are not truly random, but the technology has become more and more sophisticated. Actually, even a slightly flawed random number generator could yield a more random sample than one collected in the human community. Also, computer-based experimental mathematics does not suffer the problem of lacking experimental control, because numbers and equations do not have psychological, social, political, religious or cultural dimensions. In brief, the preceding argument is to establish the notion that experimental mathematics is experimental in every traditional sense.
Monte Carlo Simulations and Robustness Study
Traditional parametric tests, such as t-test and ANOVA, require certain parametric assumptions. Typical parametric assumptions are homogeneity of variances, which means the spread of distributions in each group do not significantly differ from each other, and normal distributions, which means the shape of the sample distribution is like a bell-curve. Traditional statistical theories state that the t-test is robust against mild violations of these assumptions; the Satterthwaite t-test is even more resistant against assumption violations; and the F-test in ANOVA is also robust if the sample size is large (please note that in these theories the sample can be composed of observations from humans or a set of numbers without any measurement unit). The test of homogeneity of variance is one the preliminary tests for examining whether assumption violations occur. Since conventional theories state that the preceding tests are robust, Box (1953) mocked the idea of testing the variances prior to applying an F-test: "To make a preliminary test on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to leave port!" (p.333).
However, in recent years statisticians have been skeptical of the conventional theories. Different statisticians have proposed their own theories to counteract the problem of assumption violations (Yu, 2002). For instance,
(1) Some researchers construct non-parametric procedures to evade the problem of parametric test assumptions. As the name implies, non-parametric tests do not require parametric assumptions because interval data are converted to rank-ordered data. Examples of non-parametric tests are the Wilcoxon signed rank test and the Mann-Whitney-Wilcoxon test. Some version of non-parametric method is known as order statistics for its focus on using rank-ordered data. A typical example of it is Cliff’s statistics (Cliff, 1996).
(2) To address the violation problem, some statisticians introduce robust calculations such as Trimmed means and Winsorized means. The trimmed mean approach is to exclude outliers in the two tails of the distribution while the Winsorized mean method “pulls” extreme cases toward the center of the distribution. The Winsorized method is based upon Winsor's principle: All observed distributions are Gaussian in the middle. Other robust procedures such as robust regression involve differential weighting to different observations. In the trimmed mean approach outliers are given a zero weighting while robust regression may assign a “lighter” count, say 0.5, to outliers. Cliff (1996), who endorsed order statistics, was skeptical of the differential weighting of robust procedures. He argued that data analysis should follow the principle of “one observation, one vote.” Mallows and Tukey (1982) also argued against Winsor's principle. In their view, since this approach pays too much attention to the very center of the distribution, it is highly misleading. Instead, Tukey (1986) strongly recommended using data re-expression procedures, which will be discussed next.
(3) In data re-expression, linear or non-linear equations are applied to the data. When the normality assumption is violated, the distribution could be normalized through re-expression. If the variances of two groups are unequal, certain transformation techniques can be used to stabilize the variances. In the case of non-linearity, this technique can be applied to linearize the data. However, Cliff (1996) argued that data transformation confines the conclusion to the arbitrary version of the variables.
(4) Resampling techniques such as the randomization exact test, jackknife, and bootstrap are proposed by some other statisticians as a counter measure against parametric assumption violations (Diaconis & Efron, 1983; Edgington, 1995; Efron & Tibshirani, 1993; Ludbrook & Dudley, 1998). Robust procedures recognize the threat of parametric assumption violations and make adjustments to work around the problem. Data re-expression converts data in order to conform to the parametric assumptions. Resampling is very different from the above remedies, for it is not under the framework of theoretical distributions imposed by classical parametric procedures. For example, in bootstrapping, the sample is duplicated many times and treated as a virtual population. Then samples are drawn from this virtual population to construct an empirical sampling distribution. In short, the resampling school replaces theoretical distributions with empirical distributions. In reaction against resampling, Stephen E. Fienberg criticized that "you're trying to get something for nothing. You use the same numbers over and over again until you get an answer that you can't get any other way. In order to do that, you have to assume something, and you may live to regret that hidden assumption later on" (cited in Peterson, 1991, p. 57).
It is obvious that statisticians such as Winsor, Tukey, Cliff, and Fienberg do not agree with each other on the assumption violation and robustness reinforcement issues. If different mathematical systems, as Russell and Whitehead suggested, are self-contained systems, and if mathematics, as Carnap maintained, is reduced to logic that is based on different conventions, these disputes would never come to a conclusive closure. Within the system of Winsor’s school, the Gaussian distribution is the ideal and all other associated theorems tend to support Winsor’s principle. Within the Tukey’s convention, the logic of re-expression fits well with the notions of distribution normalization, variance stabilization, and trend linearization.
It is important to note that these disputes are not about how well those statistical theories could be applied to particular subject matters such as psychology and physics. Rather, these statistical questions could be asked without reference to measurement, and this is the core argument of the school of data re-expression. For example, researchers who argue against data re-expression complain that it would be absurd to obtain a measurement of people’s IQ and then transform the data like [new variable = 1/(square root of IQ)]. They argue that we could conclude that the average IQ of the Chinese people in Phoenix is significantly higher than that of the Japanese, but it makes no sense to say anything about the difference in terms of 1/(square root of IQ). However, researchers supporting data re-expression argue that the so-called IQ is just a way of obtaining certain numbers, just like using meters or feet to express height. Numbers can be manipulated in their own right without being mapped onto physical measurement units. In a sense non-parametric statistics and order statistics are forms of data re-expression. For example, when we obtain a vector of scores such as [15, 13, 11, 8, 6], we can order the scalars within the vector as [1, 2, 3, 4, 5]. This “transformation” no doubt alters the measurement and, indeed, loses the precision of the original measurement. Nevertheless, these examples demonstrate that statistical questions can be studied regardless of the measurement units, or even without any measurements. Monte Carlo simulation is a typical example of studying statistics without measurement.
As you may notice in the section regarding bootstrapping, statisticians do not even need empirical data obtained from observations to conduct a test; they could “duplicate” data by manipulating existing observations. In bootstrapping, number generation is still based on empirical observations, whereas in Monte Carlo simulations all numbers could be generated in computer only. In recent years, robustness studies using Monte Carlo simulations have been widely employed to evaluate the soundness of mathematical procedures in terms of their departure from idealization and robustness against assumption violations. In Monte Carlo simulations, mathematicians make up strange data (e.g. extremely unequal variances, non-normality) to observe how well those mathematical procedures are robust against the violations. Box is right that we cannot row a boat to test the condition for an ocean liner. But using computers to simulate multi-million cases under hundreds scenarios is really the other way around—now we are testing the weather condition with an ocean liner to tell us whether rowing a boat is safe. Through computer simulations we learn that traditional claims concerning the robustness of certain procedures are either invalid or require additional constraints.
There are numerous Monte Carlo studies in the field of statistics. A recent thorough Monte Carlo study (Thompson, Green, Stockford, Chen, & Lo, 2002; Stockford, Thompson, Lo, Chen, Green, & Yu, 2001) demonstrates how experimental mathematics could refute, or at least challenge, the conventional claims in statistical theories. This study investigates the Type I error rate and statistical power of the various statistical procedures. The Type I error rate is the probability of falsely rejecting the null hypothesis, whereas the statistical power is the probability of correctly rejecting the null hypothesis. In this study, statistical procedures under investigation include the conventional independent-samples t-test, the Satterthwaite independent-samples t-test, the Mann-Whitney-Wilcoxon test (non-parametric test), the test for the difference in trimmed means (robust procedure), and the bootstrap test of the difference in trimmed means (resampling and robust methods). Four factors were manipulated to create 180 conditions: form of the population distribution, variance heterogeneity, sample size, and mean differences. Manipulation of these factors is entirely under the control of the experimenters. No other non-experimental factors could sneak into the computer and affect the conditions. The researchers concluded that the conventional t-test, the Satterthwaite t-test, and the Mann-Whitney-Wilcoxon test produce either poor Type I error rates or loss of power when the assumptions underlying them are violated. The tests of trimmed means and the bootstrap test appear to have fewer difficulties across the range of conditions evaluated. This experimental study indicates that the robustness claims by two versions of the t-test and one of the non-parametric procedures are invalid. On the other hand, one of the robust methods and one of the resampling methods are proved to be true in terms of robustness. Although the scope of this study is narrowed to one of each statistical school, the same approach can be applied to various versions of parametric tests, non-parametric tests, robust procedures, data re-expression methods, and resampling.
The above findings are not achieved by the methods suggested by Russell and Carnap, such as the study of logical relationships, truth by definitions or truth by convention. Rather, the claims result from experimental study. When Quine introduced his philosophical theory on logic and mathematics, computer technology and the Monte Carlo method were not available. Nonetheless, his insight is highly compatible with recent development in experimental mathematics. I strongly believe that if researchers put aside the analytic-synthetic distinction by adopting Quine’s moderate holistic view to scientific inquiry, many disputes could come to a conclusive closure. Indeed, a holistic approach has been beneficial to mathematical inquiry. Although R. A. Fisher was a statistician, he was also versed in biology and agriculture science, and indeed most of his theorems were derived from such empirical fields. Winsor’s principle is based on the Gaussian distribution, but Gauss discovered the Gaussian distribution through astronomical observations. Survival analysis or the hazard model is the fruit of medical and sociological research. As discussed before, Embretson and Reise, as psychologists, used the psychometric approach to revise traditional measurement theories. The example of robustness study demonstrates how social scientists employed Monte Carlo studies to challenge traditional claims in mathematics. As Quine’s holism proposed, logic, mathematics, observation, and a web of scientific theories are strongly linked to each other.
Ayer, A. J. (1936). ‘The Principle of Verifiability’. Mind (New Series), 45, 199-203.
Ayer, A. J. (1946). Language, Truth, and Logic (2nd ed). London: V. Gollancz.
Bailey, D. H., & Borwein, J. M. (2001). ‘Experimental Mathematics: Recent Developments And Future Outlook’. In B. Engquist, & W. Schmid, (Eds.). Mathematics Unlimited: 2001 And Beyond (pp. 51-65). New York: Springer.
Box, G. E. P. (1953). Non-Normality And Tests On Variances. Biometrika, 40, 318-335
Carnap, R. (1971). ‘Foundations of Logic And Mathematics’. In O. Neurath, R. Carnap, & C. Morris, (Eds.). Foundations of The Unity of Science, Toward An International Encyclopedia of Unified Science (pp. 139-212). Chicago: University of Chicago Press.
Carnap, R. (1956). Meaning And Necessity: A Study In Semantics And Modal. Chicago, IL: University of Chicago Press.
Cliff, N. (1996). Ordinal Methods For Behavioral Data Analysis. Mahwah, NJ: Erlbaum.
Chaitin, G. J. (1998). The Limits Of Mathematics: A Course On Information Theory And The Limits Of Formal Reasoning. Singapore: Springer-Verlag.
Cook, T. D., & Campbell, D. T. (1979). Quasi-Experimentation: Design And Analysis Issues For Field Settings. Boston, MA: Houghton Mifflin Company.
Creath, R. (1990a). Carnap, Quine, and The Rejection of Intuition. In Robert B. Barrett & Roger F. Gibson (Eds.), Perspectives on Quine (pp. 55-66). Cambridge, MA: Basil Blackwell.
Creath, R. (ed.). (1990b). Dear Carnap, Dean Van: The Quine-Carnap Correspondence and Related Work. Berkeley, CA: University of California Press.
Creath, R. (1991). Every Dogma Has Its Day. Erkenntnis, 35, 347-389.
Diaconis, P., and B. Efron. (1983). Computer-Intensive Methods in Statistics. Scientific American, May, 116-130.
Edgington, E. S. (1995). Randomization Tests. New York: M. Dekker.
Efron, B., & Tibshirani, R. J. (1993). An Introduction to The Bootstrap. New York: Chapman & Hall.
Eliason, S. R. (1993). Maximum Likelihood Estimation: Logic and Practice. Newbury Park: Sage.
Embretson, S. E., & Reise, S. (2000). Item Response Theory For Psychologists. Mahwah, NJ: LEA.
Frege, G. (1884/1960). The Foundations of Arithmetic (2nd ed.). New York: Harper.
Friedman, M. (2002). Kant, Kuhn, and The Rationality Of Science. Philosophy of Science, 69, 171-190.
Gödel, K. (1944). ‘Russell’s Mathematical Logic’. In Paul A. Schilpp, (Ed.). The Philosophy of Bertrand Russell (pp.125-153). Chicago: Northwestern University.
Gödel, K. (1961). Collected Works, Volume III. Oxford: Oxford University Press.
Hersh, R. (1997). What is Mathematics, Really? Oxford: Oxford University Press.
Howie, D. (2002). Interpreting Probability: Controversies and Developments In The Early Twentieth Century. Cambridge, UK: Cambridge University Press.
Isaacson, D. (2000). ‘Carnap, Quine, and Logical Truth’. In D. Follesdal (Ed.). Philosophy Of Quine: General, Reviews, and Analytic/Synthetic (pp. 360-391). NewYork: Garland Publishing.
Jaffe, A., & Quinn, F. (1993). “Theoretical Mathematics”: Toward A Cultural Synthesis of Mathematics And Theoretical Physics. American Mathematics Society, 28, 1-13.
Ludbrook, J. & Dudley, H. (1998). ‘Why Permutation Tests Are Superior To T And F Tests In Biomedical Research’. American Statistician, 52, 127-132.
Lindstrom, P. (2000). ‘Quasi-Realism in Mathematics’. Monist, 83, 122-149.
Mallows, C. L., & Tukey, J. W. (1982). ‘An Overview of Techniques of Data Analysis, Emphasizing Its Exploratory Aspects’. In J. T. de Oliveira & B. Epstein (Eds.), Some Recent Advances in Statistics (pp. 111-172). London: Academic Press.
Peterson, I. (July 27, 1991). ‘Pick a Sample’. Science News, 140, 56-58.
Putname, H. (1995). ‘Mathematical Necessity Reconsidered’. In P. Leonardi & M. Santambrogio, (Eds.). On Quine: New Essays (pp. 267-282). Cambridge: Cambridge University Press.
Pyle, A. (ed.). (1999). Key Philosophers in Conversation: The Cogito Interviews. New York: Routledge.
Quine, W. V. (1951). Two Dogmas of Empiricism. Philosophical Review, 60, 20-43.
Quine, W. V. (1957). ‘The Scope and Language of Science’. British Journal for the Philosophical Science, 8, 1-17.
Quine, W. V. (1966/1976). The Ways of Paradox, and Other Essays. Cambridge, MA: Harvard University Press.
Quine, W. V. (1990/1992) Pursuit of Truth (2nd ed.). Cambridge, MA: Harvard University Press.
Russell, B. (1919). Introduction to Mathematical Philosophy. London: Allen & Unwin.
Schlick, M. (1959). ‘Positivism and Realism’. In A. J. Ayer (Ed.), Logical Positivism (pp. 82-107). New York: Free Press.
Stockford, S., Thompson, M., Lo, W. J., Chen, Y. H., Green, S., & Yu, C. H. (2001 October). ‘Confronting The Statistical Assumptions: New Alternatives For Comparing Groups’. Paper presented at the Annual Meeting of Arizona Educational Researcher Organization, Tempe, AZ.
Thompson, M. S., Green, S. B., Stockford, S. M., Chen, Y., & Lo, W. (2002 April). ‘The .05 level: The probability that the independent-samples t test should be applied?’ Paper presented at the Annual Meeting of the American Education Researcher Association, New Orleans, LA.
Tieszen, R. (1992). ‘Kurt Gödel and Phenomenology’. Philosophy of Science, 59, 176-194.
Tukey, J. W. (1986). The Collected Works of John W. Tukey (Volume IV): Philosophy and Principles Of Data Analysis 1965-1986. Monterey, CA: Wadsworth & Brooks/Cole.
Wang, H. (1986). Beyond Analytic Philosophy: Doing Justice To What We Know. Cambridge, MA: MIT Press.
Werkmeister, W. H. (1937). ‘Seven Theses Of Logical Positivism Critically Examined Part I’. The Philosophical Review, 46, 276-297.
Whitehead, A. N., & Russell, B. (1910/1950). Principia Mathematica (2nd ed.). Cambridge, UK: Cambridge University Press.
Yu, C. H. (2002). ‘An Overview Of Remedial Tools For Violations Of Parametric Test Assumptions in The SAS System.’ Proceedings of 2002 Western Users of SAS Software Conference, 172-178.
Copyright © 2004 Minerva
All rights are reserved, but fair and good faith use with full attribution may be made of this work for educational or scholarly purposes.
Chong Ho Yu
has a Ph.D. in Measurement, Statistics, and Methodological Studies from
State University, USA. Currently he is pursuing a second doctorate in
Philosophy at the same institution. He is also a Psychometrician
Ssystems/Aries Technology, USA. His research interests include
foundations of research methodology, and relationship between science
Chong Ho Yu has a Ph.D. in Measurement, Statistics, and Methodological Studies from Arizona State University, USA. Currently he is pursuing a second doctorate in Philosophy at the same institution. He is also a Psychometrician at Cisco Ssystems/Aries Technology, USA. His research interests include philosophical foundations of research methodology, and relationship between science and religion.