Nonparametric Statistical Inference Fourth Edition, Revised and Expanded Jean Dickinson Gibbons Subhabrata Chakraborti
Views 136 Downloads 13 File size 4MB
Nonparametric Statistical Inference Fourth Edition, Revised and Expanded
Jean Dickinson Gibbons Subhabrata Chakraborti The University of Alabama Tuscaloosa, Alabama, U.S.A.
MARCEL
MARCEL DEKKER, INC. DE KK ER
NEW YORK • BASEL
Library of Congress CataloginginPublication Data A catalog record for this book is available from the Library of Congress. ISBN: 0824740521 This book is printed on acidfree paper. Headquarters Marcel Dekker, Inc. 270 Madison Avenue, New York, NY 10016 tel: 2126969000; fax: 2126854540 Eastern Hemisphere Distribution Marcel Dekker AG Hutgasse 4, Postfach 812, CH4001 Basel, Switzerland tel: 41612606300; fax: 41612606333 World Wide Web http:==www.dekker.com The publisher offers discounts on this book when ordered in bulk quantities. For more information, write to Special Sales=Professional Marketing at the headquarters address above. Copyright # 2003 by Marcel Dekker, Inc. All Rights Reserved. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microﬁlming, and recording, or by any information storage and retrieval system, without permission in writing from the publisher. Current printing (last digit): 10 9 8 7 6 5 4 3 2 1 PRINTED IN THE UNITED STATES OF AMERICA
STATISTICS: Textbooks and Monographs D. B. Owen Founding Editor, 19721991 Associate Editors Statistical Computing/ Nonparametric Statistics Professor William R. Schucany Southern Methodist University
Multivariate Analysis Professor Anant M Kshirsagar University of Michigan
Probability Professor Marcel F. Neuts University of Arizona
Quality Control/Reliability Professor Edward G. Schilling Rochester Institute of Technology
Editorial Board Applied Probability Dr. Paul R. Garvey The MITRE Corporation
Statistical Distributions Professor N. Balaknshnan McMaster University
Economic Statistics Professor David E A. Giles University of Victoria
Statistical Process Improvement Professor G. Geoffrey Vming Virginia Polytechnic Institute
Experimental Designs Mr Thomas B. Barker Rochester Institute of Technology
Stochastic Processes Professor V. Lakshrmkantham Florida Institute of Technology
Multivariate Analysis Professor Subir Ghosh University of CaliforniaRiverside
Survey Sampling Professor Lynne Stokes Southern Methodist University
Time Series Sastry G. Pantula North Carolina State University
1 2. 3 4. 5 6. 7. 8. 9 10. 11 12 13. 14. 15. 16 17. 18 19. 20. 21. 22. 23. 24. 25. 26. 27. 28 29. 30. 31. 32 33. 34. 35. 36. 37. 38. 39. 40. 41 42 43.
The Generalized Jackknife Statistic, H. L. Gray and W. R Schucany Multivariate Analysis, Anant M. Kshirsagar Statistics and Society, Walter T. Federer Multivanate Analysis. A Selected and Abstracted Bibliography, 19571972, Kocherlakota Subrahmamam and Kathleen Subrahmamam Design of Expenments: A Realistic Approach, Virgil L. Anderson and Robert A. McLean Statistical and Mathematical Aspects of Pollution Problems, John W Pratt Introduction to Probability and Statistics (in two parts), Part I: Probability; Part II: Statistics, Narayan C. Giri Statistical Theory of the Analysis of Experimental Designs, J Ogawa Statistical Techniques in Simulation (in two parts), Jack P. C. Kleijnen Data Quality Control and Editing, Joseph I Naus Cost of Living Index Numbers: Practice, Precision, and Theory, Kali S Banerjee Weighing Designs: For Chemistry, Medicine, Economics, Operations Research, Statistics, Kali S. Banerjee The Search for Oil: Some Statistical Methods and Techniques, edited by D B. Owen Sample Size Choice: Charts for Expenments with Linear Models, Robert E Odeh and Martin Fox Statistical Methods for Engineers and Scientists, Robert M. Bethea, Benjamin S Duran, and Thomas L Boullion Statistical Quality Control Methods, living W Burr On the History of Statistics and Probability, edited by D. B. Owen Econometrics, Peter Schmidt Sufficient Statistics' Selected Contributions, VasantS. Huzurbazar (edited by Anant M Kshirsagar) Handbook of Statistical Distributions, Jagdish K. Pate/, C H Kapadia, and D B Owen Case Studies in Sample Design, A. C Rosander Pocket Book of Statistical Tables, compiled by R. E Odeh, D B. Owen, Z. W. Bimbaum, and L Fisher The Information in Contingency Tables, D V Gokhale and Solomon Kullback Statistical Analysis of Reliability and LifeTesting Models: Theory and Methods, Lee J. Bain Elementary Statistical Quality Control, Irving W Burr An Introduction to Probability and Statistics Using BASIC, Richard A. Groeneveld Basic Applied Statistics, B. L Raktoe andJ J Hubert A Primer in Probability, Kathleen Subrahmamam Random Processes: A First Look, R. Syski Regression Methods: A Tool for Data Analysis, Rudolf J. Freund and Paul D. Minton Randomization Tests, Eugene S. Edgington Tables for Normal Tolerance Limits, Sampling Plans and Screening, Robert E. Odeh andD B Owen Statistical Computing, William J Kennedy, Jr., and James E. Gentle Regression Analysis and Its Application: A DataOnented Approach, Richard F. Gunst and Robert L Mason Scientific Strategies to Save Your Life, / D. J. Brass Statistics in the Pharmaceutical Industry, edited by C. Ralph Buncher and JiaYeong Tsay Sampling from a Finite Population, J. Hajek Statistical Modeling Techniques, S. S. Shapiro and A J. Gross Statistical Theory and Inference in Research, T A. Bancroft and C P. Han Handbook of the Normal Distribution, Jagdish K. Pate/ and Campbell B Read Recent Advances in Regression Methods, Hrishikesh D. Vinod and Aman Ullah Acceptance Sampling in Quality Control, Edward G. Schilling The Randomized Clinical Trial and Therapeutic Decisions, edited by Niels Tygstrup, John M Lachin, and Enk Juhl
44. Regression Analysis of Survival Data in Cancer Chemotherapy, Walter H Carter, Jr, Galen L Wampler, and Donald M Stablein 45. A Course in Linear Models, Anant M Kshirsagar 46. Clinical Trials Issues and Approaches, edited by Stanley H Shapiro and Thomas H Louis 47. Statistical Analysis of DNA Sequence Data, edited by B S Weir 48. Nonlinear Regression Modeling: A Unified Practical Approach, David A Ratkowsky 49. Attribute Sampling Plans, Tables of Tests and Confidence Limits for Proportions, Robert E OdehandD B Owen 50. Experimental Design, Statistical Models, and Genetic Statistics, edited by Klaus Hinkelmann 51. Statistical Methods for Cancer Studies, edited by Richard G Cornell 52. Practical Statistical Sampling for Auditors, Arthur J. Wilbum 53 Statistical Methods for Cancer Studies, edited by Edward J Wegman and James G Smith 54 SelfOrganizing Methods in Modeling GMDH Type Algorithms, edited by Stanley J. Farlow 55 Applied Factonal and Fractional Designs, Robert A McLean and Virgil L Anderson 56 Design of Experiments Ranking and Selection, edited by Thomas J Santner and Ajit C Tamhane 57 Statistical Methods for Engineers and Scientists Second Edition, Revised and Expanded, Robert M. Bethea, Benjamin S Duran, and Thomas L Bouillon 58 Ensemble Modeling. Inference from SmallScale Properties to LargeScale Systems, Alan E Gelfand and Crayton C Walker 59 Computer Modeling for Business and Industry, Bruce L Bowerman and Richard T O'Connell 60 Bayesian Analysis of Linear Models, Lyle D Broemeling 61. Methodological Issues for Health Care Surveys, Brenda Cox and Steven Cohen 62 Applied Regression Analysis and Expenmental Design, Richard J Brook and Gregory C Arnold 63 Statpal: A Statistical Package for Microcomputers—PCDOS Version for the IBM PC and Compatibles, Bruce J Chalmer and David G Whitmore 64. Statpal: A Statistical Package for Microcomputers—Apple Version for the II, II+, and He, David G Whitmore and Bruce J. Chalmer 65. Nonparametnc Statistical Inference. Second Edition, Revised and Expanded, Jean Dickinson Gibbons 66 Design and Analysis of Experiments, Roger G Petersen 67. Statistical Methods for Pharmaceutical Research Planning, Sten W Bergman and John C Gittins 68. GoodnessofFit Techniques, edited by Ralph B D'Agostino and Michael A. Stephens 69. Statistical Methods in Discnmination Litigation, ecMed by D H Kaye and MikelAickin 70. Truncated and Censored Samples from Normal Populations, Helmut Schneider 71. Robust Inference, M L Tiku, W Y Tan, and N Balakrishnan 72. Statistical Image Processing and Graphics, edited by Edward J, Wegman and Douglas J DePnest 73. Assignment Methods in Combmatonal Data Analysis, Lawrence J Hubert 74. Econometrics and Structural Change, Lyle D Broemeling and Hiroki Tsurumi 75. Multivanate Interpretation of Clinical Laboratory Data, Adelin Albert and Eugene K Ham's 76. Statistical Tools for Simulation Practitioners, Jack P C Kleijnen 77. Randomization Tests' Second Editon, Eugene S Edgington 78 A Folio of Distributions A Collection of Theoretical QuantileQuantile Plots, Edward B Fowlkes 79. Applied Categorical Data Analysis, Daniel H Freeman, Jr 80. Seemingly Unrelated Regression Equations Models: Estimation and Inference, Virendra K Snvastava and David E A Giles
81. Response Surfaces: Designs and Analyses, Andre I. Khun and John A. Cornell 82. Nonlinear Parameter Estimation: An Integrated System in BASIC, John C. Nash and Mary WalkerSmith 83. Cancer Modeling, edited by James R. Thompson and Barry W. Brown 84. Mixture Models: Inference and Applications to Clustering, Geoffrey J. McLachlan and Kaye E. Basford 85. Randomized Response. Theory and Techniques, Anjit Chaudhuri and Rahul Mukerjee 86 Biopharmaceutical Statistics for Drug Development, edited by Karl E. Peace 87. Parts per Million Values for Estimating Quality Levels, Robert E Odeh and D B. Owen 88. Lognormal Distnbutions: Theory and Applications, ecWed by Edwin L Crow and Kunio Shimizu 89. Properties of Estimators for the Gamma Distribution, K O. Bowman and L R. Shenton 90. Spline Smoothing and Nonparametnc Regression, Randall L Eubank 91. Linear Least Squares Computations, R W Farebrother 92. Exploring Statistics, Damaraju Raghavarao 93. Applied Time Series Analysis for Business and Economic Forecasting, Sufi M Nazem 94 Bayesian Analysis of Time Series and Dynamic Models, edited by James C. Spall 95. The Inverse Gaussian Distribution: Theory, Methodology, and Applications, Raj S Chhikara andJ. Leroy Folks 96. Parameter Estimation in Reliability and Life Span Models, A. Clifford Cohen and Betty Jones Whrtten 97. Pooled CrossSectional and Time Series Data Analysis, Terry E Die/man 98. Random Processes: A First Look, Second Edition, Revised and Expanded, R. Syski 99. Generalized Poisson Distributions: Properties and Applications, P C. Consul 100. Nonlinear LpNorm Estimation, Rene Gonin and Arthur H Money 101. Model Discrimination for Nonlinear Regression Models, Dale S. Borowiak 102. Applied Regression Analysis in Econometrics, Howard E. Doran 103. Continued Fractions in Statistical Applications, K. O. Bowman andL R. Shenton 104 Statistical Methodology in the Pharmaceutical Sciences, Donald A. Berry 105. Expenmental Design in Biotechnology, Perry D. Haaland 106. Statistical Issues in Drug Research and Development, edited by Karl E Peace 107. Handbook of Nonlinear Regression Models, David A. Ratkowsky 108. Robust Regression: Analysis and Applications, edited by Kenneth D Lawrence and Jeffrey L Arthur 109. Statistical Design and Analysis of Industrial Experiments, edited by Subir Ghosh 110. (7Statistics: Theory and Practice, A J Lee 111. A Primer in Probability: Second Edition, Revised and Expanded, Kathleen Subrahmaniam 112. Data Quality Control: Theory and Pragmatics, edited by GunarE Uepins and V. R R. Uppuluri 113. Engmeenng Quality by Design: Interpreting the Taguchi Approach, Thomas B Barker 114 Survivorship Analysis for Clinical Studies, Eugene K. Hams and Adelin Albert 115. Statistical Analysis of Reliability and LifeTesting Models: Second Edition, Lee J. Bam and Max Engelhardt 116. Stochastic Models of Carcinogenesis, WaiYuan Tan 117. Statistics and Society Data Collection and Interpretation, Second Edition, Revised and Expanded, Walter T. Federer 118. Handbook of Sequential Analysis, B K. Gfiosn and P. K. Sen 119 Truncated and Censored Samples: Theory and Applications, A. Clifford Cohen 120. Survey Sampling Pnnciples, E. K. Foreman 121. Applied Engineering Statistics, Robert M. Bethea and R. Russell Rhinehart 122. Sample Size Choice: Charts for Experiments with Linear Models: Second Edition, Robert £ Odeh and Martin Fox 123. Handbook of the Logistic Distnbution, edited by N Balaknshnan 124. Fundamentals of Biostatistical Inference, Chap T. Le 125. Correspondence Analysis Handbook, J.P Benzecn
126. Quadratic Forms in Random Variables: Theory and Applications, A. M Mathai and Serge B Provost 127 Confidence Intervals on Vanance Components, Richard K. Burdick and Franklin A Graybill 128 Biopharmaceutical Sequential Statistical Applications, edited by Karl E Peace 129. Item Response Theory Parameter Estimation Techniques, Frank B. Baker 130. Survey Sampling Theory and Methods, Arijrt Chaudhun and Horst Stenger 131. Nonparametnc Statistical Inference Third Edition, Revised and Expanded, Jean Dickinson Gibbons and Subhabrata Chakraborti 132 Bivanate Discrete Distribution, Subrahmaniam Kochertakota and Kathleen Kocherlakota 133. Design and Analysis of Bioavailability and Bioequivalence Studies, SheinChung Chow and Jenpei Liu 134. Multiple Compansons, Selection, and Applications in Biometry, edited by Fred M Hoppe 135. CrossOver Expenments: Design, Analysis, and Application, David A Ratkowsky, Marc A Evans, and J. Richard Alldredge 136 Introduction to Probability and Statistics Second Edition, Revised and Expanded, Narayan C Gin 137. Applied Analysis of Vanance in Behavioral Science, edited by Lynne K Edwards 138 Drug Safety Assessment in Clinical Trials, edited by Gene S Gilbert 139. Design of Expenments A NoName Approach, Thomas J Lorenzen and Virgil L Anderson 140 Statistics in the Pharmaceutical Industry. Second Edition, Revised and Expanded, edited by C Ralph Buncher and JiaYeong Tsay 141 Advanced Linear Models Theory and Applications, SongGui Wang and SheinChung Chow 142. Multistage Selection and Ranking Procedures. SecondOrder Asymptotics, Nitis Mukhopadhyay and Tumulesh K S Solanky 143. Statistical Design and Analysis in Pharmaceutical Science Validation, Process Controls, and Stability, SheinChung Chow and Jenpei Liu 144 Statistical Methods for Engineers and Scientists Third Edition, Revised and Expanded, Robert M Bethea, Benjamin S Duran, and Thomas L Bouillon 145 Growth Curves, Anant M Kshirsagar and William Boyce Smith 146 Statistical Bases of Reference Values in Laboratory Medicine, Eugene K. Harris and James C Boyd 147 Randomization Tests Third Edition, Revised and Expanded, Eugene S Edgington 148 Practical Sampling Techniques Second Edition, Revised and Expanded, Ran/an K. Som 149 Multivanate Statistical Analysis, Narayan C Gin 150 Handbook of the Normal Distribution Second Edition, Revised and Expanded, Jagdish K Patel and Campbell B Read 151 Bayesian Biostatistics, edited by Donald A Berry and Dalene K Stangl 152 Response Surfaces: Designs and Analyses, Second Edition, Revised and Expanded, Andre I Khuri and John A Cornell 153 Statistics of Quality, edited by Subir Ghosh, William R Schucany, and William B. Smith 154. Linear and Nonlinear Models for the Analysis of Repeated Measurements, Edward F Vonesh and Vemon M Chinchilli 155 Handbook of Applied Economic Statistics, Aman Ullah and David E A Giles 156 Improving Efficiency by Shnnkage The JamesStein and Ridge Regression Estimators, Marvin H J Gruber 157 Nonparametnc Regression and Spline Smoothing Second Edition, Randall L Eubank 158 Asymptotics, Nonparametncs, and Time Senes, edited by Subir Ghosh 159 Multivanate Analysis, Design of Experiments, and Survey Sampling, edited by Subir Ghosh
160 Statistical Process Monitoring and Control, edited by Sung H Park and G Geoffrey Vining 161 Statistics for the 21st Century Methodologies for Applications of the Future, edited by C. R Rao and GaborJ Szekely 162 Probability and Statistical Inference, Nitis Mukhopadhyay 163 Handbook of Stochastic Analysis and Applications, edited by D Kannan and V. Lakshmtkantham 164. Testing for Normality, Henry C Thode, Jr. 165 Handbook of Applied Econometncs and Statistical Inference, edited by Aman Ullah, Alan T K. Wan, andAnoop Chaturvedi 166 Visualizing Statistical Models and Concepts, R W Farebrother 167. Financial and Actuarial Statistics An Introduction, Dale S Borowiak 168 Nonparametnc Statistical Inference Fourth Edition, Revised and Expanded, Jean Dickinson Gibbons and Subhabrata Chakraborti 169. ComputerAided Econometncs, edited by David E. A Giles
Additional Volumes in Preparation
The EM Algorithm and Related Statistical Models, edited by Michiko Watanabe and Kazunori Yamaguchi Multivanate Statistical Analysis, Narayan C Giri
To the memory of my parents, John and Alice, And to my husband, John S. Fielden J.D.G. To my parents, Himangshu and Pratima, And to my wife Anuradha, and son, Siddhartha Neil S.C.
Preface to the Fourth Edition
This book was ﬁrst published in 1971 and last revised in 1992. During the span of over 30 years, it seems fair to say that the book has made a meaningful contribution to the teaching and learning of nonparametric statistics. We have been gratiﬁed by the interest and the comments from our readers, reviewers, and users. These comments and our own experiences have resulted in many corrections, improvements, and additions. We have two main goals in this revision: We want to bring the material covered in this book into the 21st century, and we want to make the material more user friendly. With respect to the ﬁrst goal, we have added new materials concerning the quantiles, the calculation of exact power and simulated power, sample size determination, other goodnessofﬁt tests, and multiple comparisons. These additions will be discussed in more detail later. We have added and modiﬁed examples and included exact v
vi
PREFACE TO THE FOURTH EDITION
solutions done by hand and modern computer solutions using MINITAB,* SAS, STATXACT, and SPSS. We have removed most of the computer solutions to previous examples using BMDP, SPSSX, Execustat, or IMSL, because they seem redundant and take up too much valuable space. We have added a number of new references but have made no attempt to make the references comprehensive on some current minor reﬁnements of the procedures covered. Given the sheer volume of the literature, preparing a comprehensive list of references on the subject of nonparametric statistics would truly be a challenging task. We apologize to the authors whose contributions could not be included in this edition. With respect to our second goal, we have completely revised a number of sections and reorganized some of the materials, more fully integrated the applications with the theory, given tabular guides for applications of tests and conﬁdence intervals, both exact and approximate, placed more emphasis on reporting results using P values, added some new problems, added many new ﬁgures and titled all ﬁgures and tables, supplied answers to almost all the problems, increased the number of numerical examples with solutions, and written concise but detailed summaries for each chapter. We think the problem answers should be a major plus, something many readers have requested over the years. We have also tried to correct errors and inaccuracies from previous editions. In Chapter 1, we have added Chebyshev’s inequality, the Central Limit Theorem, and computer simulations, and expanded the listing of probability functions, including the multinomial distribution and the relation between the beta and gamma functions. Chapter 2 has been completely reorganized, starting with the quantile function and the empirical distribution function (edf), in an attempt to motivate the reader to see the importance of order statistics. The relation between rank and the edf is explained. The tests and conﬁdence intervals for quantiles have been moved to Chapter 5 so that they are discussed along with other onesample and pairedsample procedures, namely, the sign test and signed rank test for the median. New discussions of exact power, simulated power, and sample size determination, and the discussion of rank tests in Chapter 5 of the previous edition are also included here. Chapter 4, on goodnessofﬁt tests, has been expanded to include Lilliefors’s test for the exponential distribution,
*
MINITAB is a trademark of Minitab Inc. in the United States and other countries and is used herein with permission of the owner (on the Web at www.minitab.com).
PREFACE TO THE FOURTH EDITION
vii
computation of normal probability plots, and visual analysis of goodness of ﬁt using PP and QQ plots. The new Chapter 6, on the general twosample problem, deﬁnes ‘‘stochastically larger’’ and gives numerical examples with exact and computer solutions for all tests. We include sample size determination for the MannWhitneyWilcoxon test. Chapters 7 and 8 are the previousedition Chapters 8 and 9 on linear rank tests for the location and scale problems, respectively, with numerical examples for all procedures. The method of positive variables to obtain a conﬁdence interval estimate of the ratio of scale parameters when nothing is known about location has been added to Chapter 8, along with a much needed summary. Chapters 10 and 12, on tests for k samples, now include multiple comparisons procedures. The materials on nonparametric correlation in Chapter 11 have been expanded to include the interpretation of Kendall’s tau as a coefﬁcient of disarray, the Student’s t approximation to the distribution of Spearman’s rank correlation coefﬁcient, and the deﬁnitions of Kendall’s tau a, tau b and the GoodmanKruskal coefﬁcient. Chapter 14, a new chapter, discusses nonparametric methods for analyzing count data. We cover analysis of contingency tables, tests for equality of proportions, Fisher’s exact test, McNemar’s test, and an adaptation of Wilcoxon’s ranksum test for tables with ordered categories. Bergmann, Ludbrook, and Spooren (2000) warn of possible meaningful differences in the outcomes of P values from different statistical packages. These differences can be due to the use of exact versus asymptotic distributions, use or nonuse of a continuity correction, or use or nonuse of a correction for ties. The output seldom gives such details of calculations, and even the ‘‘Help’’ facility and the manuals do not always give a clear description or documentation of the methods used to carry out the computations. Because this warning is quite valid, we tried to explain to the best of our ability any differences between our hand calculations and the package results for each of our examples. As we said at the beginning, it has been most gratifying to receive very positive remarks, comments, and helpful suggestions on earlier editions of this book and we sincerely thank many readers and colleagues who have taken the time. We would like to thank Minitab, Cytel, and Statsoft for providing complimentary copies of their software. The popularity of nonparametric statistics must depend, to some extent, on the availability of inexpensive and userfriendly software. Portions of MINITAB Statistical Software input and output in this book are reprinted with permission of Minitab Inc.
viii
PREFACE TO THE FOURTH EDITION
Many people have helped, directly and indirectly, to bring a project of this magnitude to a successful conclusion. We are thankful to the University of Alabama and to the Department of Information Systems, Statistics and Management Science for providing an environment conducive to creative work and for making some resources available. In particular, Heather Davis has provided valuable assistance with typing. We are indebted to Clifton D. Sutton of George Mason University for pointing out errors in the ﬁrst printing of the third edition. These have all been corrected. We are grateful to Joseph Stubenrauch, Production Editor at Marcel Dekker for giving us excellent editorial assistance. We also thank the reviewers of the third edition for their helpful comments and suggestions. These include Jones (1993), Prvan (1993), and Ziegel (1993). Ziegel’s review in Technometrics stated, ‘‘This is the book for all statisticians and students in statistics who need to learn nonparametric statistics— . . .. I am grateful that the author decided that one more edition could already improve a ﬁne package.’’ We sincerely hope that Mr. Ziegel and others will agree that this ﬁne package has been improved in scope, readability, and usability. Jean Dickinson Gibbons Subhabrata Chakraborti
Preface to the Third Edition
The third edition of this book includes a large amount of additions and changes. The additions provide a broader coverage of the nonparametric theory and methods, along with the tables required to apply them in practice. The primary change in presentation is an integration of the discussion of theory, applications, and numerical examples of applications. Thus the book has been returned to its original fourteen chapters with illustrations of practical applications following the theory at the appropriate place within each chapter. In addition, many of the handcalculated solutions to these examples are veriﬁed and illustrated further by showing the solutions found by using one or more of the frequently used computer packages. When the package solutions are not equivalent, which happens frequently because most of the packages use approximate sampling distributions, the reasons are discussed brieﬂy. Two new packages have recently been developed exclusively for nonparametric methods—NONPAR: Nonparametric Statistics Package and STATXACT: A Statistical Package for Exact ix
x
PREFACE TO THE THIRD EDITION
Nonparametric Inference. The latter package claims to compute exact P values. We have not used them but still regard them as a welcome addition. Additional new material is found in the problem sets at the end of each chapter. Some of the new theoretical problems request veriﬁcation of results published in journals about inference procedures not covered speciﬁcally in the text. Other new problems refer to the new material included in this edition. Further, many new applied problems have been added. The new topics that are covered extensively are as follows. In Chapter 2 we give more convenient expressions for the moments of order statistics in terms of the quantile function, introduce the empirical distribution function, and discuss both onesample and twosample coverages so that problems can be given relating to exceedance and precedence statistics. The rank von Neumann test for randomness is included in Chapter 3 along with applications of runs tests in analyses of time series data. In Chapter 4 on goodnessofﬁt tests, Lilliefors’s test for a normal distribution with unspeciﬁed mean and variance has been added. Chapter 7 now includes discussion of the control median test as another procedure appropriate for the general twosample problem. The extension of the control median test to k mutually independent samples is given in Chapter 11. Other new materials in Chapter 11 are nonparametric tests for ordered alternatives appropriate for data based on k 5 3 mutually independent random samples. The tests proposed by Jonckheere and Terpstra are covered in detail. The problems relating to comparisons of treatments with a control or an unknown standard are also included here. Chapter 13, on measures of association in multiple classiﬁcations, has an additional section on the Page test for ordered alternatives in krelated samples, illustration of the calculation of Kendall’s tau for count data in ordered contingency tables, and calculation of Kendall’s coefﬁcient of partial correlation. Chapter 14 now includes calculations of asymptotic relative efﬁciency of more tests and also against more parent distributions. For most tests covered, the corrections for ties are derived and discussions of relative performance are expanded. New tables included in the Appendix are the distributions of the Lilliefors’s test for normality, Kendall’s partial tau, Page’s test for ordered alternatives in the twoway layout, the JonckheereTerpstra test for ordered alternatives in the oneway layout, and the rank von Neumann test for randomness.
PREFACE TO THE THIRD EDITION
xi
This edition also includes a large number of additional references. However, the list of references is not by any means purported to be complete because the literature on nonparametric inference procedures is vast. Therefore, we apologize to those authors whose contributions were not included in our list of references. As always in a new edition, we have attempted to correct previous errors and inaccuracies and restate more clearly the text and problems retained from previous editions. We have also tried to take into account the valuable suggestions for improvement made by users of previous editions and reviewers of the second edition, namely, Moore (1986), Randles (1986), Sukhatme (1987), and Ziegel (1988). As with any project of this magnitude, we are indebted to many persons for help. In particular, we would like to thank Pat Coons and Connie Harrison for typing and Nancy Kao for help in the bibliography search and computer solutions to examples. Finally, we are indebted to the University of Alabama, particularly the College of Commerce and Business Administration, for partial support during the writing of this version. Jean Dickinson Gibbons Subhabrata Chakraborti
Preface to the Second Edition
A large number of books on nonparametric statistics have appeared since this book was published in 1971. The majority of them are oriented toward applications of nonparametric methods and do not attempt to explain the theory behind the techniques; they are essentially user’s manuals, called cookbooks by some. Such books serve a useful purpose in the literature because nonparametric methods have such a broad scope of application and have achieved widespread recognition as a valuable technique for analyzing data, particularly data which consist of ranks or relative preferences and=or are small samples from unknown distributions. These books are generally used by nonstatisticians, that is, persons in subjectmatter ﬁelds. The more recent books that are oriented toward theory are Lehmann (1975), Randles and Wolfe (1979), and Pratt and Gibbons (1981). A statistician needs to know about both the theory and methods of nonparametric statistical inference. However, most graduate programs xiii
xiv
PREFACE TO THE SECOND EDITION
in statistics can afford to offer either a theory course or a methods course, but not both. The ﬁrst edition of this book was frequently used for the theory course; consequently, the students were forced to learn applications on their own time. This second edition not only presents the theory with corrections from the ﬁrst edition, it also offers substantial practice in problem solving. Chapter 15 of this edition includes examples of applications of those techniques for which the theory has been presented in Chapters 1 to 14. Many applied problems are given in this new chapter; these problems involve real research situations from all areas of social, behavioral, and life sciences, business, engineering, and so on. The Appendix of Tables at the end of this new edition gives those tables of exact sampling distributions that are necessary for the reader to understand the examples given and to be able to work out the applied problems. To make it easy for the instructor to cover applications as soon as the relevant theory has been presented, the sections of Chapter 15 follow the order of presentation of theory. For example, after Chapter 3 on tests based on runs is completed, the next assignment can be Section 15.3 on applications of tests based on runs and the accompanying problems at the end of that section. At the end of the Chapter 15 there are a large number of review problems arranged in random order as to type of applications so that the reader can obtain practice in selecting the appropriate nonparametric technique to use in a given situation. While the ﬁrst edition of this book received considerable acclaim, several reviewers felt that applied numerical examples and expanded problem sets would greatly enhance its usefulness as a textbook. This second edition incorporates these and other recommendations. The author wishes to acknowledge her indebtedness to the following reviewers for helping to make this revised and expanded edition more accurate and useful for students and researchers: Dudewicz and Geller (1972), Johnson (1973), Klotz (1972), and Noether (1972). In addition to these persons, many users of the ﬁrst edition have written or told me over the years about their likes and=or dislikes regarding the book and these have all been gratefully received and considered for incorporation in this edition. I would also like to express my gratitude to Donald B. Owen for suggesting and encouraging this kind of revision, and to the Board of Visitors of the University of Alabama for partial support of this project. Jean Dickinson Gibbons
Preface to the First Edition
During the last few years many institutions offering graduate programs in statistics have experienced a demand for a course devoted exclusively to a survey of nonparametric techniques and their justiﬁcations. This demand has arisen both from their own majors and from majors in social science or other quantitatively oriented ﬁelds such as psychology, sociology, or economics. Although the basic statistics courses often include a brief description of some of the betterknown and simpler nonparametric methods, usually the treatment is necessarily perfunctory and perhaps even misleading. Discussion of only a few techniques in a highly condensed fashion may leave the impression that nonparametric statistics consists of a ‘‘bundle of tricks’’ which are simply applied by following a list of instructions dreamed up by some genie as a panacea for all sorts of vague and illdeﬁned problems. One of the deterrents to meeting this demand has been the lack of a suitable textbook in nonparametric techniques. Our experience at xv
xvi
PREFACE TO THE FIRST EDITION
the University of Pennsylvania has indicated that an appropriate text would provide a theoretical but readable survey. Only a moderate amount of pure mathematical sophistication should be required so that the course would be comprehensible to a wide variety of graduate students and perhaps even some advanced undergraduates. The course should be available to anyone who has completed at least the rather traditional oneyear sequence in probability and statistical inference at the level of Parzen, Mood and Graybill, Hogg and Craig, etc. The time allotment should be a full semester, or perhaps two semesters if outside reading in journal publications is desirable. The texts presently available which are devoted exclusively to nonparametric statistics are few in number and seem to be predominantly either of the handbook style, with few or no justiﬁcations, or of the highly rigorous mathematical style. The present book is an attempt to bridge the gap between these extremes. It assumes the reader is well acquainted with statistical inference for the traditional parametric estimation and hypothesistesting procedures, basic probability theory, and randomsampling distributions. The survey is not intended to be exhaustive, as the ﬁeld is so extensive. The purpose of the book is to provide a compendium of some of the betterknown nonparametric techniques for each problem situation. Those derivations, proofs, and mathematical details which are relatively easily grasped or which illustrate typical procedures in general nonparametric statistics are included. More advanced results are simply stated with references. For example, some of the asymptotic distribution theory for order statistics is derived since the methods are equally applicable to other statistical problems. However, the Glivenko Cantelli theorem is given without proof since the mathematics may be too advanced. Generally those proofs given are not mathematically rigorous, ignoring details such as existence of derivatives or regularity conditions. At the end of each chapter, some problems are included which are generally of a theoretical nature but on the same level as the related text material they supplement. The organization of the material is primarily according to the type of statistical information collected and the type of questions to be answered by the inference procedures or according to the general type of mathematical derivation. For each statistic, the null distribution theory is derived, or when this would be too tedious, the procedure one could follow is outlined, or when this would be overly theoretical, the results are stated without proof. Generally the other relevant mathematical details necessary for nonparametric inference are also included. The purpose is to acquaint the reader with the mathematical
PREFACE TO THE FIRST EDITION
xvii
logic on which a test is based, those test properties which are essential for understanding the procedures, and the basic tools necessary for comprehending the extensive literature published in the statistics journals. The book is not intended to be a user’s manual for the application of nonparametric techniques. As a result, almost no numerical examples or problems are provided to illustrate applications or elicit applied motivation. With the approach, reproduction of an extensive set of tables is not required. The reader may already be acquainted with many of the nonparametric methods. If not, the foundations obtained from this book should enable anyone to turn to a user’s handbook and quickly grasp the application. Once armed with the theoretical background, the user of nonparametric methods is much less likely to apply tests indiscriminately or view the ﬁeld as a collection of simple prescriptions. The only insurance against misapplication is a thorough understanding. Although some of the strengths and weaknesses of the tests covered are alluded to, no deﬁnitive judgments are attempted regarding the relative merits of comparable tests. For each topic covered, some references are given which provide further information about the tests or are speciﬁcally related to the approach used in this book. These references are necessarily incomplete, as the literature is vast. The interested reader may consult Savage’s ‘‘Bibliography’’ (1962). I wish to acknowledge the helpful comments of the reviewers and the assistance provided unknowingly by the authors of other textbooks in the area of nonparametric statistics, particularly Gottfried E. Noether and James V. Bradley, for the approach to presentation of several topics, and Maurice G. Kendall, for much of the material on measures of association. The products of their endeavors greatly facilitated this project. It is a pleasure also to acknowledge my indebtedness to Herbert A. David, both as friend and mentor. His training and encouragement helped make this book a reality. Particular gratitude is also due to the Lecture Note Fund of the Wharton School, for typing assistance, and the Department of Statistics and Operations Research at the University of Pennsylvania for providing the opportunity and time to ﬁnish this manuscript. Finally, I thank my husband for his enduring patience during the entire writing stage. Jean Dickinson Gibbons
Contents
Preface Preface Preface Preface
to to to to
the the the the
Fourth Edition Third Edition Second Edition First Edition
1 Introduction and Fundamentals 1.1 1.2
Introduction Fundamental Statistical Concepts
2 Order Statistics, Quantiles, and Coverages 2.1 2.2 2.3 2.4
Introduction The Quantile Function The Empirical Distribution Function Statistical Properties of Order Statistics
v ix xiii xv 1 1 9 32 32 33 37 40 xix
xx
CONTENTS
2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12
ProbabilityIntegral Transformation (PIT) Joint Distribution of Order Statistics Distributions of the Median and Range Exact Moments of Order Statistics LargeSample Approximations to the Moments of Order Statistics Asymptotic Distribution of Order Statistics Tolerance Limits for Distributions and Coverages Summary Problems
3 Tests of Randomness 3.1 3.2 3.3 3.4 3.5 3.6
Introduction Tests Based on the Total Number of Runs Tests Based on the Length of the Longest Run Runs Up and Down A Test Based on Ranks Summary Problems
4 Tests of Goodness of Fit 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8
Introduction The ChiSquare GoodnessofFit Test The KolmogorovSmirnov OneSample Statistic Applications of the KolmogorovSmirnov OneSample Statistics Lilliefors’s Test for Normality Lilliefors’s Test for the Exponential Distribution Visual Analysis of Goodness of Fit Summary Problems
5 OneSample and PairedSample Procedures 5.1 5.2 5.3 5.4 5.5 5.6
Introduction Conﬁdence Interval for a Population Quantile Hypothesis Testing for a Population Quantile The Sign Test and Conﬁdence Interval for the Median RankOrder Statistics Treatment of Ties in Rank Tests
42 44 50 53 57 60 64 69 69 76 76 78 87 90 97 98 99 103 103 104 111 120 130 133 143 147 150 156 156 157 163 168 189 194
CONTENTS
5.7 5.8
xxi
The Wilcoxon SignedRank Test and Conﬁdence Interval Summary Problems
6 The General TwoSample Problem 6.1 6.2 6.3 6.4 6.5 6.6 6.7
Introduction The WaldWolfowitz Runs Test The KolmogorovSmirnov TwoSample Test The Median Test The Control Median Test The MannWhitney U Test Summary Problems
7 Linear Rank Statistics and the General TwoSample Problem 7.1 7.2 7.3 7.4
Introduction Deﬁnition of Linear Rank Statistics Distribution Properties of Linear Rank Statistics Usefulness in Inference Problems
8 Linear Rank Tests for the Location Problem 8.1 8.2 8.3 8.4
196 222 224 231 231 235 239 247 262 268 279 280
283 283 284 285 294 295 296
Introduction The Wilcoxon RankSum Test Other Location Tests Summary Problems
296 298 307 314 315
9 Linear Rank Tests for the Scale Problem
319
9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9
Introduction The Mood Test The FreundAnsariBradleyDavidBarton Tests The SiegelTukey Test The Klotz NormalScores Test The Percentile Modiﬁed Rank Tests for Scale The Sukhatme Test ConﬁdenceInterval Procedures Other Tests for the Scale Problem
319 323 325 329 331 332 333 337 338
xxii
CONTENTS
9.10 9.11
Applications Summary Problems
10 Tests of the Equality of k Independent Samples 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9
Introduction Extension of the Median Test Extension of the Control Median Test The KruskalWallis OneWay ANOVA Test and Multiple Comparisons Other RankTest Statistics Tests Against Ordered Alternatives Comparisons with a Control The ChiSquare Test for k Proportions Summary Problems
11 Measures of Association for Bivariate Samples 11.1 11.2 11.3 11.4 11.5 11.6 11.7
Introduction: Deﬁnition of Measures of Association in a Bivariate Population Kendall’s Tau Coefﬁcient Spearman’s Coefﬁcient of Rank Correlation The Relations Between R and T; E(R), t, and r Another Measure of Association Applications Summary Problems
12 Measures of Association in Multiple Classiﬁcations 12.1 12.2 12.3 12.4 12.5 12.6 12.7
Introduction Friedman’s TwoWay Analysis of Variance by Ranks in a k n Table and Multiple Comparisons Page’s Test for Ordered Alternatives The Coefﬁcient of Concordance for k Sets of Rankings of n Objects The Coefﬁcient of Concordance for k Sets of Incomplete Rankings Kendall’s Tau Coefﬁcient for Partial Correlation Summary Problems
341 348 350 353 353 355 360 363 373 376 383 390 392 393 399 399 404 422 432 437 438 443 445 450 450 453 463 466 476 483 486 487
CONTENTS
xxiii
13 Asymptotic Relative Efﬁciency 13.1 13.2 13.3 13.4
Introduction Theoretical Bases for Calculating the ARE Examples of the Calculation of Efﬁcacy and ARE Summary Problems
14 Analysis of Count Data 14.1 14.2 14.3 14.4 14.5 14.6
Introduction Contingency Tables Some Special Results for k 2 Contingency Tables Fisher’s Exact Test McNemar’s Test Analysis of Multinomial Data Problems
494 494 498 503 518 518 520 520 521 529 532 537 543 548
Appendix of Tables
552
Table Table Table Table Table Table Table Table
554 555 556 568 573 576 577
A B C D E F G H
Normal Distribution ChiSquare Distribution Cumulative Binomial Distribution Total Number of Runs Distribution Runs Up and Down Distribution KolmogorovSmirnov OneSample Statistic Binomial Distribution for y ¼ 0.5 Probabilities for the Wilcoxon SignedRank Statistic Table I KolmogorovSmirnov TwoSample Statistic Table J Probabilities for the Wilcoxon RankSum Statistic Table K KruskalWallis Test Statistic Table L Kendall’s Tau Statistic Table M Spearman’s Coefﬁcient of Rank Correlation Table N Friedman’s AnalysisofVariance Statistic and Kendall’s Coefﬁcient of Concordance Table O Lilliefors’s Test for Normal Distribution Critical Values Table P Signiﬁcance Points of TXY: Z (for Kendall’s Partial RankCorrelation Coefﬁcient) Table Q Page’s L Statistic Table R Critical Values and Associated Probabilities for the JonckheereTerpstra Test
578 581 584 592 593 595 598 599 600 601 602
xxiv
Table S Table T
CONTENTS
Rank von Neumann Statistic Lilliefors’s Test for Exponential Distribution Critical Values
Answers to Selected Problems References Index
607 610 611 617 635
! " # #
$
%
# &
&
'
# #
++ //
# #
#
# "
<
=
# =
& #
> # & $ #
++ // /
= $
=
? #
++
&
&
// # =
=
$ #
&
#
#
#
?
&
#
#
# #
@
# # $ #
&
#
#
& #
Y
= # & #
= #
&
&
#
# "
& ? #
[ # # # @ & = \
#
#
= # # &
# $
&
&
#
%
#
]
/ $
= = # %
# ++ // # & " & = &
= [ # #
%
# # ? &
#
^ = #
&
6
CHAPTER 1
of the importance of speed, simplicity and cost factors, and the nonexistence of a ﬁxed and universally acceptable criterion of good performance. Box and Anderson (1955) state that ‘‘to fulﬁll the needs of the experimenter, statistical criteria should (1) be sensitive to change in the speciﬁc factors tested, (2) be insensitive to changes, of a magnitude likely to occur in practice, in extraneous factors.’’ These properties, usually called power and robustness, respectively, are generally agreed upon as the primary requirements of good performance in hypothesis testing. Parametric tests are often derived in such a way that the ﬁrst requirement is satisﬁed for an assumed speciﬁc probability distribution, e.g., using the likelihoodratio technique of test construction. However, since such tests are, strictly speaking, not even valid unless the assumptions are met, robustness is of great concern in parametric statistics. On the other hand, nonparametric tests are inherently robust because their construction requires only very general assumptions. One would expect some sacriﬁce in power to result. It is therefore natural to look at robustness as a performance criterion for parametric tests and power for nonparametric tests. How then do we compare analogous tests of the two types? Power calculations for any test require knowledge of the probability distribution of the test statistic under the alternative, but the alternatives in nonparametric problems are often extremely general. When the requisite assumptions are met, many of the classical parametric tests are known to be most powerful. In those cases where comparison studies have been made, however, nonparametric tests are frequently almost as powerful, especially for small samples, and therefore may be considered more desirable whenever there is any doubt about assumptions. No generalizations can be made for moderatesized samples. The criterion of asymptotic relative efﬁciency is theoretically relevant only for very large samples. When the classical tests are known to be robust, comparisons may also be desirable for distributions which deviate somewhat from the exact parametric assumptions. However, with inexact assumptions, calculation of power of classical tests is often difﬁcult except by Monte Carlo techniques, and studies of power here have been less extensive. Either type of test may be more reliable, depending on the particular tests compared and type or degree of deviations assumed. The difﬁculty with all these comparisons is that they can be made only for speciﬁc nonnull distribution assumptions, which are closely related to the conditions under which the parametric test is exactly valid and optimal. Perhaps the chief advantage of nonparametric tests lies in their very generality, and an assessment of their performance under
INTRODUCTION AND FUNDAMENTALS
7
conditions unrestricted by, and different from, the intrinsic postulates in classical tests seems more expedient. A comparison under more nonparametric conditions would seem especially desirable for two or more nonparametric tests which are designed for the same general hypothesis testing situation. Unlike the body of classical techniques, nonparametric techniques frequently offer a selection from interchangeable methods. With such a choice, some judgments of relative merit would be particularly useful. Power comparisons have been made, predominantly among the many tests designed to detect location differences, but again we must add that even with comparisons of nonparametric tests, power can be determined only with fairly speciﬁc distribution assumptions. The relative merits of the different tests depend on the conditions imposed. Comprehensive conclusions are thus still impossible for blanket comparisons of very general tests. In conclusion, the extreme generality of nonparametric techniques and their wide scope of usefulness, while deﬁnite advantages in application, are factors which discourage objective criteria, particularly power, as assessments of performance, relative either to each other or to parametric techniques. The comparison studies so frequently published in the literature are certainly interesting, informative, and valuable, but they do not provide the soughtfor comprehensive answers under more nonparametric conditions. Perhaps we can even say that speciﬁc comparisons are really contrary to the spirit of nonparametric methods. No deﬁnitive rules of choice will be provided in this book. The interested reader will ﬁnd many pertinent articles in all the statistics journals. This book is a compendium of many of the large number of nonparametric techniques which have been proposed for various inference situations. Before embarking on a systematic treatment of new concepts, some basic notation and deﬁnitions must be agreed upon and the groundwork prepared for development. Therefore, the remainder of this chapter will be devoted to an explanation of the notation adopted here and an abbreviated review of some of those deﬁnitions and terms from classical inference which are also relevant to the special world of nonparametric inference. A few new concepts and terms will also be introduced which are uniquely useful in nonparametric theory. The general theory of order statistics will be the subject of Chapter 2, since they play a fundamental role in many nonparametric techniques. Quantiles, coverages, and tolerance limits are also introduced here. Starting with Chapter 3, the important nonparametric techniques will be discussed in turn, organized according to the type of inference
8
CHAPTER 1
problem (hypothesis to be tested) in the case of hypotheses not involving statements about parameters, or the type of sampling situation (one sample, two independent samples, etc.) in the case of distributionfree techniques, or whichever seems more pertinent. Chapters 3 and 4 will treat tests of randomness and goodnessofﬁt tests, respectively, both nonparametric hypotheses which have no counterpart in classical statistics. Chapter 5 covers distributionfree tests of hypotheses and conﬁdence interval estimates of the value of a population quantile in the case of one sample or paired samples. These procedures are based on order statistics, signs, and signed ranks. When the relevant quantile is the median, these procedures relate to the value of a location parameter and are analogies to the onesample (pairedsample) tests for the population mean (mean difference) in classical statistics. Rankorder statistics are also introduced here, and we investigate the relationship between ranks and variate values. Chapter 6 introduces the twosample problem and covers some distributionfree tests for the hypothesis of identical distributions against general alternatives. Chapter 7 is an introduction to a particular form of nonparametric test statistic, called a linear rank statistic, which is especially useful for testing a hypothesis that two independent samples are drawn from identical populations. Those linear rank statistics which are particularly sensitive to differences only in location and only in scale are the subjects of Chapters 8 and 9, respectively. Chapter 10 extends this situation to the hypothesis that k independent samples are drawn from identical populations. Chapters 11 and 12 are concerned with measures of association and tests of independence in bivariate and multivariate sample situations, respectively. For almost all tests the discussion will center on logical justiﬁcation, null distribution and moments of the test statistic, asymptotic distribution, and other relevant distribution properties. Whenever possible, related methods of interval estimation of parameters are also included. During the course of discussion, only the briefest attention will be paid to relative merits of comparable tests. Chapter 13 presents some theorems relating to calculation of asymptotic relative efﬁciency, a possible criterion for evaluating large sample performance of nonparametric tests relative to each other or to parametric tests when certain assumptions are met. These techniques are then used to evaluate the efﬁciency of some of the tests covered earlier. Chapter 14 covers some special tests based on count data. Numerical examples of applications of the most commonly used nonparametric test and estimation procedures are included after the explanation of the theory. These illustrations of the techniques will
$
/ $
& ^
= = # $
& !! "!
`
= & $
#! !
$
= $ $ =
=
&
=
& >
& & ? #
#
{  }
 }  }
~  ~ ~
%
$ !
&
#
=
~

$ &
~ ~ ~  =
++ // "
& =
= #
 }
$  }
# $  }
 }  }  }

$ # & $ 
} } } } } }  }   #
= [
~ ~
! #! &# !
 }  }
12
CHAPTER 1
E
n X
! ¼
ai Xi
i¼1
var
n X
!
n X
¼
ai Xi
i¼1
cov
n X
n X i¼1
ai Xi ;
i¼1
¼
ai EðXi Þ
i¼1
n X
n X
a2i varðXi Þ þ 2
XX
ai aj covðXi ; Xj Þ
1 4 i 0; n1 ; n2 > 0
x > 1; n1 ; n2 > 0
The gamma and beta distributions shown in Table 2.1 each contains a special constant, denoted by GðaÞ and Bða; bÞ respectively. The gamma function, denoted by GðaÞ, is deﬁned as Z 1 GðaÞ ¼ xa1 ex dx for a > 0 0
Hypergeometric
Multinomial
Binomial
px ð1 pÞ1x
Discrete distributions Bernoulli
x
Nnx
P
04p41
x ¼ 0; 1; . . . ; n
x
Np NNp
0 4 pi 4 1;
P xi ¼ N,
pi ¼ 1
xi ¼ 0; 1; . . . ; N;
N! px1 px2 pxkk x1 !x2 ! xk ! 1 2
04p41
n px ð1 pÞnx x x ¼ 0; 1; . . . ; n
04p41
x ¼ 0; 1
Probability function fX ðxÞ
Name
Table 2.1 Some special probability functions
ðp1 et1 þ þ pk1 etk1 þ pk ÞN
ðpet þ 1 pÞn
pet þ 1 p
mgf
np
EðXi Þ ¼ Npi
np
p
E(X)
npð1 pÞ
ðN nÞ ðN 1Þ
VarðXi Þ ¼ Npi ð1 pi Þ CovðXi ; Xj Þ ¼ Npi pj
npð1 pÞ
pð1 pÞ
var(X)
INTRODUCTION AND FUNDAMENTALS 13
Normal
Continuous distributions Uniform on ða; bÞ
Uniform on 1, 2, . . ., N
ð1 pÞx1 p
Geometric
2
2
eð1=2s ÞðxmÞ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2ps2 1 < x; m < 1; s > 0
a 0
xa1 ð1 xÞb1 Bða; bÞ 0 < x < 1; a; b > 0
ð1 btÞ1 bt < 1
ð1 btÞa bt < 1
ð1 2tÞ1 2t < 1 at=b ; Gð1 þ t=bÞ
*The mgf is omitted here because the expression is too complicated.
Logistic
Laplace (double exponential)
Beta
gamma with a ¼ n=2; b ¼ 2
Chisquare(n) b
gamma with a ¼ 1
0 < x; a; b < 1
xa1 ex=b ba GðaÞ
Exponential
Gamma
y
y
a aþb
a1=b ; Gð1 þ 1=bÞ
n
b
ab
f2 p2 3
2f
ða þ bÞ2 ða þ b þ 1Þ
ab
a2=b ½Gð1 þ 2=bÞ G2 ð1 þ 1=bÞ
2n
b2
ab2
INTRODUCTION AND FUNDAMENTALS 15
,
  
}
= 
}
 }
}} $ [ } }
    ~ ~ ~
{
=  ~ %   } [  } ~  } } 
~ !#! ! &# ! !' * * +#!
 }  } # ! \   } }  }
 }
! ! ! ! !} ! # ! !  } =  "  } } "}  }
"  }
9
$
" "} " " #  }  } = ! $ = $ $  }  }
 }
#  } "  }

"}  } "  }  } ! " "} "  }   }  } "  }
#  } #  } 
 }
#  }   }

# "  }  } 
! !
%  
 
*#!*&/! 0 
} ~ % & '
18
CHAPTER 1
PðjX mj 5 kÞ 4
s2 k2
Note that the ﬁnite variance assumption guarantees the existence of the mean m. The following result, called the Central Limit Theorem (CLT), is one of the most famous in statistics. We state it for the simplest i.i.d. situation. CENTRAL LIMIT THEOREM
Let X1 ; X2 ; . . . ; Xn be a random sample from a population with mean m n be the sample mean. Then for n ! 1, X and variance s2 > 0 and pﬃﬃﬃ let the random variable nðXn mÞ=s has a limiting distribution that is normal with mean 0 and variance 1. For a proof of this result, typically done via the moment generating function, the reader is referred to any standard graduate level book on mathematical statistics. In some of the noni.i.d. situations there are other types of CLTs available. For example, if the X’s are independent but not identically distributed, there is a CLT generally attributed to Liapounov. We will not pursue these any further.
POINT AND INTERVAL ESTIMATION
A point estimate of a parameter is any single function of random variables whose observed value is used to estimate the true value. Let ^ yn ¼ uðX1 ; X2 ; . . . ; Xn Þ be a point estimate of a parameter y. Some desirable properties of ^yn are deﬁned as follows for all y. 1. Unbiasedness: Eð^ yn Þ ¼ y for all y. 2. Sufﬁciency: fX1 ; X2 ;...;Xn j^yn ðx1 ; x2 ; . . . ; xn j^yn Þ does not depend on y, or, equivalently, fX1 ;X2 ;...;Xn ðx1 ; x2 ; . . . ; xn ; yÞ ¼ gð^yn ; yÞHðx1 ; x2 ; . . . ; xn Þ where Hðx1 ; x2 ; . . . ; xn Þ does not depend on y. 3. Consistency (also called stochastic convergence and convergence in probability): lim Pðj^ yn yj > eÞ ¼ 0
n!1
for every e > 0
INTRODUCTION AND FUNDAMENTALS
19
If ^ yn is an unbiased estimate of y and limn!1 varð^yn Þ ¼ 0, then ^ yn is a consistent estimate of y, by Chebyshev’s inequality. b. ^ yn is a consistent estimate of y if the limiting distribution of ^yn is a degenerate distribution with probability 1 at y. Minimum mean squared error: E½ð^yn yÞ2 4 E½ðð^yn yÞ2 , for any other estimate ^ yn . Minimum variance unbiased: varð^yn Þ 4 varð^yn Þ for any other estimate ^ yn where both ^yn and ^yn are unbiased. a.
4. 5.
An interval estimate of a parameter y with conﬁdence coefﬁcient 1 a, or a 100ð1 aÞ percent conﬁdence interval for y, is a random interval whose end points U and V are functions of observable random variables (usually sample data) such that the probability statement PðU < y < VÞ ¼ 1 a is satisﬁed. The probability PðU < y < VÞ should be interpreted as PðU < yÞ þ PðV > yÞ since the conﬁdence limits U and V are random variables (depending on the random sample) and y is a ﬁxed quantity. In many cases this probability can be expressed in terms of a pivotal statistic and the limits can be obtained via tabulated percentiles of standard probability distributions such as the standard normal or the chisquare. A pivotal statistic is a function of a statistic and the parameter of interest such that the distribution of the pivotal statistic is free from the parameter (and is often known or at least pﬃﬃﬃ derivable). For example, t ¼ nðX mÞ=S is a pivotal statistic for setting up a conﬁdence interval for the mean m of a normal population with an unknown standard deviation. The random variable t follows a Student’s tðn1Þ distribution and is thus free from any unknown parameter. All standard books on mathematical statistics cover the topic of conﬁdence interval estimation. A useful technique for ﬁnding point estimates for parameters which appear as unspeciﬁed constants (or as functions of such constants) in a family of probability functions, say fX ð:; yÞ, is the method of maximum likelihood. The likelihood function of a random sample of size n from the population fX ð:; yÞ is the joint probability function of the sample variables regarded as a function of y, or n Y fX ðxi ; yÞ Lðx1 ; x2 ; . . . ; xn ; yÞ ¼ i¼1
A maximumlikelihood estimate (MLE) of y is a value ^y such that for all y, yÞ 5 Lðx1 ; x2 ; . . . ; xn ; yÞ Lðx1 ; x2 ; . . . ; xn ; ^
%
[ Y! >
&
$
Y! Y! *"*!! !'
$
$
? *~
* *$ $ & *~
*~
$ $ +
*~ + = / *~ / +
/+
/
$
$ *~ /
*~ * / +
/ +
0 2 / 4 # [
INTRODUCTION AND FUNDAMENTALS
21
nonparametric hypothesis testing, some confusion might arise if these distinctions were adhered to here. So the symbol a will be used to denote either the size of the test or the signiﬁcance level or the probability of a type I error, prefaced by the adjective ‘‘exact’’ whenever supy 2 o aðyÞ ¼ a. The power of a test is the probability that the test statistic will lead to a rejection of H0 , denoted by PwðyÞ ¼ PðT 2 RÞ. Power is of interest mainly as the probability of a correct decision, and so the power is typically calculated when H0 if false, or H1 is true, and then PwðyÞ ¼ PðT 2 R j y 2 O oÞ ¼ 1 bðyÞ. The power depends on the following four variables: 1. 2. 3. 4.
The degree of falseness of H0 , that is, the amount of discrepancy between the assertions stated in H0 and H1 The size of the test a The number of observable random variables involved in the test statistic, generally the sample size The critical region or rejection region R
The power function of a test is the power when all but one of these variables are held constant, usually item 1. For example, we can study the power of a particular test as a function of the parameter y, for a given sample size and a. Typically, the power function is displayed as a plot or a graph of the values of the parameter y on the X axis against the corresponding power values of the test on the Y axis. To calculate the power of a test, we need the distribution of the test statistic under the alternative hypothesis. Sometimes such a result is either unavailable or is much too complicated to be derived analytically; then computer simulations can be used to estimate the power of a test. To illustrate, suppose we would like to estimate the power of a test for the mean m of a population with H0 : m ¼ 10. We can generate on the computer a random sample from the normal distribution with mean 10 (and say variance equal to 1) and apply the test at a speciﬁed level a. If the null hypothesis is rejected, we call it a success. Now we repeat this process of generating a same size sample from the normal distribution with mean 10 and variance 1, say 1000 times. At the end of these 1000 simulations we ﬁnd the proportion of successes, i.e., the proportion of times when the test rejects the null hypothesis. This proportion is an empirical estimate of the nominal size of a test which was set a priori. To estimate power over the alternative, for example, we can repeat the same process but with samples from a normal distribution with, say, mean 10.5 and variance 1. The proportion of successes from these
~
# & / #
~~~
 ~~~ $ " #
$ "
$ ++
//
$ #
$
&
8 } [ # *~
*
/+
/ ~ 
/ /
8 8
= &
8 8 = *~ + / *~ $ / /
/"
INTRODUCTION AND FUNDAMENTALS
23
logarithm is one of the most commonly used g(.) functions. The likelihoodratio test is always a function of sufﬁcient statistics, and the principle often produces a uniformly most powerful test when such exists. A particularly useful property of T for constructing tests based on large samples is that, subject to certain regularity conditions, the probability distribution of 2 ln T approaches the chisquare distribution with k1 k2 degrees of freedom as n ! 1, where k1 and k2 are, respectively, the dimensions of the spaces O and o; k2 < k1 . All these concepts should be familiar to the reader, since they are an integral part of any standard introductory probability and inference course. We now turn to a few concepts which are especially important in nonparametric inference. P VALUE
An alternative approach to hypothesis testing is provided by computing a quantity called the P value, sometimes called a probability value or the associated probability or the signiﬁcance probability. A P value is deﬁned as the probability, when the null hypothesis H0 is true, of obtaining a sample result as extreme as, or more extreme than (in the direction of the alternative), the observed sample result. This probability can be computed for the observed value of the test statistic or some function of it like the sample estimate of the parameter in the null hypothesis. For example, suppose we are testing H0 : m ¼ 50 versus H1 : m > 50 and we observe the sample result for X is 52. The P value is computed as PðX 5 52 j m ¼ 50Þ. The appropriate direction here is values of X that are greater than or equal to 52, since the alternative is m greater than 50. It is frequently convenient to simply report the P value and go no further. If a P value is small, this is interpreted as meaning that our sample produced a result that is rather rare under the assumption of the null hypothesis. Since the sample result is a fact, it must be that the null hypothesis statement is inconsistent with the sample outcome. In other words, we should reject the null hypothesis. On the other hand, if a P value is large, the sample result is consistent with the null hypothesis and the null hypothesis is not rejected. If we want to use the P value to reach a decision about whether H0 should be rejected, we have to select a value for a. If the P value is less than or equal to a, the decision is to reject H0 ; otherwise, the decision is not to reject H0 . The P value is therefore the smallest level of signiﬁcance for which the null hypothesis would be rejected.
&
=  / ~~} } /} /
~~
 }
&  Y $=
& = #
& [ ~~ ~~ & =
= # # ? # # &
#
# = / *~ 9 # # * = /
 /
[ & [#  \  !!
$  $
 ++
//
>
=
=
= $ &
= =
$ $
/
~
~
/
/ ~
/+
/
[ #
~
~
/+
/ ~
/
 }
}
=
,
" 
$
# 4
#
& = " 4 :
: : # [
# : *~ * 4 $! : : :
* *~
$! # * ]  * *~

 $!
# $
9
$!
= = #
! # $! Y $!

] # # # $
[  =
=
4 *~ ~ # ~ =
$ ~
~ \  ; !!
%
=
++
# // # = = % ~ 
/ & /
=
28
CHAPTER 1
number of jump points in the cdf of the test statistic. These exact probabilities will be called exact a values, or natural signiﬁcance levels. The region can then be chosen such that either (1) the exact a is the largest number which does not exceed the nominal a or (2) the exact a is the number closest to the nominal a. Although most statisticians seem to prefer the ﬁrst approach, as it is more consistent with classical test procedures for a composite H0 , this has not been universally agreed upon. As a result, two sets of tables of critical values of a test statistic may not be identical for the same nominal a; this can lead to confusion in reading tables. The entries in each table in the Appendix of this book are constructed using the ﬁrst approach for all critical values. Disregarding that problem now, suppose we wish to compare the performance, as measured by power, of two different discrete test statistics. Their natural signiﬁcance levels are unlikely to be the same, so identical nominal a values do not ensure identical exact probabilities of a type I error. Power is certainly affected by exact a, and power comparisons of tests may be quite misleading without identical exact a values. A method of equalizing exact a values is provided by randomized test procedures. A randomized decision rule is one which prescribes rejection of H0 always for a certain range of values of the test statistic, rejection sometimes for another nonoverlapping range, and acceptance otherwise. A typical rejection region of exact size as a might be written T 2 R with probability 1 if T 5 t2 , and with probability p if t1 4 T < t2 , where t1 < t2 and 0 < p < 1 are chosen such that PðT 5 t2 jH0 Þ þ pPðt1 4 T < t2 jH0 Þ ¼ a Some random device could be used to make the decision in practice, like drawing one card at random from 100, of which 100p are labeled reject. Such decision rules may seem an artiﬁcial device and are probably seldom employed by experimenters, but the technique is useful in discussions of theoretical properties of tests. The power of such a randomized test against an alternative H1 is PwðyÞ ¼ PðT 5 t2 jH1 Þ þ pPðt1 4 T < t2 jH1 Þ A simple example will sufﬁce to explain the procedure. A random sample of size 5 is drawn from the Bernoulli population. We wish to test H0 : y ¼ 0:5 versus H1 : y > 0:5 at signiﬁcance level 0.05. The test statistic is X, the number of successes in the sample, which has the
INTRODUCTION AND FUNDAMENTALS
29
binomial distribution with parameter y and n ¼ 5. A reasonable rejection region would be large values of X, and thus the six exact signiﬁcance levels obtainable without using a randomized test from Table C of the Appendix are:
c PðX 5 cjy ¼ 0:5Þ
5
4
3
2
1
0
1=32
6=32
16=32
26=32
31=32
1
A nonrandomized test procedure of nominal size 0.05 but exact size a ¼ 1=32 ¼ 0:03125 has rejection region X2R
for X ¼ 5
The randomized test with exact a ¼ 0:05 is found with t1 ¼ 4 and t2 ¼ 5 as follows: PðX 5 5jy ¼ 0:5Þ þ pPð4 4 X < 5Þ ¼ 1=32 þ pPðX ¼ 4Þ ¼ 0:05 so, 1=32 þ 5p=32 ¼ 0:05 and p ¼ 0:12 Thus the rejection region is X 2 R with probability 1 if X ¼ 5 and with probability 0.12 if X ¼ 4. Using Table C, the power of this randomized test when H1 : y ¼ 0:6 is Pwð0:6Þ ¼ PðX ¼ 5jy ¼ 0:6Þ þ 0:12 PðX ¼ 4jy ¼ 0:6Þ ¼ 0:0778 þ 0:12ð0:2592Þ ¼ 0:3110 CONTINUITY CORRECTION
The exact null distribution of most test statistics used in nonparametric inference is discrete. Tables of rejection regions or cumulative distributions are often available for small sample sizes only. However, in many cases some simple approximation to these null distributions is accurate enough for practical applications with moderatesized samples. When these asymptotic distributions are continuous (like the normal or chi square), the approximation may be improved by
30
CHAPTER 1
introduction a correction for continuity. This is accomplished by regarding the value of the discrete test statistic as the midpoint of an interval. For example, if the domain of a test statistic T is only integer values, the observed value is considered to be t 0:5. If the decision rule is to reject for T 5 ta=2 or T 4 t0a=2 and the largesample approximation to the distribution of
TEðTjH0 Þ sðTjH0 Þ
is the standard normal
under H0, the rejection region with continuity correction incorporated is determined by solving the equations ta=2 0:5 EðTjH0 Þ ¼ za=2 sðTjH0 Þ
and
t0a=2 þ 0:5 EðTjH0 Þ sðTjH0 Þ
¼ za=2
where za=2 satisﬁes Fðza=2 Þ ¼ 1 a=2. Thus the continuitycorrected, twosided, approximately size a rejection region is T 5 EðTjH0 Þ þ 0:5 þ za=2 sðTjH0 Þ
or
T 4 EðTjH0 Þ 0:5 za=2 sðTjH0 Þ Onesided rejection regions or critical ratios employing continuity corrections are found similarly. For example, in a onesided test with rejection region T 5 ta , for a nominal size a, the approximation to the rejection region with a continuity correction is determined by solving for ta in ta 0:5 EðTjH0 Þ ¼ za sðTjH0 Þ and thus the continuity corrected, onesided uppertailed, approximately size a rejection region is T 5 EðTjH0 Þ þ 0:5 þ za sðTjH0 Þ Similarly, the continuity corrected, onesided lowertailed, approximately size a rejection region is T 4 EðTjH0 Þ 0:5 za sðTjH0 Þ The P value for a onesided test based on a statistic whose null distribution is discrete is often approximated by a continuous distribution, typically the normal, for large sample sizes. Like the rejection regions above, this approximation to the P value can usually be improved by incorporating a correction for continuity. For example, if the alternative is in the upper tail, and the observed value of an integervalued test statistic T is t0 , the exact P value PðT 5 t0 jH0 Þ is
INTRODUCTION AND FUNDAMENTALS
31
approximated by PðT 5 t0 0:5jH0 Þ. In the Bernoulli case with n ¼ 20; H0 : y ¼ 0:5 versus H1 : y > 0:5, suppose we observe X ¼ 13 successes. The normal approximation to the P value with a continuity correction is
X 10 12:5 10 pﬃﬃﬃ > pﬃﬃﬃ PðX 5 13jH0 Þ ¼ PðX > 12:5Þ ¼ P 5 5 ¼ PðZ > 1:12Þ ¼ 1 Fð1:12Þ ¼ 0:1314 This approximation is very close to the exact P value of 0.1316 from Table C. The approximate P value without the continuity correction is 0.0901, and thus the continuity correction greatly improves the P value approximation. In general, let t0 be the observed value of the test statistic T whose null distribution can be approximated by the normal distribution. When the alternative is in the upper tail, the approximate P value with a continuity correction is given by t0 EðTjH0 Þ 0:5 1F sðTjH0 Þ In the lower tail, the continuity corrected approximate P value is given by t0 EðTjH0 Þ þ 0:5 F sðTjH0 Þ When the alternative is twosided, the continuity corrected approximate P value can be obtained using these two expressions and applying the recommendations given earlier under P value.
2 Order Statistics, Quantiles, and Coverages
2.1 INTRODUCTION
Let X1 ; X2 ; . . . ; Xn denote a random sample from a population with continuous cdf FX . First let FX be continuous, so that the probability is zero that any two or more of these random variables have equal magnitudes. In this situation there exists a unique ordered arrangement within the sample. Suppose that Xð1Þ denotes the smallest of the set X1 ; X2 ; . . . ; Xn ; Xð2Þ denotes the second smallest; . . . and XðnÞ denotes the largest. Then Xð1Þ < Xð2Þ < < XðnÞ denotes the original random sample after arrangement in increasing order of magnitude, and these are collectively termed the order statistics of the random sample X1 ; X2 ; . . . ; Xn . The rth smallest, 1 4 r 4 n, of the ordered X ’s, XðrÞ , is called the rthorder statistic. Some familiar 32
ORDER STATISTICS, QUANTILES, AND COVERAGES
33
applications of order statistics, which are obvious on reﬂection, are as follows: 1. 2.
3.
4. 5. 6.
7.
XðnÞ , the maximum (largest) value in the sample, is of interest in the study of ﬂoods and other extreme meteorological phenomena. Xð1Þ , the minimum (smallest) value, is useful for phenomena where, for example, the strength of a chain depends on the weakest link. The sample median, deﬁned as X½ðnþ1Þ=2 for n odd and any number between Xðn=2Þ and Xðn=2þ1Þ for n even, is a measure of location and an estimate of the population central tendency. The sample midrange, deﬁned as ðXð1Þ þ XðnÞ Þ=2, is also a measure of central tendency. The sample range XðnÞ Xð1Þ is a measure of dispersion. In some experiments, the sampling process ceases after collecting r of the observations. For example, in lifetesting electric light bulbs, one may start with a group of n bulbs but stop taking observations after the rth bulb burns out. Then information is available only on the ﬁrst r ordered ‘‘lifetimes’’ Xð1Þ < Xð2Þ < < XðrÞ , where r 4 n. This type of data is often referred to as censored data. Order statistics are used to study outliers or extreme observations, e.g., when socalled dirty data are suspected.
The study of order statistics in this chapter will be limited to their mathematical and statistical properties, including joint and marginal probability distributions, exact moments, asymptotic moments, and asymptotic marginal distributions. Two general uses of order statistics in distributionfree inference will be discussed later in Chapter 5, namely, interval estimation and hypothesis testing of population percentiles. The topic of tolerance limits for distributions, including both onesample and twosample coverages, is discussed later in this chapter. But ﬁrst, we must deﬁne another property of probability functions called the quantile function. 2.2 THE QUANTILE FUNCTION
We have already talked about using the mean, the variance, and other moments to describe a probability distribution. In some situations we may be more interested in the percentiles of a distribution, like the ﬁftieth percentile (the median). For example, if X represents the breaking strength of an item, we might be interested in knowing
34
CHAPTER 2
the median strength, or the strength that is survived by 60 percent of the items, i.e., the fortieth percentile point. Or we may want to know what percentage of the items will survive a pressure of say 3 lb. For questions like these, we need information about the quantiles of a distribution. A quantile of a continuous cdf FX of a random variable X is a real number that divides the area under the pdf into two parts of speciﬁc amounts. Only the area to the left of the number need be speciﬁed since the entire area is equal to one. The pth quantile (or the 100pth percentile) of FX is that value of X, say Xp, such that 100p percent of the values of X in the population are less than or equal to Xp , for any positive fraction pð0 < p < 1Þ. In other words, Xp is a parameter of the population that satisﬁes PðX 4 Xp Þ ¼ p, or, in terms of the cdf FX ðXp Þ ¼ p. If the cdf of X is strictly increasing, the pth quantile is the unique solution to the equation Xp ¼ FX1 ð pÞ ¼ QX ð pÞ, say. We call QX ð pÞ; 0 < p < 1, the inverse of the cdf, the quantile function (qf ) of the random variable X. Consider, for example, a random variable from the exponential distribution with b ¼ 2. Then Table 2.1 in Chapter 1 indicates that the cdf is ( 0 x < 0 FX ðxÞ ¼ x=2 1e x50 Since 1 eXp =2 ¼ p for x > 0, the inverse is Xp ¼ 2 lnð1 pÞ for 0 < p < 1, and hence the quantile function is QX ðpÞ ¼ 2 lnð1 pÞ. The cdf and the quantile function for this exponential distribution are shown in Figures 2.1 and 2.2, respectively. Suppose the distribution of the breaking strength random variable X is this exponential with b ¼ 2. The reader can verify that the ﬁftieth percentile QX ð0:5Þ is 1.3863, and the fortieth percentile QX ð0:4Þ is 1.0217. The proportion that exceeds a breaking strength of 3 pounds is 0.2231. In general, we deﬁne the pth quantile QX ð pÞ as the smallest X value at which the cdf is at least equal to p, or QX ðpÞ ¼ FX1 ðpÞ ¼ inf ½x : FX ðxÞ 5 p
0 : 1
if x < Xð1Þ if Xði1Þ 4 x < XðiÞ ; i ¼ 1; 2; . . . ; n
ð3:1Þ
if x 5 XðnÞ
Suppose that a random sample of size n ¼ 5 is given by 9.4, 11.2, 11.4, 12, and 12.6. The edf of this sample is shown in Figure 3.1. Clearly, Sn ðxÞ is a step (or a jump) function, with jumps occuring at the (distinct) ordered sample values, where the height of each jump is equal to the reciprocal of the sample size, namely 1=5 or 0.2. When more than one observation has the same value, we say these observations are tied. In this case the edf is still a step function but it jumps only at the distinct ordered sample values Xð jÞ and the height of the jump is equal to k=n, where k is the number of data values tied at Xð jÞ . We now discuss some of the statistical properties of the edf Sn ðxÞ. Let Tn ðxÞ ¼ nSn ðxÞ, so that Tn ðxÞ represents the total number of sample values that are less than or equal to the speciﬁed value x.
38
CHAPTER 2
Fig. 3.1 An empirical distribution function for n ¼ 5.
For any ﬁxed real value x, the random variable Tn ðxÞ has a binomial distribution with parameters n and FX ðxÞ.
Theorem 3.1
For any ﬁxed real constant x and i ¼ 1, 2, . . . , n, deﬁne the indicator random variable
Proof
di ðxÞ ¼ I½Xi 4 x ¼
1 0
if Xi 4 x if Xi > x
The random variables d1 ðxÞ; d2 ðxÞ; . . . ; dn ðxÞ are independent and identically distributed, each with the Bernoulli distribution with parameter P y, where y ¼ P½di ðxÞ ¼ 1 ¼ PðXi 4 xÞ ¼ FX ðxÞ. Now, since Tn ðxÞ ¼ ni¼1 di ðxÞ is the sum of n independent and identically distributed Bernoulli random variables, it can be easily shown that Tn ðxÞ has a binomial distribution with parameters n and y ¼ FX ðxÞ. From Theorem 3.1, and using properties of the binomial distribution, we get the following results. The proofs are left for the reader. Corollary 3.1.1
(a)
The mean and the variance of Sn ðxÞ are
E½Sn ðxÞ ¼ FX ðxÞ
ORDER STATISTICS, QUANTILES, AND COVERAGES
(b)
39
Var½Sn ðxÞ ¼ FX ðxÞ½1 FX ðxÞ=n
Part (a) of the corollary shows that Sn ðxÞ, the proportion of sample values less than or equal to the speciﬁed value x, is an unbiased estimator of FX ðxÞ. Part (b) shows that the variance of Sn ðxÞ tends to zero as n tends to inﬁnity. Thus, using Chebyshev’s inequality, we can show that Sn ðxÞ is a consistent estimator of FX ðxÞ. For any ﬁxed real value x; Sn ðxÞ is a consistent estimator of FX ðxÞ, or, in other words, Sn ðxÞ converges to FX ðxÞ in probability.
Corollary 3.1.2
Corollary 3.1.3
E½Tn ðxÞTn ð yÞ ¼ nFX ðxÞFX ðyÞ, for x < y.
The convergence in Corollary 3.1.2 is for each value of x individually, whereas sometimes we are interested in all values of x, collectively. A probability statement can be made simultaneously for all x, as a result of the following important theorem. To this end, we have the following classical result [see Fisz (1963), for example, for a proof]. Theorem 3.2 (GlivenkoCantelli Theorem)
FX ðxÞ with probability 1, that is, P½ lim
sup
n!1 1 > > > 1 2 > > < u4 if Xð2Þ > > > n n > < 2 3 Qn ðuÞ ¼ Xð3Þ < u4 if > n n > > > > . . . . . . . . . . . . . . . . . . . . . ... > > > > > n1 > : XðnÞ < u41 if n Thus Qn ðuÞ ¼ inf½x : Sn ðxÞ 5 u. Accordingly, the empirical (or the sample) quantiles are just the ordered values in a sample. For example, if n ¼ 10, the estimate of the 0.30th quantile or the 30th percentile 2 3 < 0:3 4 10 . This is consistent with is simply Q10 ð0:3Þ ¼ Xð3Þ ; since 10 the usual deﬁnition of a quantile or a percentile since 30 percent of the data values are less than or equal to the third order statistic in a sample of size 10. However, note that according to deﬁnition, the 0.25th quantile or the 25th percentile (or the 1st quartile) is also equal to Xð3Þ since 2=10 < 0:25 4 3=10. Thus the sample order statistics are point estimates of the corresponding population quantiles. For this reason, a study of the properties of order statistics is as important in nonparametric analysis as the study of the properties of the sample mean in the context of a parametric analysis. 2.4 STATISTICAL PROPERTIES OF ORDER STATISTICS
As we have outlined, the order statistics have many useful applications. In this section we derive some of their statistical properties. CUMULATIVE DISTRIBUTION FUNCTION (CDF) OF XðrÞ
For any ﬁxed real t n X PðXðrÞ 4 tÞ ¼ P½nSn ðtÞ ¼ i
Theorem 4.1
i¼r
¼
n
X n i¼r
i
½FX ðtÞi ½1 FX ðtÞni
1 k¼0 > > > > : ð2u 1Þmk1 ð1 uÞkþm if 1=2 < u < 1 Veriﬁcation of these results is left for the reader. DISTRIBUTION OF THE RANGE
A similar procedure can be used to obtain the distribution of the range, deﬁned as R ¼ XðnÞ Xð1Þ The joint pdf of Xð1Þ and XðnÞ is fXð1Þ ; XðnÞ ðx; yÞ ¼ nðn 1Þ½FX ðyÞ FX ðxÞn2 fX ðxÞfX ðyÞ
x 0
ð7:3Þ
For the uniform distribution, the integrand in (7.3) is nonzero for the intersection of the regions 0 0
The righthand side is the cdf of a chisquare distribution with 2 degrees of freedom. As a numerical example of how this corollary enables us to approximate Dþ n;a , let a ¼ 0:05. Table B of the Appendix shows that 5.99 is the 0.05 critical point of chi square with 2 degrees of freedom. The procedure is to set 4nDþ2 n;0:05 ¼ 5:99 and solve to obtain p ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃ Dþ 1:4975=n ¼ 1:22= n n;0:05
4.4 APPLICATIONS OF THE KOLMOGOROVSMIRNOV ONESAMPLE STATISTICS
The statistical use of the KolmogorovSmirnov statistic in a goodnessofﬁt type of problem is obvious. Assume we have the random sample X1 ; X2 ; . . . ; Xn and the hypothesistesting situation H0 : FX ðxÞ ¼ F0 ðxÞ for all x, where F0 ðxÞ is a completely speciﬁed continuous distribution function. Since Sn ðxÞ is the statistical image of the population distribution FX ðxÞ, the differences between Sn ðxÞ and F0 ðxÞ should be small for all x except for sampling variation, if the null hypothesis is true. For the usual twosided goodnessofﬁt alternative. H1 : FX ðxÞ 6¼ F0 ðxÞ
for some x
large absolute values of these deviations tend to discredit the hypothesis. Therefore, the KolmogorovSmirnov goodnessofﬁt test with signiﬁcance level a is to reject H0 when Dn > Dn;a . From the GlivenkoCantelli theorem of Chapter 2, we know that Sn ðxÞ converges to FX ðxÞ with probability 1, which implies consistency.
#[
# #  > =
#
! ~ = ! ~ ! ~
!=  = $= ~ = ~ }~ ~ # \
# ~
j q
~~} ~} ~ ~
}
~~ ~
~
~}~
~ ~ ~
} ~
~}
} ~}~ ~
 ~}
~}~} ~ ~
~
! 
!
~
= # ! ~ ! ~ ! ~
=
# " ~
~~ }~ ~}
#
^ ? ? j q
! ~ ! ~ ! ~ ! ~ ! ~
~ ~} ~ ~ ~ ~ ~
~ ~
~
~
~} ~
~ ~ ~ ~ ~ ~ ~
~
~~ ~~ ~ ~}~ ~} ~~ ~ ~~ ~ ~~ ~ ~
~ ~
~~ ~ ~~ ~ ~~ ~ ~~
~ ~} ~ ~ ~ ~ ~
~ ~
~
~
~} ~
~ ~ ~ ~ ~ ~ ~
~
~~
~}} ~} ~ ~} ~} ~} ~} ~}~ ~ ~ ~
~ ~ ~~ ~~ ~~
~~ ~~ ~~}
~ ~} ~ ~
~ ~} ~~ ~} ~} ~} ~}} ~} ~ ~
~ ~} ~ ~~ ~~
~~
~~
~}} ~} ~ ~} ~} ~} ~} ~}~ ~ ~ ~
~ ~ ~~ ~~ ~~
~~ ~~ ~~}
~ ~} ~ ~
~ ~} ~~ ~} ~} ~} ~}} ~} ~ ~
~ ~} ~ ~~ ~~
~~
=
~
 ~ 
} ~  }  z! !! %
# [ #
# # ! ~ #
* ~
* ~
*~ ]
Y
#
# #[ > &
# = # # # # = # # = ~~}  @ }~ # ~}~~ ~}} [ ~~} ~}} = *~ * ~ ~~~
*~ * * ~ ~
 = ~~~
*~ * = ~~ = ~~ ~~} [ $ $
!=  [ $ $ &
#[
^
= [ #
Y@ $] #[
#[
& #!
? $=
! 
! !  ] & ~ 
8 = ! ~ @ !  8 @
 ! % ~ ? # ~ ~
8 @
[ ?
# #
? " ! " [
= \ &  = !
!
}
~  [$[ % = !=  ++ //  [$[
} @ = = @
# #[
# #[ & /
[ !=  @= #[ # !" !;
 = = # %
  $= ~
$=
Fig. 4.1 SAS Interactive Data Analysis for Example 4.1.
126 CHAPTER 4
Fig. 4.2 SAS Interactive Data Analysis output for conﬁdence band for Example 4.1.
TESTS OF GOODNESS OF FIT 127
Fig. 4.3 SAS Interactive Data Analysis Conﬁdence Bands output for Example 5.1.
128 CHAPTER 4
$
= & ! % ~} ~ > & %
& ~~}  ~ $=
~} ~}
= ~}~ & ~ ~
} ~}~
#[
# [ } [& 
Y
= } = ~ ^ [  =
~
=
$ =  $  ! $
$ 
~ $
$ 
 } =
#
= # \
%
= #
!/! ! 
# #
&
#[
# # & ~ ~ ' #[ # #[ = = # 
#[
= > Y # #[ $ #[ #
~ !
~ > 0 0
} }  $= ? = Y & ! [  
$ }
#
' #
j q
TESTS OF GOODNESS OF FIT
9,800 8,600
10,200 9,600
131
9,300 12,200
8,700 15,500
15,200 11,600
6,900 7,200
Solution Since the mean and variance are not speciﬁed, the most appropriate test is the Lilliefors’s test. The ﬁrst P P step is to calculate x and s. From the data we get pxﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼ 124; 800 and ðx xÞ2 ¼ 84;600;000 so that x ¼ 10;400 and s ¼ 84;600;000=11 ¼ 2;773:25. The corresponding standard normal variable is then z ¼ ðx 10;400Þ=2;773. The calculations needed for Dn are shown in Table 5.1 (p. 132). We ﬁnd Dn ¼ 0:1946 and P > 0:10 for n ¼ 12 from Table O. Thus, the null hypothesis that incomes are normally distributed is not rejected. The following computer printouts illustrate the solution to Example 5.1 using the SAS and MINITAB packages.
Note that both the MINITAB and SAS outputs refer to this as the KS test and not the Lilliefors’s test. Both calculate a modiﬁed KS statistic using formulas given in D’Agostino and Stephens (1986); the results agree with ours to two decimal places. SAS also provides the results for two other tests, called the AndersonDarling and the Crame´rvon Mises tests (see Problem 4.14). In this particular example, each of the tests fails to reject the null hypothesis of normality. In MINITAB one can go from Stat to Basic Statistics to Normality Test and the output is a graph, called Normal Probability Plot. The grid on
^ ? ?/ j q
~~ }~~
~~ ~~ ~~
~~ ~~ ~}~~ 
~~ }}~~ }~~ ~~
0
!
0
! 0
! 0
}
 ~
~
 ~~ ~} ~}} ~~ ~ ~
 
~~ ~
~}~~ ~ ~
~~~~ ~ ~
~~~ ~ ~
~~~~
~~ ~} ~} ~}~ ~
~ ~} ~} ~
~}} ~} ~
 ~~~~
~~}~ ~~
~~~ ~~
} ~~} ~ ~~ ~
~~
~~ ~~ ~~}
~~ ~~ ~~ ~~}~ ~~ ~~~ ~~ ~} ~~~~ ~~~ ~} ~~~ ~
= = ! "
#
# & =
? [ & \/$ [

[$[
&
=
% # !=  @

!=  \ $ $ [$[ ~ = = % # # $
++ // #[ [$[ ~ ,
!/! ! * @" !#
$
# # =
= &

#[ Y $ & #[ #
~ !
~ 0  0
~  > 0
Fig. 5.1 SAS/Interactive Data Analysis goodness of ﬁt for Example 5.1.
134 CHAPTER 4
Fig. 5.2 SAS/Interactive Data Analysis goodness of ﬁt for Example 5.1, continued.
TESTS OF GOODNESS OF FIT 135
Fig. 5.3 SAS/Interactive Data Analysis of goodness of ﬁt for Example 5.1, continued.
136 CHAPTER 4
Fig. 5.4 SAS/Interactive Data Analysis of goodness of ﬁt for Example 5.1, continued.
TESTS OF GOODNESS OF FIT 137
Fig. 5.5 SAS/Interactive Data Analysis of goodness of ﬁt for Example 5.1, continued.
138 CHAPTER 4
Fig. 5.6 SAS/Analyst QQ plot for Example 5.1.
TESTS OF GOODNESS OF FIT 139
%
0 # ~ 0 #[ $=
Y & ! [  # 
\  =
# =
j q ,
 } }  ~
}
! [ = #
0
/
 % ~} ~ = ~~ $ Y@ $] =
# [$[[email protected]$[ !=
 @ #[ /
= / #[
^ , ? ?/ j q ,  } } }
 ~
0
!
~ ~ 0
! ~ 0
! ~ 0
~}
~~ ~ ~ ~  }  

~~~~ ~}~~~ ~~~~ ~~~~ ~~~~ ~
~~~ ~~~~ ~~~~ ~~~~ ~~~~
~}} ~~ ~
 ~}}
~ ~

~ ~} ~~ ~
~} ~~ ~~
 ~}}
~~ ~
~~ ~~} ~~ ~
~
~}} ~}~ ~
 ~}}}
~ ~
~ ~~} ~~~ ~~
~
TESTS OF GOODNESS OF FIT
141
printout) is shown to be greater than 0.15. Thus, as with the hand calculations, we reach the same conclusion that there is not sufﬁcient evidence to reject the null hypothesis of an exponential distribution. SAS uses internal tables that are similar to those given by D’Agostino and Stephens (1986) to calculate the P value. Linear interpolation is used in this table if necessary. SAS provides the values of two other test statistics, called the Crame´rvon Mises and AndersonDarling tests; each fails to reject the null hypothesis and the P values are about the same.
We now redo Example 6.1 for the null hypothesis of the exponential distribution with mean speciﬁed as m ¼ 5:0. This is a simple null hypothesis for which the original KS test of Section 4.5 is applicable. The calculations are shown in Table 6.2 (p. 142). The KS test statistic is Dn ¼ 0:2687 with n ¼ 10, and we do not reject the null hypothesis since Table F gives P > 0:200. The SAS solution in this case is shown below. Each of the tests fails to reject the null hypothesis and the P values are about the same.
142
CHAPTER 4
Table 6.2 Calculations for the KS test with m ¼ 5:0 for the data in Example 6.1 x 1.5 2.3 2.5 4.2 4.6 6.5 7.1 8.4 9.3 10.4
z ¼ x/m
Sn ðxÞ
F0 ðzÞ
j Sn ðxÞ F0 ðzÞ j
j Sn ðx eÞ F0 ðzÞ
0.30 0.46 0.50 0.84 0.92 1.30 1.42 1.68 1.86 2.08
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.2592 0.3687 0.3935 0.5683 0.6015 0.7275 0.7583 0.8136 0.8443 0.8751
0.1592 0.1687 0.0935 0.1683 0.1015 0.1275 0.0583 0.0136 0.0557 0.1249
0.2592 0.2687 0.1935 0.2683 0.2015 0.2275 0.1583 0.1136 0.0443 0.0249
Finally suppose that we want to test the hypothesis that the population is exponential with mean m ¼ 3:0. Again, this is a simple null hypothesis for which SAS provides three tests mentioned earlier and all of them reject the null hypothesis. However, note the difference in the magnitudes of the P values between the KS test and the other two tests.
TESTS OF GOODNESS OF FIT
143
Currently MINITAB does not provide a direct goodnessofﬁt test for the exponential distribution but it does provide some options under a general visual approach. This is called probability plotting and is discussed in the next section. 4.7 VISUAL ANALYSIS OF GOODNESS OF FIT
With the advent of easily available computer technology, visual approaches to statistical data analysis have become popular. The subject is sometimes referred to as exploratory data analysis (EDA), championed by statisticians like John Tukey. In the context of goodnessofﬁt tests, the EDA tools employed include dot plots, histograms, probability plots, and quantile plots. The idea is to use some graphics to gain a quick insight into the underlying distribution and then, if desired, carry out a followup analysis with a formal conﬁrmatory test such as any of the tests covered earlier in this chapter. Dot plots and histograms are valuable exploratory tools and are discussed in almost all statistics books but the subject of probability and quantile plots is seldom covered, even though one of the key papers on the subject was published in the 1960s [Wilk and Gnanadesikan (1968)]. In this section we will present a brief discussion of these two topics. Fisher (1983) provided a good review of many graphical methods used in nonparametric statistics along with extensive references. Note that there are twosample versions of each of these plots but we do not cover that topic here.
144
CHAPTER 4
In what follows we distinguish between the theoretical and the empirical versions of a plot. The theoretical version is presented to understand the idea but the empirical version is the one that is implemented in practice. When there is no chance of confusion, the empirical plot is referred to as simply the plot. Two types of plots are popular in practice. The ﬁrst is the socalled probability plot, which is actually a probability versus probability plot, or a PP plot. This plot is also called a percentpercent plot, for obvious reasons. In general terms, the theoretical PP plot is the graph of a cdf FðxÞ versus a cdf GðxÞ for all values of x. Since the cdf ’s are probabilities, the PP plot is conveniently conﬁned to the unit square. If the two cdfs are identical, the theoretical PP plot will be the main diagonal, the 45 degree line through the origin. The second type of plot is the socalled quantile plot, which is actually a quantile versus quantile plot, or a QQ plot. The theoretical QQ plot is the graph of the quantiles of a cdf F versus the corresponding quantiles of a cdf G, that is, the graph ½F 1 ðpÞ; G1 ðpÞ for 0 < p < 1. If the two cdf ’s are identical, the theoretical QQ plot will be the main line through the origin. If
diagonal, the 45degree 1 , it is easily seen that F ðpÞ ¼ m þ sG1 ðpÞ, so that the FðxÞ ¼ G xm s pth quantiles of F and G have a linear relationship. Thus, if two distributions differ only in location and=or scale, the theoretical QQ plot will be a straight line with slope s and intercept m. In a goodnessofﬁt problem, there is usually a speciﬁed target cdf, say F0. Then the theoretical QQ plot is the plot ½F01 ðpÞ; FX1 ðpÞ; 0 < p < 1. Since FX is unknown, we can estimate it with the empirical cdf based on a random sample of size n, say Sn. Noting that the function Sn jumps only at the ordered values XðiÞ , the empirical QQ plot is simply the plot of F01 ði=nÞ on the horizontal axis versus S1 n ði=nÞ ¼ XðiÞ on the vertical axis, for i ¼ 1; 2; . . . ; n. As noted before, F0 is usually taken to be the standardized form of the hypothesized cdf, so that to establish the QQ plot (location and=or scale), underlying parameters do not need to be speciﬁed. This is one advantage of the QQ plot. The quantities ai ¼ i=n are called plotting positions. At i ¼ n, there is a problem since an ¼ F01 ð1Þ ¼ 1; modiﬁed plotting positions have been considered, with various objectives. One simple choice is ai ¼ i 0:5=n; other choices include ai ¼ i=n þ 1 and ai ¼ ði 0:375Þ=n þ 0:25, the latter being highly recommended by (1958). We found that many Blom $ statistical software package graph F01 ðði 0:375Þ=ðn þ 0:25ÞÞ; XðiÞ as the empirical QQ plot. For a given standardized cdf F0 , the goodnessofﬁt null hypothesis FX ¼ F0 is not rejected if this plot is approximately a straight line through
TESTS OF GOODNESS OF FIT
145
the origin. Departures from this line suggest the types of differences that could exist between FX and F0 . For example, if the plot resembles a straight line but with a nonzero intercept or with a slope other than 45 degrees, a locationscale model is indicated. This means FX belongs to the speciﬁed family of distributions but the location and the scale parameters of FX , namely m and s, are different from the standard values. When the empirical QQ plot is reasonably linear, the slope and the intercept of the plot can be used to estimate the scale and location parameter, respectively. When F0 is taken to be the standard normal distribution, the QQ plot is called a normal probability plot. When F0 is taken to be the standard exponential distribution ðmean ¼ 1Þ, the QQ plot is called an exponential probability plot. In summary, either the empirical PP or QQ plot can be used as an informal tool for the goodnessofﬁt problem but the QQ plot is more popular. If the plots appear to be close to the 45 degree straight line through the origin, the null hypothesis FX ¼ F0 is tentatively accepted. If the QQ plot is close to some other straight line, then FX is likely to be in the hypothesized locationscale family (as F0 ) and the unknown parameters can be estimated from the plot. For example, if a straight line is ﬁtted to the empirical QQ plot, the slope and the intercept of the line would estimate the unknown scale and the locationparameter, respectively; then the b X ¼ F0 xintercept . An advantage of the Qestimated distribution is F slope Q plot is that the underlying parameters do not need to be speciﬁed since F0 is usually taken to be the standard distribution in a family of distributions. By contrast, the construction of a PP plot requires speciﬁcation of the underlying parameters, so that the theoretical cdf can be evaluated at the ordered data values. The PP plot is more sensitive to the differences in the middle part of the two distributions (the data distribution and the hypothesized distribution), whereas the QQ plot is more sensitive to the differences in the tails of the two distributions. One potential issue with using plots in goodnessofﬁt problems is that the interpretation of a plot, with respect to linearity or near linearity, is bound to be somewhat subjective. Usually a lot of experience is necessary to make the judgment with a reasonable degree of conﬁdence. To make such an assessment more objective, several proposals have been made. One is based on the ‘‘correlation coefﬁcient’’ between the x and y coordinates; see Ryan and Joiner (1976) for a test in the context of a normal probability plot. For more details, see D’Agostino and Stephens (1986, Chap. 5).
146
CHAPTER 4
Table 7.1 Calculations for normal and exponential QQ plot for data in Example 6.1 Ordered data y 1.5 2.3 2.5 4.2 4.6 4.5 7.1 8.4 9.3 10.4
i
Plotpos ai ¼ i:375 10:25
Standard normal quantiles F1 ðai Þ
Standard exponential quantiles lnð1 ai Þ
1 2 3 4 5 6 7 8 9 10
0.060976 0.158537 0.256098 0.353659 0.451220 0.548780 0.646341 0.743902 0.841463 0.939024
1.54664 1.00049 0.65542 0.37546 0.12258 0.12258 0.37546 0.65542 1.00049 1.54664
0.062914 0.172613 0.295845 0.436427 0.600057 0.795801 1.039423 1.362197 1.841770 2.797281
Example 7.1 For the sample data given in Example 6.1 using ai ¼ ði 0:375Þ=ðn þ 0:25Þ, the calculations for a normal and exponential QQ plots are shown in Table 7.1. The two QQ plots are plotted in EXCEL and are shown in Figures 7.1 and 7.2. In each case a leastsquares line is ﬁtted to the plot. The slope and the intercept of
Fig. 7.1 Normal QQ plot for Example 7.1.
9
_ 9 != # != 
= # } Z !
# # # #[
/ # # #[ # >
# #
# # $
#[ # #[ =
# #
#[ & #
= # ? #[ =
& # \ = #[ #
=
% #
\
#[
`= # %
# # # #[
> #
# [ $ # # = & $
^ # & #
# # # ~
$
# $ # ~ #[
~
 ~  ~ } ~
~  ~
#
}
=
= ~
=
\ } } }  } Y
%  #[ # # #[ = # % = #[ =
# #
& / = # # Y
%


? ! [  ##
# # % # =# }
# #
=
/ 
\ /  = ? # ! 0 =
# #
Y@ $] ^#^ #
"# ! #
%
# % # # { # # #
}~~
{ @ {
~ < { }~ < # { } < # # {
} $ & Y {{{ $
&/ $

~ {
@
~ 

}
\ & $  } } }  } ~ 
$
 } # & \ #
# # & $
~{ '} ${ } }' ]{ } }' $]{ }
'  <
~~~
# {
@
}
} } '
'}
'} }
'  > # , [ \ }
\
} 
 } 
\ 
# 9 [ }
\
} } 
  }   
} \
] # # \
# #  \ } Z <
#
152
CHAPTER 4
4.9. Prove that i1 max FX ðXðiÞ Þ ;0 D n ¼ max 14i4n n þ 4.10. Prove that the probability distribution of D n is identical to the distribution of Dn :
(a) Using a derivation analogous to Theorem 3.4 (b) Using a symmetry argument 4.11. Using Theorem 3.3, verify that
pﬃﬃﬃ lim P Dn > 1:07= n ¼ 0:20 n!1
4.12. Find the minimum sample size n required such that PðDn < 0:05Þ 5 0:99. 4.13. Use Theorem 3.4 to verify directly that PðDþ 5 > 0:447Þ ¼ 0:10. Calculate this same probability using the expression given in (3.5). 4.14. Related goodnessofﬁt test. The Crame´rvon Mises type of statistic is deﬁned for continuous FX ðxÞ by Z o2n ¼
1
1
½Sn ðxÞ FX ðxÞ2 fX ðxÞ dx
(a) Prove that o2n is distribution free. (b) Explain how o2n might be used for a goodnessofﬁt test. (c) Show that no2n ¼
n X 1 2i 1 2 FX XðiÞ þ 12n i¼1 n
This statistic is discussed in Crame´r (1928), von Mises (1931), Smirnov (1936), and Darling (1957). 4.15. Suppose we want to estimate the cumulative distribution function of a continuous population using the empirical distribution function such that the probability is 0.90 that the error of the estimate does not exceed 0.25 anywhere. How large a sample size is needed? 4.16. If we wish to estimate a cumulative distribution within 0.20 units with probability 0.95, how large should n be? 4.17. A random sample of size 13 is drawn from an unknown continuous population FX ðxÞ, with the following results after array: 3:5; 4:1; 4:8; 5:0; 6:3; 7:1; 7:2; 7:8; 8:1; 8:4; 8:6; 9:0 A 90% conﬁdence band is desired for FX ðxÞ. Plot a graph of the empirical distribution function Sn ðxÞ and resulting conﬁdence bands. 4.18. In a vibration study, a random sample of 15 airplane components were subjected to severe vibrations until they showed structural failures. The data given are failure times in minutes. Test the null hypothesis that these observations can be regarded as a sample from the exponential population with density function f ðxÞ ¼ ex=10 =10 for x 5 0.

~   } ~ }~ 
}
 $ !=
 #
~~~~ }~~~ % & @ & [ & != 
}~
} } ' '  \ & ¡
~}
~}~ ^
# ~~~ ?
~~~ =
$ @
@
~ ~
 
} }~
}
}

!= 
# ~~ ' != }{ ^
  ~ !
 & & & ~  } & ~ & &
& = { ~ ~ ~ ~  } } } } &
~  } ~~ \ ~#& &
& # ^
#
# #
{
; "
~  } Y
} 
 ~ ~
, [ ;" / @ & = ¢~
¢}~ $ ~
&" { ~ ~
~  ~  
~  ~ 9 $ & & $
{ ${ ]{ ' ~ { ~
~ \{ ?
~ & ++ // { : ~
}
%

& & # [ ~~ { :
%

\ Z \ # # ? { Y ~ [ ~~~ ? ~~ @ >
TESTS OF GOODNESS OF FIT
155
Be brief but speciﬁc about which statistical procedure to use and why it is preferred and outline the steps in the procedure. 4.29. Compare and contrast the chisquare and KolmogorovSmirnov goodnessofﬁt procedures. 4.30. For the data x : 1:0; 2:3; 4:2; 7:1; 10:4, use the most appropriate procedure to test the null hypothesis that the distribution is (a) Exponential F0 ðxÞ ¼ 1 elx (estimate l by 1=x) (b) Normal In each part, carry the parameter estimates to the nearest hundredth and the distribution estimates to the nearest ten thousandth. 4.31. A statistics professor claims that the distribution of ﬁnal grades from A to F in a particular course invariably is in the ratio 1:3:4:1:1. The ﬁnal grades this year are 26 A’s, 50 B’s, 80 C’s, 35 D’s, and 10 F’s. Do these results refute the professor’s claim? 4.32. The design department has proposed three different package designs for the company’s product; the marketing manager claims that the ﬁrst design will be twice as popular as the second design and that the second design will be three times as popular as the third design. In a market test with 213 persons, 111 preferred the ﬁrst design, 62 preferred the second design, and the remainder preferred the third design. Are these results consistent with the marketing manager’s claim? 4.33. A quality control engineer has taken 50 samples, each of size 13, from a production process. The numbers of defectives are recorded below. Number of defects 0 1 2 3 4 5 6 or more
Sample frequency 9 26 9 4 1 1 0
(a) Test the null hypothesis that the number of defectives follows a Poisson distribution. (b) Test the null hypothesis that the number of defectives follows a binomial distribution. (c) Comment on your answers in (a) and (b). 4.34. Ten students take a test and their scores (out of 100) are as follows: 95; 80; 40; 52; 60; 80; 82; 58; 65; 50 Test the null hypothesis that the cumulative distribution function of the proportion of right answers a student gets on the test is ( 0 x1
5 OneSample and PairedSample Procedures
5.1 INTRODUCTION
In the general onesample problem, the available data consist of a single set of observations, usually a random sample, from a cdf FX on which inferences can be based regarding some aspect of FX . The tests for randomness in Chapter 3 relate to inferences about a property of the joint probability distribution of a set of sample observations which are identically distributed but possibly dependent, i.e., the probability distribution of the data. The hypothesis in a goodnessofﬁt study in Chapter 4 is concerned with the univariate population distribution from which a set of independent variables is drawn. These hypotheses are so general that no analogous counterparts exist within the realm of parametric statistics. Thus these problems are more suitable to be viewed under nonparametric procedures. In a classical onesample inference problem, the singlesample data are used to obtain information about some particular aspect of the population distribution, 156
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
157
usually one or more of its parameters. Nonparametric techniques are useful here too, particularly when a location parameter is of interest. In this chapter we shall be concerned with the nonparametric analog of the normaltheory test (variance known) or Student’s t test (variance unknown) for the hypotheses H0 : m ¼ m0 and H0 : mX mY ¼ mD ¼ m0 for the onesample and pairedsample problems, respectively. The classical tests are derived under the assumption that the single population or the population of differences of pairs is normal. For the nonparametric tests, however, only certain continuity assumptions about the populations need to be postulated to determine sampling distributions of the test statistics. The hypotheses here are concerned with the median or some other quantile rather than the mean as the location parameter, but both the mean and the median are good indexes of central tendency and they do coincide for symmetric populations. In any population, the median always exists (which is not true for the mean) and it is more robust as an estimate of location. The procedures covered here include conﬁdence intervals and tests of hypotheses about any speciﬁed quantile. The case of the median is treated separately and the popular sign test and the Wilcoxon signedrank test, including both hypothesis testing and conﬁdence interval techniques, are presented. The complete discussion in each case will be given only for the singlesample case, since with pairedsample data once the differences of observations are formed, we have essentially only a single sample drawn from the population of differences and thus the methods of analysis are identical. We also introduce rankorder statistics and present a measure of the relationship between ranks and variate values. 5.2 CONFIDENCE INTERVAL FOR A POPULATION QUANTILE
Recall from Chapter 2 that a quantile of a continuous random variable X is a real number that divides the area under the probability density function into two parts of speciﬁed amounts. Only the area to the left of the number need be speciﬁed since the entire area is equal to 1. Let FX be the underlying cdf and let kp , for all 0 < p < 1, denote the pth quantile, or the 100pth percentile, or the quantile of order p of FX . Thus, kp is deﬁned to be any real number which is a solution to the equation FX ðkp Þ ¼ p, and in terms of the quantile function, kp ¼ QX ð pÞ ¼ FX1 ð pÞ. We shall assume here that a unique solution (inverse) exists, as would be the case for a strictly increasing function FX . Note that kp is a parameter of the population FX , and to emphasize
158
CHAPTER 5
this point we use the Greek letter kp instead of the Latin letter QX ð pÞ used before in Chapter 2. For example, k0:50 is the median of the distribution, a measure of central tendency. First we consider the problem where a conﬁdence interval estimate of the parameter kp is desired for some speciﬁed value of p, given a random sample X1 ; X2 ; . . . ; Xn from the cdf FX . As discussed in Chapter 2, a natural point estimate of kp would be the pth sample quantile, which is the ðnpÞthorder statistic, provided of course that np is an integer. For example, since 100p percent of the population values are less than or equal to the pth population quantile, the estimate of kp is that value from a random sample such that 100p percent of the sample values are less than or equal to it. We deﬁne XðrÞ to be the pth sample quantile where r is deﬁned by np if np is an integer r¼ ½np þ 1 if np is not an integer and [x] denotes the largest integer not exceeding x. This is just a convention adopted so that we can handle situations where np is not an integer. Other conventions are sometimes adopted. In our case, the pth sample quantile QX ð pÞ is equal to XðnpÞ if np is an integer, and Xð½npþ1Þ if np is not an integer. A point estimate is not sufﬁcient for inference purposes. We know from Theorem 10.1 of Chapter 2 that the rthorder statistic is a consistent estimator of the pth quantile of a distribution when n ! 1 and r=n ! p. However, consistency is only a largesample property. We would like a procedure for interval estimation of kp which will enable us to attach a conﬁdence coefﬁcient to our estimate for the given (ﬁnite) sample size. A logical choice for the conﬁdence interval endpoints are two order statistics, say XðrÞ and XðsÞ ; r < s, from the random sample drawn from the population FX . To ﬁnd the 100ð1 aÞ% conﬁdence interval, we must then ﬁnd the two integers r and s; 1 4 r < s 4 n, such that PðXðrÞ < kp < XðsÞ Þ ¼ 1 a for some given number 0 < a < 1. The quantity 1 a, which we frequently denote by g, is called the conﬁdence level or the conﬁdence coefﬁcient. Now the event XðrÞ < kp occurs if and only if either XðrÞ < kp < XðsÞ or kp > XðsÞ, and these latter two events are clearly mutually exclusive. Therefore, for all r < s, PðXðrÞ < kp Þ ¼ PðXðrÞ < kp < XðsÞ Þ þ Pðkp > XðsÞ Þ
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
159
or, equivalently, PðXðrÞ < kp < XðsÞ Þ ¼ PðXðrÞ < kp Þ PðXðsÞ < kp Þ
ð2:1Þ
Since we assumed that FX is a strictly increasing function, XðrÞ < kp
if and only if FX ðXðrÞ Þ < FX ðkp Þ ¼ p
But when FX is continuous, the PIT implies that the probability distribution of the random variable FX ðXðrÞ Þ is the same as that of UðrÞ , the rthorder statistic from the uniform distribution over the interval (0,1). Further, since FX ðkp Þ ¼ p by the deﬁnition of kp , we have PðXðrÞ < kp Þ ¼ P½FX ðXðrÞ Þ < p Z p n! ¼ xr1 ð1 xÞnr dx 0 ðr 1Þ!ðn rÞ!
ð2:2Þ
Thus, while the distribution of the rthorder statistic depends on the population distribution FX , the probability in (2.2) does not. A conﬁdenceinterval procedure based on (2.1) is therefore distribution free. In order to ﬁnd the interval estimate of kp , substitution of (2.2) back into (2.1) indicates that r and s should be chosen such that
Z p n 1 r1 n x ð1 xÞnr dx r1 0
Z p n 1 s1 n x ð1 xÞns dx ¼ 1 a ð2:3Þ s 1 0 Clearly, this one equation will not give a unique solution for the two unknowns, r and s, and additional conditions are needed. For example, if we want the narrowest possible interval for a ﬁxed conﬁdence coefﬁcient, r and s should be chosen such that (2.3) is satisﬁed and XðsÞ XðrÞ , or E½XðsÞ XðrÞ , is as small as possible. Alternatively, we could minimize s r. The integrals in (2.2) or (2.3) can be evaluated by integration by parts or by using tables of the incomplete beta function. However, (2.2) can be expressed in another form after integration by parts as follows:
Z p n 1 r1 PðXðrÞ < kp Þ ¼ n x ð1 xÞnr dx r 1 0
p n r Z p n 1 xr nr nr1 r ð1 xÞ þ ¼n x ð1 xÞ dx r 0 0 r r1
160
CHAPTER 5
n
p xrþ1 ð1 xÞnr1 p ð1 pÞ þn r þ 1 0 r r
Z p nr1 þ xrþ1 ð1 xÞnr2 dx rþ1 0
n r n nr ¼ p ð1 pÞ þ prþ1 ð1 pÞnr1 r rþ1
Z p n1 þn xrþ1 ð1 xÞnr2 dx rþ1 0
¼
r
nr
n1
After repeating this integration by parts n r times, the result will be
n n r prþ1 ð1 pÞnr1 þ p ð1 pÞnr þ rþ1 r
Z p
n1 n n1 xn1 ð1 xÞ0 dx þ p ð1 pÞ þ n n1 n1 0
nr X n ¼ prþj ð1 pÞnrj r þ j j¼0 or, after substituting r þ j ¼ i, PðXðrÞ < kp Þ ¼
n
X n i¼r
i
pi ð1 pÞni
ð2:4Þ
In this ﬁnal form, the integral in (2.2) is expressed as the sum of the last n r þ 1 terms of the binomial distribution with parameters n and p. Thus, the probability in (2.1) can be expressed as n
n
X X n i n i PðXðrÞ < kp < XðsÞ Þ ¼ p ð1 pÞni p ð1 pÞni i i i¼r i¼s s1
X n i p ð1 pÞni ¼ i i¼r ¼ Pðr 4 K 4 s 1Þ
ð2:5Þ
where K has a binomial distribution with parameters n and p. This form is probably the easiest to use in choosing r and s such that s r is a minimum for ﬁxed a. Note that from (2.5) it is clear that this probability does not depend on the underlying cdf as long as it is
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
161
continuous. The resulting conﬁdence interval is therefore distribution free. In order to ﬁnd the conﬁdence interval for kp based on twoorder statistics, the righthand side of (2.5) is set equal to 1 a and the search for r and s is begun. Because of the discreteness of the binomial distribution, the exact nominal conﬁdence level frequently cannot be achieved. In such cases, the conﬁdence level requirement can be changed from ‘‘equal to’’ to ‘‘at least equal to’’ 1 a. We usually let g 5 1 a denote the exact conﬁdence level. The result obtained in (2.4) found by integration of (2.2) can also be obtained by arguing as follows. This argument, based on simple counting, is used frequently in the context of various nonparametric procedures where order statistics are involved. Note that for any p, the event XðrÞ < kp occurs if and only if at least r of the n sample values, X1 ; X2 ; . . . ; Xn , are less than kp . Thus PðXðrÞ < kp Þ ¼ Pðexactly r of the n observations are < kp Þ þ Pðexactly ðr þ 1Þ of the n observations are < kp Þ þ þ Pðexactly n of the n observations are < kp Þ In other words, PðXðrÞ < kp Þ ¼
n X
Pðexactly i of the n observations are < kp Þ
i¼r
This is a key observation. Now, the probability that exactly i of the n observations are less than kp can be found as the probability of i successes in n independent Bernoulli trials, since the sample observations are all independent and each observation can be classiﬁed either as a success or a failure, where a success is deﬁned as any observation being less than kp . The probability of a success is PðXi < kp Þ ¼ p. Thus, the required probability is given by the binomial probability with parameters n and p. In other words,
n i Pðexactly i of the n sample values are < kp Þ ¼ p ð1 pÞni i and therefore PðXðrÞ < kp Þ ¼
n
X n i¼r
i
This completes the proof.
pi ð1 pÞni
162
CHAPTER 5
In summary, the ð1 aÞ100% conﬁdence interval for the pth quantile is given by ðXðrÞ ; XðsÞ Þ, where r and s are integers such that 1 4 r < s 4 n and s1
X n i PðXðrÞ < kp < XðsÞ Þ ¼ p ð1 pÞni 5 1 a ð2:6Þ i i¼r
As indicated earlier, without a second condition, the conﬁdence interval endpoints will not be unique. One common approach in this case is to assign the probability a=2 in each (right and left) tail. This yields the socalled ‘‘equaltails’’ interval, where r and s are the largest and smallest integers ð1 4 r < s 4 nÞ respectively such that r1
X a n i p ð1 pÞni 4 i 2 i¼0
and
s1
X a n i p ð1 pÞni 5 1 i 2 i¼0
ð2:7Þ respectively. These equations are easy to use in conjunction with Table C of the Appendix, where cumulative binomial probabilities are given. The exact conﬁdence level is found from Table C as s1
X n i¼r
¼
i
pi ð1 pÞni
s1
r1
X X n i n i p ð1 pÞni p ð1 pÞni ¼ g i i i¼0 i¼0
ð2:8Þ
If the sample size is larger than 20 and therefore beyond the range of Table C, we can use the normal approximation to the binomial distribution with a continuity correction. The solutions are pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ npð1 pÞ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ s ¼ np þ 0:5 þ za=2 npð1 pÞ
r ¼ np þ 0:5 za=2 and
ð2:9Þ
where za=2 satisﬁes Fðza=2 Þ ¼ 1 a=2, as deﬁned in Chapter 3. We round the result in (2.9) down to the nearest integer for r and round up for s in order to be conservative (or to make the conﬁdence level at least 1 a). Suppose n ¼ 10; p ¼ 0:35, and 1 a ¼ 0:95. Using (2.7) with Table C shows that r 1 ¼ 0 and s 1 ¼ 7, making r ¼ 1 and
Example 2.1
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
163
s ¼ 8. The conﬁdence interval for the 0.35th quantile is ðXð1Þ ; Xð8Þ Þ with exact conﬁdence level from (2.8) equal to 0:9952 0:0135 ¼ 0:9817. The normal approximation gives r ¼ 1 and s ¼ 7 with approximate conﬁdence level 0.95. Now suppose that n ¼ 10; p ¼ 0:10, and 1 a ¼ 0:95. Table C shows that s 1 ¼ 3 and no value of r 1 satisﬁes the lefthand condition of (2.7) so we take the smallest possible value r 1 ¼ 0. The conﬁdence interval for the 0.10th quantile is then ðXð1Þ ; Xð4Þ Þ with exact conﬁdence 0:9872 0 ¼ 0:9872. Another possibility is to ﬁnd those values of r and s such that s r is a minimum. This requires a trialanderror solution in making (2.8) at least 1 a. In the two situations described in Example 2.1, this approach yields the same values of r and s as the equaltails approach. But if n ¼ 11; p ¼ 0:25 and 1 a ¼ 0:95, (2.7) gives r ¼ 0 and s ¼ 7 with exact conﬁdence coefﬁcient 0.9924 from (2.8). The values of r and s that make s r as small as possible and make (2.8) at least 0.95 are r ¼ 0 and s ¼ 6, with exact conﬁdence coefﬁcient 0.9657. The reader can verify these results.
5.3 HYPOTHESIS TESTING FOR A POPULATION QUANTILE
In a hypothesis testing type of inference concerned with quantiles, a distributionfree procedure is also possible. Given the order statistics Xð1Þ < Xð2Þ < < XðnÞ from any unspeciﬁed but continuous distribution FX , a null hypothesis concerning the value of the pth quantile is written H0 : kp ¼ k0p where k0p and p are both speciﬁed numbers. Under H0, since k0p is the pth quantile of FX , we have, by deﬁnition, PðX 4 k0p Þ ¼ p and therefore we expect about np of the sample observations to be smaller than k0p if H0 is true. If the actual number of sample observations smaller than k0p is considerably smaller than np, the data suggest that the true pth quantile is larger than k0p or there is evidence against H0 in favor of the onesided uppertailed alternative H1 : kp > k0p This implies it is reasonable to reject H0 in favor of H1 if at most r 1 sample observations are smaller than k0p , for some r. Now if at most r 1 sample observations are smaller than k0p , then it must be true
164
CHAPTER 5
that the rthorder statistic XðrÞ in the sample satisﬁes XðrÞ > k0p . Therefore an appropriate rejection region R is XðrÞ 2 R
for XðrÞ > k0p
ð3:1Þ
For a speciﬁed signiﬁcance level a, the integer r should be chosen such that PðXðrÞ > k0p j H0 Þ ¼ 1 PðXðrÞ 4 k0p j H0 Þ 4 a or, using (2.4), r is the largest integer such that n
r1
X X n i n i 1 p ð1 pÞni ¼ p ð1 pÞni 4 a i i i¼r
ð3:2Þ
i¼0
We now express the rejection region in another form in order to be consistent with our later presentation in Section 5.4 for the sign test. Note that XðrÞ > k0p if and only if at most r 1 of the observations are less than k0p , so that at least n ðr 1Þ ¼ n r þ 1 of the observations are greater than k0p . Deﬁne the random variable K as the total number of plus signs among the n differences XðiÞ k0p (the number of positive differences). Then the rejection region in (3.1) can be equivalent stated as K 2R
for K 5 n r þ 1
The differences Xi k0p ; i ¼ 1; 2; . . . ; n, are independent random variables, each having either a plus or a minus sign, and the probability of a plus sign under H0 is PðXi k0p > 0Þ ¼ PðXi > k0p Þ ¼ 1 p Hence, P since K is the number of plus signs, we can write K ¼ ni¼1 IðXi > k0p Þ where IðAÞ ¼ 1 when the event A occurs and is 0 otherwise. From the preceding discussion, the indicator variables IðXi > k0p Þ; i ¼ 1; 2; . . . ; n, are independent Bernoulli random variables with probability of success 1 p under H0. Thus under H0, the distribution of K is binomial with parameters n and 1 p and so r must be chosen to satisfy
n X n ð3:3Þ PðK 5 n r þ 1jH0 Þ ¼ ð1 pÞi pni 4 a i i¼nrþ1
which can be shown to agree with the statement in (3.2), by a change of summation index from i to n i. The advantage of using (3.2) is that
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
165
cumulative binomial probabilities are directly involved and these are given in Table C. On the other hand, if many more than np observations are smaller than k0p , there is support against H0 in favor of the onesided lowertailed alternative H1 : kp < k0p . Then we should reject H0 if the number of sample observations smaller than k0p is at least, say s. This leads to the rejection region XðsÞ 2 R
for XðsÞ < k0p
but this is equivalent to saying that the number of observations larger than k0p must be at most n s. Thus, based on the statistic K, deﬁned before as the number of positive differences, the appropriate rejection region for the onesided lowertailed alternative H1 : kp < k0p is K 2R
for K 4 n s
where s is the largest integer such that ns
X n PðK 4 n sjH0 Þ ¼ ð1 pÞi p ni 4 a i
ð3:4Þ
i¼0
For the twosided alternative H1 : kp 6¼ k0p , the rejection region consists of the union of the two pieces speciﬁed above, K 2R
for K 4 n s or K 5 n r þ 1
ð3:5Þ
where r and s are integers such that each of (3.2) and (3.4) is less than or equal to a=2. Note that Table C can be used to ﬁnd the exact critical values for n 4 20, where y ¼ p in (3.2) and y ¼ 1 p in (3.4). For example sizes larger than 20 the normal approximation to the binomial distribution with a continuity correction can be used. The rejection region for H1 : kp > k0p is pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ K 5 0:5 þ nð1 pÞ þ za npð1 pÞ For H1 : kp < k0p, the rejection region is pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ K 4 0:5 þ nð1 pÞ za npð1 pÞ The rejection region for H1 : kp 6¼ k0p is the combination of these two with za replaced by za=2. Note that in all these formulas the standard normal deviate, say zb, is such that the area to the right is b; in other words, zb is the 100ð1 bÞth percentile [or the ð1 bÞth quantile] of the standard normal distribution.
166
CHAPTER 5
Table 3.1 (p. 167) summarizes the appropriate rejection regions for the quantile test and the corresponding P values, both exact and approximate, where K0 is the observed value of the statistic K, the number of positive differences. Example 3.1 The Educational Testing Service reports that the 75th percentile for scores on the quantitative portion of the Graduate Record Examination (GRE) is 693 in a certain year. A random sample of 15 ﬁrstyear graduate students majoring in statistics report their GRE quantitative scores as 690, 750, 680, 700, 660, 710, 720, 730, 650, 670, 740, 730, 660, 750, and 690. Are the scores of students majoring in statistics consistent with the 75th percentile value for this year?
Solution The question in this example can be answered either by a hypothesis testing or a conﬁdence interval approach. We illustrate both approaches at the 0.05 level. Here we are interested in the 0.75th quantile (the third quartile) so that p ¼ 0:75, and the hypothesized value of the 0.75th quantile, k00:75 , is 693. Thus, the null hypothesis H0 : k0:75 ¼ 693 is to be tested against a twosided alternative H1 : k0:75 6¼ 693. The value of the test statistic is K ¼ 8, since there are eight positive differences among Xi 693, and the twosided rejection region is K 2 R for K 4 n s or K 5 n r þ 1, where r and s are the largest integers that satisfy (3.2) and (3.4) with a=2 ¼ 0:025. For n ¼ 15; p ¼ 0:75, Table C shows that 0.0173 is the largest lefttail probability that does not exceed 0.025, so r 1 ¼ 7 and hence r ¼ 8; similarly, 0.0134 is the largest lefttail probability that does not exceed 0.025 for n ¼ 15 and 1 p ¼ 0:25 (note the change in the success probability) so that n s ¼ 0 and s ¼ 15. The twosided critical region then is K 4 0 or K 5 8, and the exact signiﬁcance level for this distributionfree test is (0.0134 þ 0.0173) = 0.0307. Since the observed K ¼ 8 falls in this rejection region, there is evidence that for this year, the scores for the graduate majors in statistics are not consistent with the reported 75th percentile for all students in this year. In order to ﬁnd the P value, note that the alternative is twosided and so we need to ﬁnd the two onetailed probabilities ﬁrst. Using Table C with n ¼ 15 and y ¼ 0:25 we ﬁnd PðK 4 8jH0 Þ ¼ 0:9958 and PðK 5 8jH0 Þ ¼ 1 0:9827 ¼ 0:0173. Taking the smaller of these two values and multiplying by 2, the required P value is 0.0346, which also suggests rejecting the null hypothesis. In order to ﬁnd a 95% conﬁdence interval for k0:75, we use (2.7). For the lower index r, the inequality on the left applies. From Table C with n ¼ 15 and y ¼ 0.75, the largest value of x such that
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
167
Table 3.1 Hypothesis testing guide for quantiles Alternative
Rejection region Exact
kp > k0p
XðrÞ >
P value Exact
k0p
PU ¼
or
n X
n
! ð1 pÞk pnk
k
k¼KO
K 5 n r þ 1; r from ð3:2Þ
kp > k0p
Approximate
Approximate
K 5 0:5 þ nð1 pÞ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ þ za nð1 pÞp
PU ¼1F
Exact
Exact
XðsÞ
k0p
or XðsÞ
M0 Þ ¼ PðX < M0 Þ ¼ 0:50 Recalling the arguments used in developing a distributionfree test for an arbitrary quantile, we note that if the sample data are consistent with the hypothesized median value, on the average half of the sample observations will lie above M0 and half below. Thus the number of observations larger than M0 , denoted by K, can be used to test the validity of the null hypothesis. Also, when the sample observations are dichotomized in this way, they constitute a set of n independent random variables from the Bernoulli population with parameter y ¼ PðX > M0 Þ, regardless of the population FX . The sampling distribution of the random variable K then is the binomial probability distribution with parameters N and y, and y equals 0.5 if the null hypothesis is true. Since K is actually the number of plus signs among the N differences Xi M0 ; i ¼ 1; 2; . . . ; N; the nonparametric test based on K is called the sign test. The rejection region for the uppertailed alternative H1 : M > M0
or
y ¼ PðX > M0 Þ > PðX < M0 Þ
is K2R
for K 5 ka
where ka is chosen to be the smallest integer which satisﬁes
N X N PðK 5 ka j H0 Þ ¼ ð0:5ÞN 4 a i i¼ka
ð4:1Þ
Any table of the binomial distribution, like Table C of the Appendix, can be used with y ¼ 0:5 to ﬁnd the particular value of ka for the given N and a, but Table G of the Appendix is easier to use because it gives probabilities in both tails. Similarly, for a onesided test with the lowertailed alternative H1 : M < M0
or
y ¼ PðX > M0 Þ < PðX < M0 Þ
the rejection region for an alevel test is K2R
for K 4 k0a
170
CHAPTER 5
where k0a is the largest integer satisfying
k0a X N ð0:5ÞN 4 a i i¼0
ð4:2Þ
If the alternative is twosided, H1 : M 6¼ M0
or
y ¼ PðX > M0 Þ 6¼ PðX < M0 Þ
the rejection region is K 5 ka=2 or K 4 k0a=2, where ka=2 and k0a=2 are respectively, the smallest and the largest integers such that
N X N a ð0:5ÞN 4 2 i i¼k a=2
and
k0a=2
X N i¼0
i
ð0:5ÞN 4
a 2
ð4:3Þ
Obviously, we have the relation ka=2 ¼ N k0a=2 . The sign test statistics with these rejection regions are consistent against the respective one and twosided alternatives. This is easy to show by applying the criterion of consistency given in Chapter 1. Since EðK=NÞ ¼ y and varðK=NÞ ¼ yð1 yÞ=N ! 1 as N ! 1; K provides a consistent test statistic. P VALUE
The P value expressions for the sign test can be obtained as in the case of a general quantile test with p ¼ 0:5. The reader is referred to Table 3.1, with n replaced by N throughout. For example, if the alternative is uppertailed, H1 : M > M0 , and KO is the observed value of the sign statistic, the P value for the sign test is given by the binomial probability in the uppertail
N X N ð0:5ÞN i i¼KO
This value is easily read as a righttail probability from Table G for the given N. NORMAL APPROXIMATIONS
We could easily generate tables to apply the exact sign test for any sample size N. However, we know that the normal approximation to the binomial is especially good when y ¼ 0:50. Therefore, for moderate values of N (say at least 12), the normal approximation to the binomial can be used to determine the rejection regions. Since this is a continuous
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
171
approximation to a discrete distribution, a continuity correction of 0.5 may be incorporated in the calculations. For example, for the alternative H1 : M > M0 ; H0 is rejected for K 5 ka, where ka satisﬁes pﬃﬃﬃﬃﬃ ð4:4Þ ka ¼ 0:5N þ 0:5 þ 0:5 N za Similarly, the approximate P value is
KO 0:5 0:5N pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1F 0:25N
ð4:5Þ
ZERO DIFFERENCES
A zero difference arises whenever Xi ¼ M0 for at least one i. Theoretically, zero differences do not cause a problem because the population was assumed to be continuous in the vicinity of the median. In reality, of course, zero differences can and do occur, either because the assumption of continuity is in error or because of imprecise measurements. Many zeros can be avoided by taking measurements to a larger number of signiﬁcant ﬁgures. The most common treatment of zeros is simply to ignore them and reduce N accordingly. The inferences are then conditional on the observed number of nonzero differences. An alternative approach is to treat half of the zeros as plus and half as minus. Another possibility is to assign to all that sign which is least conducive to rejection of H0 ; this is a strictly conservative approach. Finally, we could let chance determine the signs of the zeros by, say, ﬂipping a balanced coin. These procedures are compared in Putter (1955) and Emerson and Simon (1979). A complete discussion, including more details on P values, is given in Pratt and Gibbons (1981). Randles (2001) proposed a more conservative method of handling zeros. POWER FUNCTION
In order to calculate the power of any test, the distribution of the test statistic under the alternative hypothesis should be available in a reasonably tractable form. In contrast to most nonparametric tests, the power function of the quantile tests is simple to determine since, in general, the random variable K follows the binomial probability distribution with parameters N and y, where, for the pth quantile, y ¼ P(Xi > kp). For the sign test the quantile of interest is the median and y ¼ P(Xi > M0). For illustration, we will only consider the power of the sign test against the onesided uppertailed alternative H1: M > M0.
172
CHAPTER 5
The power of the test is a function of the unknown parameter y, and the power curve or the power function is a graph of power versus various values of y, under the alternative. By deﬁnition, the power of the sign test against the alternative H1 is the probability PwðyÞ ¼ PðK 5 ka jH1 Þ Under H1, the distribution of K is binomial with parameters N and y ¼ P(Xi > M0 j H 1) so the expression for power can be written as
N X N i y ð1 yÞNi PwðyÞ ¼ i i¼ka
where ka is the smallest integer such that
N X N ð0:5ÞN 4 a i i¼ka
Thus, in order to evaluate the power function for the sign test, we ﬁrst need to ﬁnd the critical value ka for a given level a, say 0.05. Then we need to calculate the probability y ¼ PðXi > M0 jH1 Þ. If the power function is desired for a more parametric type of situation where the population distribution is fully speciﬁed then y can be calculated. Such a power function would be desirable for comparisons between the sign test and some parametric test for location. As an example, we calculate the power of the sign test of H0 : M ¼ 28 versus H1 : M > 28 for N ¼ 16 at a signiﬁcance level 0.05, under the assumption that the population is normally distributed with standard deviation 1 and the median is M ¼ 29.04. Table G shows that the rejection region at a ¼ 0.05 is K 5 12 so that ka ¼ 12 and the exact size of this sign test is 0.0384. Now, under the assumptions given, we can evaluate the underlying probability of a success y as y ¼ PðX > 28jH1 Þ
X 29:04 28 29:04 > ¼P 1 1 ¼ PðZ > 1:04Þ ¼ 1 Fð1:04Þ ¼ 0:8505 ¼ 0:85; say Note that the value of y is larger than 0.5, which is in the legitimate region of the alternative H1. Thus,
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
173
16 X 16 ð0:85Þi ð0:15Þ16i i i¼12
11 X 16 ¼1 ð0:85Þi ð0:15Þ16i ¼ 0:9209 i
Pwð0:85Þ ¼
i¼0
This would be directly comparable with the normal theory test of H0 : m ¼ 28 versus H1 : m ¼ 29:04, say with s ¼ 1, since the mean and median coincide for the normal distributions. The rejection region for pﬃﬃﬃﬃﬃﬃ > 28 þ z0:05 = 16 ¼ 28:41, and this parametric test with a ¼ 0.05 is X the power is > 28:41jX normalð29:04; 1Þ Pwð29:04Þ ¼ P½X ! 29:04 28 29:04 X pﬃﬃﬃﬃﬃﬃ > ¼P 0:25 1= 16 ¼ PðZ > 2:52Þ ¼ 0:9941 Thus, the power of the normal theory test is larger than the power of the sign test, which is of course expected, since the normal theory test is known to be the best test when the population is normal. The problem with a direct comparison of the exact sign test with the normal theory test is that the powers of any two tests are comparable only when their sizes or signiﬁcance levels are the same or nearly the same. In our case, the sign test has an exact size of 0.0384 whereas the normal theory test has exact size 0.05. This increase in the size of the test inherently biases the power comparison in favor of the normal theory test. In order to ensure a more fair comparison, we might make the exact size of the sign test equal to 0.05 by using a randomized version of the sign test (as explained in Chapter 1). Alternatively, we might ﬁnd the normal theory test of size a ¼ 0.0384 and compare the power of that test with the signtestppower of 0.9209. In this case, the rejection ﬃﬃﬃﬃﬃﬃ > 28 þ z0:0384 = 16 ¼ 28:44 and the power is Pwð29:04Þ ¼ region is X 0:9918. This is still larger than the power of the sign test at a ¼ 0.0384 but two comments are in order. First and foremost, we have to assume that the underlying distribution is normal to justify using the normal theory test. No such assumption is necessary for the sign test. If the sample size N is larger, the calculated power is an approximation to the power of the normal theory test, by the central limit theorem. However, for the sign test, the size and the power calculations can be
174
CHAPTER 5
made exactly for all sample sizes and no distribution assumptions are needed other than continuity. Further, the normal theory test is affected by the assumption about the population standard deviation s, whereas the sign test calculations do not demand such knowledge. In order to obtain the power function, we can calculate the power at several values of M in the alternative region (M > 28) and then plot the power versus the values of the median. This is easier under the normal approximation and is shown below. Since under the alternative hypothesis H1 , the sign test statistic K has a binomial distribution with parameters N and y ¼ PðX > M0 jH1 Þ, and the binomial distribution can be well approximated by the normal distribution, we can derive expressions to approximate the power of the sign test based on the normal approximation. These formulas are useful in practice for larger sample sizes and=or y values for which exact tables are unavailable, although this appears to be much less of a problem with currently available software. We consider the onesided uppertailed case H1 : M1 > M0 for illustration; approximate power expressions for the other cases are left as exercises for the reader. The power for this alternative can be evaluated using the normal approximation with a continuity correction as PwðM1 Þ ¼ PðK 5 ka jH1 : M1 > M0 Þ ! ka Ny 0:5 ¼ P Z > pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Nyð1 yÞ ! ka Ny 0:5 ¼ 1 F pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Nyð1 yÞ
ð4:6Þ
where y ¼ PðX > M1 jM1 > M0 Þ and ka is such that a ¼ PðK 5 ka jH0 Þ
! ka N=2 0:5 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼P Z> N=4
2ka N 1 pﬃﬃﬃﬃﬃ ¼1F N
ð4:7Þ
pﬃﬃﬃﬃﬃ The equality in (4.7) implies that ka ¼ ½N þ 1 þ N F1 ð1 aÞ=2. Substituting this back into (4.6) and simplifying gives
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
175
Table 4.1 Normal approximation to power of the sign test for the median when N ¼ 16 0.5 0.0461
y Power
0.55 0.0918
(
0.6 0.1629
0.65 0.2639
0.70 0.3960
0.75 0.5546
0.80 0.7255
0.85 0.8802
) pﬃﬃﬃﬃﬃ 1 N F ð1 aÞ Ny 0:5 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ PwðM1 Þ ¼ P Z > Nyð1 yÞ " pﬃﬃﬃﬃﬃ # Nð0:5 yÞ þ 0:5 N za pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼1F Nyð1 yÞ
0.90 0.9773
0:5½N þ 1 þ
ð4:8Þ
where za ¼ F1 ð1 aÞ is the (1 a)th quantile of the standard normal distribution. For example, z0:05 ¼ 1:645 and z0:85 ¼ 1:04. Note that za ¼ z1a . The approximate power values are calculated and shown in Table 4.1 for N ¼ 16 and a ¼ 0.05. A graph of the power function is shown in Figure 4.1.
Fig. 4.1
Normal approximation to the power function of the sign test for the median.
176
CHAPTER 5
It should be noted that the power of the sign test depends on the alternative hypothesis through the probability y ¼ PðX > M0 jH1 : M1 > M0 Þ. Under H0, we have y ¼ 0.5, whereas y > 0.5 under H1, since if M1 > M0, PðX > M0 jH1 : M ¼ M1 > M0 Þ > PðX > M1 jH1 : M ¼ M1 > M0 Þ and therefore y ¼ PðX > M0 jH1 Þ > PðX > M1 jH1 Þ ¼ 0:5: Thus, the power of the sign test depends on the ‘‘distance’’ between the values of y under the null hypothesis (0.5) and under the alternative and speciﬁcation of a value of y > 0.5 is necessary for the power calculation. Noether (1987) suggested choosing a value of y based on past information or a pilot study, or based on an ‘‘oddsratio.’’ In the normal theory test (such as the t test), however, the power depends directly on the ‘‘distance’’ M1 M0, the values of the median under the null hypothesis and under the alternative. Note also that the approximate power is exactly equal to the nominal size of the test when y ¼ 0.5 (i.e., the null hypothesis is true). Expressions for approximate power against other alternatives are left as exercises for the reader. SIMULATED POWER
The power function for the sign test is easily found, particularly when the normal approximation is used for calculations. For many other nonparametric tests, however, the power function can be quite difﬁcult to calculate. In such cases, computer simulations can be used to estimate the power. Here we use a MINITAB Macro program to simulate the power of the sign test when the underlying distribution is normal with mean ¼ median ¼ M and variance s2. The null hypothesis is H0 : M ¼ M0 and the alternative is H0 : M ¼ M1 > M0 . First we need to ﬁnd the relationship between M0 , M1 and y. Recall that y ¼ PðXi > M0 jH1 Þ, so assuming X is normally distributed with variance s2, we get
X M1 M0 M1 > y¼P s s
M1 M0 ¼F s This gives ðM1 M0 Þ=s ¼ F1 ðyÞ. Now let us assume arbitrarily that M0 ¼ 0:5 and s2 ¼ 1. Then if y ¼ 0.55, say, F1(0.55) ¼ 0.1256 and
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
177
M1 ¼ 0:6256. Next we need to specify a sample size and probability of a type I error for the test. We arbitrarily choose N ¼ 13 and a ¼ 0.05. From Table G, 0.0461 is closest to 0.05 and this gives a test with rejection region K 5 10 for exact size 0.0461. First we generate 1000 random samples, each of size 13, from a normal distribution with M ¼ 0.6256 and compute the value of the sign test statistic for each sample generated, i.e., the number of Xi in that sample for which Xi M0 ¼ Xi 0:5 > 0. Then we note whether or not this count value is in the rejection region K 5 10. Then we count the number of times we found the count value in the rejection region among the 1000 random samples generated. This count divided by 1000 is the simulated power at the point y ¼ 0.55 (which corresponds to M1 ¼ 0.6256) in the case N ¼ 13, M0 ¼ 0.50, s ¼ 1, and a ¼ 0.0461. Using a MINITAB Macro program, this value was found as 0.10. Note that from Table 4.1, the normal approximation to the power in this case is 0.0918. The program code is shown below for this situation:
178
CHAPTER 5
To run such a program, type the statements into a plain text ﬁle, using a text editor (not a word processor) and save it with a .mac extension to a ﬂoppy disk, say, in drive a. Suppose the name of the ﬁle is sign.mac. Then in MINITAB, go to edit, then to command line editor and then type % a:nsign.mac and click on submit. The program will print the simulated power values as well as a power curve. Output from such a simulation is shown later in Section 5.7 as Figure 7.1. SAMPLE SIZE DETERMINATION
In order to make an inference regarding the population median using the sign test, we need to have a random sample of observations. If we are allowed to choose the sample size, we might want to determine the value of N such that the test has size a and power 1b, given the null and the alternative hypotheses and other necessary assumptions. For example, for the sign test against the onesided uppertailed alternative H1 : M > M0 , we need to ﬁnd N such that
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
N X N ð0:5ÞN 4 a i
and
i¼ka
179
N X N i y ð1 yÞNi 5 1 b i
i¼ka
where a, 1b and y ¼ PðX > M0 jH1 Þ are all speciﬁed. Note also that the size and the power requirements have been modiﬁed to state ‘‘at most’’ a and ‘‘at least’’ 1b, in order to reﬂect the discreteness of the binomial distribution. Tables are available to aid in solving these equations; see for example, Cohen (1972). We illustrate the process using the normal approximation to the power because the necessary equations are much easier to solve. Under the normal approximation, the power of a size a sign test with H1 : M > in ﬃ$(4.8). Thus we require that 1 pM ﬃﬃﬃﬃﬃ 0 ispgiven ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ F Nð0:5 yÞ þ 0:5 N za = Nyð1 yÞ ¼ 1 b or, solving for N, we get "pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ #2 "pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ #2 yð1 yÞF1 ðbÞ 0:5za yð1 yÞzb þ 0:5za N¼ ¼ 0:5 y 0:5 y
ð4:9Þ
which should be rounded up to the next integer. The approximate sample size formula for the onesided lowertailed alternative H1 : M < M0 is the same except that here y ¼ PðX > M0 jH1 Þ < 0:5. A sample size formula for the twosided alternative is the same as (4.9) with a replaced by a=2. The derivation is left as an exercise for the reader. For example, suppose y ¼ 0.2. If we set ap ¼ﬃﬃﬃﬃ 0.05 and b ¼ 0.90, then ﬃ za ¼ 1:645 and zb ¼ 1:282. Then (4.9) yields N ¼ 4:45 and N ¼ 19.8. Thus we need at least 20 observations to meet the speciﬁcations. CONFIDENCE INTERVAL FOR THE MEDIAN
A twosided conﬁdenceinterval estimate for an unknown population median can be obtained from the acceptance region of the sign test against the twosided alternative. The acceptance region for a twosided test of H0 : M ¼ M0 , using (4.3), is k0a=2 þ 1 4 K 4 ka=2 1
ð4:10Þ
where K is the number of positive differences among Xi M, i ¼ 1; 2; . . . ; N and k0a=2 and ka=2 are integers such that Pðk0a=2 þ 1 4 K 4 ka=2 1Þ 5 1 a As we found for the quantile test, the equaltailed conﬁdence interval endpoints for the unknown population median are the order statistics
180
CHAPTER 5
X(r) and X(s) where r and s are the largest and smallest integers respectively, such that
r1 N X X a a N N and ð4:11Þ ð0:5ÞN 4 ð0:5ÞN 4 i i 2 2 i¼0 i¼s We note that r 1 and s are easily found from Table G in the columns labeled Left tail and Right tail, respectively. For larger sample sizes, pﬃﬃﬃﬃﬃ r ¼ k0a=2 þ 1 ¼ 0:5 þ 0:5N 0:5 N za=2 ð4:12Þ and
pﬃﬃﬃﬃﬃ s ¼ ka=2 ¼ 0:5 þ 0:5N þ 0:5 N za=2
ð4:13Þ
We round down for r and round up for s for a conservative solution. In order to contrast the exact and approximate conﬁdence interval endpoints suppose N ¼ 15 and 1a ¼ g ¼ 0.95. Then, using Table G with y ¼ 0.5, r ¼ 4 for signiﬁcance level 0.0176 so that the exact endpoints of the 95% conﬁdence interval are X(4) and X(12) with exact conﬁdence level p g ﬃﬃﬃﬃﬃ ¼ﬃ0.9648. For the approximate conﬁdence interval r ¼ 0.5 þ 7.50.5 15(1.65) ¼ 4.21 which we round down. So the conﬁdence interval based on the normal approximation is also given by (X(4), X(12)) with exact conﬁdence level g ¼ 0.9648. PROBLEM OF ZEROS
Zeros do not present a problem in ﬁnding a conﬁdence interval estimate of the median using this procedure. As a result, the sample size N is not reduced for zeros and zeros are counted as many times as they occur in determining conﬁdenceinterval endpoints. If the real interest is in hypothesis testing and there are many zeros, the power of the test will be greater if the test is carried out using a conﬁdenceinterval approach. PAIREDSAMPLE PROCEDURES
The onesample signtest procedures for hypothesis testing and conﬁdence interval estimation of M are equally applicable to pairedsample data. For a random sample of N pairs (X1,Y1), . . . ,( XN,YN), the N differences Di ¼ Xi Yi are formed. If the population of differences is assumed continuous at its median MD so that PðD ¼ MD Þ ¼ 0, and y is deﬁned as y ¼ PðD > MD Þ, the same procedures are clearly valid here with Xi replaced everywhere by Di.
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
181
It should be emphasized that this is a test for the median difference MD , which is not necessarily the same as the difference of the two medians MX and MY. The following simple example will serve to illustrate this often misunderstood fact. Let X and Y have the joint distribution 8 for y 1 4 x 4 y; 1 4 y 4 1 < 1=2 fX;Y ðx; yÞ ¼ or y þ 1 4 x 4 1; 1 4 y 4 0 : 0 otherwise Then X and Y are uniformly distributed over the shaded region in Figure 4.2. It can be seen that the marginal distributions of X and Y are identical, both being uniform on the interval (1,1), so that MX ¼ MY ¼ 0. It is clear that where X and Y have opposite signs, in quadrants II and IV, PðX < YÞ ¼ PðX > YÞ while in quadrants I and III, X < Y always. For all pairs, then, we have PðX < YÞ ¼ 3=4, which implies that the median of the population of differences is smaller than zero. It will be left as an exercise for the reader to show that the cdf of the difference random variable D ¼ X Y is 8 0 for d 4 1 > > > > > for 1 < d 4 0 < ðd þ 1Þðd þ 3Þ=4 ð4:14Þ FD ðdÞ ¼ 3=4 for 0 < d 4 1 > > > > dð4 dÞ=4 for 1 < d 4 2 > : 1 for d > 2
Fig. 4.2 Region of integration is the shaded area.
182
CHAPTER 5
The median difference is that value MD, of the distribution of D, such that FD ðMp D Þﬃﬃﬃ ¼ 1=2. The reader can verify that this yields MD ¼ 2 þ 3. In general, then, it is not true that MD ¼ MX MY . On the other hand, it is true that a mean of differences equals the difference of means. Since the mean and median coincide for symmetric distributions, if the X and Y populations are both symmetric and MX ¼ MY , and if the difference population is also symmetric,1 then MD ¼ MX MY and MX ¼ MY is a necessary and sufﬁcient condition for MD ¼ 0. Note that for the case where X and Y are each normally distributed, the difference of their medians (or means) is equal to the median (or mean) of their difference X Y, since X Y is also normally distributed with median (or mean) equal to the difference of the respective medians (or means). Earlier discussions of power and sample size also apply to the pairedsample data problems. APPLICATIONS
We note that the sign test is a special case of the quantile test with p ¼ 0.5, since the quantile speciﬁed is the population median. This test is easier to apply than the general quantile test because the binomial distribution for y ¼ 0.5 is symmetric for any N. We write the null hypothesis here as H0 : M ¼ M0 . The appropriate rejection regions in terms of K, the number of plus signs among X1 M0 ; X2 M0 ; . . . ; XN M0 , and corresponding exact P values, are summarized as follows: Alternative
Rejection region
M > M0
K 5 ka
M < M0
K 4 k0a
M 6¼ M0
K 4 k0a=2 or K 5 ka=2
Exact P value
N ð0:5ÞN i i¼KO
KO P N ð0:5ÞN i i¼0 N P
2(smaller of the onetailed P values)
Table C with y ¼ 0.5 and n (representing N) can be used to determine the critical values. Table G is simpler to use because it gives both lefttail and righttail binomial probabilities for N 4 20 when y ¼ 0:5.
1 The difference population is symmetric if X and Y are symmetric and independent or if fX;Y ðx; yÞ ¼ fX;Y ðx; yÞ.
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
183
For large sample sizes, the appropriate rejection regions and the P values, based on the normal approximation to the binomial distribution with a continuity correction, are as follows: Alternative
Rejection region
M > M0
pﬃﬃﬃﬃﬃ K 5 0:5N þ 0:5 þ 0:5za N
M < M0
pﬃﬃﬃﬃﬃ K 5 0:5N 0:5 0:5za N
M 6¼ M0
Both above with za=2
Approximate P value
KO 0:5N 0:5 pﬃﬃﬃﬃﬃ 1F 0:5 N
KO 0:5N þ 0:5 pﬃﬃﬃﬃﬃ F 0:5 N 2(smaller of the onetailed P values)
If any zeros are present, we will ignore them and reduce N accordingly. As we have seen, a prespeciﬁed signiﬁcance level a often cannot be achieved with nonparametric statistical inference because most of the applicable sampling distributions are discrete. This problem is avoided if we determine the P value of a test result and use that to make our decision. For a twosided alternative, the common procedure is to deﬁne the P value as twice the smaller of the two onesided P values, as described in the case for general quantiles. The ‘‘doubling’’ is particularly meaningful when the null distribution of the test statistic is symmetric, as is the case here. For example, suppose that we observe four plus signs among N ¼ 12 nonzero sample differences. Table G shows that the lefttail P value is 0.1938; since there is no entry in the righttail column, we know that the righttail Pvalue exceeds 0.5. Thus the twosided P value is 2 times 0.1938, or 0.3876. Another way of looking at this is as follows. Under the null hypothesis the binomial distribution is symmetric about the expected value of K, which here is N(0.5) ¼ 6. Thus, for any value of K less than 6, the uppertail probability will be greater than 0.5 and the lowertail probability less than 0.5. Conversely, for any value of K greater than 6, the uppertail probability is less than 0.5 and the lowertail probability is greater than 0.5. Also, by symmetry, the probability of say 4 or less is the same as the probability of 8 or more. Thus, to calculate the P value for the twosided alternative, the convention is to take the smaller of the two onetailed P values and double it. If instead we used the larger of the P values and doubled that, the ﬁnal P value could possibly be more than 1.0, which is not acceptable. Note also that when the observed value of K is exactly equal to 6, the twosided P value will be taken to be equal to 1.0.
184
CHAPTER 5
In our example, the observed value 4 for N ¼ 12 is less than 6, so the smaller onetailed P value is in the lower tail and is equal to 0.1938 and this leads to a twosided P value of 0.3876 as found earlier. If we have a prespeciﬁed a, and wish to reach a decision, we should reject H0 whenever the P value is less than or equal to a and accept H0 otherwise. The exact distributionfree conﬁdence interval for the median can be found from Table C but is particularly easy to ﬁnd using Table G. The choice of exact conﬁdence levels is limited to 12P, where P is a tail probability in Table G for the appropriate value of N. From (4.10), the lower conﬁdence limit is the ðk0a=2 þ 1Þth ¼ rthorder statistic in the sample, where k0a=2 is the lefttail critical value of the sign test statistic K from Table G, for the given a and N such that the P ﬁgure is less than or equal to a=2. But since the critical values are all of the nonnegative integers, k0a=2 þ 1 is simply the rank of k0a=2 among the entries in Table G for that N. The calculation of this rank will become clearer after we do Example 4.1. For consistency with the results given later for conﬁdence interval endpoints based on other nonparametric test procedures, we note that r is the rank of the lefttail entry in Table G for this N, and we denote this rank by u. Further, by symmetry, we have XðsÞ ¼ XðNrþ1Þ . The conﬁdence interval endpoints are the uth from the smallest and the uth from the largest order statistics, where u is the rank of the lefttail critical value of K from Table G that corresponds to P 4 a=2. The corresponding exact conﬁdence coefﬁcient is then g ¼ 1 2P. For sample sizes outside the range of Table G we have pﬃﬃﬃﬃﬃ u ¼ 0:5 þ 0:5N 0:5 N za=2 ð4:15Þ from (4.4), and we always round the result of (4.15) downward. For example, for a conﬁdence level of 0.95 with N ¼ 15; a=2 ¼ 0:025, the P ﬁgure from Table G closest to 0.025 but not exceeding it is 0.0176. The corresponding lefttail critical value is 3, which has a rank of 4 among the lefttail critical values for this N. Thus u ¼ 4 and the 95% conﬁdence interval for the median is given by the interval ðXð4Þ ; Xð12Þ Þ. The exact conﬁdence level for this distributionfree interval is 1 2P ¼ 1 2ð0:0176Þ ¼ 0:9648. Note that unlike in the case of testing hypotheses, if zeros occur in the data, they are counted as many times as they appear for determination of the conﬁdence interval endpoints. Example 4.1 Suppose that each of 13 randomly chosen female registered voters was asked to indicate if she was going to vote for
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
185
candidate A or candidate B in an upcoming election. The results show that 9 of the subjects preferred A. Is this sufﬁcient evidence to conclude that candidate A is preferred to B by female voters? Solution With this kind of data, the sign test is one of the few statistical tests that is valid and can be applied. Let y be the true probability that candidate A is preferred over candidate B. The null hypothesis is that the two candidates are equally preferred, that is, H0 : y ¼ 0:5 and the onesided uppertailed alternative is that A is preferred over B, that is H1 : y > 0:5. The sign test can be applied here and the value of the test statistic is K ¼ 9. Using Table G with N ¼ 13, the exact P value in the righttail is found to be 0.1338; therefore this is not sufﬁcient evidence to conclude that the female voters prefer A over B, at a commonly used signiﬁcant level such as 0.05. Example 4.2 Some researchers claim that susceptibility to hypnosis can be acquired or improved through training. To investigate this claim six subjects were rated on a scale of 1 to 20 according to their initial susceptibility to hypnosis and then given 4 weeks of training. Each subject was rated again after the training period. In the ratings below, higher numbers represent greater susceptibility to hypnosis. Do these data support the claim? Subject 1 2 3 4 5 6
Before
After
10 16 7 4 7 2
18 19 11 3 5 3
Solution The null hypothesis is H0 : MD ¼ 0 and the appropriate alternative is H1 : MD > 0 where MD is the median of the differences, after training minus before training. The number of positive differences is KO ¼ 4 and the righttail P value for N ¼ 6; KO ¼ 4 from Table G is 0.3438. Hence the data do not support the claim at any level smaller than 0.3438 which implies that 4 is not an extreme value of K under H0 ; rejection of the null hypothesis is not warranted. Also, from Table G, at a ¼ 0:05, the rejection region is K 5 6, with exact size 0.0156. Since the observed value of K equals 4, we again fail to reject H0 . The following computer printouts illustrate the solution to Example 4.2 based on the STATXACT, MINITAB, and SAS packages. The
186
CHAPTER 5
STATXACT solution agrees with ours for the exact onesided P value. Their asymptotic P value (0.2071) is based on the normal approximation without the continuity correction. The MINITAB solution agrees exactly with ours. The SAS solution gives only the twotailed P values. The exact sign test result in 2 times ours; they also give P values based on Student’s t test and the signedrank test discussed later in this chapter.
Now suppose we wanted to know, before the investigation started, how many subjects should be included in the study when we plan to use the sign test for the median difference at a level of signiﬁcance a ¼ 0:05, and we want to detect PðD > 0Þ ¼ 0:6 with a power 0.85. Note that PðD > 0Þ ¼ 0:6 means that the median difference, MD , is greater than 0, the hypothesized value, and thus the test should have an uppertailed alternative. With y ¼ 0:6; z0:05 ¼ 1:645, and z0:15 ¼ 1:0365, Eq. (4.9) gives N ¼ 176:96 which we round up to 177. The MINITAB solution to this example is shown below. It also uses the normal approximation and the result 177 agrees with ours.
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
187
188
CHAPTER 5
The solution also shows N ¼ 222 observations will be required for a twotailed test. The reader can verify this. The solution is labeled ‘‘Test for One Proportion’’ instead of ‘‘Sign Test’’ because it is applicable for a test for a quantile of any order p (as in Section 5.3).
Nine pharmaceutical laboratories cooperated in a study to determine the median effective dose level of a certain drug. Each laboratory carried out experiments and reported its effective dose. For the results 0.41, 0.52, 0.91, 0.45, 1.06, 0.82, 0.78, 0.68, 0.75, estimate the interval of median effective dose with a conﬁdence level 0.95.
Example 4.3
Solution We go to Table G with N ¼ 9 and ﬁnd P ¼ 0:0195 is the largest entry that does not exceed 0.025, and this entry has rank u ¼ 2. Hence the second smallest and second largest (or the 9 2 þ 1 ¼ 8th smallest) order statistics of the sample data, namely Xð2Þ and Xð8Þ , provide the two endpoints as 0:45 < M < 0:91 with exact conﬁdence coefﬁcient 1 2ð0:0195Þ ¼ 0:961. The MINITAB solution
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
189
shown gives the two conﬁdence intervals with the exact conﬁdence coefﬁcient on each side of 0.95, as well as an exact 95% interval, based on an interpolation scheme between the two sets of endpoints, lower and upper, respectively. This latter interval is indicated by NLI on the output. The interpolation scheme is a nonlinear one due to Hettmansperger and Sheather (1986). 5.5 RANKORDER STATISTICS
The other onesample procedure to be covered in this chapter in the Wilcoxon signedrank test. This test is based on a special case of what are called rankorder statistics. The rankorder statistics for a random sample are any set of constants which indicate the order of the observations. The actual magnitude of any observation is used only in determining its relative position in the sample array and is thereafter ignored in any analysis based on rankorder statistics. Thus any statistical procedures based on rankorder statistics depend only on the relative magnitudes of the observations. If the jth element Xj is the ith smallest in the sample, the jth rankorder statistics must be the ith smallest rankorder statistic. Rankorder statistics might then be deﬁned as the set of numbers which results when each original observation is replaced by the value of some orderpreserving function. Suppose we have a random sample of N observations X1 ; X2 ; . . . ; XN . Let the rankorder statistics be denoted by rðX1 Þ; rðX2 Þ; . . . ; rðXN Þ where r is any function such that rðXi Þ 4 rðXj Þ whenever Xi 4 Xj. As with order statistics, rankorder statistics are invariant under monotone transformations, i.e., if rðXi Þ 4 rðXj Þ, then r½FðXi Þ 4 r½FðXj Þ, in addition to F½rðXi Þ 4 F½rðXj Þ, where F is any nondecreasing function. For any set of N different sample observations, the simplest set of numbers to use to indicate relative positions is the ﬁrst N positive integers. In order to eliminate the possibility of confusion and to simplify and unify the theory of rankorder statistics, we shall assume here that unless explicitly stated otherwise, the rankorder statistics are always a permutation of the ﬁrst N integers. The ith rankorder statistic rðXi Þ then is called the rank of the ith observation in the original unordered sample. The value it assumes, rðxi Þ, is the number of observations xj ; j ¼ 1; 2; . . . ; N, such that xj 4 xi . For example, the rank of the ithorder statistic is equal to i, or rðxðiÞ Þ ¼ i. A functional deﬁnition of the rank of any xi in a set of N different observations is provided by rðXi Þ ¼
N X j¼1
Sðxi xj Þ ¼ 1
X j6¼1
Sðxi xj Þ
ð5:1Þ
190
CHAPTER 5
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
where SðuÞ ¼
0 1
if u < 0 if u 5 0
191
ð5:2Þ
The random variable rðXi Þ is discrete and for a random sample from a continuous population it follows the discrete uniform distribution, or P½rðXi Þ ¼ j ¼ 1=N
for j ¼ 1; 2; . . . ; N
Although admittedly the terminology may seen confusing at the outset, a function of the rankorder statistics will be a called a rank statistic. Rank statistics are particularly useful in nonparametric inference since they are usually distribution free. The methods are applicable to a wide variety of hypothesistesting situations depending on the particular function used. The procedures are generally simple and quick to apply. Since rank statistics are functions only of the ranks of the observations, only this information is needed in the sample data. Actual measurements are often difﬁcult, expensive, or even impossible to obtain. When actual measurement are not available for some reason but relative positions can be determined, rankorder statistics make use of all of the information available. However, when the fundamental data consist of variate values and these actual magnitudes are ignored after obtaining the rankorder statistics, we may be concerned about the loss of efﬁciency that may ensue. One approach to a judgment concerning the potential loss of efﬁciency is to determine the correlation between the variate values and their assigned ranks. If the correlation is high, we would feel intuitively more justiﬁed in the replacement of actual values by ranks for the purpose of analysis. The hope is that inference procedures based on ranks alone will lead to conclusions which seldom differ from a corresponding inference based on actual variate values. The ordinary productmoment correlation coefﬁcient between two random variables X and Y is rðX; YÞ ¼
E½ðX mX ÞðY mY Þ EðXYÞ EðXÞEðYÞ ¼ sX sY sX sY
Assume that for a continuous population denoted by a cdf FX ( pdf fX ) we would like to determine the correlation between the random variable X and its rank rðXÞ. Theoretically, a random variable from an inﬁnite population cannot have a rank, since values on a continuous scale cannot be ordered. But an observation Xi , of a random sample of size N from this population, does have a rank rðXi Þ as deﬁned in (5.1).
192
CHAPTER 5
The distribution of Xi is the same as the distribution of X and the rðXi Þ are identically distributed though not independent. Therefore, it is reasonable to deﬁne the population correlation coefﬁcient between ranks and variate values as the correlation between Xi and Yi ¼ rðXi Þ, or r½X; rðXÞ ¼
EðXi Yi Þ EðXi ÞEðYi Þ sX sYi
ð5:3Þ
The marginal distribution of Yi for any i is the discrete uniform, so that fYi ð jÞ ¼
1 N
for j ¼ 1; 2; . . . ; N
ð5:4Þ
with moments EðYi Þ ¼
EðYi2 Þ ¼
N X j Nþ1 ¼ N 2 j¼1 N 2 X j j¼1
varðYi Þ ¼
N
¼
ð5:5Þ
ðN þ 1Þð2N þ 1Þ 6
ðN þ 1Þð2n þ 1Þ ðN þ 1Þ2 N 2 1 ¼ 6 12 4
ð5:6Þ
The joint pdf of Xi and its rank Yi is fXi ;Yi ðx; jÞ ¼ fXi jYi ¼j ðxj jÞfYi ð jÞ ¼
fXð jÞ ðxÞ N
for j ¼ 1; 2; . . . ; N
where Xð jÞ denotes the jthorder statistic of a random sample of size N from the cdf FX . From this expression we can write EðXi Yi Þ ¼
1 N
Z
N 1X
1 j¼1
jxfXð jÞ ðxÞ dx ¼
N X jEðXð jÞ Þ j¼1
N
ð5:7Þ
Substituting the results (5.5), (5.6), and (5.7) back into (5.3), we obtain r½X; rðXÞ ¼
PN 12 1=2 j¼1 jEðXð jÞ Þ ½NðN þ 1Þ=2EðXÞ N2 1 NsX
ð5:8Þ
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
193
Since the result here is independent of i, our deﬁnition in (5.3) may be considered a true correlation. The same result is obtained if the covariance between X and rðXÞ is deﬁned as the limit as M approaches inﬁnity of the average of the M correlations that can be calculated between sample values and their ranks when M samples of size N are drawn from this population. This method will be left as an exercise for the reader. The expression given in (5.8) can be written in another useful form. If the variate values X are drawn from a continuous population with distribution FX , the following sum can be evaluated: N X
iEðXðiÞ Þ¼
N X
iN! ði1Þ!ðNiÞ! i¼1
i¼1
Z
¼
¼
1 1
N 1 X
ðjþ1ÞN! j!ðNj1Þ! j¼0
þ
1
Z
1
1
1
x Z
¼ NðN 1Þ
x½FX ðxÞj ½1FX ðxÞNj1 fX ðxÞdx
1 1
x½FX ðxÞj ½1FX ðxÞNj1 fX ðxÞdx
½FX ðxÞj1 ½1 FX ðxÞNj1 fX ðxÞ dx
N 1 X j¼0 1
1
Z
xFX ðxÞ
N2
1
1
x½FX ðxÞj ½1FX ðxÞNj1 fX ðxÞdx
N! j!ðNj1Þ! j¼0
j1
j¼1
þN
1
N 1 X
Z N 1 X
1
N! ðj1Þ!ðNj1Þ! j¼1 Z
Z
N 1 X
¼ NðN 1Þ
x½FX ðxÞi1 ½1FX ðxÞNi fX ðxÞdx
N1 j
½FX ðxÞj ½1 FX ðxÞNj1 fX ðxÞ dx
xFX ðxÞfX ðxÞdx þ N
¼ NðN 1ÞE½XFX ðxÞ þ NEðXÞ
Z
1 1
xfX ðxÞ dx ð5:9Þ
194
CHAPTER 5
If this quantity is now substituted in (5.8), the result is
1=2 12 1 N þ1 r½X;rðXÞ¼ EðXÞ ðN 1ÞE½XFX ðXÞþEðXÞ N 2 1 sX 2
1=2 12 1 N 1 ¼ EðXÞ ðN 1ÞE½XF ðXÞ X N 2 1 sX 2 1=2 12ðN 1Þ 1 1 ð5:10Þ ¼ E½XFX ðXÞ EðXÞ N þ1 sX 2 and
pﬃﬃﬃ 2 3 1 lim r½X; rðXÞ ¼ E½XFX ðXÞ EðXÞ N!1 sX 2
ð5:11Þ
Some particular evaluations of (5.11) are given in Stuart (1954).
5.6 TREATMENT OF TIES IN RANK TESTS
In applying tests based on rankorder statistics, we usually assume that the population from which the sample was drawn is continuous. When this assumption is made, the probability of any two observations having identical magnitudes is equal to zero. The set of ranks as deﬁned in (5.1) then will be N different integers. The exact properties of most rank statistics depend on this assumption. Two or more observations with the same magnitude are said to be tied. We may say only that theoretically no problem is presented by tied observations. However, in practice ties can certainly occur, either because the population is actually discrete or because of practical limitations on the precision of measurement. Some of the conventional approaches to dealing with ties in assigning ranks will be discussed generally in this section, so that the problem can be ignored in presenting the theory of some speciﬁc rank tests later. In a set of N observations which are not all different, arrangement in order of magnitude produces a set of r groups of different numbers, the ith different value occurring with frequency ti, where P ti ¼ N. Any group of numbers with ti 5 2 comprises a set of tied observations. The ranks are no longer well deﬁned, Q and for any set of ﬁxed ranks of N untied observations there are ti ! possible assignments of ranks to the entire sample with ties, each assignment leading to its own value for a rank test statistic, although that value may be the same as for some other assignment. If a rank test is to be performed using a sample containing tied observations, we must have
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
195
either a unique method of assigning ranks for ties so that the test statistic can be computed in the usual way or a method of combining the many possible values of the rank test statistic to reach one decision. Several acceptable methods will be discussed brieﬂy. RANDOMIZATION
Q In the method of randomization, one of the ti ! possible assignments of ranks is selected by some random procedure. For example, in the set of observations 3:0; 4:1; 4:1; 5:2; 6:3; 6:3; 6:3; 9 there are 2!ð3!Þ or 12 possible assignments of the integer ranks 1 to 8 which this sample could represent. One of these 12 assignments is selected by a supplementary random experiment and used as the unique assignment of ranks. Using this method, some theoretical properties of the rank statistic are preserved, since each assignment occurs with equal probability. In particular, the null probability distribution of the rankorder statistic, and therefore of the rank statistic, is unchanged, so that the test can be performed in the usual way. However, an additional element of chance is artiﬁcially imposed, affecting the probability distribution under alternatives. MIDRANKS
The midrank method assigns to each member of a group of tied observations the simple average of the ranks they would have if distinguishable. Using this approach, tied observations are given tied ranks. The midrank method is perhaps the most frequently used, as it has much appeal experimentally. However, the null distribution of ranks is affected. Obviously, the mean rank is unchanged, but the variance of the ranks would be reduced. When the midrank method is used, for some tests a correction for ties can be incorporated into the test statistic. We discuss these corrections when we present the respective tests. AVERAGE STATISTIC
If one does not wish to choose a particular set of ranks as in the previous two methods,Qone may instead calculate the value of the test statistic for all the ti ! assignments and use their simple average as the single sample value. Again, the test statistic would have the same mean but smaller variance.
196
CHAPTER 5
AVERAGE PROBABILITY
Instead of averaging the test statistic for each possible assignment of ranks, one could ﬁnd the probability of each resulting value of the test statistic and use the simple average of these probabilities for the overall probability. This requires availability of tables of the exact null probability distribution of the test statistic rather than simply a table of critical values. LEAST FAVORABLE STATISTIC
Having found all possible values of the test statistic, one might choose as a single value that one which minimizes the probability of rejection. This procedure leads to the most conservative test, i.e., the lowest probability of committing a type I error. RANGE OF PROBABILITY
Alternatively, one could compute two values of the test statistic: the one least favorable to rejection and the one most favorable. However, unless both fall inside or both fall outside the rejection region, this method does not lead to a decision. OMISSION OF TIED OBSERVATIONS
The ﬁnal and most obvious possibility is to discard all tied observations and reduce the sample size accordingly. This method certainly leads to a loss of information, but if the number of observations to be omitted is small relative to the sample size, the loss may be minimal. This procedure generally introduces bias toward rejection of the null hypothesis. The reader is referred to Savage’s Bibliography (1962) for discussions of treatment of ties in relation to particular nonparametric rank test statistics. Pratt and Gibbons (1981) also give detailed discussions and many references. Randles (2001) gives a different approach to dealing with ties. 5.7 THE WILCOXON SIGNEDRANK TEST AND CONFIDENCE INTERVAL
Since the onesample sign test in Section 5.4 utilizes only the signs of the differences between each observation and the hypothesized median M0 , the magnitudes of these observations relative to M0 are ignored. Assuming that such information is available, a test statistic which takes into account these individual relative magnitudes might
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
197
be expected to give better performance. If we are willing to make the assumption that the parent population is symmetric, the Wilcoxon signedrank test statistic provides an alternative test of location which is affected by both the magnitudes and signs of these differences. The rationale and properties of this test will be discussed in this section. As with the onesample situation of Section 5.4, we have a random sample of N observations X1 ; X2 ; . . . ; XN from a continuous cdf F with median M, but now we assume that F is symmetric about M. Under the null hypothesis H0 : M ¼ M0 the differences Di ¼ Xi M0 are symmetrically distributed about zero, so that positive and negative differences of equal absolute magnitude have the same probability of occurrence; i.e., for any c > 0, FD ðcÞ ¼ PðDi 4 cÞ ¼ PðDi 5 cÞ ¼ 1 PðDi 4 cÞ ¼ 1 FD ðcÞ With the assumption of a continuous population, we need not be concerned theoretically with zero or tied absolute differences jDi j. Suppose we order these absolute differences jD1 j; jD2 j; . . . ; jDN j from smallest to largest and assign them ranks 1; 2; . . . ; N, keeping track of the original signs of the differences Di . If M0 is the true median of the symmetrical population, the expected value of the sum of the ranks of the positive differences T þ is equal to the expected value of the sum of the ranks of the negative differences PT . Since the sum of all the ranks is a constant, that is, T þ þ T ¼ N i¼1 i ¼ NðN þ 1Þ=2, test statistics based on T þ only, T only, or T þ T are linearly related and therefore equivalent criteria. In contrast to the ordinary onesample sign test, the value of T þ , say, is inﬂuenced not only by the number of positive differences but also by their relative magnitudes. When the symmetry assumption can be justiﬁed, T þ may provide a more efﬁcient test of location for some distributions. The derived sample data on which these test statistics are based consist of the set of N integer ranks f1; 2; . . . ; Ng and a corresponding set of N plus and minus signs. The rank i is associated with a plus or minus sign according to the sign of Dj ¼ Xj M0 , where Dj occupies the ith position in the ordered array of absolute differences jDj j. If we let rð:Þ denote the rank of a random variable, the Wilcoxon signedrank statistic can be written symbolically as Tþ ¼
N X i¼1
Zi rðjDi jÞ
T ¼
N X i¼1
ð1 Zi ÞrðjDi jÞ
ð7:1Þ
198
CHAPTER 5
where Zi ¼
1 0
if Di > 0 if Di < 0
Therefore, Tþ T ¼ 2
N X
rjDi j
i¼1
NðN þ 1Þ 2
Under the null hypothesis, the Zi are independent and identically distributed Bernoulli random variables with PðZi ¼ 1Þ ¼ PðZi ¼ 0Þ ¼ 1=2 so that EðZi Þ ¼ 1=2 and varðZi Þ ¼ 1=4. Using the fact that T þ in (7.1) is a linear combination of these variables, its exact null mean and variance can be determined. We have EðT þ j H0 Þ ¼
N X rðjDi jÞ i¼1
2
¼
NðN þ 1Þ 4
Also, since Zi is independent of rðjDi jÞ under H0 (see Probem 5.25), we can show that varðT þ j H0 Þ ¼
N X ½rjDi j2 i¼1
4
¼
NðN þ 1Þð2N þ 1Þ 24
ð7:2Þ
A symbolic representation of the test statistic T þ that is more convenient for the purpose of deriving its mean and variance in general is XX Tþ ¼ Tij ð7:3Þ 14i4j4N
where Tij ¼
1 0
if Di þ Dj > 0 otherwise
The Di ’s are identically distributed under H0. Now deﬁne for all distinct i; j; k the probabilities p1 ¼ PðDi > 0Þ p2 ¼ PðDi þ Dj > 0Þ p3 ¼ PðDi > 0 and Di þ Dj > 0Þ p4 ¼ PðDi þ Dj > 0 and Di þ Dk > 0Þ
ð7:4Þ
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
199
The moments of the indicator variables for all distinct i; j; k; h are then EðTii Þ ¼ p1
EðTij Þ ¼ p2
varðTii Þ ¼ p1
varðTij Þ ¼ p2 p22
p21
covðTii ; Tik Þ ¼ p3 p1 p2 covðTij ; Thk Þ ¼ 0
covðTij ; Tik Þ ¼ p4 p22
The mean and variance of the linear combination in (7.3) in terms of these moments are EðT þ Þ ¼ NEðTii Þ þ
NðN 1ÞEðTij Þ NðN 1Þp2 ¼ Np1 þ 2 2
ð7:5Þ
varðT Þ ¼ NvarðTii Þ þ varðTij Þ þ 2NðN 1Þ covðTii ; Tik Þ 2
N N1 covðTij ; Thk Þ þ 2N covðTij ; Tik Þ þ 4 2 NðN 1Þp2 ð1 p2 Þ ¼ Np1 ð1 p1 Þ þ 2 þ 2NðN 1Þð p3 p1 p2 Þ þ NðN 1ÞðN 2Þð p4 p22 Þ þ
N
¼ Np1 ð1 p1 Þ þ NðN 1ÞðN 2Þð p4 p22 Þ þ
NðN 1Þ ½ p2 ð1 p2 Þ þ 4ð p3 p1 p2 Þ 2
ð7:6Þ
The relevant probabilities from (7.4) are now evaluated under the assumption that the population is symmetric and the null hypothesis is true. p1 ¼ PðDi > 0Þ ¼ 1=2
Z
1
Z
1
p2 ¼ PðDi þ Dj > 0Þ ¼ fD ðuÞfD ðvÞ du dv 1 1 Z 1 ½1 FD ðvÞfD ðvÞ dv ¼ 1 Z 1 Z 1 FD ðvÞfD ðvÞ dv ¼ x dx ¼ 1=2 ¼ 1
0
p3 ¼ PðDi > 0 and Di þ Dj > 0Þ Z 1 Z 1Z 1 ¼ fD ðuÞfD ðvÞ du dv ¼ ½1 FD ðvÞfD ðvÞ dv 0 v 0 Z 1 Z 1 FD ðvÞfD ðvÞ dv ¼ x dx ¼ 3=8 ¼ 0
1=2
200
CHAPTER 5
p4 ¼ PðDi þ Dj > 0 and Di þ Dk > 0Þ ¼ Pð0 < Di þ Dj < Di þ Dk Þ þ Pð0 < Di þ Dk < Di þ Dj Þ ¼ 2PðDi < Dj < Dk Þ Z 1 Z 1Z 1 ¼2 fD ðuÞfD ðvÞfD ðwÞ du dvdw w v Z1 Z 1 1 ¼2 ½1 FD ðvÞfD ðvÞfD ðwÞ dv dw w Z Z 1Z 1 Z1 1 1 fD ðvÞfD ðwÞdv dw 2 FD ðvÞfD ðvÞfD ðwÞ dv dw ¼2 1 w 1 w Z 1 Z 1 ¼2 ½1 FD ðwÞfD ðwÞ dw f1 ½FD ðwÞ2 gfD ðwÞ dw 1 1 Z 1 Z 1 FD ðwÞdFD ðwÞ 1 þ ½1 FD ðwÞ2 dFD ðwÞ ¼2 1
1
¼ 2ð1=2Þ 1 þ ð1=3Þ ¼ 1=3 The reader may verify that substitution of these results back in (7.5) and (7.6) gives the mean and variance already found in (7.2). We use the method described in Chapter 1 to investigate the consistency of T þ . We can write 2T þ 2p1 ðN 1Þp2 þ E ¼ NðN þ 1Þ Nþ1 Nþ1 which equals 12 under H0 and var½2T þ =NðN þ 1Þ clearly tends to zero as N ! 1. Therefore, the test with rejection region Tþ 2 R
for
2T þ 1 5k NðN þ 1Þ 2
is consistent against alternatives of the form p2 ¼ PðD1 þ Dj > 0Þ > 0.5. This result is reasonable since if the true population median exceeds M0 , the sample data would reﬂect this by having most of the larger ranks correspond to positive differences. A similar twosided rejection region of T þ centered on NðN þ 1Þ=4 is consistent against alternatives with p2 6¼ 0:5. To determine the rejection regions precisely for this consistent test, the probability distribution of T þ must be determined under the null hypothesis H0 : y ¼ PðX > M0 Þ ¼ 0:5
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
201
The extreme values of T þ are zero and NðN þ 1Þ=2, occurring when all differences are of the same sign, negative or positive, respectively. The mean and variance were found in (7.2). Since T þ is completely determined by the indicator variables Zi in (7.1), the sample space can be considered to be the set of all possible Ntuples fz1 ; z2 ; . . . ; zN g with components either one or zero, of which there are 2N . Each of these distinguishable arrangements is equally likely under H0. Therefore, the null probability distribution of T þ given by PðT þ ¼ tÞ ¼ uðtÞ=2N
ð7:7Þ
where uðtÞ is the number of ways to assign plus and minus signs to the ﬁrst N integers such that the sum of the positive integers equals t. Every assignment has a conjugate assignment with plus and minus signs interchanged, and T þ for this conjugate is N X
ið1 Zi Þ ¼
i¼1
N NðN þ 1Þ X iZi 2 i¼1
Since every assignment occurs with equal probability, this implies that the null distribution of T þ is symmetric about its mean NðN þ 1Þ=4. Because of the symmetry property, only onehalf of the null distribution need be determined. A systematic method of generating the complete distribution of T þ for N ¼ 4 is shown in Table 7.1. 8 t ¼ 0; 1; 2; 8; 9; 10 < 1=16 fT þ ðtÞ ¼ 2=16 t ¼ 3; 4; 5; 6; 7 : 0 otherwise Tables can be constructed in this way for all N. To use the signedrank statistics in hypothesis testing, the entire null distribution is not necessary. In fact, one set of critical values is sufﬁcient for even a twosided test, because of the relationship
Table 7.1 Enumeration for the distribution of T þ Value of T þ 10 9 8 7 6 5
Ranks associated with positive differences
Number of sample points u(t)
1,2,3,4 2,3,4 1,3,4 1,2,4; 3,4 1,2,3; 2,4 1,4; 2,3
1 1 1 2 2 2
202
CHAPTER 5
T þ þ T ¼ NðN þ 1Þ=2 and the symmetry of T þ about NðN þ 1Þ=4. Large values of T þ correspond to small values of T and T þ and T are identically distributed since NðN þ 1Þ NðN þ 1Þ 5c PðT þ 5 cÞ ¼ P T þ 4 4 NðN þ 1Þ NðN þ 1Þ Tþ 5 c ¼P 4 4 NðN þ 1Þ Tþ 5 c ¼P 2 ¼ PðT 5 cÞ Since it is more convenient to work with smaller sums, tables of the lefttailed critical values are generally set up for the random variable T, which may denote either T þ or T . If ta is the number such that PðT 4 ta Þ ¼ a, the appropriate rejection regions for size a tests of H0 : M ¼ M0 are as follows: T 4 ta
for H1 : M > M0
T þ 4 ta
for H1 : M < M0
T þ 4 ta=2 or T 4 ta=2
for H1 : M 6¼ M0
Suppose that N ¼ 8 and critical values are to be found for oneor twosided tests at nominal a ¼ 0:05. Since 28 ¼ 256 and 256ð0:05Þ ¼ 12:80, we need at least 13 cases of assignments of signs. We enumerate the small values of T þ in Table 7.2. Since PðT þ 4 6Þ ¼ 14=256 > 0:05 and PðT þ 4 5Þ ¼ 10=256 ¼ 0:039; t0:05 ¼ 5; the exact probability of a type I error is 0.039. Similarly, we ﬁnd t0:025 ¼ 3 with exact PðT þ 4 3Þ ¼ 0:0195. Table 7.2 Partial distribution of T þ N for N ¼ 8 Value of T þ 0 1 2 3 4 5 6
Ranks associated with positive differences
Number of sample points
1 2 3; 1,2 4; 1,3 5; 1,4; 2,3 6; 1,5; 2,4; 1,2,3
1 1 1 2 2 3 4
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
203
When the distribution is needed for several sample sizes, a simþ ple recursive relation can be used to generate the probabilities. Let TN denote the sum of the ranks associated with positive differences Di for a sample of N observations. Consider a set of N 1 ordered jDi j, with þ ranks 1; 2; . . . ; N 1 assigned, for which the null distribution of TN1 þ is known. To obtain the distribution of TN from this, an extra observation DN is added, and we can assume without loss of generality that jDN j > jDi j for all i 4 N 1. The rank of jDN j is then N. If þ þ will exceed that of TN1 by the amount N for jDN j > 0, the value of TN þ will every arrangement of the N 1 observations, but if jDN j < 0; TN þ be equal to TN1 . Using the notation in (7.7), this can be stated as þ PðTN ¼ kÞ ¼
¼
uN ðkÞ uN1 ðk NÞPðDN > 0Þ þ uN1 ðkÞPðDN < 0Þ ¼ 2N 2N1 uN1 ðk NÞ þ uN1 ðkÞ 2N
ð7:8Þ
If N is moderate and systematic enumeration is desired, classiﬁcation according to the number of positive differences Di is often helpful. Deﬁne the random variable U as the number of positive differences; U follows the binomial distribution with parameter 0.5, so that PðT þ ¼ tÞ ¼
N X
PðU ¼ i \ T þ ¼ tÞ
i¼0
¼
N X
PðU ¼ iÞPðT þ ¼ t j U ¼ iÞ
i¼0
N X N ¼ ð0:5ÞN PðT þ ¼ t j U ¼ iÞ i i¼0 A table of critical values and exact signiﬁcance levels of the Wilcoxon signedrank test is given in Dunstan, Nix, and Reynolds (1979) for N 4 50, and the entire null distribution is given in Wilcoxon, Katti, and Wilcox (1972) for N 4 50. Table H of the Appendix of this book gives lefttail and righttail probabilities of T þ (or T ) for N 4 15. From a generalization of the centrallimit theorem, they asymptotic distribution of T þ is the normal. Therefore, in the null case, using the moments given in (7.2), the distribution of 4T þ NðN þ 1Þ Z ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2NðN þ 1Þð2N þ 1Þ=3
ð7:9Þ
204
CHAPTER 5
approaches the standard normal as N ! 1. The test for, say, H1 : M > M0 can be performed for large N by computing (7.9) and rejecting H0 for Z 5 za. The approximation is generally adequate for N at least 15. A continuity correction of 0.5 generally improves the approximation. THE PROBLEM OF ZERO AND TIED DIFFERENCES
Since we assumed originally that the random sample was drawn from a continuous population, the problem of tied observations and zero differences could be ignored theoretically. In practice, generally any zero differences (observations equal to MO ) are ignored and N is reduced accordingly, although the other procedures described for the ordinary sign test in Section 5.4 are equally applicable here. In the case where two or more absolute values of differences are equal, that is, jdi j ¼ jdj j for at least one i 6¼ j, the observations are tied. The ties can be dealt with by any of the procedures described in Section 5.6. The midrank method is usually used, and the sign associated with the midrank of jdi j is determined by the original sign of di as before. The probability distribution of T is clearly not the same in the presence of tied ranks, but the effect is generally slight and no correction need be made unless the ties are quite extensive. A thorough comparison of the various methods of treating zeros and ties with this test is given in Pratt and Gibbons (1981). With large sample sizes when the test is based on the standard normal statistic in (7.9), the variance can be corrected to account for the ties as long as the midrank method is used to resolve the ties. Suppose that t observations are tied for a given rank and that if they were not tied they would be given the ranks s þ 1; s þ 2; . . . ; s þ t. The midrank is then s þ ðt þ 1Þ=2 and the sum of squares of these ranks is " # ðt þ 1Þ 2 ðt þ 1Þ2 2 t sþ ¼ t s þ sðt þ 1Þ þ 2 4 If these ranks had not been tied, their sum of squares would have been t X i¼1
ðs þ iÞ2 ¼ ts2 þ stðt þ 1Þ þ
tðt þ 1Þð2t þ 1Þ 6
The presence of these t ties then decreases the sum of squares by tðt þ 1Þð2t þ 1Þ tðt þ 1Þ2 tðt þ 1Þðt 1Þ ¼ 6 12 4
ð7:10Þ
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
Therefore the reduced variance from (7.2) is P 2 NðN þ 1Þð2N þ 1Þ tðt 1Þ þ varðT jH0 Þ ¼ 24 48
205
ð7:11Þ
where the sum is extended over all sets of t ties. POWER FUNCTION
The distribution of T þ is approximately normal for large sample sizes regardless of whether the null hypothesis is true. Therefore a large sample approximation to the power can be calculated using the mean and variance given in (7.5) and (7.6). The distribution of X M0 under the alternative would need to be speciﬁed in order to calculate the probabilities in (7.4) to substitute in (7.5) and (7.6). The asymptotic relative efﬁciency of the Wilcoxon signedrank test relative to the t test is at least 0.864 for any distribution continuous and symmetric about zero, is 0.955 for the normal distribution, and is 1.5 for the double exponential distribution. It should be noted that the probability distribution of T þ is not symmetric when the null hypothesis is not true. Further, T þ and T are not identically distributed when the null hypothesis is not true. We can still ﬁnd the probability distribution of T from that of T þ , however, using the relationship NðN þ 1Þ Tþ ¼ k PðT ¼ kÞ ¼ P ð7:12Þ 2 SIMULATED POWER
Calculating the power of the signedrank test, even using the normal approximation, requires a considerable amount of work. It is much easier to simulate the power of the test, as we did for the sign test in Section 5.4. Again we use a MINITAB Macro program for the calculations and in order to compare the results with those obtained for the sign test, we use N ¼ 13; a ¼ 0:05; M0 ¼ 0:5 and M1 ¼ 0:6256. Simulating the power of the signedrank test consists of the following steps. First, we determine the rejection region of the signedrank test from Table H of the Appendix as T þ 5 70 with exact a ¼ 0:047. We generate 1000 random samples each of size N ¼ 13 from a normal distribution with mean 0.6256 and variance 1 and calculate the signedrank statistic T þ for each. For each of these statistics we check to see if it exceeds the critical value 70 or not. Finally, we count the number of times, out of 1000, that the signedrank test rejects the null hypothesis and divide this number by 1000. This gives a
206
CHAPTER 5
Fig. 7.1 Simulated power of the sign and the signedrank rank test for the normal distribution.
simulated (estimated) value of the power of the sign test with N ¼ 13; a ¼ 0:0461; M0 ¼ 0:50; M1 ¼ 0:6256. The program code is shown below. Note that the program also calculates the simulated power of the sign test and plots the two simulated power curves on the same graph. This graph is shown in Figure 7.1.
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
207
208
CHAPTER 5
The output from the MACRO is shown below; pow1 and pow2 are, respectively, the computed powers of the sign and the signedrank test, based on 1000 simulations.
SAMPLE SIZE DETERMINATION
In order to make an inference regarding the population median using the signedrank test, we need to have a random sample of observations. If we are allowed to choose the sample size, we might want to determine the value of N such that the test has size a and power 1b, given the null and the alternative hypotheses and other necessary assumptions. Recall that for the sign test against the one sided uppertailed alternative, we solved for N such that
N X N Size ¼ ð0:5ÞN 4 a i i¼ka
N X N i and power ¼ y ð1 yÞNi 5 1 b i i¼ka where a, 1b, and y ¼ PðX > M0 jH1 Þ are all speciﬁed. We noted there that the solution is much easier to obtain using the normal approx
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
209
imation; the same is true for the Wilcoxon signedrank test, as we now illustrate. Note that the theory is presented here in terms of the signedrank statistic T þ but the same approach will hold for any test statistic whose distribution can be approximated by a normal distribution under both the null and the alternative hypotheses. Under the normal approximation, the power of a size a signedrank test against the alternative H1 : M > M0 is PðT þ 5 m0 þ za s0 jH1 Þ, where m0 and s0 are, respectively, the null mean and the null standard deviation of T þ . It can be easily shown (see Noether, 1987) that this power equals a speciﬁed 1b if m m 2 0 ¼ ðza þ r zb Þ2 s
ð7:13Þ
where m and s are, respectively, the mean and the standard deviation of T þ under the alternative hypothesis. We denote the relation between standard deviations by r ¼ s=s0. Since s is unknown and is difﬁcult to evaluate [see (7.6)], r is unknown. One possibility is to take r equal to 1 and this is what is done; such an assumption is reasonable for alternative hypotheses that are not too different from the null hypothesis. If we substitute the expressions for m0, s0, and m [see (7.5)] into (7.13), we need to solve for N in ½Nð p1 0:5Þ þ ðNðN 1Þð p2 0:5ÞÞ=22 ¼ ðza þ zb Þ2 NðN þ 1Þð2N þ 1Þ=24
ð7:14Þ
Note that p1 ¼ PðXi > M0 Þ and p2 ¼ PðXi þ Xj > 2M0 Þ under the alternative H1 : M > M0 . The sample size calculations from (7.14) are shown in Table 7.3 for a ¼ 0.05, 1b ¼ 0.95, assuming the underlying distribution is standard normal. These calculations are the solution for N in (7.14) done in EXCEL using the solver application. Note that the answer for the sample size N, shown in the ﬁfth column, needs to be rounded up to that next larger integer. Thus, for example, assuming normality and the M0 ¼ 0, M1 ¼ 0:5, a ¼ 0.05, we need to have approximately 33 observations in our sample for a power of 0.95 and a onesided alternative. A similar derivation can be used to ﬁnd a sample size formula when the alternative is twosided. The details are left to the reader as an exercise. It may be noted that the sample size formula in (7.14) is not distributionfree since it depends on the underlying distribution
M1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
M0
0 0 0 0 0 0 0 0 0 0
0.539828 0.57926 0.617911 0.655422 0.691462 0.725747 0.758036 0.788145 0.81594 0.841345
P1 0.57926 0.655422 0.725747 0.788145 0.841345 0.88493 0.919243 0.945201 0.96407 0.97725
P2 575.7201 150.7868 72.17494 44.75747 32.17465 25.45262 21.51276 19.06405 17.48421 16.44019
N 0.039828 0.07926 0.117911 0.155422 0.191462 0.225747 0.258036 0.288145 0.31594 0.341345
p1 0:5
Table 7.3 Calculations for sample size determination in EXCEL
13112.63915 1755.166447 579.8362509 282.1618931 171.190114 119.7870853 92.50310853 76.65779392 66.87552029 60.57245533
0:5NðN 1Þðp2 0:5Þ
15943497 288547.2 31985.43 7723.9 2906.364 1456.134 888.4195 623.6078 484.3471 404.7574
NðN þ 1Þð2N þ 1Þ=24
9.83E11 9.6E09 2.1E07 2.7E08 1.7E07 4.6E08 5E08 2.1E08 6.1E09 2.6E09
Error
210 CHAPTER 5
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
211
through the parameters p1 and p2. Noether (1987) proposed approximating the lefthand side of (7.14) as 3Nð p2 0:5Þ2 and solving for N, which yields N¼
ðza þ zb Þ2
ð7:15Þ
3ð p2 0:5Þ2
This formula still depends on p2; Noether (1987) suggested a choice for this parameter in terms of an ‘‘oddsratio.’’ The reader is referred to his paper for details. For a twosided test, we can use (7.15) with a replaced by a=2. We illustrate the use of (7.15) for this example where a ¼ 0.05, 1b ¼ 0.95. If M1 ¼ 0:1 and p2 ¼ 0:579, we ﬁnd N ¼ 578:12 from (7.15); if M1 ¼ 1.0 and p2 ¼ 0.977, we ﬁnd N ¼ 15.86 from (7.15). The corresponding values shown in Table 7.1 are N ¼ 575.72 and N ¼ 16.44, respectively. CONFIDENCEINTERVAL PROCEDURES
As with the ordinary onesample sign test, the Wilcoxon signedrank procedure lends itself to conﬁdenceinterval estimation of the unknown population median M. In fact, two methods of interval estimation are available here. Both will give the conﬁdence limits as those values of M which do not lead to rejection of the null hypothesis, but one amounts to a trialanderror procedure while the other is systematic and provides a unique interval. For any sample size N, we can ﬁnd that number ta=2 such that if the true population median is M and T is calculated for the derived sample values Xi M, then PðT þ 4 ta=2 Þ ¼
a 2
and
PðT 4 ta=2 Þ ¼
a 2
The null hypothesis will not be rejected for all numbers M which make T þ > ta=2 and T > ta=2 . The conﬁdence interval technique is to ﬁnd those two numbers, say M1 and M2 where M1 < M2 , such that when T is calculated for the two sets of differences Xi M1 and Xi M2 , at the signiﬁcance level a, T þ or T , whichever is smaller, is just short of signiﬁcance, i.e., slightly larger than ta=2 . Then the 100ð1 aÞ percent conﬁdenceinterval estimate of M is M1 < M < M2 . In the trialanderror procedure, we simply choose some suitable values of M and calculate the resulting values of T þ or T , stopping whenever we get numbers slightly larger than ta=2 . This generally does not lead to a unique interval, and the manipulations can be tedious even for moderate sample sizes. The technique is best illustrated by an
212
CHAPTER 5
example. The following eight observations are drawn from a continuous, symmetric population: 1; 6; 13; 4; 2; 3; 5; 9
ð7:16Þ
For N ¼ 8 the twosided rejection region of nominal size 0.05 was found earlier by Table 7.2 to be ta=2 ¼ 3 with exact signiﬁcance level a ¼ PðT þ 4 3Þ þ PðT 4 3Þ ¼ 10=256 ¼ 0:039 In Table 7.4 we try six different values for M and calculate T þ or T , whichever is smaller, for the differences Xi M. The example illustrates a number of difﬁculties which arise. In the ﬁrst trial choice of M, the number 4 was subtracted and the resulting differences contained three sets of tied pairs and one zero even though the original sample contained neither ties nor zeros. If the zero difference is ignored, N must be reduced to 7 and then the ta=2 ¼ 3 is no longer accurate for a ¼ 0.039. The midrank method could be used to handle the ties, but this also disturbs the accuracy of ta=2 . Since there seems to be no real solution to these problems, we try to avoid zeros and ties by judicious choices for our M values for subtraction. Since these data are all integers, a choice for M which is noninteger valued obviously reduces the likelihood of ties and makes zero values impossible. Since T for the differences Xi 1:1 yields T ¼ 3:5 using the midrank method, we will choose M1 ¼ 1.5. The next three columns represent an attempt to ﬁnd an M which makes T þ around 4. These calculations illustrate the fact that M1 and M2 are far from being unique. Clearly M2 is in the vicinity of 9, but the differences Xi 9 yield a zero. We conclude there is no need to go further. An approximate 96.1 percent conﬁdence
Table 7.4 Trialanderror determination of endpoints XI
Xi 4
Xi 1:1
Xi 1:5
Xi 9:1
Xi 8:9
Xi 8:95
1 6 13 4 2 3 5 9
5 2 9 0 2 1 1 5
2.1 4.9 11.9 2.9 0.9 1.9 3.9 7.9
2.5 4.5 11.5 2.5 0.5 1.5 3.5 7.5
10.1 3.1 3.9 5.1 7.1 6.1 4.1 0.1
9.9 2.9 4.1 4.9 6.9 5.9 3.9 0.1
9.95 2.95 4.05 4.95 6.95 5.95 3.95 0.05
3
5
T þ or T
3
3.5
5
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
213
interval on M is given by 1:5 < M < 9. The interpretation is that hypothesized values of M within this range will lead to acceptance of the null hypothesis for an exact signiﬁcance level of 0.039. This procedure is undoubtedly tedious, but the limits obtained are reasonably accurate. The numbers should be tried systematically to narrow down the range of possibilities. Thoughtful study of the intermediate results usually reduces the additional number of trials required. A different method of construction which leads to a unique interval and is much easier to apply is described in Noether [(1967), pp. 57–58]. The procedure is to convert the interval T þ > ta=2 and T > ta=2 to an equivalent statement on M whose end points are functions of the observations Xi. For this purpose we must analyze the comparisons involved in determining the ranks of the differences rðjXi M0 jÞ and the signs of the differences Xi ¼ M0 since T þ and T are functions of these comparisons. Recall from (5.1) that the rank of any random variable in a set fV1 ; V2 ; . . . ; VN g can be written symbolically as rðVi Þ ¼
N X
SðVi Vk Þ ¼
X
SðVi Vk Þ þ 1
k6¼i
k¼1
where SðuÞ ¼
1 0
if u > 0 if u 4 0
To compute a rank, then we make
N 2
comparisons of pairs of
different numbers and one comparison of a number with itself. To
compute the sets of all ranks, we make
N identity comparisons, a total of
N 2
N 2
comparisons of pairs and
þ N ¼ NðN þ 1Þ=2 compar
isons. Substituting the rank function in (7.1), we obtain Tþ ¼
N X
Zi rðjXi M0 jÞ
i¼1
¼
N X i¼1
Zi þ
N X X
Zi SðjXi M0 j jXk M0 jÞ
i¼1 k6¼i
Therefore these comparisons affect T þ as follows:
ð7:17Þ
214
CHAPTER 5
1. A comparison of jXi M0 j with itself adds 1 to T þ if Xi M0 > 0. 2. A comparison of jXi M0 j with jXk M0 j for any i 6¼ k adds 1 to T þ if jXi M0 j > jXk M0 j and Xi M0 > 0, that is, Xi M0 > jXk M0 j. If Xk M0 > 0, this occurs when Xi > Xk , and if Xk M0 < 0, we have Xi þ Xk > 2M0 or ðXi þ Xk Þ=2 > M0. But when Xi M0 > 0 and Xk M0 > 0, we have ðXi þ Xk Þ=2 > M0 also. Combining these two results, then, ðXi þ Xk Þ=2 > M0 is a necessary condition for adding 1 to T þ for all i, k. Similarly, if ðXi þ Xk Þ=2 < M0 , then this comparison adds 1 to T. The relative magnitudes of the NðN þ 1Þ=2 averages of pairs ðXi þ Xk Þ=2 for all i 4 k, called the Walsh averages, then determine the range of values for hypothesized numbers M0 which will not lead to rejection of H0 . If these NðN þ 1Þ=2 averages are arranged as order statistics, the two numbers which are in the (ta=2 þ 1Þ position from either end are the endpoints of the 100ð1 aÞ percent conﬁdence interval on M. Note that this procedure is exactly analogous to the ordinary signtest conﬁdence interval except that here the order statistics are for the averages of all pairs of observations instead of the original observations. The data in (7.16) for N ¼ 8 arranged in order of magnitude are 1, 2, 3, 4, 5, 6, 9, 13, and the 36 Walsh averages are given in Table 7.5. For exact a ¼ 0:039, we found before that ta=2 ¼ 3. Since the fourth largest numbers from either end are 1.5 and 9.0, the conﬁdence interval is 1.5 < M < 9 with exact conﬁdence coefﬁcient g ¼ 12(0.039) ¼ 0.922. This result agrees exactly with that obtained by the previous method, but this will not always be the case since the trialanderror procedure does not yield unique endpoints. The process of determining a conﬁdence interval on M by the above method is much facilitated by using the graphical method of construction, which can be described as follows. Each of the N ob
Table 7.5 Walsh averages for data in (7.16) 1.0 2.0 3.0 4.0 5.0 6.0 9.0 13.0
0.5 2.5 3.5 4.5 5.5 7.5 11.0
1.0 3.0 4.0 5.0 7.0 9.5
1.5 3.5 4.5 6.5 9.0
2.0 4.0 6.0 8.5
2.5 5.5 8.0
4.0 7.5
6.0
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
215
servations xi is denoted by a dot on a horizontal scale. The closed interval ½Xð1Þ ; XðNÞ then includes all dots. Form an isosceles triangle ABC by lines joining xð1Þ at A and xðNÞ at B each with a point C anywhere on the vertical line passing through the midrange value ðxð1Þ þ xðNÞ Þ=2. Through each point xi on the line segment AB draw lines parallel to AC and BC, marking each intersection with a dot. There will be NðN þ 1Þ=2 intersections, the abscissas of which are all the ðxi þ xk Þ=2 values where 1 4 i 4 k 4 N. Vertical lines drawn through the ðta=2 þ 1Þst intersection point from the left and right will allow us to read the respective conﬁdenceinterval end points on the horizontal scale. Figure 7.2 illustrates this method for the numerical data above. PAIREDSAMPLE PROCEDURES
The Wilcoxon signedrank test was actually proposed for use with pairedsample data in making inferences concerning the value of the median of the population of differences. Given a random sample of N pairs ðX1 ;Y1 Þ; ðX2 ;Y2 Þ; . . . ; ðXN ;YN Þ their differences are X1 Y1 ; X2 Y2 ; . . . ; XN YN
Fig. 7.2 Graphical determination of conﬁdence interval.
216
CHAPTER 5
We assume these are independent observations from a population of differences which is continuous and symmetric with median M0 . In order to test the hypothesis H0 : MD ¼ M0 form the N differences Di ¼ Xi Yi M0 and rank their absolute magnitudes from smallest to largest using integers f1; 2; . . . ; Ng, keeping track of the original sign of each difference. Then the above procedures for hypothesis testing and conﬁdence intervals are equally applicable here with the same notation, except that the parameter MD must be interpreted now as the median of the population of differences. USE OF WILCOXON STATISTICS TO TEST FOR SYMMETRY
The Wilcoxon signedrank statistics can also be considered tests for symmetry if the only assumption made is that the random sample is drawn from a continuous distribution. If the null hypothesis states that the population is symmetric with median M0 , the null distributions of T þ and T are exactly the same as before. If the null hypothesis is accepted, we can conclude that the population is symmetric and has median M0 . On the other hand, if the null hypothesis is rejected, we cannot tell which portion (or all) of the composite statement is not consistent with the sample outcome. With a twosided alternative, for example, we must conclude that either the population is symmetric with median not equal to M0 , or the population is asymmetric with median equal to M0 , or the population is asymmetric with median not equal to M0 . Such a broad conclusion is generally not satisfactory, and this is why in most cases the assumptions that justify a test procedure are separated from the statement of the null hypothesis. APPLICATIONS
The appropriate rejection regions and P values for T þ, called the sum of the positive ranks, are given below. Note that t is the observed value of T þ . Alternative
Exact rejection region
Exact Pvalue
M > M0 M < M0 M 6¼ M0
T þ 5 ta T þ 4 t0a T þ 4 t0a=2 or T þ 5 ta=2
PðT þ 5 tjH0 Þ PðT þ 4 tjH0 Þ 2(smaller of the above)
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
217
Table H gives the distribution of T þ for N 4 15 as lefttail probabilities for T þ 4 NðN þ 1Þ=4 and righttail for T þ 5 NðN þ 1Þ=4. This table can be used to ﬁnd exact critical values for a given a or to ﬁnd exact P values. For N > 15, the appropriate rejection regions and the P values based on the normal approximation with a continuity correction are as follows:
Alternative M > M0 M < M0 M 6¼ 0
Approximate rejection region rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ NðN þ 1Þ ðN þ 1Þð2N þ 1Þ Tþ 5 þ 0:5 þ za 4 24 rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ NðN þ 1Þ ðN þ 1Þð2N þ 1Þ Tþ 4 0:5 za 4 24
Approximate P value " # t 0:5 NðN þ 1Þ=4 1 F pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ NðN þ 1Þð2N þ 1Þ=24 " # tﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ þ 0:5 NðN þ 1Þ=4 ﬃ p F NðN þ 1Þð2N þ 1Þ=24
Both above with za=2
2(smaller of the above)
If ties are present, the variance term in these rejection regions should be replaced by (7.11). The corresponding conﬁdence interval estimate of the median has endpoints which are ðta=2 þ 1Þst from the smallest and largest of the Walsh averages, where ta=2 is the lefttail critical value in Table H for the given N. The choice of exact conﬁdence levels is limited to 12P where P is a tail probability in Table H. Therefore the critical value ta=2 is the lefttail table entry corresponding to the chosen P. Since the entries are all of the nonnegative integers, ðta=2 þ 1Þ is the rank of ta=2 among the table entries for that N. Thus, in practice, the conﬁdence interval endpoints are the uth smallest and uth largest of the NðN þ 1Þ=2 Walsh averages Wik ¼ ðXi þ Xk Þ=2 for all 1 4 i, k 4 N, or WðuÞ 4 M 4 W½NðNþ1Þ=2uþ1 The appropriate value of u for conﬁdence 12P is the rank of that lefttail P among the entries in Table H for the given N. For N > 15, we ﬁnd u from rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ NðN þ 1Þ NðN þ 1Þð2N þ 1Þ þ 0:5 za=2 u¼ 4 24 and round down to the next smaller integer if the result is not an integer. If zeros or ties occur in the averages, they should all be counted in determining the endpoints. These Wilcoxon signedrank test procedures are applicable to paired samples in exactly the same manner as long as X is replaced by
218
CHAPTER 5
the differences D ¼ X Y and M is interpreted as the median MD of the distribution of X Y. As in the case of the sign test, the conﬁdenceinterval estimate of the median or median difference can be based on all N observations even if there are zeros and=or ties. Thus a hypothesis test concerning a value for the median or median difference when the data contain zeros and=or ties will be more powerful if the decision is based on the conﬁdenceinterval estimate rather than on a hypothesis test procedure. A large company was disturbed about the number of personhours lost per month due to plant accidents and instituted an extensive industrial safety program. The data below show the number of personhours lost in a month at each of eight different plants before and after the safety program was established. Has the safety program been effective in reducing time lost from accidents? Assume the distribution of differences is symmetric.
Example 7.1
Plant 1 2 3 4 5 6 7 8
Before
After
51.2 46.5 24.1 10.2 65.3 92.1 30.3 49.2
45.8 41.3 15.8 11.1 58.5 70.3 31.6 35.4
Solution Because of the symmetry assumption, we can use the Wilcoxon signedrank test instead of the sign test on these data. We take the differences D ¼ Before minus After and test H0 : MD ¼ 0 versus H1 : MD > 0 since the program is effective if these differences are large positive numbers. Then we rank the absolute values and sum the positive ranks. The table below shows these calculations. Plant 1 2 3 4 5 6 7 8
D
jDj
rðjDjÞ
5.4 5.2 8.3 0.9 6.8 21.8 1.3 13.8
5.4 5.2 8.3 0.9 6.8 21.8 1.3 13.8
4 3 6 1 5 8 2 7
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
219
We have T þ ¼ 33 and Table H for N ¼ 8 gives the righttail probability as 0.020. The program has been effective at the 0.05 level. The following computer printouts illustrate the solution to Example 7.1 using the MINITAB, STATXACT and SAS packages.
220
CHAPTER 5
The MINITAB solution uses the normal approximation with a continuity correction. The STATXACT solution gives the asymptotic results based on the normal approximation without a continuity correction. Only a portion of the output from SAS PROC UNIVARIATE is shown. This output provides a lot of information, including important descriptive statistics such as the sample mean, variance, interquartile range, etc., which are not shown. Note that the SAS signedrank statistic is calculated as T þ nðn þ 1Þ=4 ¼ 33 18 ¼ 15 (labeled S) and the P value given is twotailed. The required onetailed P value can be found as 0.0391/2 = 0.1955, which agrees with other calculations. It is interesting that for these data both the ttest and the signedrank test clearly lead to a rejection of the null hypothesis at the 0.05 level of signiﬁcance but the sign test does not. Assume the data in Example 7.1 come from a symmetric distribution and ﬁnd a 90% conﬁdenceinterval estimate of the median difference, computed as After minus Before.
Example 7.2
Solution Table H for N ¼ 6 shows that P ¼ 0:047 for conﬁdence 1 2ð0:047Þ ¼ 0:906, and 0.047 has rank three in Table H so that u ¼ 3. Thus the 90.6% conﬁdenceinterval endpoints for the median difference are the third smallest and third largest Walsh averages.
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
221
The 6ð7Þ=2 ¼ 21 Walsh averages of differences ðDi þ Dk Þ=2 are shown in the table below. 2.0 1.5 0.5 0.5 1.0 3.0
1.0 0.0 1.0 1.5 3.5
1.0 2.0 2.5 4.5
3.0 3.5 5.5
4.0 6.0
8.0
So the third smallest and third largest Walsh averages are 1.0 and 5.5, respectively and the 90.6% conﬁdenceinterval for the median difference is (1.0, 5.5). Note that by listing the After minus Before data in an array across the top row of this table of Walsh averages, identiﬁcation of the conﬁdenceinterval endpoints is greatly simpliﬁed. The MINITAB and STATXACT solutions to this example are shown below. The MINITAB solution agrees exactly with our hand calculations. The STATXACT solution gives an asymptotic interval that agrees with our exact solution; the interval labeled exact uses the second smallest and the second largest Walsh averages, which provides the 93.8% conﬁdence interval.
222
CHAPTER 5
5.8 SUMMARY
In this chapter we presented the procedures for hypothesis tests and conﬁdence interval estimates for the pth quantile of any continuous distribution for any speciﬁed p, 0 < p < 1, based on data from one sample or paired samples. These procedures are all based on using the pth sample quantile as a point estimate of the pth population quantile and use the binomial distribution; they have no parametric counterparts. The sample quantiles are all order statistics of the sample. Other estimates of the population quantiles have been introduced in the literature;P most of these are based on linear functions of order statistics, say ai XðiÞ. The one proposed by Harrell and Davis (1982) has been shown to be better than ours for a wide variety of distributions. Dielman, Lowry and Pfaffenberger (1994) present a Monte Carlo comparison of the performance of various sample quantile estimators for small sample sizes. The pth quantile when p ¼ 0:5 is the median of the distribution and we have inference procedures based on the sign test in Section 5.4 and the Wilcoxon signedrank test in Section 5.7. Both tests are generally useful in the same experimental situations regarding a single sample or paired samples. The assumptions required are minimal – independence of observations and a population which is continuous at M for the ordinary sign test and continuous everywhere and symmetric for the Wilcoxon signedrank test. Experimentally, both tests have the problem of zero differences, and the Wilcoxon test has the additional problem of ties. Both tests are applicable when quantitative measurements are impossible or not feasible, as when rating scales or preferences are used. For the Wilcoxon test, information concerning relative magnitudes as well as directions of differences is required. Only the sign test can be used for strictly dichotomous data, like yesno observations. Both are very ﬂexible and simple to use for hypothesis testing or constructing conﬁdence intervals. The null distribution of the sign test is easier to work with since binomial tables are readily available. The normal approximation is quite accurate for even moderate N in both cases, and neither is particularly hampered by the presence of a moderate number of zeros or ties. For hypothesis testing, in the pairedsample case the hypothesis need not state an actual median difference but only a relation between medians if both populations are assumed symmetric. For example, we might test the hypothesis that the X population values are on the average p percent larger than Y values. Assuming the medians are a reliable indication of size, we would write
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
223
H0 : MX ¼ ð1 þ 0:01pÞMY and take differences Di ¼ Xi ð1 þ 0:01pÞYi and perform either test on these derived data as before. Both tests have a corresponding procedure for ﬁnding a conﬁdence interval estimate of the median of the population in the onesample case and the median difference in the pairedsample case. We have given expressions for sample size determination and power calculations. Only the Wilcoxon signedrank statistics are appropriate for tests of symmetry since the ordinary signtest statistic is not at all related to the symmetry or asymmetry of the population. We have P½ðXi MÞ > 0 ¼ 0:5 always, and the sole criterion of determining K in the sign test is the number of positive signs, thus ignoring the magnitudes of the plus and minus differences. There are other extensions and modiﬁcations of the signtest type of criteria [see, for example, Walsh (1949a,b)]. If the population is symmetric, both sign tests can be considered to be tests for location of the population mean and are therefore direct nonparametric counterparts to Student’s t test. As a result, comparisons of their performance are of interest. As explained in Chapter 1, one way to compare performance of tests is by computing their asymptotic relative efﬁciency (ARE) under various distribution assumptions. The asymptotic relative efﬁciency of the ordinary sign test relative to the t test is 2=p ¼ 0:637, and the ARE of the Wilcoxon signedrank test relative to the t test is 3=p ¼ 0:955, both calculated under the assumption of normal distributions. How these particular results were obtained will be discussed in Chapter 13. It is not surprising that both ARE values are less than one because the t test is the best test for normal distributions. It can be shown that the ARE of the Wilcoxon signedrank test is always at least 0.864 for any continuous symmetric distribution, whereas the corresponding lower bound for the ordinary sign test is only 1=3. The ARE of the sign test relative to the Wilcoxon signedrank test is 2=3 for the normal distribution and 1=3 for the uniform distribution. However, the result is 4=3 for the double exponential distribution; the fact that this ARE is greater than one means that the sign test performs better than the signedrank test for this particular symmetric but heavytailed distribution. Similarly, the Wilcoxon signedrank test performs better than the t test for some nonnormal distributions; for example, the ARE is 1.50 for the double exponential distribution and 1.09 for the logistic distribution, which are both heavytailed distributions.
224
CHAPTER 5
PROBLEMS 5.1. Give a functional deﬁnition similar to (5.1) for the rank rðXi Þ of a random variable in any set of N independent observations where ties are dealt with by the midrank method. Hint: In place of SðuÞ in (5.2), consider the function ( 0 if u < 0 cðuÞ ¼ 1=2 if u ¼ 0 1 if u > 0 5.2. Find the correlation coefﬁcient between variate values and ranks in a random sample of size N from (a) The uniform distribution (b) The standard normal distribution (c) The exponential distribution 5.3. Verify the cumulative distribution function of differences given in (4.14) and pﬃﬃﬃ the result M ¼ 2 þ 3. Find and graph the corresponding probability function of differences. 5.4. Answer parts (a) through (e) using (i) the signtest procedure and (ii) the Wilcoxon signedrank test procedure. ðaÞ Test at a signiﬁcance level not exceeding 0.10 the null hypothesis H0 : M ¼ 2 against the alternative H1 : M > 2, where M is the median of the continuous symmetric population from which is drawn the random sample: 3; 6; 1; 9; 4; 10; 12 ðbÞ Give the exact probability of a type I error in (a) ðcÞ
On the basis of the following random sample of pairs:
X
126
131
153
125
119
102
116
163
Y
120
126
152
129
102
105
100
175
test at a signiﬁcance level not exceeding 0.10 the null hypothesis H0 : M ¼ 2 against the alternative H1 : M 6¼ 2, where M is the median of the continuous and symmetric population of differences D ¼ X Y. ðdÞ Give the exact probability of a type I error in (c). ðeÞ
Give the conﬁdence interval corresponding to the test in (c).
5.5. Generate the sampling distributions of T þ and T under the null hypothesis for a random sample of six unequal and nonzero observations. 5.6. Show by calculations from tables that the normal distribution provides reasonably accurate approximations to the critical values of onesided tests for a ¼ 0:01; 0:05, and 0.10 when: N ¼ 12 for the sign test N ¼ 15 for the signedrank test 5.7. A random sample of 10 observations is drawn from a normal population with mean m and variance 1. Instead of a normaltheory test, the ordinary sign test is used for H0 : m ¼ 0, H1 : m > 0, with rejection region K 2 R for K 5 8. (a) Plot the power curve using the exact distribution of K. (b) Plot the power curve using the normal approximation to the distribution of K.
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
225
(c) Discuss how the power functions might help in the choice of an appropriate sample size for an experiment. 5.8. Prove that the Wilcoxon signedrank statistic T þ T based on a set of nonzero observations X1 ; X2 ; . . . ; XN can be written symbolically in the form XX
sgnðXi þ Xj Þ
14i4j4N
where sgnðxÞ ¼
1 1
if x > 0 if x < 0
5.9. Let D1 ; D2 ; . . . ; DN be a random sample of N nonzero observations from some continuous population which is symmetric with median zero. Deﬁne Xi if Di > 0 jDi j ¼ Yi if Di < 0 Assume there are m X values and n Y values, where m þ n ¼ N and the X and Y value are independent. Show that the signedrank test statistic T þ calculated for these Di is equal to the sum of the ranks of the X observations in the combined ordered sample of mX’s and nY’s and also that T þ T is the sum of the X ranks minus the sum of the Y ranks. This sum of the ranks of the X’s is the test criterion for the Wilcoxon statistic in the twosample problem to be discussed in Chapter 8. Show how T þ might be used to test the hypothesis that the X and Y populations are identical. 5.10. Hoskin et al. (1986) investigated the change in fatal motorvehicle accidents after the legal minimum drinking age was raised in 10 states. Their data were the ratios of the number of singlevehicle nighttime fatalities to the number of licensed drivers in the affected age group before and after the laws were changed to raise the drinking age, shown in Table 1. The researchers hypothesized that raising the minimum drinking age resulted in a reduced median fatality ratio. Investigate this hypothesis. Table 1 Data for Problem 5.10 State Florida Georgia Illinois Iowa Maine Michigan Montana Nebraska New Hampshire Tennessee
Affected ages
Ratio before
Ratio after
18 18 19–20 18 18–19 18–20 18 19 18–19 18
0.262 0.295 0.216 0.287 0.277 0.223 0.512 0.237 0.348 0.342
0.202 0.227 0.191 0.209 0.299 0.151 0.471 0.151 0.336 0.307
5.11. The conclusion in Problem 5.10 was that the median difference (BeforeAfter) was positive for the affected age group, but this does not imply that the reduction was the result of laws that raised the minimum legal drinking age. Other factors,
226
CHAPTER 5
counter measures, or advertising campaigns [like MADD (Mothers Against Drunk Drivers] may have affected the fatality ratios. In order to investigate further, these researchers compared the Before After ratios for the affected age group with the corresponding difference ratios for the 25–29 age group, who were not affected by the law change, as shown in Table 2. Carry out an appropriate test and write a report of your conclusions. Table 2 Data for Problem 5.11 State Florida Georgia Illinois Iowa Maine Michigan Montana Nebraska New Hampshire Tennessee
Affected age group
2529 age group
0.060 0.068 0.025 0.078 0.022 0.072 0.041 0.086 0.012 0.035
0.025 0.023 0.004 0.008 0.061 0.015 0.035 0.016 0.061 0.051
5.12. Howard, Murphy, and Thomas (1986) reported a study designed to investigate whether computer anxiety changes between the beginning and end of a course on introduction to computers. The student subjects were given a test to measure computer anxiety at the beginning of the term and then again at the end of the 5week summer course. High scores on this test indicate a high level of anxiety. For the data in Table 3 on 14 students, determine whether computer anxiety was reduced over the term. Table 3 Data for Problem 5.12 Student A B C D E F G
Before
After
Student
Before
After
20 21 23 26 32 27 38
20 18 10 16 11 20 20
H I J K L M N
34 28 20 29 22 30 25
19 13 21 12 15 14 17
5.13. Twentyfour students took both the midterm and the ﬁnal exam in a writing course. Numerical grades were not given on the ﬁnal, but each student was classiﬁed as either no change, improvement, or reduced level of performance compared with the midterm. Six showed improvement, 5 showed no change, and 13 had a reduced level of performance. Find the P value for an appropriate onesided test. 5.14. Reducing high blood pressure by diet requires reduction of sodium intake, which usually requires switching from processed foods to their natural counterparts.
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
227
Listed below are the average sodium contents of ﬁve ordinary foods in processed form and natural form for equivalent quantities. Find a conﬁdence interval estimate of the median difference (processed minus natural) with conﬁdence coefﬁcient at least 0.87 using two different procedures. Natural food Corn of the cob Chicken Ground Sirloin Beans Fresh tuna
Processed food 2 63 60 3 40
Canned corn Fried chicken Allbeef frankfurter Canned beans Canned tuna
251 1220 461 300 409
5.15. For the data in Problem 4.20, use both the sign test and the signedrank test to investigate the research hypothesis that median earnings exceed 2.0. 5.16. In an experiment to measure the effect of mild intoxication on coordination, nine subjects were each given ethyl alcohol in an amount equivalent to 15.7 ml=m2 of body surface and then asked to write a certain phrase as many times as they could in 1 min. The number of correctly written words was then counted and scaled such that a zero score represents the score a person not under the inﬂuence of alcohol would make, a positive score indicates increased writing speed and accuracy, and a negative score indicates decreased writing speed and accuracy. For the data below, ﬁnd a conﬁdence interval estimate of the median score at level nearest 0.95 using the procedure corresponding to the (a) Sign test (b) Wilcoxon signedrank test where we assume symmetry
Subject 1 2 3 4 5
Score
Subject
Score
10 8 6 2 15
6 7 8 9
0 7 5 8
5.17. For the data in Example 4.3, test H0 : M ¼ 0:50 against the alternative H1 : M > 0:50, using the (a) Sign test (b) Signedrank test and assuming symmetry 5.18. For the data in Example 7.1, ﬁnd a conﬁdence interval estimate of the median difference Before minus After using the level nearest 0.90. 5.19. In a trial of two types of rain gauge, 69 of type A and 12 of type B were distributed at random over a small area. In a certain period 14 storms occurred, and the average amounts of rain recorded for each storm by the two types of gauge are as follows:
228
CHAPTER 5
Another user claims to have found that the type B gauge gives consistently higher average readings than type A. Do these results substantiate such a conclusion? Investigate using two different nonparametric test procedures, by ﬁnding the P value from Storm 1 2 3 4 5 6 7
Type A
Type B
Storm
Type A
Type B
1.38 9.69 0.39 1.42 0.54 5.94 0.59
1.42 10.37 0.39 1.46 0.55 6.15 0.61
8 9 10 11 12 13 14
2.63 2.44 0.56 0.69 0.71 0.95 0.55
2.69 2.68 0.53 0.72 0.72 0.90 0.52
(a) Tables of the exact distribution (b) Large sample approximations to the exact distributions (A total of four tests are to be performed.) Discuss brieﬂy the advisability of using nonparametric versus parametric procedures for such an investigation and the relative merits of the two nonparametric tests used. Discuss assumptions in each case. 5.20. A manufacturer of suntan lotion is testing a new formula to see whether it provides more protection against sunburn than the old formula. The manufacturer chose 10 persons at random from among the company’s employees, applied the two types of lotion to their backs, one type on each side, and exposed their backs to a controlled but intense amount of sun. Degree of sunburn was measured for each side of each subject, with the results shown below (higher numbers represent more severe sunburn). (a) Test the null hypothesis that the difference (old – new) of degree of sunburn has median zero against the onesided alternative that it is negative, assuming that the differences are symmetric. Does the new formula appear to be effective? (b) Find a conﬁdence interval for the median difference, assuming symmetry and with conﬁdence coefﬁcient near 0.90. (c) Do (a) and (b) without assuming symmetry.
Subject 1 2 3 4 5 6 7 8 9 10
Old formula
New formula
41 42 48 38 38 45 21 28 29 14
37 39 31 39 34 47 19 30 25 8
5.21. Last year the elapsed time of longdistance telephone calls for a national retailer was skewed to the right with a median of 3 min 15 sec. The recession has reduced sales,
ONESAMPLE AND PAIREDSAMPLE PROCEDURES
229
but the company’s treasurer claims that the median length of longdistance calls now is even greater than last year. A random sample of 5625 calls is selected from recent records and 2890 of them are found to last more than 3 min 15 sec. Is the treasurer’s claim supported? Give the null and alternative hypotheses and the P value. 5.22. In order to test the effectiveness of a sales training program proposed by a ﬁrm of training specialists, a home furnishings company selects six sales representatives
Representative 1 2 3 4 5 6
Sales before
Sales after
90 83 105 97 110 78
97 80 110 93 123 84
at random to take the course. The data below are gross sales by these representatives before and after the course. (a) State the null and alternative hypotheses and use the sign test to ﬁnd a P value relevant to the question of whether the course is effective. (b) Use the signtest procedure at level nearest 0.90 to ﬁnd a twosided conﬁdenceinterval estimate of the median difference in sales (after – before). Give the exact level. (c) Use the signedrank test to do (a). What assumptions must you make? (d) Use the signedrank test procedure to do (b). 5.23. In a marketing research test, 15 adult males were asked to shave one side of their face with a brand A razor blade and the other side with a brand B razor blade and state their preferred blade. Twelve men preferred brand A. Find the P value for the alternative that the probability of preferring brand A is greater than 0.5. 5.24. Let X be a continuous random variable symmetrically distributed about y. Show that the random variables jXj and Z are independent, where Z¼
1 0
if X > y if X 4 y
5.25. Using the result in Problem 5.24, show that for the Wilcoxon signedrank test statistic Tþ discussed in Section 5.7, the 2N random variables Z1 ; rðjD1 jÞ; Z2 ; rðjD2 jÞ; . . . ; ZN rðjDN jÞ are mutually independent under H0. 5.26. Again consider the Wilcoxon signedrank test discussed in Section 5.7. Show P that under H0 the distribution of the test statistic T þ is the same as that of W ¼ N i¼1 Wi , where W1 ; W2 ; . . . ; WN are independent random variables with PðWi ¼ 0Þ ¼ PðWi ¼ iÞ ¼ 0:5, i ¼ 1; 2; . . . ; N. 5.27. A study 5 years ago reported that the median amount of sleep by American adults is 7.5 hours out of 24 with a standard deviation of 1.5 hours and that 5% of the population sleep 6 or less hours while another 5% sleep 9 or more hours. A current sample of eight adults reported their average amounts of sleep per 24 hours as 7.2, 8.3, 5.6, 7.4, 7.8, 5.2, 9.1, and 5.8 hours. Use the most appropriate statistical procedures to determine
230
CHAPTER 5
whether American adults sleep less today than they did ﬁve years ago and justify your choice. You should at least test hypothesis concerning the quantiles of order 0.05, 0.50, and 0.95. 5.28. Find a conﬁdence interval estimate of the median amount of sleep per 24 hours for the data in Problem 5.27 using conﬁdence coefﬁcient nearest 0.90. 5.29. Let XðrÞ denote the rthorder statistic of a random sample of size 5 from any continuous population and kp denote the pth quantile of this population. Find: (a) PðXð1Þ < k0:5 < Xð5Þ Þ (b) PðXð1Þ < k0:25 < Xð3Þ Þ (c) PðXð4Þ < k0:80 < Xð5Þ Þ 5.30. For order statistics of a random sample of size n from any continuous population FX , show that the interval ðXðrÞ ; Xðnrþ1Þ ; r < n=2Þ, is a 100ð1 aÞ percent conﬁdenceinterval estimate for the median of FX , where
Z 0:5 n1 xnr ð1 xÞr1 dx 1 a ¼ 1 2n r1 0 5.31. If X(1) and X(n) are the smallest and largest values, respectively, in a sample of size n from any continuous population FX with median k0.50, ﬁnd the smallest value of n such that: ðaÞ PðXð1Þ < k0:50 < XðnÞ Þ 5 0:99 ðbÞ P½FX ðXðnÞ Þ FX ðXð1Þ Þ 5 0:5 5 0:95 5.32. Derive the sample size formula based on the normal approximation for the sign test against a twosided alternative with approximate size a and power 1b. 5.33. Derive the sample size formula based on the normal approximation for the signed rank test against a twosided alternative with approximate size a and power 1b.
6 The General TwoSample Problem
6.1 INTRODUCTION
For the matchedpairs sign and signedrank tests of Chapter 5 the data consisted of two samples, but each element in one sample was linked with a particular element of the other sample by some unit of association. This sampling situation can be described as a case of two dependent samples or alternatively as a single sample of pairs from a bivariate population. When the inferences to be drawn are related only to the population of differences of the paired observations, the ﬁrst step in the analysis usually is to take the differences of the paired observations; this leaves only a single set of observations. Therefore, this type of data may be legitimately classiﬁed as a onesample problem. In this chapter we shall be concerned with data consisting of two mutually independent random samples, i.e., random samples drawn independently from each of two populations. Not only are the elements
231
232
CHAPTER 6
within each sample independent, but also every element in the ﬁrst sample is independent of every element in the second sample. The universe consists of two populations, which we call the X and Y populations, with cumulative distribution functions denoted by FX and FY , respectively. We have a random sample of size m drawn from the X population and another random sample of size n drawn independently from the Y population, X1 ; X2 ; . . . ; Xm
and
Y1 ; Y2 ; . . . ; Yn
Usually the hypothesis of interest in the twosample problem is that the two samples are drawn from identical populations, i.e., H0 : FY ðxÞ ¼ FX ðxÞ
for all x
If we are willing to make parametric model assumptions concerning the forms of the underlying populations and assume that the differences between the two populations occur only with respect to some parameters, such as the means or the variances, it is often possible to derive the socalled best test in a NeymanPearson framework. For example, if we assume that the populations are normally distributed, it is well known that the twosample Student’s t test for equality of means and the F test for equality of variances are respectively the best tests. The performances of these two tests are also well known. However, these and other classical tests may be sensitive to violations of the fundamental model assumptions inherent in the derivation and construction of these tests. Any conclusions reached using such tests are only as valid as the underlying assumptions made. If there is reason to suspect a violation of any of these postulates, or if sufﬁcient information to judge their validity is not available, or if a completely general test of equality for unspeciﬁed distributions is desired, some nonparametric procedure is in order. In practice, other assumptions are often made about the form of the underlying populations. One common assumption is called the location model, or the shift model. This model assumes that the X and Y populations are the same in all other respects except possibly for a shift in the (unknown) amount of say y, or that FY ðxÞ ¼ PðY 4 xÞ ¼ PðX 4 x yÞ ¼ FX ðx yÞ
for all x and y 6¼ 0
This means that X þ y and Y have the same distribution or that X is distributed as Y y. The Y population is then the same as the X population if y ¼ 0, is shifted to the right if y > 0, and is shifted to the left if y < 0. Under the shift assumption, the populations have the
$
=
~ 
# $
}
$
# # @
}
# #
#
#
=
;
& = / } / '
/ / ~
# } & 
}
~
234
CHAPTER 6
In practice, the sample pattern of arrangement of X ’s and Y ’s provides information about the type of difference which may exist in the populations. For instance, if the observed arrangement is that designated by either 1 or 10 in the above example, the X ’s and the Y ’s do not appear to be randomly mixed, suggesting a contradiction to the null hypothesis. Many statistical tests are based on some function of this combined arrangement. The type of function which is most appropriate depends on the type of difference one hopes to detect, which is indicated by the alternative hypothesis. An abundance of reasonable alternatives to H0 may be considered, but the type easiest to analyze using distributionfree techniques states some functional relationship between the distributions. The most general twosided alternative states simply HA : FY ðxÞ 6¼ FX ðxÞ
for some x
and a corresponding general onesided alternative is H1 : FY ðxÞ 5 FX ðxÞ FY ðxÞ > FX ðxÞ
for all x for some x
In this latter case, we generally say that the random variable X is stochastically larger than the random variable Y. We can write this as Y ST > X. Figures 1.1. and 1.2 are descriptive of the alternative that X is stochastically larger than Y, which includes as a subclass the more speciﬁc alternative mX > mY . Some authors deﬁne Y ST > X to mean that PðX > YÞ > PðX < YÞ. (For the reverse inequality on FX and FY , we say X is stochastically smaller than Y and write X ST > Y). If the particular alternative of interest is simply a difference in location, we use the location alternative or the location model HL : FY ðxÞ ¼ FX ðx yÞ
for all x and some y 6¼ 0
Fig. 1.1 X is stochastically larger than Y.
THE GENERAL TWOSAMPLE PROBLEM
235
Fig. 1.2 X is stochastically larger than Y.
Under the location model, Y is distributed as X þ y, so that Y is stochastically larger (smaller) than X if and only if y > 0 ðy < 0Þ. Similarly, if only a difference in scale is of interest, we use the scale alternative HS : FY ðxÞ ¼ FX ðyxÞ
for all x and some y 6¼ 1
Under the scale model, Y is distributed as X=y, so that Y is stochastically larger (smaller) than X if and only if y < 1 ðy > 1Þ. Although the three special alternatives H1 ; HL , and HS are the most frequently encountered of all those included in the general class HA , other types of relations may be considered. For example, the alternative HLE : FY ðxÞ ¼ ½FX ðxÞk , for some positive integer k and all x, called the Lehmann alternative, states that the Y random variables are distributed as the largest of k X variables. Under this alternative, Y is stochastically larger (smaller) than X if and only if k > 1 ðk < 1Þ. The available statistical literature on the twosample problem is quite extensive. A multitude of tests have been proposed for a wide variety of functional alternatives, but only a few of the bestknown tests have been selected for inclusion in this book. The WaldWolfowitz runs test, the KolmogorovSmirnov twosample test, the median test, the control median test, and the MannWhitney U test will be covered in this chapter. Chapters 7 and 8 are concerned with a speciﬁc class of tests particularly useful for the location and scale alternatives, respectively. 6.2 THE WALDWOLFOWITZ RUNS TEST
Let the two sets of independent random variables X1 ; X2 ; . . . ; Xm and Y1 ; Y2 ; . . . ; Yn be combined into a single ordered sequence from smallest to largest, keeping track of which observations correspond to the X sample and which to the Y. Assuming that their probability distributions are continuous, a unique ordering is always possible, since
236
CHAPTER 6
theoretically ties do not exist. For example, with m ¼ 4 and n ¼ 5, the arrangement might be X Y Y X X Y X Y Y which indicates that in the pooled sample the smallest element was an X, the second smallest a Y, etc., and largest a Y. Under the null hypothesis of identical distributions H0 : FY ðxÞ ¼ FX ðxÞ
for all x
we expect the X and Y random variables to be well mixed in the ordered conﬁguration, since the m þ n ¼ N random variables constitute a single random sample of size N from the common population. With a run deﬁned as in Chapter 3 as a sequence of identical letters preceded and followed by a different letter or no letter, the total number of runs in the ordered pooled sample is indicative of the degree of mixing. In our arrangement X Y Y X X Y X Y Y, the total number of runs is equal to 6 which shows a pretty good mixing of X ’s and Y ’s. A pattern of arrangement with too few runs would suggest that this group of N is not a single random sample but instead is composed of two samples from two distinguishable populations. For example, if the arrangement is X X X X Y Y Y Y Y so that all the elements in the X sample are smaller than all of the elements in the Y sample, there would be only two runs. This particular conﬁguration might indicate not only that the populations are not identical, but also that the X ’s are stochastically smaller than the Y ’s. However, the reverse ordering also contains only two runs, and therefore a test criterion based solely on the total number of runs cannot distinguish these two cases. The runs test is appropriate primarily when the alternative is completely general and twosided, as in HA : FY ðxÞ 6¼ FX ðxÞ
for some x
We deﬁne the random variable R as the total number of runs in the combined ordered arrangement of m X and n Y random variables. Since too few runs tend to discredit the null hypothesis when the alternative is HA , the WaldWolfowitz (1940) runs test for signiﬁcance level a generally has the rejection region in the lower tail as R 4 ca where ca is chosen to be the largest integer satisfying PðR 4 ca j H0 Þ 4 a
THE GENERAL TWOSAMPLE PROBLEM
237
The P value for the runs test is then given by PðR 4 RO j H0 Þ where RO is the observed value of the runs test statistic R. Since the X and Y observations are two types of objects arranged in a completely random sequence if H0 is true, the null probability distribution of R is exactly the same as was found in Chapter 3, for the runs test for randomness. The distribution is given in Theorem 2.2 of Section 3.2 with n1 and n2 replaced by m and n, respectively, assuming the X ’s are called type 1 objects and Y ’s are the type 2 objects. The other properties of R discussed in that section, including the moments and asymptotic null distribution, are also unchanged. The only difference here is that the appropriate critical region for the alternative of different populations is too few runs. The null distribution of R is given in Table D of the Appendix with n1 ¼ m and n2 ¼ n for m 4 n. The normal approximation described in Section 3.2 is used for larger sample sizes. A numerical example of this test is given below. Example 2.1 It is easy to show that the distribution of a standardized chisquare variable with large degrees of freedom can be approximated by the standard normal distribution. This example provides an investigation of the agreement between these two distributions for moderate degrees of freedom. Two mutually independent random samples, each of size 8, were generated, one from the standard normal distribution and one from the chisquare distribution with n ¼ 18 degrees of freedom. The resulting data are as follows: Normal Chi square
1.91
1.22
0.96
0.72
0.14
0.82
1.45
1.86
4.90
7.25
8.04
14.10
18.30
21.21
23.10
28.12
Solution Before testing the null hypothesis of equal distributions, the chisquare sample data must be standardized by p subtracting ﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃ the mean n ¼ 18 and dividing by the standard deviation 2n ¼ 36 ¼ 6. The transformed chisquare data are, respectively, 2:18
1:79
1:66
0:65
0:05
0:54
0:85
1:69
We pool the normal data and these transformed data into a single array, ordering them from smallest to largest, underlining the transformed chisquare data, as 2:18; 1:91; 1:79; 1:66; 1:22; 0:96; 0:72; 0:65; 0:05; 0:14; 0:54; 0:82; 0:85; 1:45; 1:69; 1:86
Z
#
%#% + } \ # + } = ~ ' } [ } ?  ~
0  ~ [ $ $
!= }
@ = ~
# \ + }  +  ~
@ } [ }
* "# !
"
= &
#
+
$
& *~
& &
  & +  !!!
%#% = % % ~ & $ # %  [ =
, * w '&z!& ]z!" !
#[ # #
# # # #
 }
 }
! ! {  ~   }  ! 
%
~ ! 
 
 } 
! !
=
*~
#
*~ "
*~
#
/ /
& @ #
# K ! " =
=! !
[ # # *:
*~
THE GENERAL TWOSAMPLE PROBLEM
241
Because of the GilvenkoCantelli theorem (Theorem 3.2 of Section 2.3), the test is consistent for this alternative. The P value is PðDm;n 5 DO j H0 Þ where DO is the observed value of the twosample KS test statistic. As with the onesample KolmogorovSmirnov statistic, Dm;n is completely distribution free for any continuous common population distribution since order is preserved under a monotone transformation. That is, if we let z ¼ FðxÞ for the common continuous cdf F, we have Sm ðzÞ ¼ Sm ðxÞ and Sn ðzÞ ¼ Sn ðxÞ, where the random variable Z, corresponding to z, has the uniform distribution on the unit interval. In order to implement the test, the exact cumulative null distribution of mnDm;n is given in Table I in the Appendix for 2 4 m 4 n 4 12 or m þ n 4 16, whichever occurs ﬁrst. Selected quantiles of mnDm;n are also given for m ¼ n between 9 and 20, along with the large sample approximation. The derivation of the exact null probability distribution of Dm;n is usually attributed to the Russian School, particularly Gnedenko (1954) and Korolyuk (1961), but the papers by Massey (1951b, 1952) are also important. Several methods of calculation are possible, generally involving recursive formulas. Drion (1952) derived a closed expression for exact probabilities in the case m ¼ n by applying randomwalk techniques. Several approaches are summarized in Hodges (1958). One of these methods, which is particularly useful for small sample sizes, will be presented here as an aid to understanding. To compute PðDm;n 5 d j H0 Þ, where d is the observed value of maxx jSm ðxÞ Sn ðxÞj, we ﬁrst arrange the combined sample of m þ n observations in increasing order of magnitude. The arrangement can be depicted graphically on a Cartesian coordinate system by a path which starts at the origin and moves one step to the right for an x observation and one step up for a y observation, ending at (m,n). For example, the sample arrangement xyyxxyy is represented in Figure 3.1. The observed values of mSm ðxÞ and nSn ðxÞ are, respectively, the coordinates of all points (u,v) on the path where u and v are integers. The number d is the largest of the differences ju=m v=nj ¼ jnu mvj=mn. If a line is drawn connecting the points (0,0) and (m,n) on this graph, the equation of the line is nx my ¼ 0 and the vertical distance from any point (u,v) on the path to this line is jv nu=mj. Therefore, nd for the observed sample is the distance from the diagonal line. In Figure 3.1 the farthest point is labeled Q, and the value of d is 2=4.
242
CHAPTER 6
Fig. 3.1 Path of xyyxxyy.
The total number of arrangements of m X and n Y random
mþn , and under H0 each of the corresponding paths is variables is m equally likely. The probability of an observed value of Dm;n not less than d then is the number of paths which have points
at a distance mþn from the diagonal not less than nd, divided by . m In order to count this number, we draw another ﬁgure of the same dimension as before and mark off two lines at vertical distance nd from the diagonal, as in Figure 3.2. Denote by A(m,n) the number of paths from (0,0) to (m,n) which lie entirely within (not on) these boundary lines. Then the desired probability is Aðm; nÞ
PðDm;n 5 d j H0 Þ ¼ 1 PðDm;n < d j H0 Þ ¼ 1 mþn m A(m,n) can easily be counted in the manner indicated in Figure 3.2. The number A(u,v) at any intersection (u,v) clearly statisﬁes the recursion relation Aðu;vÞ ¼ Aðu 1; vÞ þ Aðu; v 1Þ with boundary conditions Að0;vÞ ¼ Aðu;0Þ ¼ 1 Thus A(u,v) is the sum of the numbers at the intersections where the previous point on the path could have been while still within the
_ ! :
} # } [ : } } } ~
 ~  [  %
8 8  }
} }  } 
@ = ; ; [ & Mean, whereas if S 4 Mean, the onesided P value is calculated as P1 ¼ PðTest Statistic 4 S j H0 Þ. The mean of the median test statistic under H0 is mt/N which equals 4ð4Þ=9 ¼ 1:78 for our example. Thus, SAS calculates the exact P value in the upper tail as PðS 5 2 j H0 Þ ¼ 81=126 ¼ 0:6429. This equals 1 PðU 4 1 j H0 Þ and thus does not agree with our hand calculations. However, on the basis of S we reach the same conclusion of not rejecting H0 , made earlier on the basis of U. For the normal approximation to the P value, PROC NPAR1WAY calculates the Z statistic by the formula pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ Z ¼ ðS mt=NÞ= mntðN tÞ=N 2 ðN 1Þ and incorporates a continuity correction unless one speciﬁes otherwise. As with the exact P value, the SAS P value under the normal approximation also does not agree with our hand calculation based on U.
9
z& "
$
# # [ = \ & #  
~ &
 }
 }
; # # @ #
#  @  # ~ @ @
~
258
CHAPTER 6
where t ¼ N=2 or ðN 1Þ=2 according as N is even or odd. Since H0 is accepted for all U values in the interval c þ 1 4 U 4 c0 1, and this acceptance region has probability 1 a under H0, a 100ð1 aÞ% conﬁdenceinterval estimate for y consists of all values of y for which the derived sample observations yield values of U which lie in the acceptance region. This process of obtaining a conﬁdence interval for a parameter from the acceptance region (of a test of hypothesis) is called inverting the acceptance region (of the test), and the conﬁdence interval thus obtained is referred to as a testbased conﬁdence interval. To explicitly ﬁnd the conﬁdence interval, that is, the range of y corresponding to the acceptance region c þ 1 4 U 4 c0 1, we ﬁrst order the two derived samples separately from smallest to largest as Xð1Þ ; Xð2Þ ; . . . ; XðmÞ
and
Yð1Þ y; Yð2Þ y; . . . ; YðnÞ y
The t smallest observations of the N ¼ m þ n total number are made up of exactly i X and t i Y variables if each observation of the set Xð1Þ ; . . . ; XðiÞ ; Yð1Þ y; . . . ; YðtiÞ y is less than each observation of the set Xðiþ1Þ ; . . . ; XðmÞ ; Yðtiþ1Þ y; . . . ; YðnÞ y The value of i is at least c þ 1 if and only if for i ¼ c þ 1, the largest X in the ﬁrst set is less than the smallest Y in the second set, that is, Xðcþ1Þ < YðtcÞ y. Arguing similarly, Xðc0 Þ > Yðtc0 þ1Þ y can be seen to be a necessary and sufﬁcient condition for having at most c0 1 X observations among the t smallest of the total N (in this case the largest Y in the ﬁrst set must be smaller than the smallest X in the second set). Therefore, the acceptance region for the median test corresponding to the null hypothesis of no difference between the two distributions (with respect to location) at signiﬁcance level a can be equivalently written as Xðcþ1Þ < YðtcÞ y
and
Xðc0 Þ > Yðtc0 þ1Þ y
YðtcÞ Xðcþ1Þ > y
and
Yðtc0 þ1Þ Xðc0 Þ < y
or as
The desired conﬁdence interval ðYðtc0 þ1Þ Xðc0 Þ ; YðtcÞ Xðcþ1Þ Þ follows from the last two inequalities. Now, using (4.8),
THE GENERAL TWOSAMPLE PROBLEM
259
1 a ¼ Pðc þ 1 4 U 4 c0 1 j H0 Þ ¼ Pðc þ 1 4 U 4 c0 1 j y ¼ 0Þ ¼ PðYðtc0 þ1Þ Xðc0 Þ < y < YðtcÞ Xðcþ1Þ jy ¼ 0Þ Since the last equality is also true for all values of y, we can make the statement PðYðtc0 þ1Þ Xðc0 Þ < y < YðtcÞ Xðcþ1Þ Þ ¼ 1 a where c and c0 are found from (4.8). Thus the endpoints of the conﬁdence interval estimate for y corresponding to Mood’s median test are found simply from some order statistics of the respective random samples. We calculate the 95% conﬁdence interval for the median difference for the data in Example 4.1. In order to ﬁnd the constants c and c0 , we need to calculate the null distribution of U, using (4.3) for m ¼ 4; n ¼ 5; t ¼ 4. The results are shown in Table 4.1. If we take c ¼ 0 and c0 ¼ 4, then (4.8) equals 0.04762 so that the conﬁdence interval for y ¼ MY MX is ðYð1Þ Xð4Þ ; Yð4Þ Xð1Þ Þ with exact level 0.95238. Numerically, the intervals is ð9;4Þ. Also, the 95.238% conﬁdence interval for y ¼ MX MY is ð4;9Þ. Note that the MINITAB output given before states ‘‘A 95.0% CI for median(1) median(2): (4:26; 8:26Þ.’’ This is based on the median test but c and c0 are calculated using the normal approximation. The results are quite close. Example 4.2
It may be noted that the median test is a member of a more general class of nonparametric twosample tests, called precedence tests. Chakraborti and van der Laan (1996) provided an overview of the literature on these tests. A precedence test is based on a statistic Wr which denotes the number of Y observations that precede the rthorder statistic from the X sample XðrÞ (alternatively, one can use the number of X ’s that precede, say, YðsÞ ). It can be seen, for example, that Wr < w if and only if XðrÞ < YðwÞ so that a precedence test based on Wr can be interpreted in terms of two order statistics, one from each sample. The test is implemented by ﬁrst choosing r, and then determining w such that the size of the test is a. It can be shown that Table 4.1 Null distribution of U for m ¼ 4; n ¼ 5; t ¼ 4 u
0
1
2
3
4
PðU ¼ uÞ 0.039683 0.31746 0.47619 0.15873 0.007937 5=126 40=126 60=126 20=126 1=126
,%
{ = ^ }}
#
= ^ 
@   [
"] * !
& { ~ 
{
   $
$     ~
# { "
^
" 
{
~
~
" " " " 
{ *~
~
~ $
# # &
THE GENERAL TWOSAMPLE PROBLEM
261
½ðYðtc0 þ1Þ Xðc0 Þ Þ; ðYðtcÞ Xðcþ1Þ Þ Thus, the power function of the median test in the location case is the probability that this interval does not cover zero when y 6¼ 0, that is, PwðyÞ ¼ PðYðtc0 þ1Þ Xðc0 Þ > 0 or YðtcÞ Xðcþ1Þ < 0 when y 6¼ 0Þ These two events, call them A and B, are mutually exclusive as we now show. For any c0 > c, it is always true that Xðc0 Þ 5 Xðcþ1Þ and Yðtc0 þ1Þ ¼ Yðt½c0 1Þ 4 YðtcÞ . Thus if A occurs, that is, Yðtc0 þ1Þ > Xðc0 Þ , we must also have YðtcÞ 5 Yðtc0 þ1Þ > Xðc0 Þ 5 Xðcþ1Þ which makes YðtcÞ > Xðcþ1Þ , a contradiction in B. As a result, the power function can be expressed as the sum of two probabilities involving order statistics: PwðyÞ ¼ PðYðtc0 þ1Þ > Xðc0 Þ Þ þ PðYðtcÞ < Xðcþ1Þ Þ Since the random variables X and Y are independent, the joint distribution of, say, XðrÞ and YðsÞ is the product of their marginal distributions, which can be easily found using the methods of Chapter 2 for completely speciﬁed populations FX and FY or, equivalently, FX and y since FY ðxÞ ¼ FX ðx yÞ. In order to calculate the power function then, we need only evaluate two double integrals of the following type: Z 1Z 1 fYðsÞ ðyÞfXðrÞ ðxÞ dy dx PðYðsÞ < XðrÞ Þ ¼ 1
1
The power function for a onesided test is simply one integral of this type. For large sample sizes, since the marginal distribution of any order statistic approaches the normal distribution and the order statistics XðrÞ and YðsÞ are independent here, the distribution of their difference YðsÞ XðrÞ approaches the normal distribution with mean and variance EðYðsÞ Þ EðXðrÞ Þ
and
varðYðsÞ Þ þ varðXðrÞ Þ
Given the speciﬁed distribution function and the results in Chapter 2, we can approximate these quantities by
r s 1 1 EðYðsÞ Þ ¼ FX EðXðrÞ Þ ¼ FX mþ1 nþ1
2 rðm r þ 1Þ r 1 varðXðrÞ Þ ¼ F f X X mþ1 ðm þ 1Þ2 ðm þ 2Þ
2 sðn s þ 1Þ s 1 f varðYðsÞ Þ ¼ F Y X nþ1 ðn þ 1Þ2 ðn þ 2Þ
262
CHAPTER 6
and an approximation to the power function can be found using normal probability tables. It is clear that computing the exact or even the asymptotic power of the median test is computationally involved. An easier alternative approach might be to use computer simulations, as was outlined for the sign and the signed rank test in Chapter 5. We leave the details to the reader. The asymptotic efﬁciency of the median test relative to Student’s t test for normal populations is 2=p ¼ 0:637 (see Chapter 13). As a test for location, this is relatively poor performance. The MannWhitney test, discussed in Section 6.6, has greater efﬁciency for normal populations. 6.5 THE CONTROL MEDIAN TEST
The median test, based on the number of X observations that precede the median of the combined samples, is a special case of a precedence test. A simple yet interesting alternative test is a second precedence test, based on the number of X (or Y) observations that precede the median of the Y (or X) sample. This is known as the control median test and is generally attributed to Mathisen (1943). The properties and various reﬁnements of the test have been studied by Gart (1963), Gastwirth (1968), and Hettmansperger (1973), among others. Without any loss of generality, suppose the Y sample is the control sample. The control median test is based on V, the number of X observations that precede the median of the Y observations. For simplicity let n ¼ 2r þ 1, so that the ðr þ 1Þthorder statistic Yðrþ1Þ is the median of the Y sample. Now Yðrþ1Þ deﬁnes two nonoverlapping blocks ½1; Yðrþ1Þ and ðYðrþ1Þ ; 1Þ in the sample, and the control median test is based on V, the number of X observations in the ﬁrst block. It may be noted that V is equal to mSm ðYðrþ1Þ Þ ¼ Pðrþ1Þ , called the placement of Yðrþ1Þ , the median of the Y sample, among the X observations. As with the median test, the control median test can be used to test the null hypothesis H0 : FY ðxÞ ¼ FX ðxÞ for all x against the general onesided alternative that for example, the Y ’s are stochastically larger than the X ’s. In this case, the number of X ’s preceding Yðrþ1Þ should be large and thus large values of V provide evidence against the null hypothesis in favor of the alternative. If we assume the shift model FY ðxÞ ¼ FX ðx yÞ then the problem reduces to testing the null hypothesis H0 : y ¼ 0 against the onesided alternative hypothesis H1 : y > 0 and the appropriate rejection region consists of large values of V.
,
# # / /
* ~ # [ ^ }} }   {
$ $ $ $
 $ *~ }  $ ~  = ? ^ }}
  }
 
} } } }  } }
~ '
@ ' ~ *~
~ ~ > &
#
Y# % / / /
$ / / = { @
 }
 }
; # %
$
$
$   } $  }
 $ ~ $ Y#% @
@
}
$
 $
# / / *
@ @
,$
*~ #
*~ ~
= *
[ * ~ ~
*~ ~ * ~
 ]
$ }$
$ 
#
$ ~
$
$  } $
$ $ } }
 $ $
}
} $ $ $
 }
270
CHAPTER 6
Since U is deﬁned in (6.2) as a linear combination of these mn random variables, the mean and variance of U are EðUÞ ¼
m X n X
EðDij Þ ¼ mnp
ð6:9Þ
i¼1 j¼1
varðUÞ ¼
m X n X
varðDij Þ þ
covðDij ; Dik Þ
i¼1 1 4 j6¼k 4 n
i¼1 j¼1
þ
m XX X
n X XX
covðDij ; Dhj Þ
j¼1 1 4 i6¼h 4 m
þ
XX
XX
covðDij ; Dhk Þ
ð6:10Þ
1 4 i6¼h 4 m 1 4 j6¼k 4 n
Now substituting (6.5) and (6.6) in (6.10), this variance is varðUÞ ¼ mnpð1 pÞ þ mnðn 1Þð p1 p2 Þ þ nmðm 1Þð p2 p2 Þ ¼ mn½ p p2 ðN 1Þ þ ðn 1Þp1 þ ðm 1Þp2
ð6:11Þ
Since EðU=mnÞ ¼ p and varðU=mnÞ ! 0 as m; n ! 1; U=mn is a consistent estimator of p. Based on the method described in Chapter 1, the MannWhitney test can be shown to be consistent in the following cases: Alternative
Rejection region
p < 0:5
FY ðxÞ 4 FX ðxÞ
p > 0:5
FY ðxÞ 5 FX ðxÞ
U mn=2 > k2
p 6¼ 0:5
FY ðxÞ 6¼ FX ðxÞ
U mn=2 > k3
U mn=2 < k1
ð6:12Þ
In order to determine the size a critical regions of the MannWhitney test, we must now ﬁnd the null probability distribution of U. þ n arrangements of the random variables Under H0, each of the m m into a combined sequence occurs with equal probability, so that rm;n ðuÞ
fU ðuÞ ¼ PðU ¼ uÞ ¼ mþn m
ð6:13Þ
where rm;n ðuÞ is the number of distinguishable arrangements of the m X and n Y random variables such that in each sequence the number
9
= @ =
% @
}
0 0
0 & 0 0 # ! 0 0
Y#%
0 0 *~ @ @ } } & ' @ @ } } ]
# # % @
 @
 $  $
} { :
+$
~
@
~
@
~
@ } @ }
~ & = / /

# ~~ ~~ @ ~ @ }
9
^ ,
' ? [ ?z } ? ?
~  } }
@ ~ }
~~~ @  }}
~~
@ } }
~~} @ }
~~
!
[
&
$ Y#% #    = ]
 /
= '
 =
   
 
 
@


 

THE GENERAL TWOSAMPLE PROBLEM
273
This recursive relation holds for all u ¼ 0; 1; 2; . . . ; mn and all integervalued m and n if the following initial and boundary conditions are deﬁned for all i ¼ 1; 2; . . . ; m and j ¼ 1; 2; . . . ; n. rij ðuÞ ¼ 0
for all u < 0
ri;0 ð0Þ ¼ 1 ri;0 ðuÞ ¼ 0
r0;i ð0Þ ¼ 1 for all u 6¼ 0
r0;i ðuÞ ¼ 0
for all u 6¼ 0
If the sample with fewer observations is always labeled the X sample, tables are needed only for m 4 n and lefttail critical points. Such tables are widely available, for example in Auble (1953) or Mann and Whitney (1947). When m and n are too large for the existing tables, the asymptotic probability distribution can be used. Since U is the sum of identically distributed (though dependent) random variables, a generalization of the centrallimit theorem allows us to conclude that the null distribution of the standardized U approaches the standard normal as m; n ! 1 in such a way that m=n remains constant (Mann and Whitney, 1947). To make use of this approximation, the mean and variance of U under the null hypothesis must be determined. When FY ðxÞ ¼ FX ðxÞ, the integrals in (6.7) and (6.8) are evaluated as p1 ¼ p2 ¼ 1=3. Substituting these results in (6.9) and (6.11) along with the value p ¼ 1=2 from (6.4) gives EðU j H0 Þ ¼
mn 2
varðU j H0 Þ ¼
mnðN þ 1Þ 12
ð6:15Þ
The largesample test statistic then is U mn=2 Z ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ mnðN þ 1Þ=12 whose distribution is approximately standard normal. This approximation has been found reasonably accurate for equal sample sizes as small as 6. Since U can assume only integer values, a continuity correction of 0.5 may be used. THE PROBLEM OF TIES
The deﬁnition of U in (6.2) was adopted for presentation here because most tables of critical values are designed for use in the way described above. Since Dij is not deﬁned for Xi ¼ Yj, this expression does not allow for the possibility of ties across samples. If ties occur within one
9
@ >
&
# &
@ ' %
& & $ Y#%
^ 
@/
$
 $
 $ ~ ~
$ $ $

> & $  ~ $ $  $
@/ @/ $ @/ [ @/ *~ ~
@/ #

#
;  @/ *~ 
 } ;; } 
9
= =
@ 
} z& "
=
% Y#% '
 }  } ] 
Y#% # @ $  } $  } $ $ # @ @ *~
$
*~
$
 } =
 #  $ = 
 [ @  
 = [ ~~ ] @  }
~~
@ }
~~
# } ~~  = ~~} } 
276
CHAPTER 6
Table 6.2 Conﬁdenceinterval calculations yj 1 1 3 8 9 11
yj 6
yj 7
4 2 3 4 6
5 3 2 3 5
Suppose that the sample data are X: 1; 6; 7; Y: 2; 4; 9; 10; 12: In order to ﬁnd Dð2Þ and Dð14Þ systematically, we ﬁrst order the x and y data separately, then subtract from each y, starting with the smallest y, the successive values of x as shown in Table 6.2, and order the differences. The interval here is 4 < y < 9 with an exact conﬁdence coefﬁcient of 0.928. The straightforward graphical approach could be used to simplify the procedure of constructing intervals here. Each of the m þ n sample observations is plotted on a graph, the X observations on the abscissa and Y on the ordinate. Then the mn pairings of observations can be easily indicated by dots at all possible intersections. The line y x ¼ y with slope 1 for any number y divides the pairings into two groups: those on the left and above have y x < y, and those on the right and below have y x > y. Thus if the rejection region for a size a test is U 4 k, two lines with slope 1 such that k dots lie on each side of the included band will determine the appropriate values of y. If the two lines are drawn through the ðk þ 1Þst dots from the upper left and lower right, respectively, the values on the vertical axis where these lines have x intercept zero determine the conﬁdenceinterval endpoints. In practice, it is often convenient to add or subtract an arbitrary constant from each observation before the pairs are plotted, the number chosen so that all observations are positive and the smallest is close to zero. This does not change the resulting interval for y, since the parameter y is invariant under a change in location in both the X and Y populations. This method is illustrated in Fig. 6.1 for the above example where k ¼ 1 for a ¼ 0:072. SAMPLE SIZE DETERMINATION
If we are in the process of designing an experiment and specify the size of the test as a and the power of the test as 1 b, we can determine the sample size required to detect a difference between the populations measured by p ¼ PðY > XÞ. Noether (1987) showed that an
THE GENERAL TWOSAMPLE PROBLEM
277
Fig. 6.1 Graphical determination of endpoints.
approximate sample size for a onesided test based on the MannWhitney statistic is N¼
ðza þ zb Þ2 12cð1 cÞðp 1=2Þ2
ð6:18Þ
where c ¼ m=N and za ; zb are the upper a and b quantiles, respectively, of the standard normal distribution. The corresponding formula for a twosided test is found by replacing a by a=2 in (6.18). Veriﬁcation of this is left to the reader. These formulas are based on a normal approximation to the power of the MannWhitney test ‘‘near’’ the null hypothesis and can be calculated easily. Note that if we take c ¼ 0:5, that is when m ¼ n, the
9Z
 [ % = # & $ = # Y# % ~~ ~~ ~~ [ & ~ & % 0~~ 
0~~ }
 ;

}}
~ } ~
  >  @ / # @ / =
!!!
Y#% @ #& % = &# [ } ] % = &# =
? #
= # # =
#
=
$ # %
Y#% [/ @
~
$!
9$
~~  Y#% #
= $! ~ = ~ # # Y Y#% % = &# # = # ?  } /  } #
} ; $  ; ?

$
 = @ @
  } =
# '
= ,9 !
#
%#% #[ # # [ #
280
CHAPTER 6
interval procedure for estimating the difference in the medians. But it is not very powerful compared to other nonparametric tests for location. Conover, Wehmanen, and Ramsey (1978) examined the power of eight nonparametric tests, including the median test, compared to the locally most powerful (LMP) linear rank test when the distribution is exponential for small sample sizes. Even though the median test is asymptotically equivalent to the LMP test, it performed rather poorly. Freidlin and Gastwirth (2000) suggest that the median test ‘‘be retired’ from routine use’’ because their simulated power comparisons showed that other tests for location are more powerful for most distributions. Even the KolmogorovSmirnov twosample test was mroe powerful for most of the cases they studied. Gibbons (1964) showed the poor performance of the median test with exact power calculations for small sample sizes. Further, the hand calculations for an exact median test based on the hypergeometric distribution are quite tedious even for small sample sizes. The median test continues to be of theoretical interest, however, because it is valid under such weak conditions and has interesting theoretical properties. The MannWhitney test is far preferable as a test of location for general use, as are the other rank tests for location to be covered later in Chapter 8. The MannWhitney test also has a corresponding procedure for conﬁdence interval estimation of the difference in population medians. And we can estimate the sample size needed to carry out a test at level a to detect a stated difference in locations with power 1 b. PROBLEMS 6.1. Use the graphical method of Hodges to ﬁnd PðDþ m;n 5 dÞ, where d is the observed value of Dþ m;n ¼ maxx ½Sm ðxÞ Sn ðxÞ in the arrangement xyyxyx. 6.2. For the mediantest statistic derive the complete null distribution of U for m ¼ 6; n ¼ 7, and set up one and twosided critical regions when a ¼ 0:01; 0:05, and 0.10. 6.3. Find the largesample approximation to the power function of a twosided median test for m ¼ 6; n ¼ 7; a ¼ 0:10, when FX is the standard normal distribution. 6.4. Use the recursion relation for the MannWhitney test statistic given in (6.14) to generate the complete null probability distribution of U for all m þ n 4 4. 6.5. Verify the expressions given in (6.15) for the moments of U under H0. 6.6. Answer parts ðaÞ to ðcÞ using ðiÞ the mediantest procedure and (ii) the MannWhitney test procedure (use tables) for the following two independent random samples drawn from continuous populations which have the same form but possibly a difference of y in their locations:
Z


}


~
~}
}
~
~
' ~~ *~ ~
* ~
< = < = #
= ,9 #
% Y#% ,Z & ! ! =
# ,$ }~~~ $ ~ }~~~
}~~~ }~~~
{
}
~~}~
}
@
} }  }~ } ~
' * { @ ,% [ ^ }} =
  $ $ *~  $ $ $ ~  $ $ # > }  =
} }  $ $ ~  }  } $ ^
Z
, ^ }}
}
} } }
* { ' , [ ~  ^
~
$ $  $ 
$ ~ 
[ 
,
=
= [  [ }  #  
} }
N=2 .
Theorem 3.6
The appropriate conjugate Z0 has components Z0i ¼ ZiþN=2 for i 4 N=2 and Z0i ¼ ZiN=2 for i > N=2. Then Proof
TN ðZÞ þ TN ðZ0 Þ ¼
N=2 X i¼1
þ
N X
iZi þ
N=2 X
N=2 X i¼1
N X
iZN=2þi þ
i¼1
¼
ðN i þ 1ÞZi
i¼N=2þ1
ðN i þ 1ÞZiN=2
i¼N=2þ1 N X
iZi þ
ðN i þ 1ÞZi
i¼N=2þ1
N=2 X N N j þ 1 Zj Zj þ 2 2 j¼1 j¼N=2þ1
N=2 N X X N N þ 1 Zi þ þ 1 Zi ¼ 2 2 i¼1 i¼N=2þ1
N þ 1 ¼ 2m ¼m 2 þ
N X
j
In determining the frequency tðkÞ for any value k which is assumed by a linearranktest statistic, the number of calculations
290
CHAPTER 7
required may be reduced considerably by the following properties of TN ðZÞ, which are easily veriﬁed. Theorem 3.7
Property 1: Let T¼
N X
ai Zi
and
T0 ¼
i¼1
N X
ai ZNiþ1
i¼1
Then T ¼ T0
if ai ¼ aNiþ1
for i ¼ 1; 2; . . . ; N
Property 2: Let T¼
N X
ai Zi
and
T0 ¼
i¼1
N X
ai ð1 Zi Þ
i¼1
Then T þ T0 ¼
N X
ai
i¼1
Property 3: Let T¼
N X
ai Zi
and
i¼1
T0 ¼
N X
ai ð1 ZNiþ1 Þ
i¼1
Then T þ T0 ¼
N X
ai
if ai ¼ aNiþ1 for i ¼ 1; 2; . . . ; N
i¼1
For large samples, that is, m ! 1 and n ! 1 in such a way that m=n remains constant, an approximation exists which is applicable to the distribution of almost all linear rank statistics. Since TN is a linear combination of the Zi , which are identically distributed (though dependent) random variables, a generalization of the centrallimit theorem allows us to conclude that the probability distribution of a standardized linear rank statistic TN EðTN Þ=sðTN Þ approaches the standard normal probability distribution subject to certain regularity conditions. The foregoing properties of a linear rank statistic hold only in the hypothesized case of identical populations. Chernoff and Savage
LINEAR RANK STATISTICS AND THE GENERAL TWOSAMPLE PROBLEM
291
(1958) have proved that the asymptotic normality property is valid also in the nonnull case, subject to certain regularity conditions relating mainly to the smoothness and size of the weights. The expressions for the mean and variance will be given here, since they are also useful in investigating consistency and efﬁciency properties of most twosample linear rank statistics. A key feature in the ChernoffSavage theory is that a linear rank statistic can be represented in the form of a Stieltjes integral. Thus, if the weights for a linear rank statistic P are functions of the ranks, an equivalent representation of TN ¼ N i¼1 ai Zi is Z 1 TN ¼ m JN ½HN ðxÞ dSm ðxÞ 1
where the notation is deﬁned as follows: 1. 2. 3.
4.
Sm ðxÞ and Sn ðxÞ are the empirical distribution functions of the X and Y samples, respectively. m=N ! lN ; 0 < lN < 1: HN ðxÞ ¼ lN Sm ðxÞ þ ð1 lN ÞSn ðxÞ, so that HN ðxÞ is the proportion of observations from either sample which do not exceed the value x, or the empirical distribution function of the combined sample. JN ði=NÞ ¼ ai .
This Stieltjes integral form is given here because it appears frequently in the journal literature and is useful for proving theoretical properties. Since the following theorems are given here without proof anyway, the student not familiar with Stieltjes integrals can consider the following equivalent representation: X
0 ¼m TN
JN ½HN ðxÞpðxÞ
over all x such that pðxÞ>0
where ( pðxÞ ¼
1=m 0
if x is the observed value of an X random variable otherwise
For example, in the simplest case where ai ¼ i=N; JN ½HN ðxÞ ¼ HN ðxÞ and Z Z 1 m 1 TN ¼ m HN ðxÞdSm ðxÞ ¼ ½mSm ðxÞ þ nSn ðxÞ dSm ðxÞ N 1 1
292
CHAPTER 7
¼
m N
Z
1
1
ðnumber of observations in the combined sample 4 xÞ ð1=m if x is the value of an X random variable and 0 otherwiseÞ
¼
N 1X iZi N i¼1
Now when the X and Y samples are drawn from the continuous populations FX and FY , respectively, we deﬁne the combined population cdf as HðxÞ ¼ lN FX ðxÞ þ ð1 lN ÞFY ðxÞ The Chernoff and Savage theorem stated below is subject to certain regularity conditions not explicitly stated here, but given in Chernoff and Savage (1958). Subject to certain regularity conditions, the most important of which are that JðHÞ ¼ limN!1 JN ðHÞ,
Theorem 3.8
ðrÞ J ðHÞ ¼ jdr JðHÞ=dH r j 4 K j Hð1 HÞjr1=2þd for r ¼ 0; 1; 2 and some d > 0 and K any constant which does not depend on m; n; N; FX ; or FY ; then for lN ﬁxed,
TN =m mN lim P 4 t ¼ FðtÞ N!1 sN where Z mN ¼
1
J½HðxÞfX ðxÞ dx 8 Z Z 1 lN < lN Ns2N ¼ 2 lN : 1
FY ðxÞ½1 FY ðyÞJ 0 ½HðxÞJ 0 ½HðyÞ
1 *~ * } #
#
} # = Y#% #
$!
! "
99
@ # } = $ } &  # Y# % @$ #
$ $  } $
@$
@} @ @ @} @} @} @
 $
@$
$ 
$
 $  
$ # $ # * $
 }  } *~ / [ ; &
& *~ #
& # >
= }  } "
 } }
$= = ;
~  @} @ @ *~ = } #
9Z
#
[
Y#% @$ $ } *~
~
$  $
}
;}

}
}
= *~
~
; } };

}
} }
=
*~ * ~ 0 ~ } 0  ]
>
% $ @$ @$
Y#%
@$
$
 
 } ~
$ $ $
& & $
  $
@$
! "
9$
' =
} $
*~
~ } ;; }; } 

}
;; ; }

 }
 }

 
;; 




 ;
= < ~
%  ; =
$ = = ~ ~ } = *~ * ~ 0 ~ }
;
~  ;
#
^ 
$ & 
> 
^ 

[ & 
Y  >  [ ~  > @  = $ ^ 
&
Z%
#
# #[ % = ^ / ]
}  " } # #
" # & " #
& > ^ /
# #[
& \  # # $!
"" !
*~ *
=
*~ %
=
~  ~
~ ~
}
!= &
$ & " &
& #
 &
&
j q ,
TESTS OF THE EQUALITY OF k INDEPENDENT SAMPLES
381
of three groups. The ﬁrst group of speakers were not allowed to use any audiovisual aids, the second group of speakers were allowed to use a regular overhead projector and a microphone, and the third group of speakers could use a 35mm color slide projector together with a microphone and a tape recorder (which played prerecorded audio messages). After a certain period of time, each of the speakers made a presentation in an auditorium, on a certain issue, in front of a live audience and a selected panel of judges. The contents of their presentations were virtually the same, so that any differences in effectiveness could be attributed only to the audiovisual aids used by the speakers. The judges scored each presentation on a scale of 30 to 100, depending on their own judgment and the reaction of the audience, with larger scores denoting greater effectiveness; the scores are given below. It seems reasonable to expect that the use of audiovisual aids would have some beneﬁcial effect and hence the median score for group 1 will be the lowest, that for group 3 the highest, and the median score for group 2 somewhere in between.
Group 1 74, 58, 68, 60, 69
Group 2
Group 3
70, 72, 75, 80, 71
73, 78, 88, 85, 76
Solution The hypotheses to be tested are H0 : y1 ¼ y2 ¼ y3 , where yi is the median of the ith group, against H1 : y1 4 y2 4 y3 , where at least one of the inequalities is strict. Here k ¼ 3 and in order to apply the JT test the three twosample MannWhitney statistics U12 ; U13 , and U23 are needed. We ﬁnd U12 ¼ 22; U13 ¼ 24, and U23 ¼ 21 and hence B ¼ 67. The exact P value for the JT test from Table R of the Appendix is PðB567jH0 Þ < 0:0044. Thus H0 is rejected in favor of H1 at any commonly used value of a and we conclude that audiovisual aids do help in making a presentation more effective, and in fact, when all other factors are equal, there is evidence that the more audiovisual aids are used, the more effective is the presentation. Also, we have E0 ðBÞ ¼ 37:5 and var0 ðBÞ ¼ 89:5833, so that using the normal approximation, z ¼ 3:1168 (without a continuity correction) and the approximate P value from Table A of the Appendix is 1 Fð3:12Þ ¼ 0:0009; the approximate JT test leads to the same conclusion. The SAS and STATXACT computer solutions shown below agree exactly with ours.
382
CHAPTER 10
! "
Z
%9 "!! ]*
= =
% = [ 

= ++ //
$
=
# #
? = # %

Z
#
} ~
~  ? * ~ }  # * }    $  
& # Y  ++# //
}   
? #  *~  *  } # *~ *
# 
] & & # &
& = &
! { w]
*~  &
*
} # #
! "
Z
=
?
% = &
# # ^
& <  } *~ * $ 
$  ~
$
} *~ * %
*~ * / }
? *~ ; } $ =
= = = $ } / = = = = = & < } $ #  >
*~ 
*  *~
*~ & }
}  }
Z,
#
*~ * =

= } *~
*~ 
}
$   $ $ $~
}
~  = = = % = = ! { w]
%  &
# }  >
=
++ //
{
$ /
$
/  ] # { # = / =   /  { /
%
{ *~ {/ # @
{/ $ {/
TESTS OF THE EQUALITY OF k INDEPENDENT SAMPLES
387
test and reject H0 in favor of H1 if W is small. This test has been proposed and studied by Chakraborti and Desu (1988b) and will be referred to as the CD test. The exact (unconditional) distribution of the sum statistic W is obtained by noting that the exact distribution of W is simply the expectation of the joint distribution of the Wi ’s, i ¼ 2; 3; . . . ; k, with respect to T and that conditional on T, the Wi ’s are independent binomial random variables with parameter ni and Fi ðTÞ This yields, for w ¼ 0; 1; . . . ; ðN n1 Þ P½W ¼ w ¼
XZ
k
Y ni ½Fi ðtÞai ½1 Fi ðtÞni ai dFT ðtÞ a i 1 i¼2 1
ð7:3Þ
where the sum is over all ai ¼ 0; 1; . . . ; ni ; i ¼ 2; 3; . . . ; k, such that a2 þ a3 þ þ ak ¼ w. Under the null hypothesis the integral in (7.3) reduces to a complete beta integral and the exact null distribution of W can be enumerated. However, a more convenient closedform expression for the null distribution of W may be obtained directly by arguing as follows. The statistic W is the total number of observations in treatment groups 2 through k that precede T. Hence the null distribution of W is the same as that of the twosample precedence statistic with sample sizes n1 and N n1 and this can be obtained directly from the results in Problems 2.28c and 6.10a. Thus we have, when T is the ith order statistic in the control sample, h N i w i þ w 1 i N
P½W ¼ wjH0 ¼ N n1 w w N n1 w ¼ 0; 1; . . . ; N; i ¼ 1; 2; . . . ; n1 or equivalently, P½W ¼ wjH0 ¼
n1 N n 1 N1 n1 1 i1 wþi1 N w
ð7:4Þ
Also, using the result in Problem 2.28d we have
i E0 ðWÞ ¼ ðN n1 Þ nþ1
ð7:5Þ
ZZ
#
~ {
 ; ;   }  }
=
{
{ %  {} { { & \  " ^
[ # # = &  #
! "
$
^ 8 } }

+
 }~ }  ~ } 
} ~  } }
} } }
% $ # # Y [ &
$ Y [ &
# ?
^ &  Y [ & &
?
} ? \ }
^ % % & ?
/
$
}~ }  }
}}  }~ }
}} }}  
   
% $  = & = $ # & 
 " } / " #
396
CHAPTER 10
Table 3 Region Northwest Midwest Northeast Southwest South Central Southeast
Insults
Persuades
True picture
3.69 4.22 3.63 4.16 3.96 3.78
4.48 3.75 4.54 4.35 4.73 4.49
3.69 3.25 4.09 3.61 3.41 3.64
10.16. Prior to the AlabamaAuburn football game, 80 Alabama alumni, 75 Auburn alumni, and 45 residents of Tuscaloosa who are not alumni of either are asked who they think will win the game. The responses are as follows:
Alabama win Auburn win
Alabama
Auburn
Tuscaloosa
55 25
15 60
30 15
Do the three groups have the same probability of thinking Alabama will win? 10.17. Random samples of 100 insurance company executives, 100 transportation company executives, and 100 media company executives were classiﬁed according to highest level of formal education using the code 10 ¼ some college, 20 ¼ bachelor’s degree, 30 ¼ master’s degree, 40 ¼ more than master’s. The results are shown below. Determine whether median education level is the same for the three groups at a ¼ 0:05.
Education 10 20 30 40 Total
Insurance
Transportation
Media
19 20 36 25
31 37 20 12
33 34 21 12
100
100
100
10.18. Four different experimental methods of treating schizophrenia—(1) weekly shock treatments, (2) weekly treatments of carbon dioxide inhalations, (3) biweekly shock treatment alternated with biweekly carbon dioxide inhalations, and (4) tranquilizer drug treatment—are compared by assigning a group of schizophrenic patients randomly into four treatment groups. The data below are the number of patients who did
! "
&
/
$9
;
;
} } }
} } 
}
 }
%$ ! ! ` =
'
% : ~ ~ } 
%
% %
~}

}} } } }
%% $ : % & & & = : } & $
& &
: }  ~ ~ 
%
 ~ ~ } } }~ }
} }  }~ }  }
% $
"
$Z
#
$
&
%
@ @
:
%
~
~
~
~ ~
% ]
=
#
8 : %
} ~ 
 ~
~
 ~~
 ~ }
% &#% [$[
!= }
11 Measures of Association for Bivariate Samples
11.1 INTRODUCTION: DEFINITION OF MEASURES OF ASSOCIATION IN A BIVARIATE POPULATION
In Chapter 5 we saw that the ordinary sign test and the Wilcoxon signedrank test procedures, although discussed in terms of inferences in a singlesample problem, could be applied to pairedsample data by basing the statistical analysis on the differences between the pairs of observations. The inferences then must be concerned with the population of differences as opposed to some general relationship between the two dependent random variables. One parameter of this population of differences, the variance, does contain information concerning their relationship, since varðX YÞ ¼ varðXÞ þ varðYÞ 2 covðX; YÞ It is this covariance factor and a similar measure with which we shall be concerned in this chapter. 399
400
CHAPTER 11
In general, if X and Y are two random variables with a bivariate probability distribution, their covariance, in a certain sense, reﬂects the direction and amount of association or correspondence between the variables. The covariance is large and positive if there is a high probability that large (small) values of X are associated with large (small) values of Y. On the other hand, if the correspondence is inverse so that large (small) values of X generally occur in conjunction with small (large) values of Y, their covariance is large and negative. This comparative type of association is referred to as concordance or agreement. The covariance parameter as a measure of association is difﬁcult to interpret because its value depends on the orders of magnitude and units of the random variables concerned. A nonabsolute or relative measure of association circumvents this difﬁculty. The Pearson productmoment correlation coefﬁcient, deﬁned as rðX; YÞ ¼
covðX; YÞ ½varðXÞ varðYÞ1=2
is a measure of the linear relationship between X and Y. This coefﬁcient is invariant under changes of scale and location in X and Y, and in classical statistics this parameter is usually used as the relative measure of association in a bivariate distribution. The absolute value of the correlation coefﬁcient does not exceed 1, and its sign is determined by the sign of the covariance. If X and Y are independent random variables, their correlation is zero, and therefore the magnitude of r in some sense measures the degree of association. Although it is not true in general that a zero correlation implies independence, the bivariate normal distribution is a signiﬁcant exception, and therefore in the normaltheory model r is a good measure of association. For random variables from other bivariate populations, r may not be such a good description of relationship since dependence may be reﬂected in a wide variety of types of relationships. One can only say in general that r is a more descriptive measure of dependence than covariance because r does not depend on the scales of X and Y. If the main justiﬁcation for the use of r as a measure of association is that the bivariate normal is such an important distribution in classical statistics and zero correlation is equivalent to independence for that particular population, this reasoning has little signiﬁcance in nonparametric statistics. Other population measures of association should be equally acceptable, but the approach to measuring relationships might be analogous, so that interpretations are
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
401
simpliﬁed. Because r is so widely known and accepted, any other measure would preferably emulate its properties. Suppose we deﬁne a ‘‘good’’ relative measure of association as one which satisﬁes the following criteria: 1.
For any two independent pairs (Xi ; Yi ) and (Xj ; Yj ) of random variables which follow this bivariate distribution, the measure will equal þ 1 if the relationship is direct and perfect in the sense that Xi < Xj whenever Yi < Yj
2.
4. 5. 6. 7.
Xi > Xj whenever Yi > Yj
This relation will be referred to as perfect concordance (agreement). For any two independent pairs, the measure will equal 1 if the relationship is indirect and perfect in the sense that Xi < Xj whenever Yi > Yj
3.
or
or
Xi > Xj whenever Yi < Yj
This relation will be referred to as perfect discordance (disagreement). If neither criterion 1 nor criterion 2 is true for all pairs, the measure will lie between the two extremes 1 and þ 1. It is also desirable that, in some sense, increasing degrees of concordance are reﬂected by increasing positive values, and increasing degrees of discordance are reﬂected by increasing negative values. The measure will equal zero if X and Y are independent. The measure for X and Y will be the same as for Y and X, or X and Y, or Y and X. The measure for X and Y or X and Y will be the negative of the measure for X and Y. The measure will be invariant under all transformations of X and Y for which order of magnitude is preserved.
The parameter r is well known to satisfy the ﬁrst six of these criteria. It is a type of measure of concordance in the same sense that covariance measures the degree to which the two variables are associated in magnitude. However, although r is invariant under positive linear transformations of the random variables, it is not invariant under all orderpreserving transformations. This last criterion seems especially desirable in nonparametric statistics, as we have seen that in order to be distributionfree, inferences must usually be determined by relative magnitudes as opposed to absolute magnitudes of the variables under study. Since probabilities of events involving only inequality relations between random variables are invariant under all
402
CHAPTER 11
orderpreserving transformations, a measure of association which is a function of the probabilities of concordance and discordance will satisfy the seventh criterion. Perfect direct and indirect association between X and Y are reﬂected by perfect concordance and perfect discordance, respectively, and in the same spirit as r measures a perfect direct and indirect linear relationship between the variables. Thus an appropriate combination of these probabilities will provide a measure of association which will satisfy all seven of these desirable criteria. For any two independent pairs of random variables ðXi ; Yi Þ and ðXj ; Yj Þ, we denote by pc and pd the probabilities of concordance and discordance, respectively. pc ¼ Pf½ðXi < Xj Þ \ ðYi < Yj Þ [ ½ðXi > Xj Þ \ ðYi > Yj Þ ¼ P½ðXj Xi ÞðYj Yi Þ > 0 ¼ P½ðXi < Xj Þ \ ðYi < Yj Þ þ P½ðXi > Xj Þ \ ðYi > Yj Þ pd ¼ P½ðXj Xi ÞðYj Yi Þ < 0 ¼ P½ðXi < Xj Þ \ ðYi > Yj Þ þ P½ðXi > Xj Þ \ ðYi < Yj Þ Perfect association between X and Y is reﬂected by either perfect concordance or perfect discordance, and thus some combination of these probabilities should provide a measure of association. The Kendall coefﬁcient t is deﬁned as the difference t ¼ pc pd and this measure of association satisﬁes our desirable criteria 1 to 7. If the marginal probability distributions of X and Y are continuous, so that the probability of ties Xi ¼ Xj or Yi ¼ Yj within groups is eliminated, we have pc ¼ fPðYi < Yj Þ P½ðXi > Xj Þ \ ðYi < Yj Þ þ fPðYi > Yj Þ P½ðXi < Xj Þ \ ðYi > Yj Þg ¼ PðYi < Yj Þ þ PðYi > Yj Þ pd ¼ 1 pd Thus in this case t can also be expressed as t ¼ 2pc 1 ¼ 1 2pd How does t measure independence? If X and Y are independent and continuous random variables, PðXi < Xj Þ ¼ PðXi > Xj Þ and further
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
403
the joint probabilities in pc or pd are the product of the individual probabilities. Using these relations, we can write pc ¼ PðXi < Xj ÞPðYi < Yj Þ þ PðXi > Xj ÞPðYi > Yj Þ ¼ PðXi > Xj ÞPðYi < Yj Þ þ PðXi < Xj ÞPðYi > Yj Þ ¼ pd and thus t ¼ 0 for independent continuous random variables. In general, the converse is not true, but this disadvantage is shared by r. For the bivariate normal population, however, t ¼ 0 if and only if r ¼ 0, that is, if and only if X and Y are independent. This fact follows from the relation 2 t ¼ arcsin r p which can be derived as follows. Suppose that X and Y are bivariate normal with variances s2X and s2Y and correlation coefﬁcient r. Then for any two independent pairs ðXi ; Yi Þ and ðXj ; Yj Þ from this population, the differences Xi Xj U ¼ pﬃﬃﬃ 2sX
and
Yi Yj V ¼ pﬃﬃﬃ 2s Y
also have a bivariate normal distribution, with zero means, unit variances, and covariance equal to r. Thus rðU; VÞ ¼ rðX; YÞ. Since pc ¼ PðUV > 0Þ we have Z pc ¼
Z
0
1
Z
¼2
Z
0
1 0
1
Z
jðx; yÞ dx dy þ
1
Z
1
jðx; yÞ dx dy 0
0
0
1
jðx; yÞ dx dy ¼ 2Fð0; 0Þ
where jðx; yÞ and Fðx; yÞ denote the density and cumulative distributions, respectively, of a standardized bivariate normal probability distribution. Since it can be shown that Fð0; 0Þ ¼
1 1 þ arcsin r 4 2p
we see that for the bivariate normal pc ¼
1 1 þ arcsin r 2 p
404
CHAPTER 11
and t¼
2 arcsin r p
In this chapter, the problem of point estimation of these two population measures of association, r and t, will be considered. We shall ﬁnd estimates which are distributionfree and discuss their individual properties and procedures for hypothesis testing, and the relationship between the two estimates will be determined. Another measure of association will be discussed brieﬂy. 11.2 KENDALL’S TAU COEFFICIENT
In Section 11.1, Kendall’s tau, a measure of association between random variables from any bivariate population, was deﬁned as t ¼ pc pd
ð2:1Þ
where, for any two independent pairs of observations ðXi ; Yi Þ; ðXj ; Yj Þ from the population, pc ¼ P½ðXj Xi ÞðYj Yi Þ > 0
and
pd ¼ P½ðXj Xi ÞðYj Yi Þ < 0 ð2:2Þ
In order to estimate the parameter t from a random sample of n pairs ðX1 ; Y1 Þ; ðX2 ; Y2 Þ; . . . ; ðXn ; Yn Þ drawn from this bivariate population, we must ﬁnd point estimates of the probabilities pc and pd. For each set of pairs (Xi,Yi), (Xj,Yj) of sample observations, deﬁne the indicator variables Aij ¼ sgnðXj Xi Þ sgnðYj Yi Þ where sgnðuÞ ¼
(
1 0 1
ð2:3Þ
if u < 0 if u ¼ 0 if u > 0
Then the values assumed by Aij are 8 1 if these pairs are concordant > < 1 if these pairs are discordant aij ¼ if these pairs are neither concordant nor > : 0 discordant because of a tie in either component
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
405
The marginal probability distribution of these indicator variables is 8 < pc fAij ðaij Þ ¼ pd : 1 pc pd
if aij ¼ 1 if aij ¼ 1 if aij ¼ 0
ð2:4Þ
and the expected value is EðAij Þ ¼ 1pc þ ð1Þpd ¼ pc pd ¼ t
ð2:5Þ
Since obviously we have aij ¼ aji and aii ¼ 0, there are only n2 sets of pairs which need be considered. An unbiased estimator of t is therefore provided by T¼
X X Aij XX Aij
n ¼ 2 nðn 1Þ 1 4 i > > >
> > 1pcc pdd 2pcd > > > > > > : 0
if aij ¼aik ¼1 if aij ¼aik ¼1 if aij ¼1; aik ¼1 or aij ¼1; aik ¼1 if aij ¼0; aik ¼1;0;1 or aij ¼1;0;1; aik ¼0 otherwise ð2:11Þ
for all i < j, i < k, j 6¼ k, i ¼ 1; 2; . . . ; n, and some 0 4 pcc ; pdd ; pcd 4 1. Thus we can evaluate EðAij Aik Þ ¼ 12 pcc þ ð1Þ2 pdd þ 2ð1Þpcd covðAij ; Aik Þ ¼ pcc þ pdd 2pcd ðpc pd Þ2
ð2:12Þ
Substitution of (2.10) and (2.12) in (2.9) gives nðn 1Þ varðTÞ ¼ 2ð pc þ pd Þ þ 4ðn 2Þð pcc þ pdd 2pcd Þ 2ð2n 3Þð pc pd Þ2
ð2:13Þ
so that the variance of T is of order 1=n and therefore approaches zero as n approaches inﬁnity. The results obtained so far are completely general, applying to all random variables. If the marginal distributions of X and Y are continuous, P(Aij ¼ 0) ¼ 0 and the resulting identities pc þ pd ¼ 1
and
pcc þ pdd þ 2pcd ¼ 1
allow us to simplify (2.13) to a function of, say, pc and pcd only: nðn 1Þ varðTÞ ¼ 2 2ð2n 3Þð2pc 1Þ2 þ 4ðn 2Þð1 4pcd Þ ¼ 8ð2n 3Þpc ð1 pc Þ 16ðn 2Þpcd Since for X and Y continuous we also have pcd ¼ PðAij ¼ 1 \ Aik ¼ 1Þ ¼ PðAij ¼ 1Þ PðAij ¼ 1 \ Aik ¼ 1Þ ¼ pc pcc
ð2:14Þ
408
CHAPTER 11
another expression equivalent to (2.14) is nðn 1Þ varðTÞ ¼ 8ð2n 3Þpc ð1 pc Þ 16ðn 2Þðpc pcc Þ ¼ 8pc ð1 pc Þ þ 16ðn 2Þðpcc p2c Þ
ð2:15Þ
We have already interpreted pc as the probability that the pair (Xi,Yi) is concordant with (Xj,Yj). Since the parameter pcc is pcc ¼ PðAij ¼ 1 \ Aik ¼ 1Þ ¼ P½ðXj Xi ÞðYj Yi Þ > 0 \ ðXk Xi ÞðYk Yi Þ > 0
ð2:16Þ
for all i < j, i < k, j 6¼ k, i ¼ 1; 2; . . . ; n, we interpret pcc as the probability that the pair (Xi,Yi) is concordant with both (Xj,Yj) the (Xk,Yk). Integral expressions can be obtained as follows for the probabilities pc and pcc for random variables X and Y from any continuous bivariate population FX,Y(x,y). pc ¼ P½ðXi < Xj Þ \ ðYi < Yj Þ þ P½ðXi > Xj Þ \ ðYi > Yj Þ Z 1Z 1 ¼ P½ðXi < xj Þ \ ðYi < yj ÞfXi ;Yi ðxj ; yj Þ dxj dyj 1 1 Z 1Z 1 þ P½ðXj < xi Þ \ ðYj < yi ÞfXi ;Yi ðxi ; yi Þ dxi dyi 1 1 Z 1Z 1 ¼2 FX;Y ðx; yÞfX;Y ðx; yÞ dx dy 1
1
ð2:17Þ
pcc ¼ Pðf½ðXi < Xj Þ \ ðYi < Yj Þ [ ½ðXi > Xj Þ \ ðYi > Yj Þg \ f½ðXi < Xk Þ \ ðYi < Yk Þ [ ½ðXi > Xk Þ \ ðYi > Yk ÞgÞ ¼ P½ðA [ BÞ \ ðC [ DÞ ¼ P½ðA \ CÞ [ ðB \ DÞ [ ðA \ DÞ [ ðB \ CÞ ¼ PðA \ CÞ þ PðB \ DÞ þ 2PðA \ DÞ Z 1Z 1 ¼ fP½ðXj > xi Þ \ ðYj > yi Þ \ ðXk > xi Þ \ ðYk > yi Þ 1 1
þ P½ðXj < xi Þ \ ðYj < yi Þ \ ðXk < xi Þ \ ðYk < yi Þ Z ¼
þ 2P½ðXj > xi Þ \ ðYj > yi Þ \ ðXk < xi Þ \ ðYk < yi Þg fXi ;Yi ðxi ;yi Þ dxi dyi
1Z 1
1 1
ðfP½ðX > xÞ \ ðY > yÞg2 þ fP½ðX < xÞ \ ðY < yÞg2 þ 2P½ðX > xÞ \ ðY > yÞP½ðX < xÞ \ ðY < yÞÞfX;Y ðx;yÞdxdy
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
Z ¼
1 1
Z ¼
1Z 1
1Z 1
1 1
409
fP½ðX > xÞ \ ðY > yÞ þ P½ðX < xÞ \ ðY < yÞg2 fX;Y ðx;yÞdxdy ½1 FX ðxÞ FY ðyÞ þ 2FX;Y ðx;yÞ2 fX;Y ðx;yÞdxdy
ð2:18Þ
Although T as given in (2.6) is perhaps the simplest form for deriving theoretical properties, the coefﬁcient can be written in a number of other ways. In terms of all n2 pairs for which Aij is deﬁned, (2.6) can be written as T¼
n X n X i¼1 j¼1
Aij nðn 1Þ
ð2:19Þ
Now we introduce the notation Uij ¼ sgnðXj Xi Þ
and
Vij ¼ sgnðYj Yi Þ
so that Aij ¼ Uij Vij for all i, j. Assuming that Xi 6¼ Xj and Yi 6¼ Yj for all i 6¼ j, we have n X n X
Uij2 ¼
i¼1 j¼1
n X n X
Vij2 ¼ nðn 1Þ
i¼1 j¼1
and (2.19) can be written in a form resembling an ordinary sample correlation coefﬁcient as Pn Pn T ¼ h Pn Pn i¼1
i¼1
2 j¼1 Uij
j¼1
P
Uij Vij n Pn i¼1
2 j¼1 Vij
i1=2
ð2:20Þ
Kendall and Gibbons (1990) often use T in still another form, which arises by simply classifying sets of differences according to the resulting sign of Aij. If C and Q denote the number of positive and negative Aij for 1 4 i < j 4 n, respectively, and the total is S ¼ CQ, we have n n T ¼ ðC QÞ= ¼ S= ð2:21Þ 2 2 If there are noties within either the X or Y groups, that is, Aij 6¼ 0 for n i 6¼ j, C þ Q ¼ and (2.21) can be written as 2
410
CHAPTER 11
2C 2Q T ¼ 1¼1 n n 2 2
ð2:22Þ
These two forms in (2.22) are analogous to the expression in Section 1 for the parameter t ¼ 2pc 1 ¼ 1 2pd
and C= n2 and Q= n2 are obviously unbiased estimators for pc and pd, respectively. The quantity C is perhaps the simplest to calculate for a given sample of n pairs. Assuming that the pairs are written from smallest to largest according to the value of the X component, C is simply the number of values of 1 4 i < j 4 n for which Yj Yi > 0, since only then shall we have aij ¼ 1. Another interpretation of T is as a coefﬁcient of disarray, since it can be shown (see Kendall and Gibbons, 1990, pp. 30–31) that the total number of interchanges between two consecutive Y observations required to transform the Y arrangement into the natural ordering from smallest to largest, i.e., to transform the Y arrangement into the X
arrangement, is equal to Q, or n2 ð1 TÞ=2. This will be illustrated later in Section 11.6. NULL DISTRIBUTION OF T
Suppose we wish to test the null hypothesis that the X and Y random variables are independent. Since t ¼ 0 for independent variables, the null distribution of T is symmetric about the origin. For a general alternative of nonindependence, the rejection region of size a then should be T2R
for jTj 5 ta=2
where ta=2 is chosen so that PðjTj 5 ta=2 jH0 Þ ¼ a For an alternative of positive dependence, a similar onesided critical region is appropriate. We must now determine the random sampling distribution of T under the assumption of independence. For this purpose, it will be more convenient, but not necessary, to assume that the X and Y sample observations have both been ordered from smallest to largest and assigned positive integer ranks. The data then consist of n sets of
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
411
pairs of ranks. The justiﬁcation for this assumption is that, like t, T is invariant under all orderpreserving transformations. Its numerical value then depends only on the relative magnitudes of the observations and is the same whether calculated for variate values or ranks. For samples with no ties, the n! distinguishable pairings of ranks are all equally likely under the null hypothesis. The value of T is completely determined by the value of C or S because of the expressions in (2.21) and (2.22), and it is more convenient to work with C. Denote by uðn; cÞ the number of pairings of n ranks which result in exactly c positive aij, 1 4 i < j 4 n. Then PðC ¼ cÞ ¼ and
uðn; cÞ n!
n tþ1 fT ðtÞ ¼ PðT ¼ tÞ ¼ P C ¼ 2 2
ð2:23Þ
ð2:24Þ
We shall now ﬁnd a recursive relation to generate the values of uðn þ 1; cÞ from knowledge of the values of uðn; cÞ for some n and all c. Assuming that the observations are written in order of magnitude of the X component, the value of C depends only on the resulting permutation of the Y ranks. If si denotes the rank of the Y observation which is paired with the rank i in the X sample, for i ¼ 1; 2; . . . ; n; c equals the number of integers greater than s1, plus the number of integers greater than s2 excluding s1, plus the number exceeding s3 excluding s1 and s2, etc. For any given permutation of n integers which has this sum c, we need only consider what insertion of the number n þ 1 in any of the n þ 1 possible positions of the permutation ðs1 ; s2 ; . . . ; sn Þ does to the value of c. If n þ 1 is in the ﬁrst position, c is clearly unchanged. If n þ 1 is in the second position, there is one additional integer greater than s1, so that c is increased by 1. If in the third position, there is one additional integer greater than both s1 and s2, so that c is increased by 2. In general, if n þ 1 is in the kth position, c is increased by k 1 for all k ¼ 1; 2; . . . ; n þ 1. Therefore the desired recursive relation is uðn þ 1; cÞ ¼ uðn; cÞ þ uðn; c 1Þ þ uðn; c 2Þ þ þ uðn; c nÞ ð2:25Þ In terms of s, since for a set of n pairs s ¼ 2c
nðn 1Þ 2
ð2:26Þ
412
CHAPTER 11
insertion of n þ 1 in the kth position increases c by k 1, the new value s0 of s for n þ 1 pairs will be nðn þ 1Þ nðn þ 1Þ ¼ 2ðc þ k 1Þ 2 2 nðn 1Þ þ 2ðk 1Þ n ¼ s þ 2ðk 1Þ n ¼ 2c 2
s0 ¼ 2c0
In other words, s is increased by 2ðk 1Þ n for k ¼ 1; 2; . . . ; n þ 1, and corresponding to (2.25) we have uðn þ 1; sÞ ¼ uðn; s þ nÞ þ uðn; s þ n 2Þ þ uðn; s þ n 4Þ þ þ uðn; s n þ 2Þ þ uðn; s nÞ
ð2:27Þ
The distribution of S is symmetrical about zero, and from (2.26) it is clear that S for n pairs is an even or odd integer according as nðn 1Þ=2 is even or odd. Because of this symmetry, tables are most easily constructed for S (or T) rather than C or Q. The null distribution of S is given in Table L of the Appendix. More extensive tables of the null distribution of S or T are given in Kaarsemaker and Van Wijngaarden (1952, 1953), Best (1973, 1974), Best and Gipps (1974), Nijsse (1988), and Kendall and Gibbons (1990). A simple example will sufﬁce to illustrate the use of (2.25) or (2.27) to set up tables of these probability distributions. When n ¼ 3, the 3! permutations of the Y ranks and the corresponding values of C and S are: Permutation
123
132
213
231
312
c
3
2
2
1
1
s
3
1
1
1
1
321 0 3
The frequencies then are: c
0
1
2
s
3
1
1
3
1
2
2
1
uð3; cÞ or uð3; sÞ
3
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
For C, using (2.25), uð4; cÞ ¼ uð4;0Þ ¼ uð3;0Þ ¼ 1
P3 i¼0
413
uð3c iÞ, or
uð4;1Þ ¼ uð3;1Þ þ uð3;0Þ ¼ 3 uð4;2Þ ¼ uð3;2Þ þ uð3;1Þ þ uð3;0Þ ¼ 5 uð4;3Þ ¼ uð3;3Þ þ uð3;2Þ þ uð3;1Þ þ uð3;0Þ ¼ 6 uð4;4Þ ¼ uð3;3Þ þ uð3;2Þ þ uð3;1Þ ¼ 5 uð4;5Þ ¼ uð3;3Þ þ uð3;2Þ ¼ 3 uð4;6Þ ¼ uð3;3Þ ¼ 1 P Alternatively, we could use (2.27), or uð4; sÞ ¼ 3i¼0 uð3; s þ 3 2iÞ. Therefore the probability distributions for n ¼ 4 are: c s
0 6
1 4
2 2
t
1
2=3
1=3
f ðc; s; or tÞ
1=24
3=24
5=24
3 0
4 2
5 4
6 6
0
1=3
2=3
1
6=24
5=24
3=24
1=24
The way in which the uðn; s; or cÞ are built up by cumulative sums indicates that simple schemes for their generation may be easily worked out (see, for example, Kendall and Gibbons, 1990, pp. 91–92). The exact null distribution is thus easily found for moderate n. Since T is a sum of random variables, it can be shown using general limit theorems for independent variables that the distribution of a standardized T approaches the standard normal distribution as n approaches inﬁnity. To use this fact to facilitate inferences concerning independence in large samples, we need to determine the null mean and variance of T. Since T was deﬁned to be an unbiased estimator of t for any bivariate population and we showed in Section 1 that t ¼ 0 for independent, continuous random variables, the mean is EðT j H0 Þ ¼ 0. In order to ﬁnd varðT j H0 Þ for X and Y continuous, (2.15) is used with the appropriate pc and pcc under H0. Under the assumption that X and Y have continuous marginal distributions and are independent, they can be assumed to be identically distributed according to the uniform distribution over the interval (0,1), because of the probabilityintegral transformation. Then, in (2.17) and (2.18), we have Z 1Z 1 pc ¼ 2 xy dx dy ¼ 1=2 0 0 ð2:28Þ Z 1Z 1 ð1 x y þ 2xyÞ2 dx dy ¼ 5=18
pcc ¼
0
0
414
CHAPTER 11
Substituting these results in (2.15), we obtain nðn 1Þ varðTÞ ¼ 2 þ
varðTÞ ¼
16ðn 2Þ 36
2ð2n þ 5Þ 9nðn 1Þ
For large n, the random variable pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 3 nðn 1Þ T Z ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2ð2n þ 5Þ
ð2:29Þ
ð2:30Þ
can be treated as a standard normal variable with density jðzÞ. If the null hypothesis of independence of X and Y is accepted, we can of course infer that the population parameter t is zero. However, if the hypothesis is rejected, this implies dependence between the random variables but not necessarily that t 6¼ 0. THE LARGESAMPLE NONNULL DISTRIBUTION OF KENDALL’S STATISTIC
The probability distribution of T is asymptotically normal for sample pairs from any bivariate population. Therefore, if any general mean and variance of T could be determined, T would be useful in large samples for other inferences relating to population characteristics besides independence. Since EðTÞ ¼ t for any distribution, T is particularly relevant in inferences concerning the value of t. The expressions previously found for varðTÞ in (2.13) for any distribution and (2.15) for continuous distributions depend on unknown probabilities. Unless the hypothesis under consideration somehow determines pc ; pd ; pcc ; pdd , and pcd (or simply pc and pcc for the continuous case), the exact variance cannot be found without some information about fX;Y ðx; yÞ. However, unbiased and consistent estimates of these probabilities can be found from the sample data to provide a consistent ^ ðTÞ of the variance of T. The asymptotic distribution of estimate s ðT tÞ=^ sðTÞ then remains standard normal. Such estimates will be found here for paired
samples containing no tied observations. We observed before that C= n2 is an unbiased and consistent estimator of pc . However, for the purpose of ﬁnding estimates for all the probabilities involved, it will be more convenient now to introduce a different notation. As before, we can assume without loss of generality that the n pairs are arranged in natural order according to the x component and that si is the rank of that y which is
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
415
paired with the ith smallest x for i ¼ 1; 2; . . . ; n, so that the data are ðs1 ; s2 ; . . . ; sn Þ. Deﬁne ai ¼ number of integers to the left of si and less than si bi ¼ number of integers to the right of si and greater than si Then ci ¼ ai þ bi ¼ number of values of j ¼ 1; 2; . . . ; n such that ðxi ; yi Þ is concordant with Pnðxj ; yj Þ. There are nðn 1Þ distinguishable sets of pairs, of which i¼1 ci are concordant. An unbiased estimate of pc then is ^c ¼ p
n X i¼1
ci nðn 1Þ
ð2:31Þ
Similarly, we deﬁne a0i ¼ number of integers to the left of si and greater than si b0i ¼ number of integers to the right of si and less than si and
di ¼ a0i þ b0i ¼ number of values of j ¼ 1; 2; . . . ; n such that ðxi ; yi Þ is discordant with ðxj ; yj Þ. Then n X di ^d ¼ ð2:32Þ p nðn 1Þ i¼1
gives an unbiased estimate of pd . An unbiased and consistent estimate of pcc is the number of sets of three pairs ðxi ; yi Þ; ðxj ; yj Þ; ðxk ; yk Þ for all i 6¼ j 6¼ k, for which the products ðxi xj Þðyi yj Þ and ðxi xk Þðyi yk Þ are both positive, divided by the number of distinguishable sets nðn 1Þðn 2Þ. Denote by cii the number of values of j and k; i 6¼ j 6¼ k; 1 4 j; k 4 n, such that ðxi ; yi Þ is concordant with both ðxj ; yj Þ and ðxk ; yk Þ, so that n X cii ^ cc ¼ p nðn 1Þðn 2Þ i¼1 The pair ðxi ; yi Þ is concordant with both ðxj ; yj Þ and ðxk ; yk Þ if: Group 1:
sj < si < sk sk < si < sj
for j < i < k for k < i < j
416
CHAPTER 11
Group 2:
si < sj < sk si < sk < sj
for i < j < k for i < k < j
Group 3:
sj < sk < si sk < sj < si
for j < k < i for k < j < i
Therefore cii is twice the sum of the following three corresponding numbers: 1. The number of unordered pairs of integers, one to the left and one to the right of si , such that the one to the left is less than si and the one to the right is greater than si . 2. The number of unordered pairs of integers, both to the right of si , such that both are greater than si . 3. The number of unordered pairs of integers, both to the left of si , such that both are less than si . Then, employing the same notation as before, we have a i bi bi ai cii ¼ 2 þ þ ¼ ðai þbi Þ2 ðai þbi Þ ¼ c2i ci ¼ ci ðci 1Þ 1 1 2 2 and ^ cc ¼ p
n X i¼1
ci ðci 1Þ nðn 1Þðn 2Þ
ð2:33Þ
Similarly, we can obtain ^ dd ¼ p
n X i¼1
^ cd ¼ p
di ðdi 1Þ nðn 1Þðn 2Þ
n n X ai b0i þ ai a0i þ bi a0i þ bi b0i X c i di ¼ nðn 1Þðn 2Þ nðn 1Þðn 2Þ i¼1 i¼1
ð2:34Þ
ð2:35Þ
Substituting the results (2.31) and (2.33) in (2.15), the estimated variance of T in samples for continuous variables is nðn 1Þ^ s2 ðTÞ ¼ 8^ pc 8^ p2c ð2n 3Þ þ 16ðn 2Þ^ pcc 2 3 !2 n n n X X X 2n 3 ^ 2 ðTÞ ¼ 842 n2 ðn 1Þ2 s c2i ci ci 5 nðn 1Þ i¼1 i¼1 i¼1
ð2:36Þ
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
417
In order to obviate any confusion regarding the calculation of the ci and cii to estimate the variance from (2.36) in the case of no tied observations, a simple example is provided below for achievement tests in Mathematics and English administered to a group of six randomly chosen students. Student
A
B
C
D
E
F
Math score English score
91 89
52 72
69 69
99 96
72 66
78 67
The two sets of scores ranked and rearranged in order of increasing Mathematics scores are: Student
B
C
E
F
A
D
Math rank English rank
1 4
2 3
3 1
4 2
5 5
6 6
The numbers ci ¼ ai þ bi are c1 ¼ 0 þ 2 c2 ¼ 0 þ 2 c3 ¼ 0 þ 3 c4 ¼ 1 þ 2 X X c2i ¼ 76 n¼6 ci ¼ 20 ^c ¼ p ^ cc ¼ p
c5 ¼ 4 þ 1
c6 ¼ 5 þ 0
20 2 ¼ 6ð5Þ 3 76 20 7 ¼ 6ð5Þð4Þ 15
t ¼ 2ð2=3Þ 1 ¼ 1=3 9 2 2 2 ^ ðTÞ ¼ 8 2ð76Þ 20 30 s 20 ¼ 96 6ð5Þ ^ 2 ðTÞ ¼ 0:1067 s
^ ðTÞ ¼ 0:33 s
If we wish to count the cii directly, we have for cii ¼ 2ðgroup 1 þ group 2 þ group 3Þ, the pairs relevant to c44 , say, are Group 1: Group 2: Group 3:
(1,5)(1,6) (5,6) None
so that c44 ¼ 2ð3Þ ¼ 6 ¼ c4 ðc4 1Þ.
418
CHAPTER 11
On the other hand, suppose the English scores corresponding to increasing Math scores were ranked as y
3
1
4
2
6
5
Then we can calculate c1 ¼ c4 ¼ 3
c2 ¼ c3 ¼ c5 ¼ c6 ¼ 4
^ c ¼ 11=15 p
^ cc ¼ 1=2 p
t ¼ 7=15
^ 2 ðTÞ ¼ 32=1125 s
and the estimated variance is negative! A negative variance from (2.15) of course cannot occur, but when the parameters p are replaced ^ and combined, the result can be negative. Since the by estimates p probability estimates are consistent, the estimated variance of T will be positive for n sufﬁciently large. Two applications of this asymptotic approximation to the nonnull distribution of T in nonparametric inference for large samples are: 1. An approximate ð1 aÞ100 percent conﬁdenceinterval estimate of the population Kendall tau coefﬁcient is ^ ðTÞ < t < t þ za=2 s ^ ðTÞ t za=2 s 2. An approximate test of H0: t ¼ t0
versus
H1: t 6¼ t0
with signiﬁcance level a is to reject H0 when jt t0 j 5 za=2 ^ ðTÞ s A onesided alternative can also be tested.
TIED OBSERVATIONS
Whether or not the marginal distributions of X and Y are assumed continuous, tied observations can occur within either or both samples. Ties across samples do not present any problem of course. Since the deﬁnition of Aij in (2.3) assigned a value of zero to aij if a tie occurs in the (i; j) set of pairs for either the x or y sample values, T as deﬁned before allows for, and essentially ignores, all zero differences. With t deﬁned as the difference pc pd ; T as calculated from (2.6), (2.19), or (2.21) is an unbiased estimator of t with variance as given in (2.13) even in the presence of ties. If the occurrence of ties in the sample is
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
419
attributed to a lack of precision in measurement as opposed to discrete marginal distributions, the simpliﬁed expression for varðTÞ in (2.15) may still be used. If there are sample ties, however, the expressions (2.20) and (2.22) are no longer equivalent to (2.6), (2.19), or (2.21). For small samples with a small number of tied observations, the exact null distribution of T (or S) conditional on the observed ties can be determined by enumeration. There will be mw pairings of the two samples, each occurring with equal probability 1=mw, if there are m and w distinguishable permutations of the x and y sample observations, respectively. For larger samples, the normal approximation to the distribution of T can still be used but with corrected moments. Conditional upon the observed ties, the parameters pc ; Pd ; pcc ; pdd , and pcd must have a slightly different interpretation. For example, pc and pd here would be the probability that we select two pairs ðxi ; yi Þ and ðxj ; yj Þ which do not have a tie in either coordinate, and under the assumption of independence this is P P uðu 1Þ vðv 1Þ 1 1 nðn 1Þ nðn 1Þ where u denotes the multiplicity of a tie in the x set and the sum is extended over all ties and v has the same interpretation for the y set. These parameters in the conditional distribution can be determined and substituted in (2.13) to ﬁnd the conditional variance (see, for example, Noether, 1967, pp. 76–77). The conditional mean of T, however, is unchanged, since even for the new parameters we have pc ¼ pd for independent samples.
n Conditional on the observed ties, however, there are not longer 2 distinguishable sets of pairs to check for concordance, and thus if T is calculated in the ordinary way, it cannot equal one even for perfect agreement. Therefore an alternative deﬁnition of T in the presence of ties is to replace the nðn 1Þ in the denominator of (2.6), (2.19), or (2.21) by a smaller quantity. To obtain a result still analogous to a correlation coefﬁcient, might take (2.20) as the deﬁnition of T in Pn Pwe n 2 general. Since i¼1 j¼1 Uij is the number of nonzero differences Xj Xi for all ði; jÞ, the sum is the total number of distinguishable differences less the number involving tied observations, or P nðn 1Þ uðu 1Þ. Similarly for the Y observations. Therefore our modiﬁed T from (2.20) is Pn Pn T¼
f½nðn 1Þ
P
i¼1
j¼1
Uij Vij
uðu 1Þ½nðn 1Þ
P
vðv 1Þg1=2
ð2:37Þ
420
CHAPTER 11
which reduces to all previously given forms if there are no ties. The modiﬁed T from (2.21) is T¼
CQ 1=2 P P f½ðn2Þ ðu2Þ½ðn2Þ ð2vÞg
ð2:38Þ
Note that the denominator in (2.38) is a function of the geometric mean of the number of untied X observations and the number of untied Y observations. The modiﬁed T in (2.37) or (2.38) is frequently called taub in order to distinguish it from (2.20) or (2.21), which is called taua and has no correction for ties. The absolute value of the coefﬁcient T calculated from (2.37) or (2.38) is always greater than the absolute of a coefﬁcient calculated from (2.20) or (2.21) when ties are present, but it still may not be equal to one for perfect agreement or disagreement. The only way to deﬁne a tau coefﬁcient that does always equal one for perfect agreement or disagreement is to deﬁne g¼
CQ CþQ
ð2:39Þ
This ratio, the number of concordant pairs with no ties minus the number of discordant pairs with no ties by the total number of untied pairs, is called the GoodmanKruskal gamma coefﬁcient. A RELATED MEASURE OF ASSOCIATION FOR DISCRETE POPULATIONS
In Section 11.1 we stated the criterion that a good measure of association between two random variables would equal þ 1 for a perfect direct relationship and 1 for a perfect indirect relationship. In terms of the probability parameters, perfect concordance requires pc ¼ 1, and perfect discordance requires pd ¼ 1. With Kendall’s coefﬁcient deﬁned as t ¼ pc pd , the criterion is satisﬁed if and only if pc þ pd ¼ 1. But if the marginal distributions of X and Y are not continuous, pc þ pd ¼ P½ðXj Xi ÞðYj Yi Þ > 0 þ P½ðXj Xi ÞðYj Yi Þ < 0 ¼ 1 P½ðXj Xi ÞðYj Yi Þ ¼ 0 ¼ 1 P½ðXi ¼ Xj Þ [ ðYi ¼ Yj Þ ¼ 1 pt where pt denotes the probability that a pair is neither concordant not discordant. Thus t cannot be considered a ‘‘good’’ measure of association if pt 6¼ 0.
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
421
However, a modiﬁed parameter which does satisfy the criteria for all distributions can easily be deﬁned as t ¼
t ¼ pc pd 1 pt
where pc and pd are, respectively, the conditional probabilities of concordance and discordance given that there are no ties pc ¼
pc P½ðXj Xi ÞðYj Yi Þ > 0 ¼ 1 pt P½ðXj Xi ÞðYj Yi Þ 6¼ 0
Since t is a linear function of t, an estimate is provided by T ¼
^d ^c p T p ¼ ^t p ^c þ p ^d 1p
^ c and p ^ d deﬁned as before in (2.31) and (2.32). Since p ^ c and p ^d with p ^c þ p ^ d Þ is are consistent estimators, the asymptotic distribution of T=ð p equivalent to the asymptotic distribution of T=ð pc þ pd Þ, which we know to be the normal distribution. Therefore for large samples, inferences concerning t can be made (see, for example, Goodman and Kruskal, 1954, 1959, 1963). USE OF KENDALL’S STATISTIC TO TEST AGAINST TREND
In Chapter 3 regarding tests for randomness, we observed that the arrangement of relative magnitudes in a single sequence of timeordered observations can indicate some sort of trend. When the theory of runs up and down was used to test a hypothesis of randomness, the magnitude of each observations relative to its immediately preceding value was considered, and a long run of plus (minus) signs or a sequence with a large predominance of plus (minus) signs was considered indicative of an upward (downward) trend. If time is treated as an X variable, say, and a set of timeordered observations as the Y variable, an association between X and Y might be considered indicative of a trend. Thus the degree of concordance between such X and Y observations would be a measure of trend, and Kendall’s tau statistic becomes a measure of trend. Unlike the case of runs up and down, however, the tau coefﬁcient considers the relative magnitude of each observation relative to every preceding observation. A hypothesis of randomness in a single set of n timeordered observations is the same as a hypothesis of independence between these observations when paired with the numbers 1; 2; . . . ; n. Therefore,
422
CHAPTER 11
assuming that xi ¼ i for i ¼ 1; 2; . . . ; n, the indicator variables Aij deﬁned in (2.3) become Aij ¼ sgnðj 1Þ sgnðYj Yi Þ and (2.6) can be written as n XX sgnðYj Yi Þ T¼ 2 1 4 i Xj whenever Yi > Yk or, equivalently, if ðXj Xi ÞðYk Yi Þ ¼ Uij Vik > 0 The probability of a secondorder concordance is pc2 ¼ P½ðXj Xi ÞðYk Yi Þ > 0 ^ c2 is the number of sets of and the corresponding sample estimate p three pairs with the product U V > 0 for i < j, k 6¼ j, divided by ij ik
n ðn 2Þ, the number of distinguishable sets of three pairs. The triple 2 sum in (4.5) is the totality of all these products, whether positive or negative, and therefore equals
nðn 1Þðn 2Þð2^ pc2 1Þ n ^ c2 Þ ¼ ^ c2 ð1 p ðn 2Þ½ p 2 2 In terms of sample concordances, then, (4.5) can be written as R¼
3 3ðn 2Þ ð2^ pc 1Þ þ ð2^ pc2 1Þ nþ1 nþ1
ð4:6Þ
and the population parameter for which R is an unbiased estimator is EðRÞ ¼
3½t þ ðn 2Þð2pc2 1Þ nþ1
ð4:7Þ
We shall now express pc2 for any continuous bivariate population FX;Y ðx; yÞ in a form analogous to (2.17) for pc:
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
435
pc2 ¼ P½ðXi Yk Þ Z 1Z 1 ¼ fP½ðXi yk ÞgfXj ;Yk ðxj ;yk Þdxj dyk Z 1Z 1 ¼ ½FX;Y ðx;yÞþ1FX ðxÞFY ðyÞþFX;Y ðx;yÞdFX ðxÞdFY ðyÞ 1 1 Z 1Z 1
¼1þ2 Z ¼2
1 1 1Z 1
1 1
Z FX;Y ðx;yÞdFX ðxÞdFY ðyÞ2
1
1
FX ðxÞdFX ðxÞ
FX;Y ðx;yÞdFX ðxÞdFY ðyÞ
ð4:8Þ
A similar development yields another equivalent form Z 1Z 1 FX ðxÞFY ðyÞ dFX;Y ðx; yÞ p c2 ¼ 2 1
1
ð4:9Þ
If X and Y are independent, of course, a comparison of these expressions with (2.17) shows that pc2 ¼ pc ¼ 1=2. Unlike pc, however, which ranges between 0 and 1, pc2 ranges only between 1=3 and 2=3, with the extreme values obtained for perfect indirect and direct linear relationships, respectively. This result can be shown easily. For the upper limit, since for all x, y, 2FX ðxÞFY ðyÞ 4 FX2 ðxÞ þ FY2 ðyÞ we have from (4.9) Z 1Z 1 p c2 4 2 FX2 ðxÞ dFX;Y ðx; yÞ ¼ 2=3 1
1
Similarly, for all x, y, 2FX ðxÞFY ðyÞ ¼ ½FX ðxÞ þ FY ðyÞ2 FX2 ðxÞ FY2 ðyÞ so that from (4.9) Z 1Z 1 p c2 ¼ ½FX ðxÞ þ FY ðyÞ2 dFX;Y ðx; yÞ 2=3 1
Z 5
1
1
1
Z
1
1
2 ½FX ðxÞ þ FY ðyÞ dFX;Y ðx; yÞ 2=3 ¼ 1=3
Now if X and Y have a perfect direct linear relationship, we can assume without loss of generality that X ¼ Y, so that
436
CHAPTER 11
FX;Y ðx; yÞ ¼ Then from (4.8) Z pc2 ¼ 2ð2Þ
1
FX ðxÞ FX ðyÞ Z
1
if x 4y if x > y
y
1
FX ðxÞfX ðxÞfX ðyÞ dx dy ¼ 2=3
For a perfect indirect relationship, we assume X ¼ Y, so that if x5 y FX ðxÞ FX ðyÞ FX;Y ðx; yÞ ¼ 0 if x < y and
Z pc2 ¼ 2 Z ¼
1 1
1
Z ¼
1
1
1
Z
1
y
½FX ðxÞ FX ðyÞfX ðxÞfX ðyÞ dx dy
f1 FX2 ðyÞ 2½1 FX ðyÞFX ðyÞgfX ðyÞ dy ½1 FX ðyÞ2 fX ðyÞ dy ¼ 1=3
Substitution of these extreme values in (4.7) shows that for any continuous population r, t, and E(R) all have the same value for the following cases: r ¼ t ¼ EðRÞ
X,Y relation Indirect linear dependence Independence Direct linear dependence
1 0 1
Although strictly speaking we cannot talk about a parameter for a bivariate distribution which is a coefﬁcient of rank correlation, it seems natural to deﬁne the pseudo rankcorrelation parameter, say r2, as that constant for which R is an unbiased estimator in large samples. Then from (4.7), we have the deﬁnition r2 ¼ lim EðRÞ ¼ 3ð2pc2 1Þ n!1
ð4:10Þ
and for a sample of size n, the relation between EðRÞ, r2 , and t is EðRÞ ¼
3t þ ðn 2Þr2 nþ1
ð4:11Þ
The relation between r2 (for ranks) and r (for variate values) depends on the relation between pc2 and covariance. From (4.9), we see that
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
437
pc2 ¼ 2E½FX ðXÞFY ðYÞ ¼ 2 cov½FX ðXÞ; FY ðYÞ þ 2E½FX ðXÞE½FY ðYÞ ¼ 2 cov½FX ðXÞ; FY ðYÞ þ 1=2 Since var½FX ðXÞ ¼ var½FY ðYÞ ¼ 1=12 we have 6pc2 ¼ r½FX ðXÞ; FY ðYÞ þ 3 and we see from (4.10) that r2 ¼ r½FX ðXÞ; FY ðYÞ Therefore r2 is sometimes called the grade correlation coefﬁcient, since the grade of a number x is usually deﬁned as the cumulative probability FX ðxÞ.
11.5 ANOTHER MEASURE OF ASSOCIATION
Another nonparametric type of measure of association for paired samples which is related to the Pearson productmoment correlation coefﬁcient has been investigated by Fieller, Hartley, Pearson, and others. This is the ordinary Pearson sample correlation coefﬁcient of (3.1) calculated using expected normal scores in place of ranks or variate values. That is, if xi ¼ EðUðiÞ Þ, where UðiÞ is the ith order statistic in a sample of n from the standard normal population and Si denotes the rank of the Y observation which is paired with the ith smallest X observation, the random sample of pairs of ranks ð1; s1 Þ; ð2; s2 Þ; . . . ; ðn; sn Þ is replaced by the derived sample of pairs ðx1 ; xs1 Þ; ðx2 ; xs2 Þ; . . . ; ðxn ; xsn Þ and the correlation coefﬁcient for these pairs is Pn x i x si RF ¼ Pi¼1 n 2 i¼1 xi This coefﬁcient is discussed in Fieller, Hartley, and Pearson (1957) and Fieller and Pearson (1961). The authors show that the transformed random variable
438
CHAPTER 11
ZF ¼ tanh1 RF is approximately normally distributed with moments
0:6 EðZF Þ ¼ tanh1 r 1 nþ8 varðZF Þ ¼
1 n3
where r is the correlation coefﬁcient in the bivariate population from which the sample is drawn. The authors also show that analogous transformations on R and T, ZR ¼ tanh1 R ZT ¼ tanh1 T produce approximately normally distributed random variables, but in the nonnull case the approximation for ZF is best.
11.6 APPLICATIONS
Kendall’s sample tau coefﬁcient (Section 11.2) is one descriptive measure of association in a bivariate sample. The statistic is calculated as T¼
2S 2ðC QÞ ¼ nðn 1Þ nðn 1Þ
where C is the number of concordant pairs and Q is the number of discordant pairs among ðXi ; Yi Þ and ðXj ; Yj Þ, for all i < j in a sample of n observations. T ranges between 1 and 1, with 1 describing perfect disagreement, 1 describing perfect agreement, and 0 describing no agreement. The easiest way to calculate C and Q is to ﬁrst arrange one set of observations in an array, while keeping the pairs intact. A pair in which there is a tie in either the X observations or the Y observations is not counted as part of either C or Q, and therefore with ties it may be necessary to list all possible pairs to ﬁnd the correct values for C and Q. The modiﬁed T is then calculated from (2.37) and called taub. The null hypothesis of independence between X and Y can be tested using T. The appropriate rejection regions and P values for an observed value t are as follows:
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
Alternative Positive dependence Negative dependence Nonindependence
439
Rejection region
P value
T 5 ta T 4 ta T 5 ta=2 or T 4 ta=2
PðT 5 tÞ PðT 4 tÞ 2 (smaller of above)
The exact cumulative null distribution of T is given in Table L of the Appendix as righttail probabilities for n 4 10. Quantiles of T are also given for 11 4 n 4 30. For n > 30, the normal approximation to the null distribution of T indicates the following rejection regions and P values: Alternative
Rejection region pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ T 5 za 2ð2n þ 5Þ=3 nðn 1Þ
P value pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Pðz 5 3t nðn 1Þ= 2ð2n þ 5Þ
Positive dependence pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Negative T 4 za 2ð2n þ 5Þ=3 nðn 1Þ Pðz 4 3t nðn 1Þ= 2ð2n þ 5Þ dependence Nonindependence Both above with za=2 2 (smaller of above)
This test of the null hypothesis of independence can also be used for the alternative of a trend in a timeordered sequence of observations Y if time is regarded as X. The alternative of an upward trend corresponds to the alternative of positive dependence. This use of Kendalls’s tau is frequently called the Mann test for trend. The Spearman coefﬁcient of rank correlation (Section 11.3) is an alternative descriptive measure of association in a bivariate sample. Each set of observations is independently ranked from 1 to n, but the pairs are kept intact. The coefﬁcient is given in (3.7) as R¼1
P 6 ni¼1 D2i nðn2 1Þ
where Di is the difference of the ranks of Xi and Yi . If ties are present we use (3.19). Interpretation of the value of R is exactly the same as for T and the appropriate rejection regions are also in the same direction. For small samples the null distribution of R is given in Table M in a form similar to pTable areﬃ ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ L. For large samples the rejection regions pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ simply R 5 za=2 n 1 for positive dependence and R4 za=2 n 1 for negative dependence. When R is used as a test for trend, it is frequently called the Daniels’ test for trend. Applications of both of these statistics are illustrated in Example 6.1.
440
CHAPTER 11
Alternative
Rejection region pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 ﬃ R 5 za npﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ R 4 za n 1 Both above with za=2
Positive dependence Negative dependence Nonindependence
P value pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ PðZ 5 r=pn 1ﬃÞ ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ PðZ 4 r= n 1Þ 2 (smaller of above)
Example 6.1 Two judges ranked nine teas on the basis of taste and fullbodied properties, with 1 indicating the highest ranking. Calculate the Kendall and Spearman measures of association, test the null hypothesis of independence, and ﬁnd the appropriate onetailed P value in each case, for the data shown below. Tea
A
B
C
D
E
F
G
H
I
Judge 1 Judge 2
1 4
5 3
9 6
7 8
4 2
6 7
8 9
2 1
3 5
Solution The ﬁrst step in calculating Kendall’s tau is to rearrange the data for Judge 1 in an array, keeping track of the corresponding rank of Judge 2 as shown below. Then the number of concordant pairs is counted as the number of Y ranks that are below and larger than each Y rank and then summed over all Y’s; the number of discordant pairs is counted in the same manner but for ranks below and smaller. Judge 1 1 2 3 4 5 6 7 8 9
Judge 2
C
Q
D
D2
4 1 5 2 3 7 8 9 6
5 7 4 5 4 2 1 0
3 0 2 0 0 1 1 1
3 1 2 2 2 1 1 1 3
9 1 4 4 4 1 1 1 9
28
8
0
34
We then calculate T ¼ 2ð20Þ=9ð8Þ ¼ 0:556. For the null hypothesis of independence the righttailed P value from Table P 2L is 0.022. The last two columns above show that Di ¼ 34 and we compute R ¼ 1 6ð34Þ=9ð80Þ ¼ 0:717, which is larger than T as expected.
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
441
The righttailed P value from Table M is P ¼ 0:018 for the alternative of positive dependence. At the time of this writing, MINITAB has no command for either Kendall’s tau or Spearman’s rho. However, we can use MINITAB to calculate Spearman’s rho by using the rank command on the data (for Judges 1 and 2, respectively) and then calculating the Pearson productmoment correlation coefﬁcient on these ranks. The result R ¼ 0:717 agrees with ours. The MINITAB P value is for a Pearson correlation and does not apply for Spearman’s rho. The STATXACT solution gives the coefﬁcients and the exact P values for a test of independence using both tau and rho, and all of these agree with ours. Note that the printout shows calculation of both ta and tb . These are equal because there are no ties in this example. The solution also shows tc and Somers’ d, which apply for data in a contingency table and are not covered in this book. For Kendall’s tau, STATXACT shows the asymptotic Pvalue based on the normal approximation PðZ 5 2:09Þ calculated from (2.30). For Spearman’s rho, it shows the asymptotic P value based on the approximation given in (3.15) using Student’s t distribution, Pðt 5 2:72Þ with 7 degrees of freedom. The expressions they use for calculating the asymptotic standard errors and conﬁdence interval estimates are not clear. The reader may verify, however, that they did not use our (2.36) because this gives an estimate of the variance of T which is negative in this example. As explained earlier, the estimate can be negative for n small, even though the exact value of the variance must be positive.
442
CHAPTER 11
We use the data in Example 6.1 to illustrate how T can be interpreted as a coefﬁcient of disarray, where Q, the number of discordant pairs, is the minimum number of interchanges in the Y ranks, one pair at a time, needed to convert them to the natural order. The X and Y ranks in this example are as follows. X
1
2
3
4
5
6
7
8
9
Y
4
1
5
2
3
7
8
9
6
In the Y ranks, we ﬁrst interchange the 4 and 1 to put 1 in the correct position. Then we interchange 2 and 5 to make 2 closer to its correct
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
443
position. Then we interchange 2 and 4. We keep proceeding in this way, working to get 3 in the correct position, and then 4, etc. The complete set of changes is as follows: Y
1 1 1 1 1 1 1 1
4 4 2 2 2 2 2 2
5 2 4 4 3 3 3 3
2 5 5 3 4 4 4 4
3 3 3 5 5 5 5 5
7 7 7 7 7 7 7 6
8 8 8 8 8 8 8 7
9 9 9 9 9 9 6 8
6 6 6 6 6 6 9 9
The total number of interchanges required to transform the Y ranks into the natural order by this systematic procedure is 8, and this is the value of Q, the total number of discordant pairs. We could make the transformation using more interchanges, of course, but more are not needed. It can be shown that Q ¼ 8 is the minimum number of interchanges. 11.7 SUMMARY
In this chapter we have studied in detail the nonparametric coefﬁcients that were proposed by Kendall and Spearman to measure association. Both coefﬁcients can be computed for a sample from a bivariate distribution, a sample of pairs, when the data are numerical measurements or ranks indicating relative magnitudes. The absolute values of both coefﬁcients range between zero and one, with increasing values indicating increasing degrees of association. The sign of the coefﬁcient indicates the direction of the association, direct or inverse. The values of the coefﬁcients are not directly comparable, however. We know that jRj 5 jTj for any set of data, and in fact jRj can be as much as 50 percent greater than jTj. Both coefﬁcients can be used to test the null hypothesis of independence between the variables. Even though the magnitudes of R and T are not directly comparable, the magnitudes of the P values based on them should be about the same, allowing for the fact that they are measuring association in different ways. The interpretation of T is easier than for R. T is the proportion of concordant pairs in the sample minus the proportion of discordant pairs. T can also be interpreted as a coefﬁcient of disarray. The easiest interpretation of R is as the sample value of the Pearson productmoment correlation coefﬁcient calculated using the ranks of the sample data.
444
CHAPTER 11
An exact test of the null hypothesis of independence can be carried out using either T or R for small sample sizes. Generation of tables for exact P values was difﬁcult initially, but now computers have the capacity for doing this for even moderate n. For intermediate and large sample sizes, the tests can be performed using large sample approximations. The distribution of T approaches the normal distribution much more rapidly than the distribution of R and hence approximate P values based on R are less reliable than those based on T. Both T and R can be used when ties are present in either or both samples, and both have a correction for ties that improves the normal approximation. The correction with T always increases the value of T while the R correction always decreases the value of R, making the coefﬁcients closer in magnitude. If we reject the null hypothesis of independence by either T or R, we can conclude that there is some kind of dependence or ‘‘association’’ between the variables. But the kind of relationship or association that exists deﬁes any verbal description in general. The existence of a relationship or signiﬁcant association does not mean that the relationship is causal. The relationship may be due to several other factors, or to no factor at all. Care should always be taken in stating the results of an experiment that no causality is implied, either directly or indirectly. Kendall’s T is an unbiased estimator of a parameter t in the bivariate population; t represents the probability of concordance minus the probability of discordance. Concordance is not the same as correlation, although both represent a kind of association. Spearman’s R is not an unbiased estimator of the population correlation r. It is an unbiased estimator of a parameter which is a function of t and the grade correlation. The tests of independence based on T and R can be considered nonparametric counterparts of the test that the Pearson productmoment correlation coefﬁcient r is equal to zero in the bivariate normal distribution or that the regression coefﬁcient b equals zero. The asymptotic relative efﬁciency of these tests relative to the parametric test based on the sample Pearson productmoment correlation coefﬁcient is 9=p2 ¼ 0.912 for normal distributions and one for the continuous uniform distribution. Both T and R can be used to test for the existence of trend in a set of timeordered observations. The test based on T is called the Mann test, and the test based on R is called the Daniels’ test. Both of these tests are alternatives to the tests for randomness presented in Chapter 3.
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
445
PROBLEMS 11.1. A beauty contest has eight contestants. The two judges are each asked to rank the contestants in a preferential order of pulchritude. The results are shown in the table. Answer parts (a) and (b) using (i) the Kendall taucoefﬁcient procedures and (ii) the Spearman rankcorrelationcoefﬁcient procedures: Contestant Judge
A
B
C
D
E
F
G
H
1 2
2 1
1 2
3 4
5 5
4 7
8 6
7 8
6 3
ðaÞ Calculate the measure of association. ðbÞ Test the null hypothesis that the judges ranked the contestants independently (use tables). ðcÞ Find a 95 percent conﬁdenceinterval estimate of t. 11.2. Verify the result given in (4.9). 11.3. Two independent random samples of sizes m and n contain no ties. A set of m þ n paired observations can be derived from these data by arranging the combined samples in ascending order of magnitude and (a) assigning ranks, (b) assigning sample indicators. Show that Kendall’s tau, calculated for these pairs without a correction for ties, is linearly related to the MannWhitney U statistic for these data, and ﬁnd the relation if the sample indicators are (i) sample numbers 1 and 2, (ii) 1 for the ﬁrst sample and 0 for the second sample as in the Z vector of Chapter 7. 11.4. Show that for the standardized bivariate normal distribution 1 Fð0;0Þ¼ 4
1 þ 2p
arcsin r
11.5. The Census Bureau reported that Hispanics are expected to overtake blacks as the largest minority in the United States by the year 2030. Use two different tests to see whether there is a direct relationship between number of Hispanics and percent of state population for the nine states below.
State California Texas New York Florida Illinois Arizona New Jersey New Mexico Colorado
Hispanics (millions)
Percent of state population
6.6 4.1 2.1 1.5 0.8 0.6 0.6 0.5 0.4
23 24 12 12 7 18 8 35 11
446
CHAPTER 11
11.6. Companyﬁnanced expenditures in manufacturing on research and development (R&D) are currently about 2.7 percent of sales in Japan and 2.8 percent of sales in the United States. However, when these ﬁgures are looked at separately according to industry, the following data from Mansﬁeld (1989) show some large differences.
Industry Food Textiles Paper Chemicals Petroleum Rubber Ferrous metals Nonferrous metals Metal products Machinery Electrical equipment Motor vehicles Other transport equipment Instruments
Japan
United States
0.8 1.2 0.7 3.8 0.4 2.9 1.9 1.9 1.6 2.7 5.1 3.0 2.6 4.5
0.4 0.5 1.3 4.7 0.7 2.2 0.5 1.4 1.3 5.8 4.8 3.2 1.2 9.0
ðaÞ Use the signedrank test to determine whether Japan spends a larger percentage than the United States on R&D. ðbÞ Determine whether there is a signiﬁcant positive relationship between percentages spent by Japan and the United States (two different methods). 11.7. The World Almanac and Book of Facts published the following divorce rates per 1000 population in the United States. Determine whether these data show a positive trend using four different methods. Year
Divorce rate
1945 1950 1955 1960 1965 1970 1975 1980 1985
3.5 2.6 2.3 2.2 2.5 3.5 4.8 5.2 5.0
11.8. For the time series data in Example 4.1 of Chapter 3, use the Mann test based on Spearman’s rank correlation coefﬁcient to see if the data show a positive trend.
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
447
11.9. Do Problem 11.8 using the Daniels’ test based on Kendall’s tau. 11.10. The rainfall measured by each of 12 gauges was recorded for 20 successive days. The average results for each day are as follows: Day April April April April April April April April April April
Rainfall
Day
Rainfall
0.00 0.03 0.05 1.11 0.00 0.00 0.02 0.06 1.15 2.00
April 11 April 12 April 13 April 14 April 15 April 16 April 17 April 18 April 19 April 20
2.10 2.25 2.50 2.50 2.51 2.60 2.50 2.45 0.02 0.00
1 2 3 4 5 6 7 8 9 10
Use an appropriate test to determine whether these data exhibit some sort of pattern. Find the P value: (a) Using tests based on runs with both the exact distribution and the normal approximation. (b) Using other tests that you may think are appropriate. (c) Compare and interpret the results of ðaÞ and ðbÞ. 11.11 A company has administered a screening aptitude test to 20 new employees over a twoyear period. The record of scores and date on which the person was hired are shown below.
1=4=01 3=9=01 6=3=01 6=15=01 8=4=01
75 74 71 76 98
9=21=01 10=4=01 10=9=01 11=1=01 12=5=01
72 77 76 78 80
12=9=01 1=22=02 1=26=02 3=21=02 4=6=02
81 93 82 84 89
5=10=02 7=17=02 9=12=02 10=4=02 12=6=02
91 95 90 92 93
Assuming that these test scores are the primary criterion for hiring, do you think that over this time period the screening procedure has changed, or the personnel agent has changed, or supply has changed, or what? Base your answer on an appropriate nonparametric procedure (there are several appropriate methods). 11.12. Ten randomly chosen male college students are used in an experiment to investigate the claim that physical strength is decreased by fatigue. Describe the relationship for the data below, using several methods of analysis.
448
CHAPTER 11
Minutes between rest periods
Pounds lifted per minute
5.5 9.6 2.4 4.4 0.5 7.9 2.0 3.3 13.1 4.2
350 230 540 390 910 220 680 590 90 520
11.13. Given a single series of timeordered ordinal observations over several years, name some nonparametric procedures that could be used and how in order to detect a longterm positive trend. Name as many as you can think of. 11.14. Six randomly selected mice are studied over time and scored on an ordinal basis for intelligence and social dominance. The data are as follows:
Mouse 1 2 3 4 5 6 ðaÞ ðbÞ ðcÞ ðdÞ
Intelligence
Social dominance
45 26 20 40 36 23
63 0 16 91 25 2
Find the coefﬁcient of rank correlation. Find the appropriate onetailed P value for your result in ðaÞ. Find the Kendall tau coefﬁcient. Find the appropriate onetailed P value for your result in ðcÞ.
11.15. A board of marketing executives ranked 10 similar products, and an ‘‘independent’’ group of male consumers also ranked the products. Use two different nonparametric procedures to describe the correlation between rankings and ﬁnd a onetailed P value in each case. State the hypothesis and alternative and all assumptions. Compare and contrast the procedures. Product
A
B
C
D
E
F
G
H
I
J
Executive ranks Independent male ranks
9 7
4 6
3 5
7 9
2 2
1 3
5 8
8 5
10 10
6 1
11.16. Derive the null distribution of both Kendall’s tau statistic and Spearman’s rho for n ¼ 3 assuming no ties.
MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES
449
11.17. A scout for a professional baseball team ranks nine players separately in terms of speed and power hitting, as shown below.
Player
Speed ranking
Powerhitting ranking
3 1 5 6 2 7 8 4 9
1 3 4 2 6 8 9 5 7
A B C D E F G H I
ðaÞ Find the rank correlation coefﬁcient and the appropriate onetailed P value. ðbÞ Find the Kendall tau coefﬁcient and the appropriate onetailed P value. 11.18. Twentythree subjects are asked to give their attitude toward elementary school integration and their number of years of schooling completed. The data are shown below. Attitude toward elementary school integration Number of years of school completed at the time 0–6 7–9 10–12 or G.E.D. Some college College degree (4 yr) Some Graduate Graduate degree
Strongly disagree
Moderately disagree
Moderately agree
Strongly agree
5 4 10 12 3
9 10 7 12 12 10
12 13 9 12 16 15 14
16 18 12 19 14
As a measure of the association between attitude and number of years of schooling completed: ðaÞ Compute Kendall’s tau with correction for ties. ðbÞ Compute Spearman’s R with correction for ties.
12 Measures of Association in Multiple Classiﬁcations
12.1 INTRODUCTION
Suppose we have a set of data presented in the form of a complete twoway layout of I rows and J columns, with one entry in each of the IJ cells. In the sampling situation of Chapter 10, if the independent samples drawn from each of I univariate populations were all of the same size J, we would have a complete layout of IJ cells. However, this would be called a oneway layout since only one factor is involved, the populations. Under the null hypothesis of identical populations, the data can be considered a single random sample of size IJ from the common population. The parallel to this problem in classical statistics is the oneway analysis of variance. In this chapter we shall study some nonparametric analogs of the twoway analysisofvariance problem, all parallel in the sense that the data are presented in the form of a twoway layout which cannot be considered a single random sample because of certain relationships among elements. 450
MEASURES OF ASSOCIATION IN MULTIPLE CLASSIFICATIONS
451
Let us ﬁrst review the techniques of the analysisofvariance approach to testing the null hypothesis that the column effects are all the same. The model is usually written Xij ¼ m þ bi þ yj þ Eij
for i ¼ 1; 2; . . . ; I
and
j ¼ 1; 2; . . . ; J
The bi and yj are known as the row and column effects, respectively. In the normaltheory model, the errors Eij are independent, normally distributed random variables with mean zero and variance s2E . The test statistic for the null hypothesis of equal column effects or, equivalently, H0: y1 ¼ y2 ¼ ¼ yJ is the ratio P ðI 1ÞI Jj¼1 ð xj xÞ2 PI PJ xi xj þ xÞ2 i¼1 j¼1 ðxij where xi ¼
J X xij J j¼1
xj ¼
I X xij I i¼1
x ¼
I X J X xij IJ i¼1 j¼1
If all the assumptions of the model are met, this test statistic has the F distribution with J1 and (I1)(J1) degrees of freedom. The ﬁrst two parallels of this design which we shall consider are the krelated or kmatched sample problems. The matching can arise in two different ways, but both are somewhat analogous to the randomizedblock design of a twoway layout. In this design, IJ experimental units are grouped into I blocks, each containing J units. A set of J treatments is assigned at random to the units within each block in such a way that all J assignments are equally likely, and the assignments in different blocks are independent. The scheme of grouping into blocks is important, since the purpose of such a design is to minimize the differences between units in the same block. If the design is successful, an estimate of experimental error can be obtained which is not inﬂated by differences between blocks. This model is often appropriate in agricultural ﬁeld experimentation since the effects of a possible fertility gradient can be reduced. Dividing the ﬁeld into I blocks, the plots within each block can be kept in close proximity. Any differences between plots within the same block can be attributed to differences between treatments and the block effect can be eliminated from the estimate of experimental error.
452
CHAPTER 12
The ﬁrst relatedsample problem arises where IJ subjects are grouped into I blocks each containing Jmatched subjects, and within each block J treatments are assigned randomly to the matched subjects. The effects of the treatments are observed, and we let Xij denote the observation in block i of treatment number j; i ¼ 1; 2; . . . ; I; j ¼ 1; 2; . . . ; J. Since the observations in different blocks are independent, the collection of entries in column number j are independent. In order to determine whether the treatment (column) effects are all the same, the analysisofvariance test is appropriate if the requisite assumptions are justiﬁed. If the observations in each row Xi1 ; Xi2 ; . . . ; XiJ are replaced by their ranking within that row, a nonparametric test involving the column sums of this I J table, called Friedman’s twoway analysis of variance by ranks, can be used to test the same hypothesis. This is a krelated sample problem when J ¼ k. This design is sometimes called a balanced complete block design and also a repeated measures design. The null hypothesis is that the treatment effects are all equal or H 0 : y1 ¼ y2 ¼ ¼ yJ and the alternative for the Friedman test is H1 : yi 6¼ yj
for at least one i 6¼ j
A related nonparametric test for the krelated sample problem is called Page’s test for ordered alternatives. The null hypothesis is the same as above but the alternative speciﬁes the treatment effects as occurring in a speciﬁc order, as for example, H 1 : y1 < y2 < < yJ For each of these problems the location model is that the respective cdf ’s are Fðx yi bj Þ. Another relatedsample problem arises by considering a single group of J subjects, each of which is observed under I different conditions. The matching here is by condition rather than subject, and the observation Xij denotes the effect of condition i on subject number j; i ¼ 1; 2; . . . ; I; j ¼ 1; 2; . . . ; J. We have here a random sample of size J from an Ivariate population. Under the null hypothesis that the I variates are independent, the expected sum of the I observations on subject number j is the same for all j ¼ 1; 2; . . . ; J. In order to determine whether the column effects are all the same, the analysisofvariance test may be appropriate. Testing the independence of the I variates involves a comparison of J column totals,
MEASURES OF ASSOCIATION IN MULTIPLE CLASSIFICATIONS
453
so that in a sense the roles of treatments and blocks have been reversed in terms of which factor is of interest. This is a krelated sample problem when I ¼ k. If the observations in each row are ranked as before, Friedman’s twoway analysis of variance will provide a nonparametric test of independence of the k variates. Thus, in order to effect consistency of results as opposed to consistency of sampling situations, the presentation here in both cases will be for a table containing k rows and n columns, where each row is a set of positive integer ranks. In this second relatedsample problem, particularly if the null hypothesis of the independence of the k variates is rejected, a measure of the association between the k variates would be desirable. In fact, this sampling situation is the direct extension of the pairedsample problem of Chapter 11 to the krelated sample case. A measure of the overall agreement between the k sets of rankings, called Kendall’s coefﬁcient of concordance, can be determined. This statistic can also be used to test the null hypothesis of independence, but the test is equivalent to Friedman’s test for n treatments and k blocks. An analogous measure of concordance will be found for k sets of incomplete rankings, which relate to the balanced incompleteblocks design. Another topic to be treated brieﬂy is a nonparametric approach to ﬁnding a measure of partial correlation or correlation between two variables when a third is held constant when there are three complete sets of rankings of n objects. 12.2 FRIEDMAN’S TWOWAY ANALYSIS OF VARIANCE BY RANKS IN A k 3 n TABLE AND MULTIPLE COMPARISONS
As suggested in Section 12.1, in the ﬁrst relatedsample problem we have data presented in the form of a twoway layout of k rows and n columns. The rows indicate block, subject, or sample numbers, and the columns are treatment numbers. The observations in different rows are independent, but the columns are not because of some unit of association. In order to avoid making the assumptions requisite for the usual analysis of variance test that the n treatments are the same, Friedman (1937, 1940) suggested replacing each treatment observation within the ith block by a number from the set f1; 2; . . . ; ng which represents that treatment’s magnitude relative to the other observations in the same block. We denote the ranked observations by Rij ; i ¼ 1; 2; . . . ; k; j ¼ 1; 2; . . . ; n, so that Rij is the rank of treatment number j when observed in block number i. Then Ri1 ; Ri2 ; . . . ; Rin is a permutation of the ﬁrst n integers, and R1j ; R2j ; . . . ; Rkj is the set of
454
CHAPTER 12
ranks given to treatment number j in all blocks. We represent the data in tabular form as follows:
1 1 2 : Blocks : : k Column totals
R11 R21
Treatments 2 n R12 R22
R1n R2n
::::::::::::::::::::::::::::::::::: Rk1 R1
Rk2 R2
Rkn Rn
Row totals nðn þ 1Þ=2 nðn þ 1Þ=2 : : : nðn þ 1Þ=2 knðn þ 1Þ=2
ð2:1Þ
The row totals are of course constant, but the column totals are affected by differences between treatments. If the treatment effects are all the same, each expected column total is the same and equals the average column total k(n þ 1)=2. The sum of deviations of observed column totals around this mean is zero, but the sum of squares of these deviations will be indicative of the differences in treatment effects. Therefore we shall consider the sampling distribution of the random variable.
n n X X kðn þ 1Þ 2 X nþ1 2 k S¼ Rj ¼ ð2:2Þ i¼1 Rij 2 2 j¼1 j¼1 under the null hypothesis of no difference between the n treatment effects, that is, H 0 : y1 ¼ y2 ¼ ¼ yn For this null case, in the ith block the ranks are assigned completely at random, and each row in the twoway layout constitutes a random permutation of the ﬁrst n integers if there are no ties. There are then a total of (n!)k distinguishable sets of entries in the k n table, and each is equally likely. These possibilities can be enumerated and the value of S calculated for each. The probability distribution of S then is fS ðsÞ ¼
us ðn!Þk
where us is the number of those assignments which yield s as the sum of squares of column total deviations. A systematic method of
MEASURES OF ASSOCIATION IN MULTIPLE CLASSIFICATIONS
455
generating the values of us for n, k from the values of us for n, k1 can be employed (see Kendall and Gibbons, 1990, pp. 150–151). A table of the distribution of S is given here in Table N of the Appendix for n ¼ 3, k 4 8 and n ¼ 4, k 4 4. More extensive tables for the distribution of Q, a linear function of S to be deﬁned later in (2.8), are given in Owen (1962) for n ¼ 3, k 4 15 and n ¼ 4, k 4 8. Other tables are given in Michaelis (1971), Quade (1972), and Odeh (1977) that cover the cases up to k ¼ 6, n ¼ 6. However, the calculations are considerable even using the systematic approach. Therefore, outside the range of existing tables, an approximation to the null distribution is generally used for tests of signiﬁcance. Using the symbol m to denote (n þ 1)=2, (2.2) can be written as
S¼
n X k X
ðRij mÞ2 þ 2
ðRij mÞðRpj mÞ
j¼1 1 4 i