Descriptive Statistics and Measures of Central Tendency and Dispersion
© 1998 by Dr. Thomas W. MacFarland -- All Rights Reserved
************ cent_tnd.doc ************ Background: Quite often when examining data and relationships between data, it is useful to offer a general view of the data. Imagine an array of data, representing final examination test scores in a computer science education class: -- How many students sat for the test? -- What is the average test score and are there multiple definitions of the term "average?" -- Did most test scores come close to the average, or was there a wide degree of variance in test scores? -- What was the range of test scores, from the lowest test score to the highest test score? The following listing identifies a series of statistical measures of central tendency (closeness to the "average" score) and dispersion (spread or variance in the range of scores away from average score) typically used in the social sciences: 1. Measures of central tendency or closeness to the average score: A. Mode ...... most frequent score B. Median .... mid-point of an array C. Mean ...... arithmetic average (Sum/N) In the "perfect" bell-shaped curve, all three measures of central tendency would be equivalent. 2. Measures of dispersion, spread, or variance in the range of scores away from the average score: A. Variance ... the sum of squared deviations from the mean B. SD ......... the standard deviation, or the square root the variance C. Range ...... the spread from the lowest score to the highest score It is common to present in summary statistics a listing of these descriptive statistics, to give the reader a general view of the data. In our current example, you would typically identify: -- N or number of valid final examination test scores -- Average final examination test score -- Mode -- Median -- Mean -- Variance in final examination test scores -- SD (Std dev) or standard deviation -- Range of scores from minimum score to highest test score This information gives a far more complete description of test results than merely stating that "the average test score was 80 out of 100." Scenario: A computing technology teacher administered a final examination at the end of a nine-week term. In an attempt to better understand the progress of her students, she prepared a data file and then used leading software products to examine final examination outcomes. Scores (potentially ranging from 000 to 100) for her 23 students are presented in Table 1: Table 1 Scores for a Computing Technology Final Examination =================================================== Student Number Score --------------------------------------------------- 01 089 02 092 03 073 04 083 05 056 06 082 07 077 08 092 09 100 10 067 11 071 12 076 13 083 14 086 15 077 16 049 17 071 18 084 19 091 20 088 21 082 22 077 23 097 ___________________________________________________ Files: 1. cent_tnd.doc 2. cent_tnd.dat 3. cent_tnd.r01 4. cent_tnd.o01 5. cent_tnd.con 6. cent_tnd.lis Command: At the UNIX prompt (%), key: %spss -m < cent_tnd.r01 > cent_tnd.o01 Contact your system administrator if you need to use another command to effect SPSS-X in batch mode. Of course, slight modifications may be necessary if you use SPSS on a PC. ************ cent_tnd.dat ************ 01 089 02 092 03 073 04 083 05 056 06 082 07 077 08 092 09 100 10 067 11 071 12 076 13 083 14 086 15 077 16 049 17 071 18 084 19 091 20 088 21 082 22 077 23 097 ************ cent_tnd.r01 ************ SET WIDTH = 80 SET LENGTH = NONE SET CASE = UPLOW SET HEADER = NO TITLE = Descriptive Statistics and Central Tendency COMMENT = This file examines scores on a computing technology final examination DATA LIST FILE = 'cent_tnd.dat' FIXED / Stu_Code 20-21 Score 39-41 Variable Labels Stu_Code "Student Code" / Score "Exam Score " FREQUENCIES VARIABLES = Score / STATISTICS = All ************ cent_tnd.o01 ************ 1 SET WIDTH = 80 2 SET LENGTH = NONE 3 SET CASE = UPLOW 4 SET HEADER = NO 5 TITLE = Descriptive Statistics and Central Tendency 6 COMMENT = This file examines scores on a computing 7 technology final examination 8 DATA LIST FILE = 'cent_tnd.dat' FIXED 9 / Stu_Code 20-21 10 Score 39-41 11 This command will read 1 records from cent_tnd.dat Variable Rec Start End Format STU_CODE 1 20 21 F2.0 SCORE 1 39 41 F3.0 12 Variable Labels 13 Stu_Code "Student Code" 14 / Score "Exam Score " 15 16 17 FREQUENCIES VARIABLES = Score 18 / STATISTICS = All SCORE Exam Score Valid Cum Value Label Value Frequency Percent Percent Percent 49 1 4.3 4.3 4.3 56 1 4.3 4.3 8.7 67 1 4.3 4.3 13.0 71 2 8.7 8.7 21.7 73 1 4.3 4.3 26.1 76 1 4.3 4.3 30.4 77 3 13.0 13.0 43.5 82 2 8.7 8.7 52.2 83 2 8.7 8.7 60.9 84 1 4.3 4.3 65.2 86 1 4.3 4.3 69.6 88 1 4.3 4.3 73.9 89 1 4.3 4.3 78.3 91 1 4.3 4.3 82.6 92 2 8.7 8.7 91.3 97 1 4.3 4.3 95.7 100 1 4.3 4.3 100.0 ------- ------- ------- Total 23 100.0 100.0 Mean 80.130 Std err 2.546 Median 82.000 Mode 77.000 Std dev 12.211 Variance 149.119 Kurtosis .914 S E Kurt .935 Skewness -.813 S E Skew .481 Range 51.000 Minimum 49.000 Maximum 100.000 Sum 1843.000 Valid cases 23 Missing cases 0 ************ cent_tnd.con ************ Conclusion: Descriptive statistics and measures of central tendency for final examination test scores follow: N Mode Median Mean SD Range ============================================ 23 77 82 80.1 12.2 51: 49 to 100 Far greater detail (perhaps too much detail) on descriptive statistics for final examination test scores can be found at the end of the output file (cent_tnd.o01). As you examine this section of the output file, be sure to notice that: -- N = 23, which is to say that there were 23 students who had scores for this examination. -- Three separate values were provided for the "average" score: -- Mode (most frequent) was 77 -- Median (mid-point of the array of all final examination scores) was 82 -- Mean (arithmetic average, or Sum of all final examination scores / Number of final examination scores) was 80.1 -- Variance is expressed by two leading statistics: -- Standard Deviation (Std dev or SD, representing dispersion of final examination scores away from the mean) was 12.2 -- Range in final examination scores was 51, from a minimum score of 49 to a maximum score of 100 Each statistic is useful in our attempt to place context to outcomes. Although it is very common to only see N, Mean, and SD presented in the literature, the other statistics presented above give a more complete picture of outcomes. ************ cent_tnd.lis ************ % minitab MTB > outfile 'cent_tnd.lis' Collecting Minitab session in file: cent_tnd.lis MTB > # MINITAB addendum to cent_tnd.dat MTB > read 'cent_tnd.dat' c1 c2 Entering data from file: cent_tnd.dat 23 rows read. MTB > print c1 C1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 MTB > print c2 C2 89 92 73 83 56 82 77 92 100 67 71 76 83 86 77 49 71 84 91 88 82 77 97 MTB > histogram c2 Histogram of C2 N = 23 Midpoint Count 50 1 * 55 1 * 60 0 65 1 * 70 2 ** 75 5 ***** 80 2 ** 85 4 **** 90 5 ***** 95 1 * 100 1 * MTB > stem-and-leaf c2 Stem-and-leaf of C2 N = 23 Leaf Unit = 1.0 1 4 9 1 5 2 5 6 2 6 3 6 7 6 7 113 10 7 6777 (5) 8 22334 8 8 689 5 9 122 2 9 7 1 10 0 MTB > dotplot c2 . . . . : . .: ::. . .. .: . . ---+---------+---------+---------+---------+---------+---C2 50 60 70 80 90 100 MTB > tally c2 C2 COUNT 49 1 56 1 67 1 71 2 73 1 76 1 77 3 82 2 83 2 84 1 86 1 88 1 89 1 91 1 92 2 97 1 100 1 N= 23 MTB > describe c2 N MEAN MEDIAN TRMEAN STDEV SEMEAN C2 23 80.13 82.00 80.67 12.21 2.55 MIN MAX Q1 Q3 C2 49.00 100.00 73.00 89.00 MTB > stop -------------------------- Disclaimer: All care was used to prepare the information in this tutorial. Even so, the author does not and cannot guarantee the accuracy of this information. The author disclaims any and all injury that may come about from the use of this tutorial. As always, students and all others should check with their advisor(s) and/or other appropriate professionals for any and all assistance on research design, analysis, selected levels of significance, and interpretation of output file(s). The author is entitled to exclusive distribution of this tutorial. Readers have permission to print this tutorial for individual use, provided that the copyright statement appears and that there is no redistribution of this tutorial without permission. Prepared 980316 Revised 980914 end-of-file 'cent_tnd.ssi'