Exploratory data analysis (EDA) provides a perspective and set of tools to search for clues and patterns in the data. EDA augments rather than supplants traditional statistics. In addition to numerical summaries of location, spread, and shape, EDA uses visual displays to provide a complete and accurate impression of distributions and variable relationships. Frequency tables array data from lowest to highest values with counts and percentages. They are most useful for inspecting the range of responses and their repeated occurrence. Bar charts and pie charts are appropriate for relative comparisons of nominal data. Histograms are optimally used with continuous variables where intervals group the responses. The Pareto diagram is a bar chart whose percentages sum to 100 percent. The causes of the problem under investigation are sorted in decreasing importance, with bar height descending from left to right. Stem-and-leaf displays and boxplots are EDA techniques that provide visual representations of distributions. The former present actual data values using a histogram-type device that allows inspection of spread and shape. Boxplots use the five-number summary to convey a detailed picture of a distribution's main body, tails, and outliers. Both stem-and-leaf displays and boxplots rely on resistant statistics to overcome the limitations of descriptive measures that are subject to extreme scores.
The examination of relationships involving categorical variables employs cross-tabulation. The tables used for this purpose consist of cells and marginals. The cells may contain combinations of count, row, column, and total percentages. The tabular structure is the framework for later statistical testing. Computer software for cross-classification analysis makes table-based analysis with one or more control variables an efficient tool for data visualization and later decision making. An advanced variation on n-way tables is automatic interaction detection (AID).. . .