Scientific Data Presentation: a Picture Is Worth a Thousand Words
Graphs and tables are powerful storytelling tools and are critical components of scientific publications. Learn different ways to present data and things to consider
Updated on September 9, 2020
Why are graphs and tables important in scientific publications?
Graphs and tables are powerful storytelling tools and are critical components of scientific publications. Often readers will skip reading the main text of the manuscript entirely and will only look at the display items. Large complex datasets that would be complicated to explain in words can be quickly communicated via tables and figures. Therefore, it is important that your display items clearly communicate your most important findings and can stand alone from the text. Tables are an easy way to summarize large amounts of data, and a well-composed figure can convey a convincing argument just by visualizing the data. Below we present some guidelines to consider when presenting data in a manuscript.
Different ways to present data and things to consider
1. Tables
Tables are a concise and effective way to present large amounts of data. If you simply want to summarize specific information or if your message requires precise values, you should use a table. Tables are also a convenient display tool when you have many different units of measure, which can be difficult to present in an easy-to-read manner in a graph.
A well-designed table should have:
- clearly defined categories divided into rows and columns
- sufficient spacing between columns and rows
- clearly defined units
- easy-to-read font type and size
- a clear and concise legend or caption
Tables are a great way to present large amounts of data; however, they can take a long time to interpret and do not easily communicate data trends. The viewer must connect the dots between the data to see these trends in tables. When you want to show the relationship of the data, illustrate trends, or make comparisons, data plots are best.
2. Data plots
Data plots can quickly convey information from large quantities of data and are often used to show a functional or statistical relationship between two or more items.
Well-designed data plots should have:
- clearly labeled axes
- clearly specified units for the quantities displayed
- clearly defined plot elements in the legend (e.g., error bars, significance notations)
- easy-to-read font type and size
Data plots are used to display quantitative data or objective measurements or counts that can be either discrete or continuous. Examples of quantitative data include weight, height, temperature, counts, etc.
2.1 Discrete data
Discrete quantitative data are counts of categorical data and cannot be meaningfully divided into smaller increments. For example, a single household can have 1 or 2 pets, but it cannot have 1.5. Only a finite number of possible values can be recorded for a single observation.
Common display formats for discrete data include:
- Bar graphs: Bar graphs are a standard way to display discrete variables where each bar represents the proportion of observations within each category.
Figure 1. Bar graph summarizing questionnaire responses.
- Line graphs: Line graphs are a useful way to graph changes in discrete variables (usually over time).
Figure 2. Line graph summarizing the number of recovered patients over time.
2.2 Continuous data
Continuous data can take on any numeric value and can be divided into smaller increments, including fractional and decimal values. There are an infinite number of possible values between any two values. Measurements such as height, weight, and temperature can take on any numeric value, including fractional or decimal values, and are all examples of continuous data.
Common display formats for continuous data include:
- Histograms: Histograms show the distribution of continuous data and illustrate whether the distribution is symmetric (i.e., normal) or skewed.
Figure 3. Normal vs. skewed distributions displayed as histograms.
- Dot plots: Dot plots also show the distribution of data but are especially useful for highlighting clusters of data points and outliers.
Figure 4. Example of clustered data points and data point outlier displayed as dot plots.
- Box plots: Box plots are useful for displaying continuous variables divided into groups as box plots display the central tendency, spread, and outliers of each group.
Figure 5. Average monthly temperatures.
- Scatterplots: Scatterplots show the relationship between two continuous variables. Correlations often accompany scatterplots to assess the strength of the relationship.
Figure 6. Scatter plot demonstrating a positive and negative relationship and the corresponding Pearson's correlations.
Avoid using bar or line graphs to plot continuous data. Bar and line graphs obscure the data distribution and don't provide a complete picture to the reader as many different distributions can produce similar bar and line graphs. The figure below demonstrates how different datasets can produce the same bar graph.
Each of the scatterplots displayed on the right could produce the bar graph shown on the left which shows a difference between groups. The data shown in panel B demonstrate that the data are symmetrically distributed with a high degree of overlap between groups. In panel C, the difference between groups is largely driven by an outlier. The data shown in panel D are bimodally distributed in each group, suggesting potential subgroups that may warrant further investigation. In panel E, there are twice as many data points in Group A as there are in Group B. The narrower distribution of Group B may simply be due to the fact that there are fewer data points suggesting that more data are needed to verify the apparent between group difference.
Summary statistics (e.g., the bar graph above) may suggest conclusions that differ from what is concluded from the full dataset. When displaying continuous data, be sure to use a graph format that clearly indicates the distribution of the data so that readers can appropriately interpret the data.
Figure 7. Example of different data distributions producing the same bar graph. The data shown in panels B-E could all produce the bar graph shown in panel A.
The reader's understanding of a dataset is limited to what the authors present in their manuscripts. Figures and tables are an effective tool for communicating large amounts of data that would be complicated to explain in text. When composing a figure, be sure to choose a graph format that fully describes the data and provides readers with a complete picture. To make the most of your figures, consider the question that you aim to ask, the type of data that you are presenting, and what your readers can learn from it.
Published on 09/10/2020