Figure and File Type Basics
One of the issues that can make generating figures for research papers challenging is the array of file types that must be dealt with. In this article, we break down the different types of files a scientist is likely to encounter and explain the strengths and weaknesses of each.
Updated on May 21, 2012
One of the issues that can make generating figures for research papers challenging is the array of file types that must be dealt with. Image files come in a variety of formats, scientific and publishing software expands the spectrum even further, and converting between different formats doesn't always occur seamlessly. Here we break down the different types of files a scientist is likely to encounter and explain the strengths and weaknesses of each.
Pixel-based file formats are the file types that many of us are most familiar with, and in the simplest terms, they work by storing an image as a gridwork of colored dots, known as pixels. The most important thing for a scientist to understand about these files is how file types differ regarding the way that they store the gridwork of pixels. In general terms, pixel-based file formats can be further divided into two categories: lossy and lossless.
Lossy file formats are named as such because they result in the loss of information every time the file is saved, with Jpgs being the most common lossy file type encountered in research. The advantage of the compression used with a Jpg is a greatly reduced file size, but at the cost of “blurring” of the image. The reduction in image quality is often subtle enough that using this file format is preferable for taking pictures of birthday parties and other day-to-day events, particularly since these types of images are unlikely to be repeatedly modified and saved. However, this file type can be a real problem for many research applications, since the image subtly changes every time it is opened and resaved. Different programs use slightly different compression protocols, and many programs allow the user to input the level of compression used, making it impossible to reverse the compression and retrieve the exact information that was present in the original image if the original image is overwritten. So if your primary data consist of images in Jpg files, then you should be aware that you or your colleagues are changing your data every time those files are saved.
Jpg and other lossy formats can cause researchers problems in another arena as well. As mentioned above, the compression protocols used in creating Jpgs cause blurring of the image. This effect has little aesthetic impact on pictures of a birthday party. However, if you save a graph, flowchart, or other drawing as a Jpg, you will find that the edges in the figure become “fuzzy” regardless of the actual resolution (pixel density) of the image (see Figure 1). As described above, lower quality settings chosen upon saving and/or repeated saving of the same file will further degrade the image. The degradation of these images doesn't have the larger consequence of loss of data that saving primary data as a Jpg does, but it does result in an image that is often too poor in quality for publication.
As a general rule, we discourage any researcher from using Jpgs or other lossy file formats to store their data.
AJE offers figure formatting and custom illustration for researchers in all fields. For more information and examples, see our Illustration and Figures service page.
In contrast to lossy file formats, lossless formats do not result in any degradation of the image upon saving. This fidelity generally comes at the cost of larger file sizes. There are a number of lossless formats, but the format that is encountered most often in research is Tiff files. In the uncompressed form, the Tiff file format simply records the color information and position of each pixel in the gridwork, although there are lossless compression protocols (such as LZW) that can be applied to Tiff files that apply a shorthand to this color information to reduce the file size without compromising the fidelity of the image. (There are also lossy compression protocols that can be used in Tiff files, but one doesn't encounter these very often.) Should you choose to save a graph, diagram, or flowchart as a Tiff, the edges of the elements in these images will remain crisp and clean (within the resolution limits of the image) without any of the “fuzziness” that is associated with Jpg files (see Figure 1). For these reasons, Tiff files are the preferred format for both the storage of primary data and the submission of pixel-based figures to journals for publication.
Figure 1. Because of the lossy compression protocols used in Jpg files, repeated saving of a file will result in the change of individual pixel values and lead to “fuzzy” edges on text and diagrams compared to a lossless format (Tiff) at the same resolution.
A word of caution should be mentioned here regarding lossless files. The term “lossless” simply means that opening a file and re-saving it will not degrade the image, but it does not mean that the original pixel data will remain stored if those pixels are altered before the file is saved, over-writing the original file. In practical terms, this means that if you open a Tiff and make changes to the brightness or contrast and click “Save”, the original pixel information has been lost. For that reason, you should always archive the original images so you can go back to them if you need to.
In contrast to pixel-based formats, vector-based files store a figure as a collection of text and discrete shapes. The major functional implication of this alternate storage method is the ability to edit individual elements of the figure. In a vector file you can change the spelling of a word or the thickness of a line in a graph, while the same elements in a pixel-based file have been reduced to a picture of a word or a picture of a graph, and as such, the individual elements are set and cannot be edited. Another advantage to vector-based formats is what is termed “infinite resolution”; this means that no matter how much a vector-based file is enlarged, you will never see the gridwork of color that you would if you attempted to do the same thing with a pixel-based file (See Figure 2). There are two major classes of software that researchers encounter that use vector-based image formats: various types of research- and graphing-specific software and graphic design software.
Figure 2. Unlike pixel-based files, vector-based graphs and drawings can be enlarged without any loss of resolution, leading to so-called “infinite resolution”.
When many people think of vector-based files, they think of the publication-oriented software described below. However, if you think about the definition provided above, many of the programs that we use in the course of research actually produce vector-based drawings. Almost all graphing programs produce vector-based graphs, which is why you can change the symbols on a data series or the type of font used on the axis. We have also seen software specific for producing flowcharts or drawing circuits that function the same way, and all the presentation software we have worked with functions in a vector-based manner. However, the native files for these programs aren't true vector format files, since these files don't just contain the image, but also all the other information required for the program to generate that image. Fortunately, within most of these programs we can export the graph or drawing as a more standardized vector-file format, such as Eps, Emf, Wmf, or sometimes Pdf. (Most of these programs will also allow you to save the graph or drawing as a Jpg or Tiff, but often the choice of resolution and other aspects of the file are limited, and once you convert the graph or diagram to a pixel-based format you lose all of the editing capability inherent in a vector-based format.) Of all the vector-based formats, the Eps format is usually the best option, being the most universal file format of the common options and often the format of choice of journals.
Graphic design software
Publication-oriented software programs such as Adobe Illustrator or CorelDRAW are traditionally associated with vector-based files, and they can be used to edit any of the vector files mentioned in the previous section. While every program has its own native file format, they will all also save your figures as Eps files, the journal-preferred format. The largest drawback to vector-based file formats is, in general, the specialized software required to work with these files and the time and energy investment required to use these programs effectively. However, these programs are very powerful and often the most efficient way of generating publication-ready figures.
Where things can get confusing
One point of confusion that often arises in a discussion of pixel- and vector-based file types is that the two are not strictly exclusive. You can insert a pixel-based image into a vector-based file. We do it all the time when we place a picture in a PowerPoint presentation. When you enlarge such a file, at a certain point, you will see the individual pixels in the picture while the surrounding text, lines, and other vector-based elements remain crisp and smooth-edged. However, in general, the reverse is not true; you cannot have vector-elements in a pixel-based file. If you have a graph saved as a Jpg or Tiff, you will not be able to open that file and edit any text. In addition, some programs exploit the ‘pixel-based image in a vector file' trick when they save or export an image as a Pdf file, which means that you cannot always edit the individual elements of a Pdf file in Adobe Illustrator or an equivalent program.
One of the most common challenges our clients encounter when we assist them with the preparation of their figures is the storage of their data in a suboptimal file format. Once you have a manuscript written and you are ready to publish, it can be very frustrating to discover problems with the figures that could interfere with your ability to submit the paper. Hopefully, with the information presented here, you will be able to make informed decisions about the best way to digitally store your visual data and avoid complications with the final preparation of your figures.
AJE offers formatting of journal figures and custom illustration for researchers in all fields. For more information, see our Illustration and Figures service page.