Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 27;20(11):e1012604.
doi: 10.1371/journal.pcbi.1012604. eCollection 2024 Nov.

Eleven quick tips for properly handling tabular data

Affiliations

Eleven quick tips for properly handling tabular data

Marla I Hertz et al. PLoS Comput Biol. .
No abstract available

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. An illustration of the 11 tips for tabular data showcases good practices in green and poor practices in orange for tips 1 through 10.
Tip 11 is not pictured. The sample data table includes synthetic data produced using ChatGPT 3.5 on 2024-07-31 using the following prompt: “generate a synthetic dataset about tardigrades that shows good data management, no merged cells, no color used as format, descriptive headers, computer friendly.” The ChatGPT output (location, habitat, species name, and length columns) was exported to Excel and modified slightly for machine readability. Sample identification (ID) numbers and collection dates were randomly generated in Excel.
Fig 2
Fig 2. Example illustrating machine-readable spreadsheet header practices.
Each spreadsheet tab is named according to the date the data was taken in YYYY-MM-DD format and header row labels are formatted without spaces, special characters, or unnecessary capital letters for computer readability. For use with other data analysis programs outside of Excel, these tabs should be saved as separate files with the date in the file name.
Fig 3
Fig 3. Quality control measures for a clinical dataset.
(A) These parameters limit what can be entered in a cell to a range of acceptable values with standard formatting. RBC stands for red blood cell. (B) To set up data validation for each variable, go to the Data tab > Data Validation. Then, fill in the popup window with the data type allowed. The example shown here is for a dropdown list of location sites with the acceptable categories separated by commas.
Fig 4
Fig 4. Example of a well-organized folder for an experiment which maintains separate files for each step in the research process.
Metadata files include a data dictionary (see Fig 1A for an example) and a ReadMe, which is a plaintext file that compiles important information about the experiment. Note that the file icons displayed are not the only eligible format for that file type. For instance, metadata is commonly saved as a .txt file or use structure file formats such as XML or JSON.

References

    1. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al.. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. doi: 10.1038/sdata.2016.18 - DOI - PMC - PubMed
    1. Ziemann M, Eren Y, El-Osta A. Gene name errors are widespread in the scientific literature. Genome Biol. 2016;17:177. doi: 10.1186/s13059-016-1044-7 - DOI - PMC - PubMed
    1. Abeysooriya M, Soria M, Kasu MS, Ziemann M. Gene name errors: Lessons not learned. PLoS Comput Biol. 2021;17:e1008984. doi: 10.1371/journal.pcbi.1008984 - DOI - PMC - PubMed
    1. Kelion L. Excel: Why using Microsoft’s tool caused Covid-19 results to be lost. 5 Oct 2020. Available from: https://www.bbc.com/news/technology-54423988. Accessed 2024 Aug 13.
    1. Dobell E, Herold S, Buckley J. Spreadsheet Error Types and Their Prevalence in a Healthcare Context. JOEUC. 2018;30:20–42. doi: 10.4018/JOEUC.2018040102 - DOI

LinkOut - more resources