by Julianna R. Calabrese, The Ohio State University
Quantitative methods are critical for psychological research and are fundamental to psychology curricula. Structural equation modeling, item response theory, and multilevel modeling have recently been brought to the forefront in graduate curriculums (Aiken et al., 2008). In parallel, recent research in journals have employed increasingly advanced statistical techniques like hierarchical linear modeling and Bayesian analyses (Blanca et al., 2018). The rise in sophisticated data analysis has been driven by easy access to powerful yet user-friendly software programs, such as SPSS, Mplus, and R (Aiken et al., 2008; Blanca et al., 2018). However, graduate courses rarely cover good programming practices. This article will provide a discussion and overview of good practices intended for clinical psychology doctoral students.
In my statistics classes so far, our assignments have used sample data, which has already been collected, cleaned, and organized with neat SPSS variable labels. However, this does not resemble the real world. As datasets become more multimodal and sample sizes larger, good data management practices become more important. Good data management practices include managing personal identifiers, crosswalking datasets, automated data processing, and software version control to maintain reproducibility, confidentiality, and credibility (Levenstein & Lyle, 2018; Peikert et al., 2021). Additionally, as open science practices become more prevalent (Winerman, 2017), so does the value of having clean, readable, reproducible code, including documentation of the steps taken to restructure and analyze data, sensible variable-naming conventions, efficient optimization to minimize runtime, inclusion of relevant comments, and code sharing on GitHub or OSF.
Since strong quantitative methods are necessary to answer many psychological research questions, good practices in data management and statistical coding should be incorporated into the standard graduate curriculum. This could be included as a module in the introductory statistics course or a required workshop before completing one’s Master’s. Students should be introduced to good practices as early as possible to avoid forming bad habits. In today’s landscape, good programming practices are necessary to produce and disseminate high-quality research. Below I summarize a few foundational good practices adapted from previous guides (Blischak et al., 2016; Cannell & Livingston, 2021; DeStasio, 2019; Mahoney, 2019).
1. Automate your data cleaning.
Data cleaning is an inevitable obstacle in research. For example, in Qualtrics surveys, variables are rarely given distinct names (what item was “Q27_1_TEXT”, again?), responses expire before being fully completed, and participants can complete the same survey multiple times. Do not give into temptation to manually clean your raw data manually in Excel, whether it be renaming variables or reshaping your data from wide to long format. Since manual data cleaning is seldom reproducible, automating your data processing in a syntax file will ensure that your colleagues can follow your steps later. (Mahoney, 2019).
2. Comment, comment, comment.
Good research should be easily understandable and so should good code. What does this section of code accomplish? What is the purpose of this loaded package? Commenting will allow others to understand the why behind your actions and choices. Without comments, a colleague is more likely to misinterpret your code, leading to misuse and potentially harmful edits. Please note that good comments will never make up for bad code, but good code is always accompanied with thorough comments (DeStasio, 2019).
3. Be consistent.
Consistency is key. This applies to 1) variable-naming practices, 2) the selection programs and packages, and 3) data organization (Blischak et al., 2016; Cannell & Livingston, 2021). Unlike “Q27_1_TEXT”, variables should be given concise yet descriptive names. If your project contains an enormous amount of variables, a data dictionary is likely necessary. When writing code for cleaning or analysis, stick to the minimum amount of programs and packages needed to accomplish your goals. Avoid using SPSS for one quick step in an otherwise R-dominated workflow. If you rely on a specific R package in data wrangling, it’s recommended to stick to package-specific functions for the sake of style and compatibility (e.g., tidyverse). Finally, your data and your code should be organized in a coherent folder structure with relevant subfolders for your datasets, data dictionaries, analyses, figures, and manuscript drafts.
______________________________________________________________________________
References
Aiken, L. S., West, S. G., & Millsap, R. E. (2008). Doctoral training in statistics, measurement, and methodology in psychology: Replication and extension of Aiken, West, Sechrest, and Reno’s (1990) survey of PhD programs in North America. American Psychologist, 63(1), 32–50. https://doi.org/10.1037/0003-066X.63.1.32
Blanca, M. J., Alarcón, R., & Bono, R. (2018). Current Practices in Data Analysis Procedures in Psychology: What Has Changed? Frontiers in Psychology, 9, 2558. https://doi.org/10.3389/fpsyg.2018.02558
Blischak, J., Chen, D., Dashnow, H., & Haine, D. (eds): “Software Carpentry: Programming with R.” Version 2016.06, June 2016, https://github.com/swcarpentry/r-novice-inflammation. https://doi.org/10.5281/zenodo.57541
Cannell, B., & Livingston, M. (2021). R for Epidemiology. https://www.r4epi.com/
DeStasio, K. L. (2019, April 24). R Best Practices. https://kdestasio.github.io/post/r_best_practices/
Levenstein, M. C., & Lyle, J. A. (2018). Data: Sharing Is Caring. Advances in Methods and Practices in Psychological Science, 95–103. https://doi.org/10.1177/2515245918758319
Mahoney, M. (2019). Introduction to Data Exploration and Analysis with R. https://bookdown.org/mikemahoney218/IDEAR/
Peikert, A., van Lissa, C. J., & Brandmaier, A. M. (2021). Reproducible research in R: A tutorial on how to do the same thing more than once. Psych, 3(4), 836-867. https://doi.org/10.3390/psych3040053
Winerman, L. (2017, November). Trend report: Psychologists embrace open science. Monitor on Psychology, 48(10). https://www.apa.org/monitor/2017/11/trends-open-science
Disclaimer: The views and opinions expressed in this newsletter are those of the authors alone and do not necessarily reflect the official policy or position of the Psychological Clinical Science Accreditation System (PCSAS).