Content
- Discovering R, and the RStudio environment
- Importance of tidy data in general and how it translates into dataframes in R
- Data manipulation and analysis using R standard commands and the tidyverse packages
- Data visualisation with ggplot2
Do you want to get started with reproducible data analysis with R, one of the most used software for the analysis of high throughput biology data?
R is a free and open-source software. It is one of the most widely used in the bio-medical research field, likely due to the availability of numerous R/Bioconductor packages specifically dedicated to high throughput data.
The goal of this training is to initiate wet-lab scientists to reproducible data analysis with R and its RStudio integrated environment, focusing on data manipulation, data visualisation and basic data analysis.
This training doesn’t require any previous knowledge of R. There are no programming or technical pre-requisities for this course, other than basic computer usage, such as general knowledge about files (binary and text files) and folders and as well as downloading files. Familiarity with a spreadsheet editor is helpful for the first chapter.
Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed:
Download R from the CRAN page: https://cloud.r-project.org/. At the top of that page, choose the Download R link corresponding to your operating system. If you use Windows, follow install R for the first time, then click the link to download R. The installation procedure is like any other software, and you can safely use all default options. If you use Mac (OS X), download the pkg installer that matches you OS version and install like any other software. Linux users are advised to use their package manager.
Download and install the Rstudio Desktop Open source edition: https://rstudio.com/products/rstudio/download/#download. Choose the installer for your operating system and version. Install as any other software.
For technical assistance https://moodle.uclouvain.be/course/view.php?id=4862
| Day 1 | Day 2 | |
|---|---|---|
| 9h-10h30 | Data organisation with Spreadsheets R and Rstudio | Data visualization |
| 10h45-12h45 | Introduction to R Starting with data | Data visualization (cont) and joining tables |
| 13h45-15h45 | Starting with data Manipulating and analyzing data with dplyr | Summary exercise |
| 16h-17h | Manipulating and analyzing data with dplyr | Further topics |
References are provided throughout the course. Several stand out however, as they cover large parts of the material or provide complementary resources.
The material for the first chapters, covering the Introduction to data science with R, is based on the Data Carpentry Ecology curiculum (von Hardenberg et al. 2019von Hardenberg, Achaz, Adam Obeng, Aleksandra Pawlik, Alex Pletzer, Alexey Shiklomanov, Anne Fouilloux, April Wright, et al. 2019. “Data Carpentry: R for Data Analysis and Visualization of Ecological Data.” Edited by Francois Michonneau and Auriel Fournier. https://doi.org/10.5281/zenodo.569338.).
General references for this course are R for Data Science (Grolemund and Wickham 2017Grolemund, Garrett, and Hadley Wickham. 2017. R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/.) and Bioinformatics Data Skills (Buffalo 2015Buffalo, Vince. 2015. Bioinformatics Data Skills. O’Reilly Media, Inc.).
The RStudio Cheat Sheets are also a handy resource and readers will be pointed to specific sheets in the respective chapters.
This training is organised by the SMCS in partnership with Laurent Gatto, from the CBIO Lab at the de Duve Institute and is being taught by Axelle Loriot and Laurent Gatt at the UCLouvain, Belgium.
This material is written in R markdown (Allaire et al. 2025Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2025. Rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.) and compiled as
a book using knitr (Xie 2025bXie, Yihui. 2025b. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.) bookdown (Xie 2025aXie, Yihui. 2025a. Bookdown: Authoring Books and Technical Documents with r Markdown. https://github.com/rstudio/bookdown.). The redered
material can be read on-line at
https://uclouvain-cbio.github.io/bioinfo-training-01-intro-r
This material is licensed under the Creative Commons Attribution-ShareAlike 4.0 License.
For chapter 1 about Data organisation with Spreadsheets, a spreadsheet programme is necessary.
We will be using the R environment for statistical computing as main data science language. We will also use the RStudio interface to interact with R and write scripts and reports. Both R and RStudio are easy to install and works on all major operating systems.
Once R and RStudio are installed, a set of packages will need to be installed. See section 9.1 for details.
Page built: 2025-11-06 using R version 4.5.0 (2025-04-11)