Introduction
R is an open-source programming language and environment for statistical computing, data science, visualization, and reproducible research. It is especially common in statistics, bioinformatics, econometrics, social science, machine learning, and academic data science workflows.
R is built around vectorized data structures, interactive exploration, package-based extension, and strong graphics support. Its ecosystem is useful for the full research cycle: importing data, cleaning it, fitting statistical models, producing plots, running experiments, and publishing reports.
Typical Workflow
- Install R from CRAN or the system package manager.
- Create a project directory for code, data, figures, and reports.
- Use scripts, notebooks, or Quarto/R Markdown documents for reproducible work.
- Install packages with
install.packagesand load them withlibrary. - Keep project dependencies reproducible with tools such as
renv. - Record environment details with
sessionInfowhen sharing results.
RStudio R IDEs and Editors
RStudio is the most common IDE for R. It provides an editor, console, environment browser, plotting pane, package tools, project support, and integration with notebooks and reports.
Posit is the company behind RStudio and related data-science tooling.
- RStudio is becoming Posit: https://posit.co/blog/rstudio-is-becoming-posit/
Packages
R packages extend the language with functions, datasets, modeling tools, plotting systems, and interfaces to external software. CRAN is the central package repository, while Bioconductor is widely used for bioinformatics and computational biology.
Common package families:
tidyverse: data import, cleaning, transformation, and visualization.ggplot2: grammar-of-graphics plotting.dplyrandtidyr: data manipulation and reshaping.data.table: high-performance tabular data processing.knitr,rmarkdown, andquarto: reproducible reports.shiny: interactive web applications.tidymodels: modeling and machine learning workflows.
Basic Commands
# Install and load packages
install.packages("tidyverse")
library(tidyverse)
# Inspect the current environment
getwd()
sessionInfo()
# Read and write CSV files
df <- read.csv("data/input.csv")
write.csv(df, "data/output.csv", row.names = FALSE)Reproducibility Notes
Prefer project-relative paths, keep raw data separate from derived data, and place reusable code in scripts or package-style functions. For research projects, save both the analysis code and the package environment so results can be rerun later.
Useful habits:
- Use one project directory per analysis.
- Avoid manual edits to intermediate data files.
- Keep figures, tables, and reports generated from source code.
- Use version control for scripts and notebooks.
- Capture package versions with
renvorsessionInfo.