1 Welcome to Intro to R

We are very excited you could make it to this Intro to R workshop! We hope to leave this workshop with some practical practice under our belts working with data in R and the ability to find and understand resources when applying these steps to our own datasets.

1.1 Why R?

R is a programming language designed specifically for statistics. Because it is a free and open source language, many people have contributed to the R ecosystem. In addition to being very powerful and expressive for statistical analysis, R has become very popular for doing exploratory data visualization and reporting as well.

1.2 When to use R?

  • R is limited in producing highly-customized and interactive reporting visualizations
    While R is able to produce interactive graphics, maps, and dashboards, a lot of the more expressive and customized interactive data presentations will require knowledge of JavaScript. The amount you can do directly from within R is increasing rapidly, however, and you can read more about that here.

  • R is not as expressive for general tasks such as scraping and crawling
    New additions to R like tidyr and dplyr are making R more expressive; the result is R is becoming a strong competitor in more general tasks like data-mining, data-wrangling, and scraping and crawling for data.

  • R requires data to be loaded into memory
    For projects where exceptionally large data needs to be analyzed or processed very quickly, we will want to consider using tools that help us work around the memory limitations.

Overall, R is a top choice for data exploration, modelling, analysis, and static graphics.

1.3 What is the scope of this workshop?

This workshop will cover steps and code that can be readily adapted for small to moderate sized datasets.

Specifically, we will be:

  1. Loading in data
  2. Exploring it
  3. Cleaning it
  4. Calculating descriptive statistics
  5. Visualizing it
  6. Modeling it
  7. Making a report

We will also cover how to get help and how to learn more about using R. Additionally, we’ve pulled together different resources for things we aren’t covering into our appendix.

While we will not be able to cover how to handle a wide variety of different datasets and visualizations, our mentors have a wide range of experiences with using R for energy data, bioinformatics, mapping, natural language processing, and much more! We hope to get to know you throughout the workshop and share domain specific experiences with you.