What is R?
R is both a programming language and a software environment that can be used in a wide range of areas: statistics, reporting, data management, visualization, web scraping, machine learning and so on.
It is a so-called “open source” software environment, meaning that the source code is made freely available, enabling anyone to contribute to the development of the software. This is why R can be used for so many different things and why it is developing so rapidly.
It can be downloaded on the R Project Site.
Why use R?
When talking about ‘Big Data’, digital methods, data science, ‘computational social science’ or any other buzz-term within data analysis, getting familiar with programming is essential in order to really make use of all the developments in these areas (webscraping, automated data collection, machine learning and so on). R is of course not the only option (many would advocate for Python as a better alternative) but where other programming languages are meant for a variety of applications (like software and app development, using bots and building websites), R is more specifically tailored towards data analysis while also being able to perform various programming tasks.
Pretty much any kind of analysis you can think of, R can handle in some way or another (with varying difficulty of course). Once you really get into using R (or another programming language and software environment), you rarely need to use anything else.
Because R is free and open source you can dive into it immediately and there are a ton of free resources out there to get you started. If you are used to software with a GUI (“graphical user interface”) where you perform actions in the program by pointing and clicking, then switching to R is a bit of a shock and it does have a steep learning curve. If you have worked with the SPSS syntax, STATA do-files or SAS editor-files, then you actually have dabbled a bit in the world of programming. Working with R is similar to only working with do-files and the like but that is also what makes the software so incredibly flexible and versatile. R can work with all sorts of file-formats and can process both texts, numbers, maps, webcontent and so on. R is also used across various scientific disciplines making sure that a lot of the extensions (called ‘packages’) are thoroughly tested and reviewed.
We have gathered some good places to start getting to know R. CALDISS will also regularly have workshops in using R so keep an eye on our event calendar!
Where should I start?
As pointed out by others, there is not a single resource that will allow you to master R. Given the wide applicability of R and the many scientific fields within which it is used, the material is always developed with certain scientific fields in mind. Expecting to learn everything that R has to offer is unrealistic and unnecessary so finding the best resource depends on what you are looking to do with R. Browse around the different resources here and see which of them pique your interest.
This short guide by Data Carpentry introduces the essentials when making the shift from statistical software like SAS, SPSS or STATA to R. It covers installing R and RStudio (which makes it easier to work with R), how to navigate the software, its fundamentals and structure and well as basic data management techniques and some visualization basics.
This guide is a good place to start to get a feel for how working with statistical data in R differs from other software solutions.
This guide introduces the basics and the building blocks of R in some detail while also introducing the basics of various programming functions. It is written by Peter Haschke for the Star Lab at University of Rochester Department of Political Science. It is written with social scientists in mind as the readers so it makes few to no assumptions about prior knowledge about programming.
It is a good place to get a good all-round view of how R works and how to get started.
“R for Data Science” is an introductory book about R compiled and edited by Hadley Wickham and Garrett Grolemund, who both have contributed immensely to the development of R utilities and learning materials.
The book covers a wide variety of topics from working with the different data structures of R to visualization, data manipulation, recoding, model building and so on.
The book does state that it is helpful to have some programming experience but it does introduce coding basics. If you have worked with SAS editor-files, STATA do-files or the like, then starting with this book should not be too difficult.
It is worth noting that the book jumps straight into data visualization which contains rather complex coding examples. If you want to get an understanding of the fundamentals, then you can start with chapter 2 on coding basics.
The scope of this guide by Computerworld is similar to that of the Data Carpentry guide. It is a good place to start when wanting to learn how to do statistics in R. Compared to the Data Carpentry guide it contains more examples and gives several arguments throughout as to why R is a great tool for data analysis.
Navigating the guide on the webpage can be a bit confusing but it can be downloaded as a PDF after registering at the site.
The R Tutorial by Tutorialspoint is a massive guide containing everything from setup, R basics and data structures to how to do specific statistical models, charts and reading different data files.
There is very little hand-holding in this guide and it is aimed towards a more tech-literate audience but it works great as a sort of “R encyclopedia” for statistics with plenty of examples.
Winston Chang’s “Cookbook for R” is a site containing various descriptions and examples of how to work with different formats and data structures in R. It is better used as a reference work rather than a guide as a lot of the sections are tailored towards doing rather specific things in R (but still things that have broad applicability, none the less).
This seven part YouTube tutorial-series by David Langer (totaling over 8 hours) gives a thorough introduction on how to use R for data science: from installation and using the program to more advanced statistical modelling.
David Langer goes through how to use R for data science at a reasonable pace making it easy to follow but it can still be a bit daunting to jump into with no prior knowledge about programming software in general.
This danish guide by Erik Gahner Larsen provides a good introduction on how to get started with R (and why you should get started with R). It is aimed at statisticians looking to transition to R but also covers installation of R and RStudio as well as some of the basics. The book is not yet complete and new sections are continuously added. It is also possible to download the book in its current state in pdf or epub format.
R is a very popular programming language and software environment and several companies are competing to be the best platform to learn data science with R (as well as other programming languages like Python).
Websites like DataCamp, Udemy and SAGE Campus have vast selections of great e-courses that can get you started. Prices differ a lot with some sites being subscription based while others are paid course-by-course (SAGE Campus courses are often more expensive but offer direct support and contact with the instructors). Often the first sections of the courses are free allowing you to get a feel for the content and difficulty of the course before making a purchase.
The web is full of answers
Because R is so widely used, chances are others will have had the same problems and issues that you will run into. Merely searching for the problem via a search engine prefacing it with “R” will lead you to various forums and sites where others describe having had similar issues. Stack Overflow is especially a very valuable site for finding answers for various questions. As you get more familiar with R and its specific language, it also becomes easier to find forum posts dealing with the specific issues that you are trying to solve.
Happy exploring and keep an eye out for upcoming R workshops in CALDISS!