Section: 5 Setting Up R
Important Instructions
- Installation of R is required before installing RStudio
- “R” is a programming language, and,
- “RStudio” is an Integrated Development Environment (IDE) which provides you a platform to code in R.
- Installation of R is required before installing RStudio
How to download and install R & RStudio?
Downloading and installing R.
- For Windows Users.
- Click on the link provided below or copy paste it on your favourite browser and go to the website.
- Click on the link at top left where it says “Download R 4.0.3 for windows” or the latest at the time of your installation.
- Open the downloaded file and follow the instructions as it is.
- For MAC Users.
- Click on the link provided below or copy paste it on your favourite browser and go to the website.
- Under “Latest release”, click on “R-4.0.3.pkg” or the latest at the time of your installation.
- Open the downloaded file and follow the instructions as it is.
- For Windows Users.
Downloading and installing RStudio.
- For Windows Users.
- Click on the link below or copy paste it in your favourite browser.
- Scroll down almost till the end of the web page until you find a section named “All Installers”.
- Click on the download link beside “Windows 10/8/7” to download the windows version of RStudio.
- Install RStudio by clicking on the downloaded file and following the instructions as it is.
- For MAC Users.
- Click on the link below or copy paste it in your favourite browser.
- Scroll down almost till the end of the web page until you find a section named “All Installers”.
- Click on the link beside “macOS 10.13+” to start your download the MAC version of RStudio.
- Install RStudio by clicking on the downloaded file and following the instructions as it is.
- For Windows Users.
5.1 Create New Project
After installing R studio successfully the first step is to create a project R studio.
- Step 1: Go to File -> New Project
- Step 2: Select New Directory
- Step 3: Select New Project
- Step 4: Give your preferred directory name like “Data101_Assignmnets”
- Step 5: Click on Create Project and finally the R studio should look like
5.2 How to upload a data set?
To upload the dataset/file present in csv format the read.csv() and read.csv2() functions are frequently used The read.csv() and read.csv2() have different separator symbol: for the former this is a comma, whereas the latter uses a semicolon.
There are two options while accessing the dataset from your local machine:
- To avoid giving long directory paths for accessing the dataset, one should use the command getwd() to get the current working directory and store the dataset in the same directory.
- To access the dataset stored in the same directory one can use the following: read.csv(“MOODY_DATA.csv”).
- One can also store the dataset at a different location and can access it using the following command: (Suppose the dataset is stored inside the folder Data101_Tutorials on the desktop)
- For Windows Users.
- Example: read.csv("C:/Users/Desktop/Data101_Tutorials/MOODY_DATA.csv")
- For MAC Users.
- Example: read.csv("/Users/Desktop/Data101_Tutorials/MOODY_DATA.csv")
Note:
The directory path given here is the current working directory hosted on Github where the dataset has been stored.
5.3 Saving your work
- To save your work go to File -> Save. It will ask you to give a name for your .R file and then click on Save.
- After making modifications to your saved file, you will need to save the file again. If the name of the file on the top is in Red Color indicates that the file have unsaved changes.
- Go to File -> Save to save your .R file again. After saving the file the color of the file name i.e. HW1.R will again change back to black.
Note: You can create multiple files inside the same project such as for your each homework assignments
5.4 General R References
https://www.w3schools.com/r/
https://cran.r-project.org/doc/contrib/Short-refcard.pdf
https://www.amazon.com/Statistics-Engineers-Scientists-William-Navidi/dp/0073376337/ref=pd_lpo_3?pd_rd_i=0073376337&psc=1
https://data101.cs.rutgers.edu/laboratory/
5.5 Textbook Concepts
Hypothesis testing: 8
Difference of means hypothesis testing: 8
Null Hypothesis: 8
Alternative Hypothesis: 8
z-value: 8
critical value: 8
significance level: 8
p-value: 8
Bonferroni correction: 10
Chi square test: 9
Independence: 9
Multiple Hypothesis testing: 10
False Discovery Proportion: 10
Contingency Matrix: 9
Bayesian Reasoning: 13
Prior odds: 13
Posterior odds: 13
Likelihood ratio: 13
False positive: 13
True positive: 13
Crossvalidation: 16.4
Decision trees: 16
Linear regression: 17
Recursive partitioning: 17
MSE: 17
Prediction accuracy: 17
Training: 17
Testing: 17
5.6 R functions used in this class
Elementary instructions: c() 6.1, mean() 7.1.1, nrow() 7.2.1, rep(), sd() 7.1.5, cut() 7.4.2
Plots: plot() 6.4, barplot() 6.5, boxplot() 6.6 mosaicplot() 6.7
Data Transformations: subset() 7.2, tapply() 7.3, table() 6.3, aggregate()
Library functions: chisq.test() 9, pnorm() 8.2, Permutation() 8.2, rpart() 16, predict() 16.5, lm() 17.1, crossvalidation() 16.4
5.7 Data sets
5.7.1 Moody
Download: moody2022_new.csv
SCORE | GRADE | DOZES_OFF | TEXTING_IN_CLASS | PARTICIPATION | |
---|---|---|---|---|---|
44 | 25.69 | D | always | always | 0.04 |
338 | 58.70 | C | never | rarely | 0.30 |
380 | 52.48 | C | never | rarely | 0.75 |
529 | 26.02 | F | never | never | 0.04 |
108 | 67.28 | C | sometimes | rarely | 0.39 |
5.7.2 Movies
Download: Movies2022F-4.csv
country | content | imdb_score | Gross | Budget | genre | |
---|---|---|---|---|---|---|
6370 | USA | R | 7.45 | Low | Medium | Comedy |
7534 | USA | PG-13 | 3.11 | Medium | Medium | Sci-Fi |
12066 | USA | R | 7.28 | Medium | Low | Drama |
9925 | USA | G | 8.91 | High | Low | History |
9069 | Hong Kong | R | 7.60 | Low | Medium | Comedy |
5.7.3 Traffic
Download: Traffic2022.csv
TUNNEL | DAY | VOLUME_PER_MINUTE | |
---|---|---|---|
2228 | Lincoln | weekday | 70.0 |
2346 | Lincoln | weekday | 59.0 |
374 | Holland | weekday | 73.5 |
1972 | Lincoln | weekday | 58.0 |
1994 | Lincoln | weekday | 93.0 |
5.7.4 Hindex
Download: Hindex.csv
IDN | COUNTRY | HAPPINESS | |
---|---|---|---|
2215 | 25434 | Kuwait | 3.52 |
705 | 57598 | Lithuania | 6.44 |
2590 | 75761 | Oman | 7.74 |
2739 | 79071 | Czech Republic | 7.36 |
5651 | 14556 | Benin | 2.70 |
5.7.5 Prediction 1 Dataset
Download: M2022train.csv
Major | Score | Seniority | Grade | |
---|---|---|---|---|
275 | Economics | 85 | Sophomore | A |
356 | Statistics | 52 | Senior | D |
511 | CS | 92 | Senior | A |
518 | CS | 37 | Sophomore | F |
309 | CS | 98 | Junior | A |
5.7.6 Midterm, Project and Final Exam distribution in Prof. Moody class
Download: MoodyNUM.csv
Assumptions: Midterm, Project and Final Exam are all out of 100
Midterm | Project | FinalExam | ClassScore | |
---|---|---|---|---|
226 | 28 | 49 | 90 | 57.10000 |
785 | 75 | 44 | 8 | 39.40000 |
407 | 67 | 86 | 13 | 60.30000 |
199 | 79 | 55 | 68 | 63.70000 |
590 | 65 | 75 | 57 | 70.47627 |
5.7.7 Minimarket
Download: HomeworkMarket2022.csv
Beer | Day | Location | SoftDrinks | Sweets | Wine | Snacks | |
---|---|---|---|---|---|---|---|
3554 | Lager | Weekend | New Brunswick | None | Snickers | None | Popcorn |
6412 | None | Weekday | Metuchen | Cola | Snickers | None | Crackers |
16356 | Ale | Weekday | Princeton | None | Twix | White | Crackers |
3708 | None | Weekend | Princeton | None | Milky Way | White | None |
5255 | Ale | Weekend | Metuchen | Sprite | Snickers | None | None |