Section: 5 Setting Up R

  • Important Instructions

    • Installation of R is required before installing RStudio
      • “R” is a programming language, and,
      • “RStudio” is an Integrated Development Environment (IDE) which provides you a platform to code in R.
  • How to download and install R & RStudio?

    • Downloading and installing R.

      • For Windows Users.
        • Click on the link provided below or copy paste it on your favourite browser and go to the website.
        • Click on the link at top left where it says “Download R 4.0.3 for windows” or the latest at the time of your installation.
        • Open the downloaded file and follow the instructions as it is.
      • For MAC Users.
        • Click on the link provided below or copy paste it on your favourite browser and go to the website.
        • Under “Latest release”, click on “R-4.0.3.pkg” or the latest at the time of your installation.
        • Open the downloaded file and follow the instructions as it is.
    • Downloading and installing RStudio.

      • For Windows Users.
        • Click on the link below or copy paste it in your favourite browser.
        • Scroll down almost till the end of the web page until you find a section named “All Installers”.
        • Click on the download link beside “Windows 10/8/7” to download the windows version of RStudio.
        • Install RStudio by clicking on the downloaded file and following the instructions as it is.
      • For MAC Users.
        • Click on the link below or copy paste it in your favourite browser.
        • Scroll down almost till the end of the web page until you find a section named “All Installers”.
        • Click on the link beside “macOS 10.13+” to start your download the MAC version of RStudio.
        • Install RStudio by clicking on the downloaded file and following the instructions as it is.

5.1 Create New Project

After installing R studio successfully the first step is to create a project R studio.

  • Step 1: Go to File -> New Project

New Project

  • Step 2: Select New Directory

New Directory

  • Step 3: Select New Project

New Project

  • Step 4: Give your preferred directory name like “Data101_Assignmnets”

Directory Name

  • Step 5: Click on Create Project and finally the R studio should look like

Rstudio


5.2 How to upload a data set?

  • To upload the dataset/file present in csv format the read.csv() and read.csv2() functions are frequently used The read.csv() and read.csv2() have different separator symbol: for the former this is a comma, whereas the latter uses a semicolon.

  • There are two options while accessing the dataset from your local machine:

    1. To avoid giving long directory paths for accessing the dataset, one should use the command getwd() to get the current working directory and store the dataset in the same directory.

Getwd

  • To access the dataset stored in the same directory one can use the following: read.csv(“MOODY_DATA.csv”).

Store the moody dataset in the same directory

  1. One can also store the dataset at a different location and can access it using the following command: (Suppose the dataset is stored inside the folder Data101_Tutorials on the desktop)
- For Windows Users.
  - Example: read.csv("C:/Users/Desktop/Data101_Tutorials/MOODY_DATA.csv")

- For MAC Users.
  - Example: read.csv("/Users/Desktop/Data101_Tutorials/MOODY_DATA.csv")
  
Note: The directory path given here is the current working directory hosted on Github where the dataset has been stored.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFJlYWQgaW4gdGhlIGRhdGFcbmRmIDwtIHJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvbW9vZHkyMDIwYi5jc3ZcIilcblxuIyBQcmludCBvdXQgYGRmYFxuaGVhZChkZikifQ==

5.3 Saving your work

  • To save your work go to File -> Save. It will ask you to give a name for your .R file and then click on Save.

Save

  • After making modifications to your saved file, you will need to save the file again. If the name of the file on the top is in Red Color indicates that the file have unsaved changes.

Unsaved File

  • Go to File -> Save to save your .R file again. After saving the file the color of the file name i.e. HW1.R will again change back to black.

Saved File

Note: You can create multiple files inside the same project such as for your each homework assignments

5.5 Textbook Concepts

  • Hypothesis testing: 8

  • Difference of means hypothesis testing: 8

  • Null Hypothesis: 8

  • Alternative Hypothesis: 8

  • z-value: 8

  • critical value: 8

  • significance level: 8

  • p-value: 8

  • Bonferroni correction: 10

  • Chi square test: 9

  • Independence: 9

  • Multiple Hypothesis testing: 10

  • False Discovery Proportion: 10

  • Contingency Matrix: 9

  • Bayesian Reasoning: 13

  • Prior odds: 13

  • Posterior odds: 13

  • Likelihood ratio: 13

  • False positive: 13

  • True positive: 13

  • Crossvalidation: 16.4

  • Decision trees: 16

  • Linear regression: 17

  • Recursive partitioning: 17

  • MSE: 17

  • Prediction accuracy: 17

  • Training: 17

  • Testing: 17

5.6 R functions used in this class

  • Elementary instructions: c() 6.1, mean() 7.1.1, nrow() 7.2.1, rep(), sd() 7.1.5, cut() 7.4.2

  • Plots: plot() 6.4, barplot() 6.5, boxplot() 6.6 mosaicplot() 6.7

  • Data Transformations: subset() 7.2, tapply() 7.3, table() 6.3, aggregate()

  • Library functions: chisq.test() 9, pnorm() 8.2, Permutation() 8.2, rpart() 16, predict() 16.5, lm() 17.1, crossvalidation() 16.4

  • Parameters of rpart: minsplit 16.3, minbucket 16.3, cp 16.3

5.7 Data sets

5.7.1 Moody

Download: moody2022_new.csv

Table 5.1: Snippet of Moody Dataset
SCORE GRADE DOZES_OFF TEXTING_IN_CLASS PARTICIPATION
44 25.69 D always always 0.04
338 58.70 C never rarely 0.30
380 52.48 C never rarely 0.75
529 26.02 F never never 0.04
108 67.28 C sometimes rarely 0.39

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb29keTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9tb29keTIwMjJfbmV3LmNzdlwiKVxuXG5zdW1tYXJ5KG1vb2R5KSJ9

5.7.2 Movies

Download: Movies2022F-4.csv

Table 5.2: Snippet of Movies Dataset
country content imdb_score Gross Budget genre
6370 USA R 7.45 Low Medium Comedy
7534 USA PG-13 3.11 Medium Medium Sci-Fi
12066 USA R 7.28 Medium Low Drama
9925 USA G 8.91 High Low History
9069 Hong Kong R 7.60 Low Medium Comedy

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb3ZpZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvTW92aWVzMjAyMkYtNC5jc3ZcIilcblxuc3VtbWFyeShtb3ZpZXMpIn0=

5.7.3 Traffic

Download: Traffic2022.csv

Table 5.3: Snippet of Traffic Dataset
TUNNEL DAY VOLUME_PER_MINUTE
2228 Lincoln weekday 70.0
2346 Lincoln weekday 59.0
374 Holland weekday 73.5
1972 Lincoln weekday 58.0
1994 Lincoln weekday 93.0

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ0cmFmZmljPC1yZWFkLmNzdignaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvVHJhZmZpYzIwMjIuY3N2Jylcblxuc3VtbWFyeSh0cmFmZmljKSJ9

5.7.4 Hindex

Download: Hindex.csv

Table 5.4: Snippet of Hindex Dataset
IDN COUNTRY HAPPINESS
2215 25434 Kuwait 3.52
705 57598 Lithuania 6.44
2590 75761 Oman 7.74
2739 79071 Czech Republic 7.36
5651 14556 Benin 2.70

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaW5kZXggPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L0hpbmRleC5jc3ZcIilcblxuc3VtbWFyeShoaW5kZXgpIn0=

5.7.5 Prediction 1 Dataset

Download: M2022train.csv

Table 5.5: Snippet of Moody Predicition 1 dataset
Major Score Seniority Grade
275 Economics 85 Sophomore A
356 Statistics 52 Senior D
511 CS 92 Senior A
518 CS 37 Sophomore F
309 CS 98 Junior A

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb29keTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9NMjAyMnRyYWluLmNzdlwiKVxuXG5zdW1tYXJ5KG1vb2R5KSJ9

5.7.6 Midterm, Project and Final Exam distribution in Prof. Moody class

Download: MoodyNUM.csv

Assumptions: Midterm, Project and Final Exam are all out of 100

Table 5.6: Midterm, Project and Final Exam distribution in Prof. Moody class
Midterm Project FinalExam ClassScore
226 28 49 90 57.10000
785 75 44 8 39.40000
407 67 86 13 60.30000
199 79 55 68 63.70000
590 65 75 57 70.47627

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb29keTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9Nb29keU5VTS5jc3ZcIilcblxuc3VtbWFyeShtb29keSkifQ==

5.7.7 Minimarket

Download: HomeworkMarket2022.csv

Table 5.7: Minimarket dataset
Beer Day Location SoftDrinks Sweets Wine Snacks
3554 Lager Weekend New Brunswick None Snickers None Popcorn
6412 None Weekday Metuchen Cola Snickers None Crackers
16356 Ale Weekday Princeton None Twix White Crackers
3708 None Weekend Princeton None Milky Way White None
5255 Ale Weekend Metuchen Sprite Snickers None None

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtaW5pbWFya2V0PC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L0hvbWV3b3JrTWFya2V0MjAyMi5jc3ZcIilcblxuc3VtbWFyeShtaW5pbWFya2V0KSJ9