DATA 101 Book
Github Source
Socrative Quiz
Question Roulette
1
Introduction
2
π Best Works of 2022
2.1
DataBlog
2.2
Prediction Challenge 1
2.3
Prediction Challenge 2
2.4
Prediction Challenge 3
2.5
Boundless Analytics
3
Data League Leaderboard
4
π Data puzzles secrets
4.1
Moody Data Puzzle
4.1.1
Secrets Revealed- Patterns in Professor Moodyβs data?
4.1.2
Best Studentβs Submissions 2022
4.2
Movies Data Hunt
4.2.1
Secrets Revealed- Patterns in Movies data?
4.2.2
Best Studentβs Submissions 2022
4.3
Minimarket Data Hunt
4.3.1
What were the secret associations between items in the minimarket?
4.4
Predicting grades in Professor Moodyβs class
4.4.1
How did I cook the Professor Moody Prediction challenge data?
5
Setting Up R
5.1
Create New Project
5.2
How to upload a data set?
5.3
Saving your work
5.4
General R References
5.5
Textbook Concepts
5.6
R functions used in this class
5.7
Data sets
5.7.1
Moody
5.7.2
Movies
5.7.3
Traffic
5.7.4
Hindex
5.7.5
Prediction 1 Dataset
5.7.6
Midterm, Project and Final Exam distribution in Prof.Β Moody class
5.7.7
Minimarket
6
π Plots
6.1
Vector
6.1.1
Snippet 1
6.1.2
Snippet 2
6.2
Data Frames
6.2.1
Snippet 1
6.2.2
Snippet 2
6.2.3
Snippet 3
6.3
Table
6.3.1
Snippet 1
6.3.2
Code Review
6.4
Scatter Plot
6.5
Bar Plot
6.6
Box Plot
6.7
Mosaic Plot
6.8
Additional References
7
π Data Transformation
7.1
Basic Functions
7.1.1
mean()
7.1.2
length()
7.1.3
max()
7.1.4
min()
7.1.5
sd()
7.2
Subset
7.2.1
Snippet 1- example of subset function
7.2.2
Snippet 2- example of subset function
7.2.3
Snippet 3- subset as subframe
7.2.4
Snippet 4- subsetting columns
7.2.5
Snippet 5- sub-setting rows and columns
7.2.6
Code Review
7.3
tapply
7.3.1
Snippet 1- Example of tapply followed by barplot
7.3.2
Code Review
7.4
Derived Attribute
7.4.1
Snippet 1 - Making new categorical attribute.
7.4.2
Cut
7.4.3
Code Review
8
π Hypothesis Testing
8.1
Introduction
8.2
Snippet 1: Permutation test
8.3
Snippet 2: z-test
8.4
Snippet 3: Make your own data and see how p-value changes
8.4.1
Permuation test
8.4.2
One permutation at a time
8.4.3
z-test
8.5
Additional References
9
π Chi Square Analysis
9.1
Snippet 1
9.2
Snippet 2
9.3
Snippet 3
9.4
Snippet 4
10
π Multiple Hypothesis Testing
10.1
Snippet 1 - Benjamini-Hochberg Algorithm
10.2
Snippet 2
10.3
Additional References
11
Code Review: Exploratory Queries in R
11.1
Movies Dataset Example
11.1.1
Snippet 1: What is the mean imdb of low budget comedies?
11.1.2
Snippet 2: What is standard deviation of imdb score of high gross Family movies?
11.1.3
Snippet 3: What is the lowest imdb score among high budget movies?
11.1.4
Snippet 4: How many low budget movies generated high gross income?
11.1.5
Snippet 5: What is imdb score of the first non-US movie in the movies data frame?
11.1.6
Snippet 6: What is the least frequent genre among UK movies?
11.1.7
Snippet 7: Which content rating has the lowest average imdb score?
11.1.8
Snippet 8: Movies from which country have the smallest average imdb score?
11.1.9
Snippet 9: What is the least frequent genre in movies data frame?
11.1.10
Snippet 10: z value = 2.4, whats the p-value?
11.2
Census Dataset Example
11.2.1
Snippet 11: For the individual over 50, which profession has the highest average capital gain?
11.2.2
Snippet 12: Which profession has the highest average capital gains; Sales or Tech-support?
11.2.3
Snippet 13: What is most frequent profession of people with less than 10 years od of education?
11.2.4
Snippet 14: What is minimum number of years of education for people with Exec-managerial specialty?
11.2.5
Snippet 15: What is the most frequent degree for natives of the United States?
11.2.6
Snippet 16: What is the least frequent degree for people with at least 12 years of education?
12
π Common Sense Judgement and Probability
13
π Bayesian Reasoning
13.1
Snippet 1: Covid Odds after positive Home Test.
13.2
Snippet 2: What are the odds that an βFβ student is a freshman?
13.3
Snippet 3: What are the odds that a βAβ student with the score less than 80 is a psychology major?
14
π Prediction Challenges
14.1
General Structure of the Prediction Challenges
14.2
Challenge 1 - Freestyle prediction of grades in yet another MOODY data set
14.3
Challenge 2 - Same data but using rpart - decision tree
15
π Free Style: Prediction
15.1
Snippet 1: Example of a simple freestyle prediction model
15.2
Snippet 2: How to build a freestyle (your own code) prediction model?
15.3
Snippet 3: One-step crossvalidation
15.4
Snippet 4: Preparing submission.csv for Kaggle
16
π Predictions with rpart
16.1
Use of Rpart
16.1.1
Snippet 1
16.2
Visualize the Decision tree
16.2.1
Snippet 2
16.3
Rpart Control
16.3.1
Snippet 3: Minsplit = 200
16.3.2
Snippet 4: Minsplit = 100
16.3.3
Snippet 5: Minbucket = 100
16.3.4
Snippet 6: Minbucket = 200
16.3.5
Snippet 7: cp = 0.05
16.3.6
Snippet 8: cp = 0.005
16.4
Cross Validation
16.4.1
Snippet 9
16.5
Prediction using rpart.
16.5.1
Snippet 10
16.6
Snippet 11: Your Model with rpart
16.7
Snippet 12: Freestyle + rpart: Combining rpart prediction models
16.8
Snippet 13: Submission with rpart
17
π Linear Regression
17.1
Linear regression using lm() function
17.1.1
Snippet 1: How much do Midterm, Project and Final Exam count?
17.2
Calculating the Error using mse()
17.3
Snippet 2: Cross Validate your prediction
17.4
Snippet 3: Submission with lm
18
π Machine Learning-Prediction Loop
19
π How can data fool us?
20
Boundless Analytics - Pre-discovery Tool
20.1
Minimarket Data Set description
20.2
Demo of Boundless Analytics
20.3
The Boundless Analytics web application
20.4
Snippet 1: Chi square hunt
Published with bookdown
Section: 18
π Machine Learning-Prediction Loop
Lecture slides:
Prediction Loop