DATA 101 Book
Github Source
Socrative Quiz
Question Roulette
Test Yourself
1
Introduction
2
Setting Up R
2.1
Create New Project
2.2
How to upload a data set?
2.2.1
Grades dataset
2.3
Saving your work
2.4
General R References
2.5
Textbook Concepts
2.6
R functions used in this class
3
π Basic R Intructions
3.1
Vector
3.1.1
Categorical vectors
3.1.2
Numerical vectors
3.2
Data Frames
3.2.1
Data Frame creation
3.2.2
Data frame subsetting
3.3
Table
3.3.1
Table()
3.4
Basic Functions
3.4.1
mean()
3.4.2
length()
3.4.3
max()
3.4.4
min()
3.4.5
sd()
3.5
Subset
3.5.1
subset()
3.5.2
Subframe
3.5.3
Subsetting columns
3.5.4
Subsetting rows and columns
3.6
tapply
3.6.1
tapply()
3.6.2
Combining table() and subset()
4
π Plots
4.1
Scatter Plot
4.2
Bar Plot
4.3
Box Plot
4.4
Mosaic Plot
4.5
Misleading Graphs
4.6
Additional References
5
Data science vs Databases
5.1
Query 1:
Fraction of Brooklyn Airbnb Rentals
5.1.1
Fraction of Brooklyn Airbnb rentals)
5.1.2
Fraction of Brooklyn Airbnb rentals in the
real
airbnb data set
5.2
Query 2:
Mean price of Brooklyn Airbnb rentals
5.2.1
Mean price of Brooklyn airbnb rentals
5.2.2
Mean price of Brooklyn airbnb rentals in real Airbnb data set
6
Fooled By Randomness
6.1
False Discoveries
6.2
More about Sampling
6.3
Confidence Intervals
6.4
Law of Small Numbers
6.5
Randomness and repeatability
6.6
Data sets with embedded patterns
6.7
Hidden Variables
7
Normal distribution
8
π Data Transformation with Derived Attributes
8.1
Making new categorical attributes
8.2
Making categorical attribute from numerical attribute using function Cut()
8.3
Making new numerical attribute from numerical attributes
8.4
More complex example of defining derived attributes
8.5
Additional references
9
π Hypothesis Testing
9.1
Introduction
9.2
Permutation test
9.3
z-test
9.4
Make your own data and see how p-value changes
9.4.1
Permuation test
9.4.2
One permutation at a time
9.4.3
z-test
9.5
Additional References
10
π Test of Independence
10.1
Introduction
10.2
Chisq test
10.3
Chisq permutation test
10.4
Chisq and slicing
10.5
Contingency table
10.6
Additional Reference
11
Sampling (and special role of normal distributions)
11.1
Estimator of Proportions
11.2
Estimator of Mean
12
π Multiple Hypothesis Testing
12.1
Introduction
12.1.1
Multiple Permutation Tests
12.2
Additional References
13
π Bayesian Reasoning
13.1
Introduction
13.2
Covid Odds after positive Home Test.
13.3
What are the odds that an βFβ student is a freshman?
13.4
What are the odds that a βAβ student with the score less than 80 is a psychology major?
13.5
Additinal Reference
14
π Data puzzles
14.1
Introduction
14.2
Strange grading methods of Professor Moody Data Puzzle
14.2.1
π§ Secret patterns embedded by the Data Maker (hint)
14.2.2
Practice Snippets
14.2.3
Moody Data Quiz
14.2.4
Check yourself
14.3
How to predict a good party? Data puzzle
14.3.1
π§ Secret patterns embedded by the Data Maker (hint)
14.3.2
Practice Snippets
14.3.3
Party Data Quiz
14.3.4
Check yourself
14.4
When election is truly local - data puzzle
14.4.1
π§ Secret patterns embedded by the Data Maker (hint)
14.4.2
Practice Snippets
14.4.3
Election Data Quiz
14.4.4
Check yourself
14.5
Secrets of good sleep Data Puzzle
14.5.1
π§ Secret patterns embedded by the Data Maker (hint)
14.5.2
Practice Snippets
14.5.3
Sleep Data Quiz
14.5.4
Check yourself
14.6
Letβs go to the movies: Data Puzzle
14.6.1
π§ Secret patterns embedded by the Data Maker (hint)
14.6.2
Practice Snippets
14.6.3
Movies Data Quiz
14.6.4
Check yourself
14.7
When canvas goes wild data puzzle
14.7.1
π§ Secret patterns embedded by the Data Maker (hint)
14.7.2
Practice Snippets
14.7.3
Canvas Data Quiz
14.7.4
Check yourself
14.8
Very local minimarket data puzzle
14.8.1
π§ Secret patterns embedded by DataMaker
14.8.2
Practice Snippets
14.8.3
MiniMarket Data Quiz
14.8.4
Check yourself
14.9
Airbnb data puzzle
14.9.1
Practice Snippets
14.9.2
Airbnb Data Quiz
14.9.3
Check yourself
14.10
Titanic data puzzle
14.10.1
Practice Snippets
14.10.2
Titanic Data Quiz
14.10.3
Check yourself
14.11
Addiotional Reference
15
π Common Sense Judgement and Probability
15.1
Introduction
15.2
Additional References
16
π Free Style: Prediction
16.1
Introduction
16.2
Example of a simple freestyle prediction model
16.3
How to build a freestyle (your own code) prediction model?
16.4
One-step crossvalidation
16.5
General Structure of the Prediction Challenges
16.5.1
Preparing submission.csv for Kaggle
16.6
Additional Reference
17
π Predictions with rpart
17.1
Introduction
17.2
Use of Rpart
17.2.1
rpart()
17.3
Visualize the Decision tree
17.3.1
rpart.plot()
17.4
Rpart Control
17.4.1
rpart(): Minsplit = 200
17.4.2
rpart(): Minsplit = 100
17.4.3
rpart(): Minbucket = 100
17.4.4
rpart(): Minbucket = 200
17.4.5
rpart(): cp = 0.05
17.4.6
rpart(): cp = 0.005
17.5
Cross Validation
17.5.1
cross_validate()
17.6
Prediction using rpart.
17.6.1
predict()
17.7
Combining multiple prediction models
17.7.1
Combining rpart prediction models
17.7.2
Combining multiple prediction models using rpart
17.8
Submission of your prediction vector
17.9
Additional Reference
18
π Linear Regression
18.1
Introduction
18.2
Linear regression using lm() function
18.2.1
How much do Midterm, Project and Final Exam count?
18.3
Calculating the Error using mse()
18.4
Cross Validate your prediction
18.5
Submission with lm
18.6
Additional Reference
19
Power Law Distribution
20
Prediction models from R library
20.1
Introduction
21
Prediction Challenge
21.1
Introduction
21.2
Predict if the party will be fun? Boring? Just Ok?
21.3
Predict if sleep will be deep, shallow, little or not at all?
21.4
Predict if a local voter will vote for Anarchists, KnowNothings, or Royalists?
21.5
Predict studentβs grade
21.6
Predict Titanic passengerβs survival
22
π How can data fool us?
22.1
Introduction
22.1.1
Ecological Paradox
22.2
Additional References
23
Boundless Analytics - Pre-discovery Tool
23.1
Introduction
23.2
Minimarket Data Set description
23.3
Demo of Boundless Analytics
23.4
The Boundless Analytics web application
23.5
Chi square hunt
24
π Best Works of 2022
24.1
DataBlog
24.2
Boundless Analytics
25
Data League Leaderboard
Published with bookdown
Section: 24
π Best Works of 2022
24.1
DataBlog
Ella Walmsley
24.2
Boundless Analytics
Anastasiya Chuchkova
Shreya Tiwari
George Basta
Paul Kotys
Selin Altimparmak