Section: 14 🔖 Data puzzles

14.1 Introduction

Data Puzzles are synthetically generated datasets with some embedded patterns. Patterns have various forms from relationships between attributes to rules of the form “if condition then value” between specific attribute-value pairs. These patterns are stochastic and embedded in datasets using DataMaker - our Data Puzzle Generation Tool.

We use data puzzles extensively in the class assignments. These range from data exploration and plotting through hypothesis testing to prediction and machine learning. After the assignment is completed we reveal the data secrets - the patterns which were embedded by DataMaker. Students do not have to find exactly the embedded patterns, often they find related patterns which makes the “game” even more fun.

In the following we provide the list of data puzzles along with the underlying data sets. Using DataMaker we change the patterns and even data sets from academic year to academic year.. We can also provide data puzzles of different levels of difficulty from the one star (easy) to five star (most difficult) ones.

We will now proceed to nine datasets -first seven of them are synthetically created by DataMaker data puzzles with embedded secret patterns. Last two are real data sets - airbnb and titanic. Each data set is subject of a separate subsection 10.2-10.10. These sections are structured in a similar way. We start each section with around eight to ten practice snippets and follow with the Code Roulette link, where we randomly select a small coding task for students using the data set of this section. We provide the output which requested code returns when run on the data set. But we do not provide the code. This has to be written by a student.

Each section 10.i starts with the snippet “Get to know your data”. There are around fifteen R instructions there when executed return the column set of the data set, number of rows, summary of the data set, unique values of each attribute and various basic distributions.

For example, the first section, 10.2 starts with the following “Get to know your data” snippet. These are repeated for all 9 data sets. Here we explain each line below.

colnames(moody)
# returns all columns of moody

summary(moody)
# provides summary, distributions of different columns of moody

nrow(moody)
# returns  number of rows of moody

unique(moody$GRADE)
#provides unique values of GRADE attribute (grades)

unique(moody$DOZES_OFF)        
#provides unique values of  DOZES_OFF attribute (‘always’, ‘sometimes’, ‘never’)    

unique(moody$TEXTING_IN_CLASS)
#provides unique values of  TEXTING_IN_CLASS) attribute (‘always’, ‘sometimes’, ‘never’)    

table(moody$GRADE)
#Frequency distribution of GRADE

table(moody$DOZES_OFF)
#Frequency distribution ofDOZES_OFF

table(moody$TEXTING_IN_CLASS)
#Frequency distribution of TEXTING_IN_CLASS

tapply(moody$SCORE, moody$GRADE, mean)
#Provides a mean score for each of the grades. We expect it will be diminishing with the grade

tapply(moody$PARTICIPATION, moody$GRADE, mean)
#Provides mean participation  for each of the grades. We do not know what to expect. Perhaps  it will be diminishing with the grade

tapply(moody$SCORE, moody$DOZES_OFF, mean)
#Provides a mean score  for each of the values of DOZES_OFF. We do not know what to expect. Perhaps  it will be diminishing the more often a student DOZES_OFF?

tapply(moody$PARTICIPATION, moody$DOZES_OFF, mean)
#Provides mean participation  for each of the values of DOZES_OFF. We do not know what to expect. Perhaps  it will be diminishing the more often a student DOZES_OFF?

After we get to know the data, we follow with the number of simple coding tasks - some of them are:

  • Exploratory queries - compute mean, max, min on a subset of the data set
  • Hypothesis testing for difference of means of numerical attribute of the data set
  • Calculation of posterior odds using Odd’s formulation of Bayes Theorem.
  • Hypothesis testing for test of independence

14.2 Strange grading methods of Professor Moody Data Puzzle

Download: moody2022_new.csv

How to get a good grade in Professor Moody’s class?

Professor Moody does not give final grades just on the basis of your total score alone. Our data shows that two students with the same total score may get widely varying final grades. Can you believe that you can even fail his class with a score as high as 82%? This is outrageous, isn’t it?

DataMaker has generated thousands of tuples which in addition to the total score and final grade also store bizarre information about student behaviors in the class - do they often doze off? Does a student text a lot? Does s/he ask a lot of questions? Does it help if you ask a lot of questions? Does it hurt if you doze off a lot?

Table 14.1: Snippet of Moody Dataset
SCORE GRADE DOZES_OFF TEXTING_IN_CLASS PARTICIPATION
21.33 F never never 0.29
71.57 C always rarely 0.11
90.11 A always never 0.26
31.52 D sometimes rarely 0.03
95.94 A always rarely 0.21


Figiure 1: Boxplot for student score and grade distribution 14.2.2.9

14.2.1 🧙 Secret patterns embedded by the Data Maker (hint)

It is not surprising that Professor Moody does not like when students text in his class all the time. He also does not like the sleepers - who doze off and do not participate at all. But….if they do well in class nevertheless, and have convincing scores, he has no choice but give them grades their score imply. However, if one has a border score, could be B, could be A, or can be B or can be C, then Moody apparently weights in the sleeping/texting data!

Always texting in class or always dozing off in class makes a difference but only for border line scores. That is, for scores between 80 and 90, the grade will be a B when Texting_inclass = ‘always’ or Dozing_off =’always”. Situation is similar for border scores between C and B and F and D.

Check it out!

14.2.2 Practice Snippets

14.2.2.1 Snippet 1: Get familiar with the data set

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb29keTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9tb29keTIwMjJfbmV3LmNzdlwiKVxuXG5jb2xuYW1lcyhtb29keSlcbnN1bW1hcnkobW9vZHkpXG51bmlxdWUobW9vZHkkR1JBREUpXG5ucm93KG1vb2R5KVxudW5pcXVlKG1vb2R5JERPWkVTX09GRilcbnVuaXF1ZShtb29keSRURVhUSU5HX0lOX0NMQVNTKVxudGFibGUobW9vZHkkR1JBREUpXG50YWJsZShtb29keSRET1pFU19PRkYpXG50YWJsZShtb29keSRURVhUSU5HX0lOX0NMQVNTKVxudGFwcGx5KG1vb2R5JFNDT1JFLCBtb29keSRHUkFERSwgbWVhbilcbnRhcHBseShtb29keSRQQVJUSUNJUEFUSU9OLCBtb29keSRHUkFERSwgbWVhbilcbnRhcHBseShtb29keSRTQ09SRSwgbW9vZHkkRE9aRVNfT0ZGLCBtZWFuKVxudGFwcGx5KG1vb2R5JFBBUlRJQ0lQQVRJT04sIG1vb2R5JERPWkVTX09GRiwgbWVhbilcbnRhcHBseShtb29keSRTQ09SRSwgbW9vZHkkVEVYVElOR19JTl9DTEFTUywgbWVhbilcbnRhcHBseShtb29keSRQQVJUSUNJUEFUSU9OLCBtb29keSRURVhUSU5HX0lOX0NMQVNTLCBtZWFuKSJ9

14.2.2.2 Snippet 2

Q: Did you know that students who never doze off in class have more than twice as many A’s than students who sometimes doze off?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb29keTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9tb29keTIwMjJfbmV3LmNzdlwiKVxuXG50YWJsZShtb29keSRET1pFU19PRkYsIG1vb2R5JEdSQURFKSJ9

14.2.2.3 Snippet 3

Q: Did you know that the students who scored over 85 and still received a B almost always dozed off all the time during class?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb29keTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9tb29keTIwMjJfbmV3LmNzdlwiKVxuXG5tb29keVsobW9vZHkkU0NPUkU+ODUpICYgKG1vb2R5JEdSQURFPT0nQicpLCBdJERPWkVTX09GRiJ9

14.2.2.4 Snippet 4

Q: What gives a higher chance of failing, texting all the time or always dozing off during class?
A: Always texting in class! Almost 40% chance of failing!

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb29keTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9tb29keTIwMjJfbmV3LmNzdlwiKVxuXG4jUHJvYmFiaWxpdHkgb2YgZmFpbGluZyB0aGUgY2xhc3Mgd2hlbiBkb3ppbmcgb2ZmIGFsbCB0aGUgdGltZVxubnJvdyhtb29keVttb29keSRET1pFU19PRkY9PSdhbHdheXMnICYgbW9vZHkkR1JBREU9PSdGJyxdKS9ucm93KG1vb2R5W21vb2R5JERPWkVTX09GRj09J2Fsd2F5cycsXSlcbiNQcm9iYWJpbGl0eSBvZiBmYWlsaW5nIHRoZSBjbGFzcyB3aGVuICBhbHdheXMgdGV4dGluZyBpbiBjbGFzcyBcbm5yb3cobW9vZHlbbW9vZHkkVEVYVElOR19JTl9DTEFTUz09J2Fsd2F5cycgJiBtb29keSRHUkFERT09J0YnLF0pL25yb3cobW9vZHlbbW9vZHkkVEVYVElOR19JTl9DTEFTUz09J2Fsd2F5cycsXSkifQ==

14.2.2.5 Snippet 5

Q: What grade did a student who scores 39.57 and always dozed off received?
A: D

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb29keTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9tb29keTIwMjJfbmV3LmNzdlwiKVxuXG5tb29keVttb29keSRTQ09SRT09JzM5LjU3JyZtb29keSRET1pFU19PRkY9PSdhbHdheXMnLF0kR1JBREUifQ==

14.2.2.6 Snippet 6

Q: What are posterior odds that an A student never dozes off?
A: Posterior Odds = 2.66
Likelihood Ratio = 4
Prior Odds =0.56

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb29keTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9tb29keTIwMjJfbmV3LmNzdlwiKVxuXG5QcmlvcjwtbnJvdyhtb29keVttb29keSRET1pFU19PRkYgPT0nbmV2ZXInLF0pL25yb3cobW9vZHkpXG5QcmlvclxuUHJpb3JPZGRzPC1yb3VuZChQcmlvci8oMS1QcmlvciksMilcblByaW9yT2Rkc1xuVHJ1ZVBvc2l0aXZlPC1yb3VuZChucm93KG1vb2R5W21vb2R5JEdSQURFPT0nQScmIG1vb2R5JERPWkVTX09GRj09J25ldmVyJyxdKS9ucm93KG1vb2R5W21vb2R5JERPWkVTX09GRj09J2Fsd2F5cycsXSksMilcblRydWVQb3NpdGl2ZVxuRmFsc2VQb3NpdGl2ZTwtcm91bmQobnJvdyhtb29keVttb29keSRHUkFERT09J0EnJiBtb29keSRET1pFU19PRkYhPSduZXZlcicsXSkvbnJvdyhtb29keVttb29keSRET1pFU19PRkYhPSdhbHdheXMnLF0pLDIpXG5GYWxzZVBvc2l0aXZlXG5MaWtlbGlob29kUmF0aW88LXJvdW5kKFRydWVQb3NpdGl2ZS9GYWxzZVBvc2l0aXZlLDIpXG5MaWtlbGlob29kUmF0aW9cblBvc3Rlcmlvck9kZHMgPC1MaWtlbGlob29kUmF0aW8gKiBQcmlvck9kZHNcblBvc3Rlcmlvck9kZHNcblBvc3RlcmlvciA8LVBvc3Rlcmlvck9kZHMvKDErUG9zdGVyaW9yT2RkcylcblBvc3RlcmlvciJ9

14.2.2.7 Snippet 7

Q: Verify the hypothesis that C students have higher mean participation than F students? What is the p-value?
A: Negative. Fail to reject null hypothesis that mean participations of C and F students are the same with p=0.11

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlBlcm11dGF0aW9uIDwtIGZ1bmN0aW9uKGRmMSxjMSxjMixuLHcxLHcyKXtcbiAgZGYgPC0gYXMuZGF0YS5mcmFtZShkZjEpXG4gIERfbnVsbDwtYygpXG4gIFYxPC1kZlssYzFdXG4gIFYyPC1kZlssYzJdXG4gIHN1Yi52YWx1ZTEgPC0gZGZbZGZbLCBjMV0gPT0gdzEsIGMyXVxuICBzdWIudmFsdWUyIDwtIGRmW2RmWywgYzFdID09IHcyLCBjMl1cbiAgRCA8LSAgYWJzKG1lYW4oc3ViLnZhbHVlMiwgbmEucm09VFJVRSkgLSBtZWFuKHN1Yi52YWx1ZTEsIG5hLnJtPVRSVUUpKVxuICBtPWxlbmd0aChWMSlcbiAgbD1sZW5ndGgoVjFbVjE9PXcyXSlcbiAgZm9yKGpqIGluIDE6bil7XG4gICAgbnVsbCA8LSByZXAodzEsbGVuZ3RoKFYxKSlcbiAgICBudWxsW3NhbXBsZShtLGwpXSA8LSB3MlxuICAgIG5mIDwtIGRhdGEuZnJhbWUoS2V5PW51bGwsIFZhbHVlPVYyKVxuICAgIG5hbWVzKG5mKSA8LSBjKFwiS2V5XCIsXCJWYWx1ZVwiKVxuICAgIHcxX251bGwgPC0gbmZbbmYkS2V5ID09IHcxLDJdXG4gICAgdzJfbnVsbCA8LSBuZltuZiRLZXkgPT0gdzIsMl1cbiAgICBEX251bGwgPC0gYyhEX251bGwsbWVhbih3Ml9udWxsLCBuYS5ybT1UUlVFKSAtIG1lYW4odzFfbnVsbCwgbmEucm09VFJVRSkpXG4gIH1cbiAgbXloaXN0PC1oaXN0KERfbnVsbCwgcHJvYj1UUlVFKVxuICBtdWx0aXBsaWVyIDwtIG15aGlzdCRjb3VudHMgLyBteWhpc3QkZGVuc2l0eVxuICBteWRlbnNpdHkgPC0gZGVuc2l0eShEX251bGwsIGFkanVzdD0yKVxuICBteWRlbnNpdHkkeSA8LSBteWRlbnNpdHkkeSAqIG11bHRpcGxpZXJbMV1cbiAgcGxvdChteWhpc3QpXG4gIGxpbmVzKG15ZGVuc2l0eSwgY29sPSdibHVlJylcbiAgYWJsaW5lKHY9RCwgY29sPSdyZWQnKVxuICBNPC1tZWFuKERfbnVsbD5EKVxuICByZXR1cm4oTSlcbn0iLCJzYW1wbGUiOiIjaW5zdGFsbC5wYWNrYWdlcyhcImRldnRvb2xzXCIpXG4jZGV2dG9vbHM6Omluc3RhbGxfZ2l0aHViKFwiZGV2YW5zaGFnci9QZXJtdXRhdGlvblRlc3RTZWNvbmRcIilcblxuI1Blcm11dGF0aW9uVGVzdFNlY29uZDo6UGVybXV0YXRpb24oZCwgXCJDYXRcIiwgXCJWYWxcIiwxMDAwMCwgXCJHcm91cEFcIiwgXCJHcm91cEJcIikgXG5cbm1vb2R5PC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L21vb2R5MjAyMl9uZXcuY3N2XCIpXG5cblBlcm11dGF0aW9uVGVzdFNlY29uZDo6UGVybXV0YXRpb24obW9vZHksIFwiR1JBREVcIiwgXCJQQVJUSUNJUEFUSU9OXCIsMTAwMDAsIFwiQ1wiLCBcIkZcIikifQ==

14.2.2.8 Snippet 8

Q: What is the mean score of students who always doze off in class and what is the most frequent grade that they received?
A: The mean score is 50.26 and the most frequent grade is D.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb29keTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9tb29keTIwMjJfbmV3LmNzdlwiKVxuXG5tZWFuKG1vb2R5W21vb2R5JERPWkVTX09GRj09J2Fsd2F5cycsXSRTQ09SRSlcbnRhYmxlKG1vb2R5W21vb2R5JERPWkVTX09GRj09J2Fsd2F5cycsXSRHUkFERSkifQ==


Great job!! You have made it this far. You are now familiar with the moody dataset and it’s time to give your understanding a test. Please click the link below to get to the quiz and come back here to cross-check your answer in the snippet 14.2.4.

14.2.2.9 Snippet 9

Visualizing the grade w.r.t score for the moody dataset.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb29keTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9tb29keTIwMjJfbmV3LmNzdlwiKVxuY29sb3JzPC0gYygncmVkJywnYmx1ZScsJ2N5YW4nLCd5ZWxsb3cnLCdncmVlbicpXG5cbmJveHBsb3QobW9vZHkkU0NPUkV+bW9vZHkkR1JBREUsIHhsYWI9XCJHcmFkZVwiLHlsYWI9XCJTY29yZVwiLGNvbD1jb2xvcnMsIFxuICAgICAgICBtYWluPVwiQm94cGxvdCBmb3Igc3R1ZGVudCBzY29yZSBhbmQgZ3JhZGUgZGlzdHJpYnV0aW9uXCIsYm9yZGVyPVwiYmxhY2tcIikifQ==

14.2.3 Moody Data Quiz

Quiz Time

14.2.4 Check yourself

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb29keTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9tb29keTIwMjJfbmV3LmNzdlwiKVxuXG5zdW1tYXJ5KG1vb2R5KSJ9

14.3 How to predict a good party? Data puzzle

Download: Partyb.csv

DataMaker has generated data about thousands of parties, some fun parties, others which were OK or simply boring. Your goal is to discover secrets of a fun party. Is it music? Dancing? Does the host matter? Or who was present at a party? Maybe who was NOT present at the party? All this data is stored in this data puzzle.

Table 14.2: Snippet of Party Dataset
Party Music Host WasThere WasNotThere CaloriesDanc
477 Fun None Janek Janusz Joe 339
1616 Boring Classical Xi Billy Manny 104
3195 Fun HipHop Alex Janusz Manny 220
1358 Fun None Janek Janusz Vladimir 432
3452 Boring None Xi Billy Joe 261


Figiure 2: Distribution of party evaluation when HipHop music was played 14.3.2.6

14.3.1 🧙 Secret patterns embedded by the Data Maker (hint)

When Vladimir is not at a party it is much more likely to be Fun. On the other hand with Angela…parties are almost always boring. She seems to be the life of a party! Music plays a role as well, when HipHop is played and is catchy (people are dancing) than party rocks. Without any music, the party tends to be just OK. Alex is a good host, provided he plays classical music. This must be an older crowd though?

14.3.2 Practice Snippets

14.3.2.1 Snippet 1: Get to know your data

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJwYXJ0eTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9QYXJ0eWIuY3N2XCIpXG5cbmNvbG5hbWVzKHBhcnR5KVxubnJvdyhwYXJ0eSlcbnN1bW1hcnkocGFydHkpXG51bmlxdWUocGFydHkkUGFydHkpXG51bmlxdWUocGFydHkkTXVzaWMpXG51bmlxdWUocGFydHkkSG9zdClcbnVuaXF1ZShwYXJ0eSRXYXNUaGVyZSlcbnVuaXF1ZShwYXJ0eSRXYXNOb3RUaGVyZSlcbnRhYmxlKHBhcnR5JFBhcnR5KVxudGFibGUocGFydHkkTXVzaWMpXG50YWJsZShwYXJ0eSRIb3N0KVxudGFibGUocGFydHkkV2FzVGhlcmUpXG50YWJsZShwYXJ0eSRXYXNOb3RUaGVyZSlcbmNvbG5hbWVzKHBhcnR5KVxudGFwcGx5KHBhcnR5JENhbG9yaWVzRGFuYywgcGFydHkkUGFydHksIG1lYW4pXG50YXBwbHkocGFydHkkQ2Fsb3JpZXNEYW5jLCBwYXJ0eSRNdXNpYywgbWVhbilcbnRhcHBseShwYXJ0eSRDYWxvcmllc0RhbmMsIHBhcnR5JEhvc3QsIG1lYW4pXG50YXBwbHkocGFydHkkQ2Fsb3JpZXNEYW5jLCBwYXJ0eSRXYXNUaGVyZSwgbWVhbilcbnRhcHBseShwYXJ0eSRDYWxvcmllc0RhbmMsIHBhcnR5JFdhc05vdFRoZXJlLCBtZWFuKSJ9

14.3.2.2 Snippet 2

Q: Did you know that a party is often boring when Angela is not there?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJwYXJ0eTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9QYXJ0eWIuY3N2XCIpXG5cbnRhYmxlKHBhcnR5W3BhcnR5JFdhc05vdFRoZXJlPT0nQW5nZWxhJyxdJFBhcnR5KSJ9

14.3.2.3 Snippet 3

Q: What are the odds of the party being fun when Vladimir is not there?
A: PosteriorOdds = 2.91
LikelihoodRatio = 3.83
Prior Odds = 0.76

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJwYXJ0eTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9QYXJ0eWIuY3N2XCIpXG5cblByaW9yPC1ucm93KHBhcnR5W3BhcnR5JFBhcnR5ID09J0Z1bicsXSkvbnJvdyhwYXJ0eSlcblByaW9yXG5Qcmlvck9kZHM8LXJvdW5kKFByaW9yLygxLVByaW9yKSwyKVxuUHJpb3JPZGRzXG5UcnVlUG9zaXRpdmU8LXJvdW5kKG5yb3cocGFydHlbcGFydHkkUGFydHk9PSdGdW4nJiBwYXJ0eSRXYXNOb3RUaGVyZT09J1ZsYWRpbWlyJyxdKS9ucm93KHBhcnR5W3BhcnR5JFBhcnR5PT0nRnVuJyxdKSwyKVxuVHJ1ZVBvc2l0aXZlXG5GYWxzZVBvc2l0aXZlPC1yb3VuZChucm93KHBhcnR5W3BhcnR5JFBhcnR5IT0nRnVuJyYgcGFydHkkV2FzTm90VGhlcmU9PSdWbGFkaW1pcicsXSkvbnJvdyhwYXJ0eVtwYXJ0eSRQYXJ0eSE9J0Z1bicsXSksMilcbkZhbHNlUG9zaXRpdmVcbkxpa2VsaWhvb2RSYXRpbzwtcm91bmQoVHJ1ZVBvc2l0aXZlL0ZhbHNlUG9zaXRpdmUsMilcbkxpa2VsaWhvb2RSYXRpb1xuUG9zdGVyaW9yT2RkcyA8LUxpa2VsaWhvb2RSYXRpbyAqIFByaW9yT2Rkc1xuUG9zdGVyaW9yT2Rkc1xuUG9zdGVyaW9yIDwtUG9zdGVyaW9yT2Rkcy8oMStQb3N0ZXJpb3JPZGRzKVxuUG9zdGVyaW9yIn0=

14.3.2.4 Snippet 4

Q: Verify the hypothesis that there is more dancing at Fun parties than at Boring parties?
A: Positive. We reject null hypothesis that there is same amount of dancing at Fun and Boring parties with p < 0.00001

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlBlcm11dGF0aW9uIDwtIGZ1bmN0aW9uKGRmMSxjMSxjMixuLHcxLHcyKXtcbiAgZGYgPC0gYXMuZGF0YS5mcmFtZShkZjEpXG4gIERfbnVsbDwtYygpXG4gIFYxPC1kZlssYzFdXG4gIFYyPC1kZlssYzJdXG4gIHN1Yi52YWx1ZTEgPC0gZGZbZGZbLCBjMV0gPT0gdzEsIGMyXVxuICBzdWIudmFsdWUyIDwtIGRmW2RmWywgYzFdID09IHcyLCBjMl1cbiAgRCA8LSAgYWJzKG1lYW4oc3ViLnZhbHVlMiwgbmEucm09VFJVRSkgLSBtZWFuKHN1Yi52YWx1ZTEsIG5hLnJtPVRSVUUpKVxuICBtPWxlbmd0aChWMSlcbiAgbD1sZW5ndGgoVjFbVjE9PXcyXSlcbiAgZm9yKGpqIGluIDE6bil7XG4gICAgbnVsbCA8LSByZXAodzEsbGVuZ3RoKFYxKSlcbiAgICBudWxsW3NhbXBsZShtLGwpXSA8LSB3MlxuICAgIG5mIDwtIGRhdGEuZnJhbWUoS2V5PW51bGwsIFZhbHVlPVYyKVxuICAgIG5hbWVzKG5mKSA8LSBjKFwiS2V5XCIsXCJWYWx1ZVwiKVxuICAgIHcxX251bGwgPC0gbmZbbmYkS2V5ID09IHcxLDJdXG4gICAgdzJfbnVsbCA8LSBuZltuZiRLZXkgPT0gdzIsMl1cbiAgICBEX251bGwgPC0gYyhEX251bGwsbWVhbih3Ml9udWxsLCBuYS5ybT1UUlVFKSAtIG1lYW4odzFfbnVsbCwgbmEucm09VFJVRSkpXG4gIH1cbiAgbXloaXN0PC1oaXN0KERfbnVsbCwgcHJvYj1UUlVFKVxuICBtdWx0aXBsaWVyIDwtIG15aGlzdCRjb3VudHMgLyBteWhpc3QkZGVuc2l0eVxuICBteWRlbnNpdHkgPC0gZGVuc2l0eShEX251bGwsIGFkanVzdD0yKVxuICBteWRlbnNpdHkkeSA8LSBteWRlbnNpdHkkeSAqIG11bHRpcGxpZXJbMV1cbiAgcGxvdChteWhpc3QpXG4gIGxpbmVzKG15ZGVuc2l0eSwgY29sPSdibHVlJylcbiAgYWJsaW5lKHY9RCwgY29sPSdyZWQnKVxuICBNPC1tZWFuKERfbnVsbD5EKVxuICByZXR1cm4oTSlcbn0iLCJzYW1wbGUiOiJwYXJ0eTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9QYXJ0eWIuY3N2XCIpXG5cbm1lYW4ocGFydHlbcGFydHkkUGFydHk9PSdGdW4nLF0kQ2Fsb3JpZXNEYW5jKVxubWVhbihwYXJ0eVtwYXJ0eSRQYXJ0eT09J0JvcmluZycsXSRDYWxvcmllc0RhbmMpXG5cblBlcm11dGF0aW9uKHBhcnR5LCBcIlBhcnR5XCIsIFwiQ2Fsb3JpZXNEYW5jXCIsMTAwMDAsIFwiRnVuXCIsIFwiQm9yaW5nXCIpIn0=

14.3.2.5 Snippet 5

Q: What music is the most popular at Fun parties?
A: HipHop

The following snippet allows us to find the most popular music, although the code just returns the frequency of music genres distribution.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJwYXJ0eTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9QYXJ0eWIuY3N2XCIpXG5cbnRhYmxlKHBhcnR5W3BhcnR5JFBhcnR5PT0nRnVuJyxdJE11c2ljKSJ9

14.3.2.6 Snippet 6

Bargraph of party distribution for Music Hiphop.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJwYXJ0eTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9QYXJ0eWIuY3N2XCIpXG5jb2xvcnM8LSBjKCdyZWQnLCdibHVlJywnY3lhbicsJ3llbGxvdycsJ2dyZWVuJykgIyBBc3NpZ25pbmcgZGlmZmVyZW50IGNvbG9ycyB0byBiYXJzXG5cbnQ8LXRhYmxlKHBhcnR5W3BhcnR5JE11c2ljPT0nSGlwSG9wJyxdJFBhcnR5KVxuXG5cbmJhcnBsb3QodCwgeGxhYj1cIlR5cGUgb2YgbXVzaWNcIiwgeWxhYj1cIk51bWJlciBvZiBQYXJ0aWVzXCIsIGNvbD1jb2xvcnMsXG4gICAgICAgIG1haW49XCJEaXN0cmlidXRpb24gb2YgcGFydHkgZXZhbHVhdGlvbiB3aGVuIEhpcEhvcCBtdXNpYyB3YXMgcGxheWVkXCIsIGJvcmRlcj1cImJsYWNrXCIpIn0=


You are now familiar with the Party dataset and it’s time to give your understanding a test. Please click the link below to get to the quiz and come back here to cross-check your answer in the snippet 14.3.4.

14.3.3 Party Data Quiz

Quiz Time

14.3.4 Check yourself

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJwYXJ0eTwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9QYXJ0eWIuY3N2XCIpXG5cbnN1bW1hcnkocGFydHkpIn0=

14.4 When election is truly local - data puzzle

Download: Voting1.csv

In local elections in some small towns, candidates of three local parties: Royalists, KnowNothings and Anarchists are running for the office of the mayor. DataMaker has generated a survey of thousands of town residents and their political sympathies. Data of course can not be more local, leaving global concerns such as inflation or global warming to national or state office candidates.

Here, the electorate cares about issues such as “should we allow leaflowers” (all, only electric, none?), what about CBD stores in town (none, just one, no restrictions), How about liquor (should the town be dry? Or hard liqueurs only). Speed limits? (none, 10mph etc) or even more extreme - the whole town being car-free, streets open only to bicycles and pedestrians?

Can we develop the profiles of voters for each of the parties? What does the anarchist electorate care about? Which party is leading among young people who do not want any speed limits in town?

Table 14.3: Snippet of Voting Dataset
LeafBlowers CBD GasMowers Party LiquerStores SpeedLimit Age
41 NoRestrictions OneStore ElectircOnly Royalists None NoLimits 94
794 NoRestrictions NoStores None KnowNothings HardLiquerOnly 25mph 65
2923 NoRestrictions NoStores None KnowNothings HardLiquerOnly 25mph 32
3900 NoRestrictions NoStores ElectircOnly Royalists None NoCars 99
3592 None NoStores NoRestrictions Anarchists None NoCars 44


Figiure 3: Barplot for party vote distribution 14.4.2.12

14.4.1 🧙 Secret patterns embedded by the Data Maker (hint)

In this local town elections where issues are very…local. Nobody cares here about wars, oil or tariffs with China. Issues of importance are leaf blowers (electric only? Not at all), mowers, SpeedLimit (none? 5mph, no cars at all?)CBD and Liquor stores in town? From a completely dry town to no restrictions whatsoever.

The radicals who want no speed limits and no restrictions on leaf blowers as well as opening CBD stores anywhere you want, predictably mostly vote for Anarchists. There is a local electorate strongly devoted to KnowNothings (this old party has resurrected itself in this small town).. There are voters who like Liquor stores to serve hard liquor only (real alcohol, no wimpy beers, wines) and voters who believe in very tight speed limits (below 5mph)!. Seniors (Age >65) tend to vote Royalists. What does it mean to be a Royalist in a small town only? Good question. We guess the office of mayor is inherited and belongs to the local blue bloods?.

14.4.2 Practice Snippets

14.4.2.1 Snippet 1: Get to know your data

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5cbmNvbG5hbWVzKHZvdGUpXG5ucm93KHZvdGUpXG5zdW1tYXJ5KHZvdGUpXG51bmlxdWUodm90ZSRMZWFmQmxvd2VycylcbnVuaXF1ZSh2b3RlJENCRClcbnVuaXF1ZSh2b3RlJEdhc01vd2VycylcbnVuaXF1ZSh2b3RlJFBhcnR5KVxudW5pcXVlKHZvdGUkTGlxdWVyU3RvcmVzKVxudW5pcXVlKHZvdGUkU3BlZWRMaW1pdClcblxudGFibGUodm90ZSRQYXJ0eSlcbnRhYmxlKHZvdGUkU3BlZWRMaW1pdClcbnRhYmxlKHZvdGUkTGVhZkJsb3dlcnMpXG50YWJsZSh2b3RlJEdhc01vd2VycykifQ==

14.4.2.2 Snippet 2

Q: Which party has the oldest constituents?
A: Royalists

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5cbnRhcHBseSh2b3RlJEFnZSwgdm90ZSRQYXJ0eSwgbWVhbikifQ==

14.4.2.3 Snippet 3

Q: How do voters who are against Gas Mowers vote?
A: Royalists

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5cbnRhYmxlKHZvdGVbdm90ZSRMZWFmQmxvd2Vycz09J05vbmUnLF0kUGFydHkpIn0=

14.4.2.4 Snippet 4

Q: How do KnowNothings voters vote on speed limits?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5cbnRhYmxlKHZvdGVbdm90ZSRQYXJ0eT09J0tub3dOb3RoaW5ncycsXSRTcGVlZExpbWl0KSJ9

14.4.2.5 Snippet 5

Q: What is the age of the oldest voter for KnowNothings?
A: 100

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5cbm1heCh2b3RlW3ZvdGUkUGFydHk9PSdLbm93Tm90aGluZ3MnLF0kQWdlKSJ9

14.4.2.6 Snippet 6

Q: Which party wins the most votes from supporters of no speed limits, no restrictions on CBD stores and Ban of Leaf Blowers?
A: Anarchists

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5cbnRhYmxlKHZvdGVbdm90ZSRTcGVlZExpbWl0ID09J05vTGltaXRzJyZ2b3RlJENCRD09J05vUmVzdHJpY3Rpb25zJyYgdm90ZSRMZWFmQmxvd2Vycz09J05vbmUnLCBdJFBhcnR5KSJ9

14.4.2.7 Snippet 7

Q: Which party wins the most votes of supporters of HardLiquerOnly Liquor stores?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5cbnRhYmxlKHZvdGVbdm90ZSRMaXF1ZXJTdG9yZXM9PSdIYXJkTGlxdWVyT25seScsIF0kUGFydHkpIn0=

14.4.2.8 Snippet 8

Q: What are the odds that a voter older than 65 will vote for Royalists?
A: Posterior Odds= 6.44
Likelihood ratio = 6.08
Prior Odds= 1.06

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5cblByaW9yPC1ucm93KHZvdGVbdm90ZSRQYXJ0eSA9PSdSb3lhbGlzdHMnLF0pL25yb3codm90ZSlcblByaW9yXG5Qcmlvck9kZHM8LXJvdW5kKFByaW9yLygxLVByaW9yKSwyKVxuUHJpb3JPZGRzXG5UcnVlUG9zaXRpdmU8LXJvdW5kKG5yb3codm90ZVt2b3RlJFBhcnR5PT0nUm95YWxpc3RzJyYgdm90ZSRBZ2U+NjUsXSkvbnJvdyh2b3RlW3ZvdGUkUGFydHk9PSdSb3lhbGlzdHMnLF0pLDIpXG5UcnVlUG9zaXRpdmVcbkZhbHNlUG9zaXRpdmU8LXJvdW5kKG5yb3codm90ZVt2b3RlJFBhcnR5IT0nUm95YWxpc3RzJyYgdm90ZSRBZ2U+NjUsXSkvbnJvdyh2b3RlW3ZvdGUkUGFydHkhPSdSb3lhbGlzdHMnLF0pLDIpXG5GYWxzZVBvc2l0aXZlXG5MaWtlbGlob29kUmF0aW88LXJvdW5kKFRydWVQb3NpdGl2ZS9GYWxzZVBvc2l0aXZlLDIpXG5MaWtlbGlob29kUmF0aW9cblBvc3Rlcmlvck9kZHMgPC1MaWtlbGlob29kUmF0aW8gKiBQcmlvck9kZHNcblBvc3Rlcmlvck9kZHNcblBvc3RlcmlvciA8LVBvc3Rlcmlvck9kZHMvKDErUG9zdGVyaW9yT2RkcylcblBvc3RlcmlvciJ9

14.4.2.9 Snippet 9

Q: Verify the hypothesis that the average age of Anarchists voters is higher than the average age of KnowNothings voters?
A: Positive. Reject of null hypothesis that average ages of Anarchists and KnowNothings voters are the same

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlBlcm11dGF0aW9uIDwtIGZ1bmN0aW9uKGRmMSxjMSxjMixuLHcxLHcyKXtcbiAgZGYgPC0gYXMuZGF0YS5mcmFtZShkZjEpXG4gIERfbnVsbDwtYygpXG4gIFYxPC1kZlssYzFdXG4gIFYyPC1kZlssYzJdXG4gIHN1Yi52YWx1ZTEgPC0gZGZbZGZbLCBjMV0gPT0gdzEsIGMyXVxuICBzdWIudmFsdWUyIDwtIGRmW2RmWywgYzFdID09IHcyLCBjMl1cbiAgRCA8LSAgYWJzKG1lYW4oc3ViLnZhbHVlMiwgbmEucm09VFJVRSkgLSBtZWFuKHN1Yi52YWx1ZTEsIG5hLnJtPVRSVUUpKVxuICBtPWxlbmd0aChWMSlcbiAgbD1sZW5ndGgoVjFbVjE9PXcyXSlcbiAgZm9yKGpqIGluIDE6bil7XG4gICAgbnVsbCA8LSByZXAodzEsbGVuZ3RoKFYxKSlcbiAgICBudWxsW3NhbXBsZShtLGwpXSA8LSB3MlxuICAgIG5mIDwtIGRhdGEuZnJhbWUoS2V5PW51bGwsIFZhbHVlPVYyKVxuICAgIG5hbWVzKG5mKSA8LSBjKFwiS2V5XCIsXCJWYWx1ZVwiKVxuICAgIHcxX251bGwgPC0gbmZbbmYkS2V5ID09IHcxLDJdXG4gICAgdzJfbnVsbCA8LSBuZltuZiRLZXkgPT0gdzIsMl1cbiAgICBEX251bGwgPC0gYyhEX251bGwsbWVhbih3Ml9udWxsLCBuYS5ybT1UUlVFKSAtIG1lYW4odzFfbnVsbCwgbmEucm09VFJVRSkpXG4gIH1cbiAgbXloaXN0PC1oaXN0KERfbnVsbCwgcHJvYj1UUlVFKVxuICBtdWx0aXBsaWVyIDwtIG15aGlzdCRjb3VudHMgLyBteWhpc3QkZGVuc2l0eVxuICBteWRlbnNpdHkgPC0gZGVuc2l0eShEX251bGwsIGFkanVzdD0yKVxuICBteWRlbnNpdHkkeSA8LSBteWRlbnNpdHkkeSAqIG11bHRpcGxpZXJbMV1cbiAgcGxvdChteWhpc3QpXG4gIGxpbmVzKG15ZGVuc2l0eSwgY29sPSdibHVlJylcbiAgYWJsaW5lKHY9RCwgY29sPSdyZWQnKVxuICBNPC1tZWFuKERfbnVsbD5EKVxuICByZXR1cm4oTSlcbn0iLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5cbm1lYW4odm90ZVt2b3RlJFBhcnR5PT0nQW5hcmNoaXN0cycsXSRBZ2UpIFxubWVhbih2b3RlW3ZvdGUkUGFydHk9PSdLbm93Tm90aGluZ3MnLF0kQWdlKSBcblxuUGVybXV0YXRpb24odm90ZSwgXCJQYXJ0eVwiLCBcIkFnZVwiLDEwMDAwLCBcIkFuYXJjaGlzdHNcIiwgXCJLbm93Tm90aGluZ3NcIikgIn0=

14.4.2.10 Snippet 10

Q: What is the most frequent position of Anarchists on the Speed Limit issue?
A: “No limits” is the most frequent position of Anarchists on Speed Limit issue

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5cbnRhYmxlKHZvdGVbdm90ZSRQYXJ0eT09J0FuYXJjaGlzdHMnLF0kU3BlZWRMaW1pdCkifQ==

14.4.2.11 Snippet 11

Q: Which party wins the most votes from supporters of Electric Leaf Blowers?
A: Royalists

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5cbnRhYmxlKHZvdGVbdm90ZSRMZWFmQmxvd2Vycz09J0VsZWN0cmljT25seScsXSRQYXJ0eSkifQ==

14.4.2.12 Snippet 12

Bar graph of Vote distribution among parties.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5jb2xvcnM8LSBjKCdyZWQnLCdibHVlJywnY3lhbicsJ3llbGxvdycsJ2dyZWVuJykgIyBBc3NpZ25pbmcgZGlmZmVyZW50IGNvbG9ycyB0byBiYXJzXG5cbnQ8LXRhYmxlKHZvdGUkUGFydHkpXG5cblxuYmFycGxvdCh0LHhsYWI9XCJQYXJ0eVwiLHlsYWI9XCJOdW1iZXIgb2Ygdm90ZXNcIixjb2w9Y29sb3JzLCBcbiAgICAgIG1haW49XCJCYXJwbG90IGZvciBwYXJ0eSB2b3RlIGRpc3RyaWJ1dGlvblwiLGJvcmRlcj1cImJsYWNrXCIpIn0=


You are now familiar with the Election dataset and it’s time to give your understanding a test. Please click the link below to get to the quiz and come back here to cross-check your answer in the snippet 14.4.4.

14.4.3 Election Data Quiz

Quiz Time

14.4.4 Check yourself

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ2b3RlPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1ZvdGluZzEuY3N2XCIpXG5cbnN1bW1hcnkodm90ZSkifQ==

14.5 Secrets of good sleep Data Puzzle

Download: SleepPrediction2.csv

Who wouldn’t want to know the secrets of good sleep? DataMaker has created a data set which may help to find these secrets. We store the number of exercise calories burnt during the day, the amount of wimpy tea a person has drunk (in ounces), hours spent on the computer and the quality of the preceding night’s sleep.

Table 14.4: Snippet of Sleep Dataset
Sleep ExerciseCal OnComputer WimpyTea RoomTemp Moon LastSleep
1497 Deep 928 6 3Cups 70 Dark Shallow
186 Deep 272 8 3Cups 70 Dark Deep
1482 Shallow 40 5 1Cup 73 Dark Deep
562 Deep 478 7 1Cup 63 Half Shallow
704 Little 987 1 3Cups 60 Full Deep


Figiure 4: Boxplot for sleep and calories distribution 14.5.2.6

14.5.1 🧙 Secret patterns embedded by the Data Maker (hint)

Insomnia (no sleep) occurs when the moon is full and there is a significant amount of exercise (above 300 Cal). Even when there was little exercise the full moon has a very negative influence on sleep - “Little” is the quality of sleep with full moon and with less than 300 calories spent exercising.

The secrets of deep sleep are scattered. Cold room and many cups of wimpy tea do the trick. Also, deep sleep comes when last night’s sleep was shallow and a significant number of hours were spent on the computer!

14.5.2 Practice Snippets

14.5.2.1 Snippet 1: Get to know your data

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJzbGVlcDwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9TbGVlcFByZWRpY3Rpb24yLmNzdlwiKVxuXG5jb2xuYW1lcyhzbGVlcClcbm5yb3coc2xlZXApXG5zdW1tYXJ5KHNsZWVwKVxudW5pcXVlKHNsZWVwJFNsZWVwKVxudW5pcXVlKHNsZWVwJFdpbXB5VGVhKVxudW5pcXVlKHNsZWVwJE1vb24pXG51bmlxdWUoc2xlZXAkTGFzdFNsZWVwKVxudGFibGUoc2xlZXAkU2xlZXApXG50YWJsZShzbGVlcCRXaW1weVRlYSlcbnRhYmxlKHNsZWVwJE1vb24pXG50YWJsZShzbGVlcCRMYXN0U2xlZXApIn0=

14.5.2.2 Snippet 2

Q: Is exercising more good for your sleep?
A: So and so. You either have Little sleep or deep sleep, much less likely to have shallow sleep

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJzbGVlcDwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9TbGVlcFByZWRpY3Rpb24yLmNzdlwiKVxuXG50YXBwbHkoc2xlZXAkRXhlcmNpc2VDYWwsIHNsZWVwJFNsZWVwLCBtZWFuKSJ9

14.5.2.3 Snippet 3

Q: What are the odds of Deep sleep when last day’s sleep was Shallow?
A: Posterior Odds= 15.27 (probability = 0.93!)
Likelihood Ratio = 5.25
Prior Odds = 2.91

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJzbGVlcDwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9TbGVlcFByZWRpY3Rpb24yLmNzdlwiKVxuXG5QcmlvcjwtbnJvdyhzbGVlcFtzbGVlcCRTbGVlcCA9PSdEZWVwJyxdKS9ucm93KHNsZWVwKVxuUHJpb3JcblByaW9yT2Rkczwtcm91bmQoUHJpb3IvKDEtUHJpb3IpLDIpXG5Qcmlvck9kZHNcblRydWVQb3NpdGl2ZTwtcm91bmQobnJvdyhzbGVlcFtzbGVlcCRTbGVlcD09J0RlZXAnJiBzbGVlcCRMYXN0U2xlZXA9PSdTaGFsbG93JyxdKS9ucm93KHNsZWVwW3NsZWVwJFNsZWVwPT0nRGVlcCcsXSksMilcblRydWVQb3NpdGl2ZVxuRmFsc2VQb3NpdGl2ZTwtcm91bmQobnJvdyhzbGVlcFtzbGVlcCRTbGVlcCE9J0RlZXAnJiBzbGVlcCRMYXN0U2xlZXA9PSdTaGFsbG93JyxdKS9ucm93KHNsZWVwW3NsZWVwJFNsZWVwIT0nRGVlcCcsXSksMilcbkZhbHNlUG9zaXRpdmVcbkxpa2VsaWhvb2RSYXRpbzwtcm91bmQoVHJ1ZVBvc2l0aXZlL0ZhbHNlUG9zaXRpdmUsMilcbkxpa2VsaWhvb2RSYXRpb1xuUG9zdGVyaW9yT2RkcyA8LUxpa2VsaWhvb2RSYXRpbyAqIFByaW9yT2Rkc1xuUG9zdGVyaW9yT2Rkc1xuUG9zdGVyaW9yIDwtUG9zdGVyaW9yT2Rkcy8oMStQb3N0ZXJpb3JPZGRzKVxuUG9zdGVyaW9yIn0=

14.5.2.4 Snippet 4

Q: Verify hypothesis that deep sleepers spend on average more time on the computer than Shallow sleepers?
A: Negative. Fail to reject null hypotheses that means are the same.

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlBlcm11dGF0aW9uIDwtIGZ1bmN0aW9uKGRmMSxjMSxjMixuLHcxLHcyKXtcbiAgZGYgPC0gYXMuZGF0YS5mcmFtZShkZjEpXG4gIERfbnVsbDwtYygpXG4gIFYxPC1kZlssYzFdXG4gIFYyPC1kZlssYzJdXG4gIHN1Yi52YWx1ZTEgPC0gZGZbZGZbLCBjMV0gPT0gdzEsIGMyXVxuICBzdWIudmFsdWUyIDwtIGRmW2RmWywgYzFdID09IHcyLCBjMl1cbiAgRCA8LSAgYWJzKG1lYW4oc3ViLnZhbHVlMiwgbmEucm09VFJVRSkgLSBtZWFuKHN1Yi52YWx1ZTEsIG5hLnJtPVRSVUUpKVxuICBtPWxlbmd0aChWMSlcbiAgbD1sZW5ndGgoVjFbVjE9PXcyXSlcbiAgZm9yKGpqIGluIDE6bil7XG4gICAgbnVsbCA8LSByZXAodzEsbGVuZ3RoKFYxKSlcbiAgICBudWxsW3NhbXBsZShtLGwpXSA8LSB3MlxuICAgIG5mIDwtIGRhdGEuZnJhbWUoS2V5PW51bGwsIFZhbHVlPVYyKVxuICAgIG5hbWVzKG5mKSA8LSBjKFwiS2V5XCIsXCJWYWx1ZVwiKVxuICAgIHcxX251bGwgPC0gbmZbbmYkS2V5ID09IHcxLDJdXG4gICAgdzJfbnVsbCA8LSBuZltuZiRLZXkgPT0gdzIsMl1cbiAgICBEX251bGwgPC0gYyhEX251bGwsbWVhbih3Ml9udWxsLCBuYS5ybT1UUlVFKSAtIG1lYW4odzFfbnVsbCwgbmEucm09VFJVRSkpXG4gIH1cbiAgbXloaXN0PC1oaXN0KERfbnVsbCwgcHJvYj1UUlVFKVxuICBtdWx0aXBsaWVyIDwtIG15aGlzdCRjb3VudHMgLyBteWhpc3QkZGVuc2l0eVxuICBteWRlbnNpdHkgPC0gZGVuc2l0eShEX251bGwsIGFkanVzdD0yKVxuICBteWRlbnNpdHkkeSA8LSBteWRlbnNpdHkkeSAqIG11bHRpcGxpZXJbMV1cbiAgcGxvdChteWhpc3QpXG4gIGxpbmVzKG15ZGVuc2l0eSwgY29sPSdibHVlJylcbiAgYWJsaW5lKHY9RCwgY29sPSdyZWQnKVxuICBNPC1tZWFuKERfbnVsbD5EKVxuICByZXR1cm4oTSlcbn0iLCJzYW1wbGUiOiJzbGVlcDwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9TbGVlcFByZWRpY3Rpb24yLmNzdlwiKVxuXG5tZWFuKHNsZWVwW3NsZWVwJFNsZWVwPT0nRGVlcCcsXSRPbkNvbXB1dGVyKVxubWVhbihzbGVlcFtzbGVlcCRTbGVlcD09J1NoYWxsb3cnLF0kT25Db21wdXRlcilcblxuUGVybXV0YXRpb24oc2xlZXAsIFwiU2xlZXBcIiwgXCJPbkNvbXB1dGVyXCIsMTAwMDAsIFwiRGVlcFwiLCBcIlNoYWxsb3dcIikifQ==

14.5.2.5 Snippet 5

Q: What is the highest Room temperature experienced by a Deep sleeper?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJzbGVlcDwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9TbGVlcFByZWRpY3Rpb24yLmNzdlwiKVxuXG5tYXgoc2xlZXBbc2xlZXAkU2xlZXA9PSdEZWVwJyxdJFJvb21UZW1wKSJ9

14.5.2.6 Snippet 6

Boxplot showing calories exercising in function of sleep attribute.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJzbGVlcDwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9TbGVlcFByZWRpY3Rpb24yLmNzdlwiKVxuY29sb3JzPC0gYygncmVkJywnYmx1ZScsJ2N5YW4nLCd5ZWxsb3cnLCdncmVlbicpICMgQXNzaWduaW5nIGRpZmZlcmVudCBjb2xvcnMgdG8gYmFyc1xuXG5ib3hwbG90KCBzbGVlcCRFeGVyY2lzZUNhbH5zbGVlcCRTbGVlcCwgeGxhYj1cIiBUeXBlIG9mIFNsZWVwXCIseWxhYj1cIkV4ZXJjaXNlIGNhbG9yaWVzXCIsY29sPWNvbG9ycywgXG4gICAgICAgIG1haW49XCJCb3hwbG90IGZvciBzbGVlcCBhbmQgY2Fsb3JpZXMgZGlzdHJpYnV0aW9uXCIsYm9yZGVyPVwiYmxhY2tcIikifQ==


You are now familiar with the Sleep dataset and it’s time to give your understanding a test. Please click the link below to get to the quiz and come back here to cross-check your answer in the snippet 14.5.4.

14.5.3 Sleep Data Quiz

Quiz Time

14.5.4 Check yourself

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJzbGVlcDwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC9TbGVlcFByZWRpY3Rpb24yLmNzdlwiKVxuXG5zdW1tYXJ5KHNsZWVwKSJ9

14.6 Let’s go to the movies: Data Puzzle

Download: Movies2022F-4.csv

Using DataMaker we have started with the imdb data set from Kaggle and embedded some patterns in it. The original data set contains data about 12,800+ movies. We have expanded this data set by DataMaker’s opinions. Yes, only DataMaker can have an opinion on each of 12,800 movies! Can you predict which movies does DataMaker love and which movies bore him so much that she quit? What movies DataMaker passionately hates (hmm is DataMaker even passionate about anything at all?).

When does DataMaker agree with the imdb score?

Can one predict an imdb score on the basis of a combination of DataMaker opinion (sort of super critic) and other attributes?

Table 14.5: Snippet of Movies Dataset
country content imdb_score Gross Budget genre
5345 UK R 7.84 Medium Medium Drama
7028 USA PG 5.89 Medium Medium Comedy
5222 USA PG-13 6.33 High Medium Comedy
8283 UK R 6.97 Medium Medium Action
535 USA PG 6.23 High High Family


Figiure 5: Boxplot for Genre and IMDB score distribution 14.6.2.9

14.6.1 🧙 Secret patterns embedded by the Data Maker (hint)

The documentaries (History) have a tendency to have higher imdb scores. The high gross Comedies do not get appreciated by imdb. And there are also patterns which link high gross movies to genre and content rating. They are easy to discover. Try!

14.6.2 Practice Snippets

14.6.2.1 Snippet 1: Get to know your data 🔊

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb3ZpZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvTW92aWVzMjAyMkYtNC5jc3ZcIilcblxuY29sbmFtZXMobW92aWVzKVxubnJvdyhtb3ZpZXMpXG5zdW1tYXJ5KG1vdmllcylcbnVuaXF1ZShtb3ZpZXMkY29udGVudClcbnVuaXF1ZShtb3ZpZXMkZ2VucmUpXG51bmlxdWUobW92aWVzJEdyb3NzKVxudW5pcXVlKG1vdmllcyRCdWRnZXQpXG5cblxudGFibGUobW92aWVzJGNvbnRlbnQpXG50YWJsZShtb3ZpZXMkZ2VucmUpXG50YWJsZShtb3ZpZXMkR3Jvc3MpXG50YWJsZShtb3ZpZXMkQnVkZ2V0KVxuXG50YXBwbHkobW92aWVzJGltZGJfc2NvcmUsIG1vdmllcyRjb250ZW50LCBtZWFuKVxudGFwcGx5KG1vdmllcyRpbWRiX3Njb3JlLCBtb3ZpZXMkY291bnRyeSwgbWVhbilcbnRhcHBseShtb3ZpZXMkaW1kYl9zY29yZSwgbW92aWVzJEdyb3NzLCBtZWFuKVxudGFwcGx5KG1vdmllcyRpbWRiX3Njb3JlLCBtb3ZpZXMkQnVkZ2V0LCBtZWFuKVxudGFwcGx5KG1vdmllcyRpbWRiX3Njb3JlLCBtb3ZpZXMkZ2VucmUsIG1lYW4pIn0=

14.6.2.2 Snippet 2

Q: What is the mean imdb of low budget comedies?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb3ZpZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvTW92aWVzMjAyMkYtNC5jc3ZcIilcblxubWVhbihtb3ZpZXNbbW92aWVzJEJ1ZGdldD09J0xvdycgJiBtb3ZpZXMkZ2VucmU9PSdDb21lZHknLCBdJGltZGJfc2NvcmUpIn0=

14.6.2.3 Snippet 3

Q: What is the lowest imdb score among high budget movies?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb3ZpZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvTW92aWVzMjAyMkYtNC5jc3ZcIilcblxubWluKG1vdmllc1ttb3ZpZXMkQnVkZ2V0PT0nSGlnaCcsXSRpbWRiKSJ9

14.6.2.4 Snippet 4

Q: How many low budget movies generated high gross income?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb3ZpZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvTW92aWVzMjAyMkYtNC5jc3ZcIilcblxubnJvdyhtb3ZpZXNbbW92aWVzJEJ1ZGdldD09J0xvdycgJiBtb3ZpZXMkR3Jvc3MgPT0nSGlnaCcsXSkifQ==

14.6.2.5 Snippet 5

Q: What is the least frequent genre among UK movies?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb3ZpZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvTW92aWVzMjAyMkYtNC5jc3ZcIilcblxudGFibGUobW92aWVzW21vdmllcyRjb3VudHJ5PT0nVUsnLF0kZ2VucmUsIG1vdmllc1ttb3ZpZXMkY291bnRyeT09J1VLJyxdJGNvdW50cnkpIn0=

14.6.2.6 Snippet 6

Q: Which content rating has the lowest average imdb score?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb3ZpZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvTW92aWVzMjAyMkYtNC5jc3ZcIilcblxudGFwcGx5KG1vdmllcyRpbWRiLCBtb3ZpZXMkY29udGVudCwgbWVhbikifQ==

14.6.2.7 Snippet 7

Q: Movies from which country have the largest average imdb score?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb3ZpZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvTW92aWVzMjAyMkYtNC5jc3ZcIilcblxuTUE8LWFnZ3JlZ2F0ZShtb3ZpZXMkaW1kYl9zY29yZSwgbGlzdChtb3ZpZXMkY291bnRyeSksIG1lYW4pXG5jb2xuYW1lcyhNQSk8LWMoXCJDb3VudHJ5XCIsIFwiTWltZGJcIilcbk1BPC1NQVtvcmRlcigtTUEkTWltZGIpLCBdXG5NQVsxLF0gIn0=

14.6.2.8 Snippet 8

Q: What are the odds that a High Budget Movie will have High Gross Income?
A: Prior Odds = 5.59
Likelihood Ratio = 5.08
Prior Odds = 1.11

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb3ZpZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvTW92aWVzMjAyMkYtNC5jc3ZcIilcblxuUHJpb3I8LW5yb3cobW92aWVzW21vdmllcyRHcm9zcyA9PSdIaWdoJyxdKS9ucm93KG1vdmllcylcblByaW9yXG5Qcmlvck9kZHM8LXJvdW5kKFByaW9yLygxLVByaW9yKSwyKVxuUHJpb3JPZGRzXG5UcnVlUG9zaXRpdmU8LXJvdW5kKG5yb3cobW92aWVzW21vdmllcyRHcm9zcz09J0hpZ2gnJiBtb3ZpZXMkQnVkZ2V0PT0nSGlnaCcsXSkvbnJvdyhtb3ZpZXNbbW92aWVzJEdyb3NzPT0nSGlnaCcsXSksMilcblRydWVQb3NpdGl2ZVxuRmFsc2VQb3NpdGl2ZTwtcm91bmQobnJvdyhtb3ZpZXNbbW92aWVzJEdyb3NzIT0nSGlnaCcmIG1vdmllcyRCdWRnZXQ9PSdIaWdoJyxdKS9ucm93KG1vdmllc1ttb3ZpZXMkR3Jvc3MhPSdIaWdoJyxdKSwyKVxuRmFsc2VQb3NpdGl2ZVxuTGlrZWxpaG9vZFJhdGlvPC1yb3VuZChUcnVlUG9zaXRpdmUvRmFsc2VQb3NpdGl2ZSwyKVxuTGlrZWxpaG9vZFJhdGlvXG5Qb3N0ZXJpb3JPZGRzIDwtTGlrZWxpaG9vZFJhdGlvICogUHJpb3JPZGRzXG5Qb3N0ZXJpb3JPZGRzIn0=

14.6.2.9 Snippet 9

Boxplot of imdb scores for different genres of the movies

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb3ZpZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvTW92aWVzMjAyMkYtNC5jc3ZcIilcbmNvbG9yczwtIGMoJ3JlZCcsJ2JsdWUnLCdjeWFuJywneWVsbG93JywnZ3JlZW4nLCdicm93bicpICMgQXNzaWduaW5nIGRpZmZlcmVudCBjb2xvcnMgdG8gYmFyc1xuXG5ib3hwbG90KG1vdmllcyRpbWRiX3Njb3Jlfm1vdmllcyRnZW5yZSwgeGxhYj1cIiBHZW5yZVwiLHlsYWI9XCJJTURCIHNjb3JlXCIsY29sPWNvbG9ycywgXG4gICAgICAgIG1haW49XCJCb3hwbG90IGZvciBHZW5yZSBhbmQgSU1EQiBzY29yZSBkaXN0cmlidXRpb25cIixib3JkZXI9XCJibGFja1wiKSJ9


You are now familiar with the Movies dataset and it’s time to give your understanding a test. Please click the link below to get to the quiz and come back here to cross-check your answer in the snippet 14.6.4.

14.6.3 Movies Data Quiz

Quiz Time

14.6.4 Check yourself

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtb3ZpZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvTW92aWVzMjAyMkYtNC5jc3ZcIilcblxuc3VtbWFyeShtb3ZpZXMpIn0=

14.7 When canvas goes wild data puzzle

Download: Canvas1.csv

You are all familiar with Canvas, right? This is where you look to see your grades for each assignment and exam. This is where you see the scores. However it seems that Canvas went a bit wild and unfair in this data set.One can still fail the class with the score of 82 (sounds familiar, yes, Professor Moody would do it, but Canvas?

How can one get a lower grade with a higher score?

Yes, Canvas was instructed by someone and your goal is to discover the grading method. How to get an A, how to pass? We know who that someone is… it is DataMaker of course.

Table 14.6: Snippet of Canvas Dataset
Homeworks Exams Score Grade
197 30 7 27.7 F
1865 57 86 59.9 C
623 83 74 82.1 A
868 68 63 67.5 B
1835 30 2 27.2 F


Figiure 6: Grade distribution for students who passed the homeoworks but failed exam 14.7.2.8

14.7.1 🧙 Secret patterns embedded by the Data Maker (hint)

Canvas allows professors to build linear formulas to provide weights for homeoworks, projects and exams and calculate the final score which is then mapped to grades, the usual; way, say A is over score of 90, B over score of 75 etc. This canvas method however has also a decision tree embedded addressing rare but troubling cases when a student’s score was very high on homoworks but low on the final exam. Even if the global weight of the final exam is small, say only 15%, the professor is concerned that if the score is very low on the final - say less than 20% of the maximum final score. Even if the overall score is in A range should such a student get an A?

The method here is very cruel and would for sure be objected to by many students :-). No matter how high your final score is, if your final exam is low, you may actually fail the class!

Check out what is the weight of the final exam? And what is the threshold of Fail, no matter what? Is there such a threshold that if your final exam score falls under it, you fail no matter what?

14.7.2 Practice Snippets

14.7.2.1 Snippet 1: Get to know your data

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJncmFkZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvQ2FudmFzMS5jc3ZcIilcblxuY29sbmFtZXMoZ3JhZGVzKVxubnJvdyhncmFkZXMpXG5zdW1tYXJ5KGdyYWRlcylcbnVuaXF1ZShncmFkZXMkR3JhZGUpXG50YWJsZShncmFkZXMkR3JhZGUpXG50YXBwbHkoZ3JhZGVzJEhvbWV3b3JrcywgZ3JhZGVzJEdyYWRlLCBtZWFuKVxudGFwcGx5KGdyYWRlcyRFeGFtcywgZ3JhZGVzJEdyYWRlLCBtZWFuKVxudGFwcGx5KGdyYWRlcyRTY29yZSwgZ3JhZGVzJEdyYWRlLCBtZWFuKSJ9

14.7.2.2 Snippet 2

Q: What is the distribution of possible grades when a student’s total scores is over 80?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJncmFkZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvQ2FudmFzMS5jc3ZcIilcblxudGFibGUoZ3JhZGVzW2dyYWRlcyRTY29yZSA+IDgwLF0kR3JhZGUpIn0=

14.7.2.3 Snippet 3

Q: Previous snippets showed that you can only get an A or an F with a score over 80. How can you get an F? This snippet helps to answer this question.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJncmFkZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvQ2FudmFzMS5jc3ZcIilcblxudGFibGUoZ3JhZGVzW2dyYWRlcyRTY29yZSA+IDgwICYgZ3JhZGVzJEV4YW1zID40MCxdJEdyYWRlKSJ9

14.7.2.4 Snippet 4

Q: What is the worst exam score of a student with final grade A?
A: 25

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJncmFkZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvQ2FudmFzMS5jc3ZcIilcblxubWluKGdyYWRlc1tncmFkZXMkR3JhZGUgPT0nQScsXSRFeGFtcykifQ==

14.7.2.5 Snippet 5

Q: What are the odds of getting an A with an Exams score above 60?
A: Posterior Odds = 0.45
Likelihood Ratio = 4.75
Prior Odds = 0.17

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJncmFkZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvQ2FudmFzMS5jc3ZcIilcblxuUHJpb3I8LW5yb3coZ3JhZGVzW2dyYWRlcyRHcmFkZSA9PSdBJyxdKS9ucm93KGdyYWRlcylcblByaW9yXG5Qcmlvck9kZHM8LXJvdW5kKFByaW9yLygxLVByaW9yKSwyKVxuUHJpb3JPZGRzXG5UcnVlUG9zaXRpdmU8LXJvdW5kKG5yb3coZ3JhZGVzW2dyYWRlcyRHcmFkZT09J0EnJiBncmFkZXMkRXhhbXM+NjAsXSkvbnJvdyhncmFkZXNbZ3JhZGVzJEdyYWRlPT0nQScsXSksMilcblRydWVQb3NpdGl2ZVxuRmFsc2VQb3N0aXZlPC1yb3VuZChucm93KGdyYWRlc1tncmFkZXMkR3JhZGUhPSdBJyYgZ3JhZGVzJEV4YW1zPDYwLF0pL25yb3coZ3JhZGVzW2dyYWRlcyRHcmFkZXMhPSdBJyxdKSwyKVxuRmFsc2VQb3NpdGl2ZVxuTGlrZWxpaG9vZFJhdGlvPC1yb3VuZChUcnVlUG9zaXRpdmUvRmFsc2VQb3NpdGl2ZSwyKVxuTGlrZWxpaG9vZFJhdGlvXG5Qb3N0ZXJpb3JPZGRzIDwtTGlrZWxpaG9vZFJhdGlvICogUHJpb3JPZGRzXG5Qb3N0ZXJpb3JPZGRzXG5Qb3N0ZXJpb3IgPC1Qb3N0ZXJpb3JPZGRzLygxK1Bvc3Rlcmlvck9kZHMpXG5Qb3N0ZXJpb3IifQ==

14.7.2.6 Snippet 6

Q: Verify Hypothesis that Mean exam score for B students is higher than mean exam score for C students. What is the p-value?
A: Negative. We fail to reject the null hypothesis with p=0.23

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlBlcm11dGF0aW9uIDwtIGZ1bmN0aW9uKGRmMSxjMSxjMixuLHcxLHcyKXtcbiAgZGYgPC0gYXMuZGF0YS5mcmFtZShkZjEpXG4gIERfbnVsbDwtYygpXG4gIFYxPC1kZlssYzFdXG4gIFYyPC1kZlssYzJdXG4gIHN1Yi52YWx1ZTEgPC0gZGZbZGZbLCBjMV0gPT0gdzEsIGMyXVxuICBzdWIudmFsdWUyIDwtIGRmW2RmWywgYzFdID09IHcyLCBjMl1cbiAgRCA8LSAgYWJzKG1lYW4oc3ViLnZhbHVlMiwgbmEucm09VFJVRSkgLSBtZWFuKHN1Yi52YWx1ZTEsIG5hLnJtPVRSVUUpKVxuICBtPWxlbmd0aChWMSlcbiAgbD1sZW5ndGgoVjFbVjE9PXcyXSlcbiAgZm9yKGpqIGluIDE6bil7XG4gICAgbnVsbCA8LSByZXAodzEsbGVuZ3RoKFYxKSlcbiAgICBudWxsW3NhbXBsZShtLGwpXSA8LSB3MlxuICAgIG5mIDwtIGRhdGEuZnJhbWUoS2V5PW51bGwsIFZhbHVlPVYyKVxuICAgIG5hbWVzKG5mKSA8LSBjKFwiS2V5XCIsXCJWYWx1ZVwiKVxuICAgIHcxX251bGwgPC0gbmZbbmYkS2V5ID09IHcxLDJdXG4gICAgdzJfbnVsbCA8LSBuZltuZiRLZXkgPT0gdzIsMl1cbiAgICBEX251bGwgPC0gYyhEX251bGwsbWVhbih3Ml9udWxsLCBuYS5ybT1UUlVFKSAtIG1lYW4odzFfbnVsbCwgbmEucm09VFJVRSkpXG4gIH1cbiAgbXloaXN0PC1oaXN0KERfbnVsbCwgcHJvYj1UUlVFKVxuICBtdWx0aXBsaWVyIDwtIG15aGlzdCRjb3VudHMgLyBteWhpc3QkZGVuc2l0eVxuICBteWRlbnNpdHkgPC0gZGVuc2l0eShEX251bGwsIGFkanVzdD0yKVxuICBteWRlbnNpdHkkeSA8LSBteWRlbnNpdHkkeSAqIG11bHRpcGxpZXJbMV1cbiAgcGxvdChteWhpc3QpXG4gIGxpbmVzKG15ZGVuc2l0eSwgY29sPSdibHVlJylcbiAgYWJsaW5lKHY9RCwgY29sPSdyZWQnKVxuICBNPC1tZWFuKERfbnVsbD5EKVxuICByZXR1cm4oTSlcbn0iLCJzYW1wbGUiOiJncmFkZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvQ2FudmFzMS5jc3ZcIilcblxubWVhbihncmFkZXNbZ3JhZGVzJEdyYWRlPT0nQicsXSRFeGFtcylcbm1lYW4oZ3JhZGVzW2dyYWRlcyRHcmFkZT09J0MnLF0kRXhhbXMpXG5cblBlcm11dGF0aW9uKGdyYWRlcywgXCJHcmFkZVwiLCBcIkV4YW1zXCIsMTAwMDAsIFwiQ1wiLCBcIkJcIikgIn0=

14.7.2.7 Snippet 7

Q: What is the chance of getting an A when you score less than 50 on exams?
A: 0.093

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJncmFkZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvQ2FudmFzMS5jc3ZcIilcblxubnJvdyhncmFkZXNbZ3JhZGVzJEdyYWRlID09J0EnICYgZ3JhZGVzJEV4YW1zIDwgNTAsXSkvbnJvdyhncmFkZXNbZ3JhZGVzJEV4YW1zIDwgNTAsXSkifQ==

14.7.2.8 Snippet 8

Boxplot of Exams Score w.r.t. Grades.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJncmFkZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvQ2FudmFzMS5jc3ZcIilcbmNvbG9yczwtIGMoJ3JlZCcsJ2JsdWUnLCdjeWFuJywneWVsbG93JywnZ3JlZW4nKSAjIEFzc2lnbmluZyBkaWZmZXJlbnQgY29sb3JzIHRvIGJhcnNcblxudCA8LSB0YWJsZShncmFkZXNbZ3JhZGVzJEV4YW1zIDw1MCAmIGdyYWRlcyRTY29yZSA+NTAsXSRHcmFkZSlcblxuYmFycGxvdCh0LCB4bGFiPVwiR3JhZGVzXCIseWxhYj1cIiBOdW1iZXIgb2Ygc3R1ZGVudHNcIixjb2w9Y29sb3JzLCBcbiAgICAgICAgbWFpbj1cIkdyYWRlIGRpc3RyaWJ1dGlvbiBmb3Igc3R1ZGVudHMgd2hvIHBhc3NlZCB0aGUgaG9tZW93b3JrcyBidXQgZmFpbGVkIGV4YW1cIixib3JkZXI9XCJibGFja1wiKSJ9


You are now familiar with the Canvas dataset and it’s time to give your understanding a test. Please click the link below to get to the quiz and come back here to cross-check your answer in the snippet 14.7.4.

14.7.3 Canvas Data Quiz

Quiz Time

14.7.4 Check yourself

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJncmFkZXM8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvQ2FudmFzMS5jc3ZcIilcblxuc3VtbWFyeShncmFkZXMpIn0=

14.8 Very local minimarket data puzzle

Download: HomeworkMarket2022.csv

What items sell together?

A small local minimarket chain (think Wawa at its early days) has a few locations in New Jersey and it sells beer, snacks, sweets. DataMaker provided the data set of several thousand of transactions in the minimarket storing what items were purchased, when they were purchased (weekday or weekend) at which location.

Table 14.7: Snippet of Minimarket Dataset
Beer Day Location SoftDrinks Sweets Wine Snacks
10690 Lager Weekday Edison Cola Milky Way Red None
9450 Lager Weekday Princeton Cola Milky Way Red Pretzels
14090 None Weekend Metuchen Cola Snickers Red Pretzels
3302 None Weekend Metuchen Cola Milky Way None Crackers
9255 Lager Weekend New Brunswick Sprite Snickers White None


Figiure 7: Distribution of snacks sales among Weekend buyers of Lager in New Brunswick 14.8.2.5

14.8.1 🧙 Secret patterns embedded by DataMaker

Boundless analytics (last section) has discovered the embedded patterns in the mini market data. The golden pattern which was embedded was a subset of Lager buyers on the weekends in New Brunswick. These small subset of uses have a distinctly different distribution of snacks. Their taste for snacks is remarkably different among these transactions than the general distribution of snack sales over all transactions. There are many other associations similar to this one - for example Cola and Popcorn buyers in Princeton. We call these subsets - slices. There are many slices in this data which imply different distributions of snacks, sweets, soft drinks, wines, beers, locations and even weekday/weekend timing of the sale.

14.8.2 Practice Snippets

14.8.2.1 Snippet 1: Get to know your data

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtYXJrZXQ8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvSG9tZXdvcmtNYXJrZXQyMDIyLmNzdlwiKVxuXG5jb2xuYW1lcyhtYXJrZXQpXG5ucm93KG1hcmtldClcbnN1bW1hcnkobWFya2V0KVxudW5pcXVlKG1hcmtldCREYXkpXG51bmlxdWUobWFya2V0JExvY2F0aW9uKVxudW5pcXVlKG1hcmtldCRCZWVyKVxudW5pcXVlKG1hcmtldCRTb2Z0RHJpbmtzKVxudW5pcXVlKG1hcmtldCRTd2VldHMpXG51bmlxdWUobWFya2V0JFdpbmUpXG51bmlxdWUobWFya2V0JFNuYWNrcylcbnRhYmxlKG1hcmtldCREYXkpXG50YWJsZShtYXJrZXQkTG9jYXRpb24pXG50YWJsZShtYXJrZXQkQmVlcilcbnRhYmxlKG1hcmtldCRTb2Z0RHJpbmtzKVxudGFibGUobWFya2V0JFN3ZWV0cylcbnRhYmxlKG1hcmtldCRXaW5lKVxudGFibGUobWFya2V0JFNuYWNrcykifQ==

14.8.2.2 Snippet 2

Q: What are the odds that a customer in New Brunswick buys Lager on a weekend?
A: Posterior Odds = 0.53
Likelihood Ratio = 1.08
Prior odds = 0.49

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtYXJrZXQ8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvSG9tZXdvcmtNYXJrZXQyMDIyLmNzdlwiKVxuXG5QcmlvcjwtbnJvdyhtYXJrZXRbbWFya2V0JEJlZXIgPT0nTGFnZXInLF0pL25yb3cobWFya2V0KVxuUHJpb3JcblByaW9yT2Rkczwtcm91bmQoUHJpb3IvKDEtUHJpb3IpLDIpXG5Qcmlvck9kZHNcblRydWVQb3NpdGl2ZTwtcm91bmQobnJvdyhtYXJrZXRbbWFya2V0JEJlZXI9PSdMYWdlcicmIG1hcmtldCRMb2NhdGlvbj09J05ldyBCcnVuc3dpY2snICYgbWFya2V0JERheSA9PSdXZWVrZW5kJyxdKS9ucm93KG1hcmtldFttYXJrZXQkQmVlcj09J0xhZ2VyJyxdKSwyKVxuVHJ1ZVBvc2l0aXZlXG5GYWxzZVBvc2l0aXZlPC1yb3VuZChucm93KG1hcmtldFttYXJrZXQkQmVlciE9J0xhZ2VyJyYgbWFya2V0JExvY2F0aW9uPT0nTmV3IEJydW5zd2ljaycgJiBtYXJrZXQkRGF5ID09J1dlZWtlbmQnLF0pL25yb3cobWFya2V0W21hcmtldCRCZWVyIT0nTGFnZXInLF0pLDIpXG5GYWxzZVBvc2l0aXZlXG5MaWtlbGlob29kUmF0aW88LXJvdW5kKFRydWVQb3NpdGl2ZS9GYWxzZVBvc2l0aXZlLDIpXG5MaWtlbGlob29kUmF0aW9cblBvc3Rlcmlvck9kZHMgPC1MaWtlbGlob29kUmF0aW8gKiBQcmlvck9kZHNcblBvc3Rlcmlvck9kZHNcblBvc3RlcmlvciA8LVBvc3Rlcmlvck9kZHMvKDErUG9zdGVyaW9yT2RkcylcblBvc3RlcmlvciJ9

14.8.2.3 Snippet 3

Q: What is the most frequent location of Lager purchases?
A: Princeton is the most frequent location where Lager is sold

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtYXJrZXQ8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvSG9tZXdvcmtNYXJrZXQyMDIyLmNzdlwiKVxuXG50YWJsZShtYXJrZXRbbWFya2V0JEJlZXIgPT0nTGFnZXInLF0kTG9jYXRpb24pIn0=

14.8.2.4 Snippet 4

Q: Is distribution of purchases of snacks among Weekend buyers of Lager in New Brunswick different from base distribution of snacks?
A: yes, very different

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtYXJrZXQ8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvSG9tZXdvcmtNYXJrZXQyMDIyLmNzdlwiKVxuXG5tYXJrZXQkSU48LSdPdXRfU2xpY2UnXG5tYXJrZXRbbWFya2V0JEJlZXI9PSdMYWdlcicgJiBtYXJrZXQkRGF5PT0nV2Vla2VuZCcgJiAgbWFya2V0JExvY2F0aW9uID09J05ldyBCcnVuc3dpY2snLCBdJElOPC0nSW5fU2xpY2UnXG5kPC10YWJsZShtYXJrZXQkU25hY2tzLG1hcmtldCRJTilcbmNoaXNxLnRlc3QoZCkifQ==

14.8.2.5 Snippet 5

Bar plot to check market of snaks w.r.t. beer, day and location.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtYXJrZXQ8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvSG9tZXdvcmtNYXJrZXQyMDIyLmNzdlwiKVxuY29sb3JzPC0gYygncmVkJywnYmx1ZScsJ2N5YW4nLCd5ZWxsb3cnLCdncmVlbicpICMgQXNzaWduaW5nIGRpZmZlcmVudCBjb2xvcnMgdG8gYmFyc1xuXG50PC10YWJsZShtYXJrZXRbbWFya2V0JEJlZXI9PSdMYWdlcicgJm1hcmtldCREYXk9PSdXZWVrZW5kJyAmbWFya2V0JExvY2F0aW9uPT0nTmV3IEJydW5zd2ljaycsXSRTbmFja3MpXG5cbmJhcnBsb3QodCwgeGxhYj1cIlR5cGUgb2YgU25hY2tcIix5bGFiPVwiU2FsZXNcIixjb2w9Y29sb3JzLCBcbiAgICAgICAgbWFpbj1cIkRpc3RyaWJ1dGlvbiBvZiBzbmFja3Mgc2FsZXMgYW1vbmcgV2Vla2VuZCBidXllcnMgb2YgTGFnZXIgaW4gTmV3IEJydW5zd2lja1wiLGJvcmRlcj1cImJsYWNrXCIpIn0=


You are now familiar with the MiniMarket dataset and it’s time to give your understanding a test. Please click the link below to get to the quiz and come back here to cross-check your answer in the snippet 14.8.4.

14.8.3 MiniMarket Data Quiz

Quiz Time

14.8.4 Check yourself

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJtYXJrZXQ8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvSG9tZXdvcmtNYXJrZXQyMDIyLmNzdlwiKVxuXG5zdW1tYXJ5KG1hcmtldCkifQ==


Now it’s time for two real data sets - the airbnb data set and titanic sinking data set, both from Kaggle. These data sets have been cleaned before, this is why we do not have to spend our time on data wrangling!

14.9 Airbnb data puzzle

Download: airbnb.csv

The airbnb data set (Kaggle) stores around 30,000 plus data points about airbnb prices in NYC. We have modified the original set a little bit (we can’t stop!) adding the floor where the department is located to the existing attributes such as Room type, neighbourhood_group (boroughs), specific neighborhood and price.

Table 14.8: Snippet of Airbnb Dataset
id name host_name neighbourhood_group neighbourhood room_type floor price
23849 34943918 Enjoy Brooklyn… visit Manhattan Salomé Brooklyn Bedford-Stuyvesant Entire home/apt 1 196.55355
33635 59709 Artistic, Cozy, and Spacious w/ Patio! Sleeps 5 Ricardo & Ashlie Manhattan Chinatown Entire home/apt 1 285.73546
19865 766814 Adorable Midtown West Studio! Caitlin Manhattan Hell’s Kitchen Entire home/apt 1 284.20414
12700 9564986 Super Cozy Room in Floor Through Apartment Sandy Brooklyn Williamsburg Private room 1 97.79482
18024 30749411 Manhattan huge bedroom, with PRIVATE BATHROOM! Francesco Manhattan Two Bridges Private room 16 262.55557


Figiure 8: Box plot for Price and neighbourhood distribution 14.9.1.6

14.9.1 Practice Snippets

14.9.1.1 Snippet 1: Get to know your data

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJhaXJibmI8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvYWlyYm5iLmNzdlwiKVxuXG5ucm93KGFpcmJuYilcbnN1bW1hcnkoYWlyYm5iKVxudW5pcXVlKGFpcmJuYiRuZWlnaGJvdXJob29kX2dyb3VwKVxudW5pcXVlKGFpcmJuYiRuZWlnaGJvdXJob29kKVxudW5pcXVlKGFpcmJuYiRyb29tX3R5cGUpXG51bmlxdWUoYWlyYm5iJGZsb29yKVxudGFibGUoYWlyYm5iJG5laWdoYm91cmhvb2RfZ3JvdXApXG50YWJsZShhaXJibmIkbmVpZ2hib3VyaG9vZClcbnRhYmxlKGFpcmJuYiRyb29tX3R5cGUpXG50YWJsZShhaXJibmIkZmxvb3IpXG50YXBwbHkoYWlyYm5iJHByaWNlLCBhaXJibmIkZmxvb3IsIG1lYW4pXG50YXBwbHkoYWlyYm5iJHByaWNlLCBhaXJibmIkbmVpZ2hib3VyaG9vZCwgbWVhbilcbnRhcHBseShhaXJibmIkcHJpY2UsIGFpcmJuYiRyb29tX3R5cGUsIG1lYW4pIn0=

14.9.1.2 Snippet 2

Q: What is the price of the cheapest entire home/apt in Tribeca?
A: $284

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJhaXJibmI8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvYWlyYm5iLmNzdlwiKVxuXG5taW4oYWlyYm5iW2FpcmJuYiRyb29tX3R5cGU9PSdFbnRpcmUgaG9tZS9hcHQnICZhaXJibmIkbmVpZ2hib3VyaG9vZD09J1RyaWJlY2EnLF0kcHJpY2UpIn0=

14.9.1.3 Snippet 3

Q: What is the lowest price of accommodation above the 10th floor in Manhattan?
A: $184

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJhaXJibmI8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvYWlyYm5iLmNzdlwiKVxuXG5taW4oYWlyYm5iW2FpcmJuYiRmbG9vciA+MTAgJmFpcmJuYiRuZWlnaGJvdXJob29kX2dyb3VwPT0nTWFuaGF0dGFuJyxdJHByaWNlKSJ9

14.9.1.4 Snippet 4

Q: What are the odds of finding a place for less than $200 in Tribeca?
A: Posterior Odds = 0.09
Prior Odds = 0.86
Likelihood Ratio = 0.11

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJhaXJibmI8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvYWlyYm5iLmNzdlwiKVxuXG5QcmlvcjwtbnJvdyhhaXJibmJbYWlyYm5iJHByaWNlPDIwMCxdKS9ucm93KGFpcmJuYilcblByaW9yXG5Qcmlvck9kZHM8LXJvdW5kKFByaW9yLygxLVByaW9yKSwyKVxuUHJpb3JPZGRzXG5ucm93KGFpcmJuYlthaXJibmIkcHJpY2U8MjAwJiBhaXJibmIkbmVpZ2hib3VyaG9vZD09J1RyaWJlY2EnLF0pXG5ucm93KGFpcmJuYlthaXJibmIkbmVpZ2hib3VyaG9vZD09J1RyaWJlY2EnLF0pXG5UcnVlUG9zaXRpdmU8LXJvdW5kKG5yb3coYWlyYm5iW2FpcmJuYiRwcmljZTwyMDAmIGFpcmJuYiRuZWlnaGJvdXJob29kPT0nVHJpYmVjYScsXSkvbnJvdyhhaXJibmJbYWlyYm5iJHByaWNlPDIwMCxdKSw1KVxuVHJ1ZVBvc2l0aXZlXG5GYWxzZVBvc2l0aXZlPC1yb3VuZChucm93KGFpcmJuYlthaXJibmIkcHJpY2U+MjAwJiBhaXJibmIkbmVpZ2hib3VyaG9vZD09J1RyaWJlY2EnLF0pL25yb3coYWlyYm5iW2FpcmJuYiRwcmljZT4yMDAsXSksNSlcbkZhbHNlUG9zaXRpdmVcbkxpa2VsaWhvb2RSYXRpbzwtcm91bmQoVHJ1ZVBvc2l0aXZlL0ZhbHNlUG9zaXRpdmUsNClcbkxpa2VsaWhvb2RSYXRpb1xuUG9zdGVyaW9yT2RkcyA8LUxpa2VsaWhvb2RSYXRpbyAqIFByaW9yT2Rkc1xuUG9zdGVyaW9yT2Rkc1xuUG9zdGVyaW9yIDwtUG9zdGVyaW9yT2Rkcy8oMStQb3N0ZXJpb3JPZGRzKVxuUG9zdGVyaW9yIn0=

14.9.1.5 Snippet 5

Q: Verify hypothesis that West Village is more expensive than Upper East Side?
A: Positive. Null hypothesis is rejected with the p value p < 0.0001

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlBlcm11dGF0aW9uIDwtIGZ1bmN0aW9uKGRmMSxjMSxjMixuLHcxLHcyKXtcbiAgZGYgPC0gYXMuZGF0YS5mcmFtZShkZjEpXG4gIERfbnVsbDwtYygpXG4gIFYxPC1kZlssYzFdXG4gIFYyPC1kZlssYzJdXG4gIHN1Yi52YWx1ZTEgPC0gZGZbZGZbLCBjMV0gPT0gdzEsIGMyXVxuICBzdWIudmFsdWUyIDwtIGRmW2RmWywgYzFdID09IHcyLCBjMl1cbiAgRCA8LSAgYWJzKG1lYW4oc3ViLnZhbHVlMiwgbmEucm09VFJVRSkgLSBtZWFuKHN1Yi52YWx1ZTEsIG5hLnJtPVRSVUUpKVxuICBtPWxlbmd0aChWMSlcbiAgbD1sZW5ndGgoVjFbVjE9PXcyXSlcbiAgZm9yKGpqIGluIDE6bil7XG4gICAgbnVsbCA8LSByZXAodzEsbGVuZ3RoKFYxKSlcbiAgICBudWxsW3NhbXBsZShtLGwpXSA8LSB3MlxuICAgIG5mIDwtIGRhdGEuZnJhbWUoS2V5PW51bGwsIFZhbHVlPVYyKVxuICAgIG5hbWVzKG5mKSA8LSBjKFwiS2V5XCIsXCJWYWx1ZVwiKVxuICAgIHcxX251bGwgPC0gbmZbbmYkS2V5ID09IHcxLDJdXG4gICAgdzJfbnVsbCA8LSBuZltuZiRLZXkgPT0gdzIsMl1cbiAgICBEX251bGwgPC0gYyhEX251bGwsbWVhbih3Ml9udWxsLCBuYS5ybT1UUlVFKSAtIG1lYW4odzFfbnVsbCwgbmEucm09VFJVRSkpXG4gIH1cbiAgbXloaXN0PC1oaXN0KERfbnVsbCwgcHJvYj1UUlVFKVxuICBtdWx0aXBsaWVyIDwtIG15aGlzdCRjb3VudHMgLyBteWhpc3QkZGVuc2l0eVxuICBteWRlbnNpdHkgPC0gZGVuc2l0eShEX251bGwsIGFkanVzdD0yKVxuICBteWRlbnNpdHkkeSA8LSBteWRlbnNpdHkkeSAqIG11bHRpcGxpZXJbMV1cbiAgcGxvdChteWhpc3QpXG4gIGxpbmVzKG15ZGVuc2l0eSwgY29sPSdibHVlJylcbiAgYWJsaW5lKHY9RCwgY29sPSdyZWQnKVxuICBNPC1tZWFuKERfbnVsbD5EKVxuICByZXR1cm4oTSlcbn0iLCJzYW1wbGUiOiJhaXJibmI8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvYWlyYm5iLmNzdlwiKVxuXG5QZXJtdXRhdGlvbihhaXJibmIsIFwibmVpZ2hib3VyaG9vZFwiLCBcInByaWNlXCIsMTAwMDAsIFwiV2VzdCBWaWxsYWdlXCIsIFwiVXBwZXIgRWFzdCBTaWRlXCIpIn0=

14.9.1.6 Snippet 6

Box plot to check price w.r.t. neighbourhood.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJhaXJibmI8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvYWlyYm5iLmNzdlwiKVxuY29sb3JzPC0gYygncmVkJywnYmx1ZScsJ2N5YW4nLCd5ZWxsb3cnLCdncmVlbicpICMgQXNzaWduaW5nIGRpZmZlcmVudCBjb2xvcnMgdG8gYmFyc1xuXG5ib3hwbG90KGFpcmJuYiRwcmljZSB+IGFpcmJuYiRuZWlnaGJvdXJob29kX2dyb3VwLCB4bGFiPVwiTG9jYXRpb25cIix5bGFiPVwiVHJhbnNhY3Rpb25zXCIsY29sPWNvbG9ycywgXG4gICAgICAgIG1haW49XCJCb3ggcGxvdCBmb3IgUHJpY2UgYW5kIG5laWdoYm91cmhvb2QgZGlzdHJpYnV0aW9uXCIsYm9yZGVyPVwiYmxhY2tcIikifQ==


You are now familiar with the Airbnb dataset and it’s time to give your understanding a test. Please click the link below to get to the quiz and come back here to cross-check your answer in the snippet 14.9.3.

14.9.2 Airbnb Data Quiz

Quiz Time

14.9.3 Check yourself

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJhaXJibmI8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvYWlyYm5iLmNzdlwiKVxuXG5zdW1tYXJ5KGFpcmJuYikifQ==

14.10 Titanic data puzzle

Download: Titanic-train.csv

The titanic data set (Kaggle) stores records of passengers of Titanic with attributes such as Survived, SibSp (family size), Fare, PClass (type of a cabin), Age etc. Here is a sample of data

Table 14.9: Snippet of Titanic Dataset
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
715 715 0 2 Greenberg, Mr. Samuel male 52 0 0 250647 13.0000 S
93 93 0 1 Chaffee, Mr. Herbert Fuller male 46 1 0 W.E.P. 5734 61.1750 E31 S
624 624 0 3 Hansen, Mr. Henry Damsgaard male 21 0 0 350029 7.8542 S
311 311 1 1 Hays, Miss. Margaret Bechstein female 24 0 0 11767 83.1583 C54 C
586 586 1 1 Taussig, Miss. Ruth female 18 0 2 110413 79.6500 E68 S


Figiure 9: Age distribution among survivors of Titanic disaster 14.10.1.7

14.10.1 Practice Snippets

14.10.1.1 Snippet 1: Get to know your data

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ0aXRhbmljPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1RpdGFuaWMtdHJhaW4uY3N2XCIpXG5cbmNvbG5hbWVzKHRpdGFuaWMpXG5ucm93KHRpdGFuaWMpXG5zdW1tYXJ5KHRpdGFuaWMpXG51bmlxdWUodGl0YW5pYyRQY2xhc3MpXG51bmlxdWUodGl0YW5pYyRTaWJTcClcbnVuaXF1ZSh0aXRhbmljJFNleClcbnVuaXF1ZSh0aXRhbmljJEVtYmFya2VkKVxudW5pcXVlKHRpdGFuaWMkU3Vydml2ZWQpXG50YWJsZSh0aXRhbmljJFBjbGFzcylcbnRhYmxlKHRpdGFuaWMkU2liU3ApXG50YWJsZSh0aXRhbmljJFNleClcbnRhYmxlKHRpdGFuaWMkRW1iYXJrZWQpXG50YWJsZSh0aXRhbmljJFN1cnZpdmVkKSJ9

14.10.1.2 Snippet 2

Q: What are the odds of survival of single males on Titanic?
A: Posterior Odds = 0.196
Prior Odds = 0.62
Likelihood Ratio = 0.31

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJhaXJibmI8LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvVGl0YW5pYy10cmFpbi5jc3ZcIilcblxuUHJpb3I8LW5yb3codGl0YW5pY1t0aXRhbmljJFN1cnZpdmVkPT0xLF0pL25yb3codGl0YW5pYylcblByaW9yXG5Qcmlvck9kZHM8LXJvdW5kKFByaW9yLygxLVByaW9yKSwyKVxuUHJpb3JPZGRzXG5UcnVlUG9zaXRpdmU8LXJvdW5kKG5yb3codGl0YW5pY1t0aXRhbmljJFN1cnZpdmVkPT0xJiB0aXRhbmljJFNleD09J21hbGUnICZ0aXRhbmljJFNpYlNwPT0wLF0pL25yb3codGl0YW5pY1t0aXRhbmljJFN1cnZpdmVkPT0xLF0pLDIpXG5UcnVlUG9zaXRpdmVcbkZhbHNlUG9zaXRpdmU8LXJvdW5kKG5yb3codGl0YW5pY1t0aXRhbmljJFN1cnZpdmVkPT0wJiB0aXRhbmljJFNleD09J21hbGUnJnRpdGFuaWMkU2liU3A9PTAsLF0pL25yb3codGl0YW5pY1t0aXRhbmljJFN1cnZpdmVkPT0wLF0pLDIpXG5GYWxzZVBvc2l0aXZlXG5MaWtlbGlob29kUmF0aW88LXJvdW5kKFRydWVQb3NpdGl2ZS9GYWxzZVBvc2l0aXZlLDQpXG5MaWtlbGlob29kUmF0aW9cblBvc3Rlcmlvck9kZHMgPC1MaWtlbGlob29kUmF0aW8gKiBQcmlvck9kZHNcblBvc3Rlcmlvck9kZHNcblBvc3RlcmlvciA8LVBvc3Rlcmlvck9kZHMvKDErUG9zdGVyaW9yT2RkcylcblBvc3RlcmlvciJ9

14.10.1.3 Snippet 3

Q: Verify hypothesis that survivors paid on average more for the ticker than those who did not survive?
A: Positive. Null hypothesis rejected with p < 0.0001

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlBlcm11dGF0aW9uIDwtIGZ1bmN0aW9uKGRmMSxjMSxjMixuLHcxLHcyKXtcbiAgZGYgPC0gYXMuZGF0YS5mcmFtZShkZjEpXG4gIERfbnVsbDwtYygpXG4gIFYxPC1kZlssYzFdXG4gIFYyPC1kZlssYzJdXG4gIHN1Yi52YWx1ZTEgPC0gZGZbZGZbLCBjMV0gPT0gdzEsIGMyXVxuICBzdWIudmFsdWUyIDwtIGRmW2RmWywgYzFdID09IHcyLCBjMl1cbiAgRCA8LSAgYWJzKG1lYW4oc3ViLnZhbHVlMiwgbmEucm09VFJVRSkgLSBtZWFuKHN1Yi52YWx1ZTEsIG5hLnJtPVRSVUUpKVxuICBtPWxlbmd0aChWMSlcbiAgbD1sZW5ndGgoVjFbVjE9PXcyXSlcbiAgZm9yKGpqIGluIDE6bil7XG4gICAgbnVsbCA8LSByZXAodzEsbGVuZ3RoKFYxKSlcbiAgICBudWxsW3NhbXBsZShtLGwpXSA8LSB3MlxuICAgIG5mIDwtIGRhdGEuZnJhbWUoS2V5PW51bGwsIFZhbHVlPVYyKVxuICAgIG5hbWVzKG5mKSA8LSBjKFwiS2V5XCIsXCJWYWx1ZVwiKVxuICAgIHcxX251bGwgPC0gbmZbbmYkS2V5ID09IHcxLDJdXG4gICAgdzJfbnVsbCA8LSBuZltuZiRLZXkgPT0gdzIsMl1cbiAgICBEX251bGwgPC0gYyhEX251bGwsbWVhbih3Ml9udWxsLCBuYS5ybT1UUlVFKSAtIG1lYW4odzFfbnVsbCwgbmEucm09VFJVRSkpXG4gIH1cbiAgbXloaXN0PC1oaXN0KERfbnVsbCwgcHJvYj1UUlVFKVxuICBtdWx0aXBsaWVyIDwtIG15aGlzdCRjb3VudHMgLyBteWhpc3QkZGVuc2l0eVxuICBteWRlbnNpdHkgPC0gZGVuc2l0eShEX251bGwsIGFkanVzdD0yKVxuICBteWRlbnNpdHkkeSA8LSBteWRlbnNpdHkkeSAqIG11bHRpcGxpZXJbMV1cbiAgcGxvdChteWhpc3QpXG4gIGxpbmVzKG15ZGVuc2l0eSwgY29sPSdibHVlJylcbiAgYWJsaW5lKHY9RCwgY29sPSdyZWQnKVxuICBNPC1tZWFuKERfbnVsbD5EKVxuICByZXR1cm4oTSlcbn0iLCJzYW1wbGUiOiJ0aXRhbmljPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1RpdGFuaWMtdHJhaW4uY3N2XCIpXG5cblBlcm11dGF0aW9uKHRpdGFuaWMsIFwiU3Vydml2ZWRcIiwgXCJGYXJlXCIsMTAwMDAsIFwiMVwiLCBcIjBcIikifQ==

14.10.1.4 Snippet 4

Q: What is the probability of survival for passengers who paid more than 100 pounds for a ticket? How about those who paid less than 10 pounds?
A: 0.73 for passengers who paid more than 100 pounds
0.20 for passengers who paid less than 10 pounds
0.38 for all passengers

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ0aXRhbmljPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1RpdGFuaWMtdHJhaW4uY3N2XCIpXG5cbm5yb3codGl0YW5pY1t0aXRhbmljJEZhcmUgPjEwMCAmIHRpdGFuaWMkU3Vydml2ZWQgPT0xLF0pL25yb3codGl0YW5pY1t0aXRhbmljJEZhcmUgPjEwMCxdKVxubnJvdyh0aXRhbmljW3RpdGFuaWMkRmFyZSA8MTAgJiB0aXRhbmljJFN1cnZpdmVkID09MSxdKS9ucm93KHRpdGFuaWNbdGl0YW5pYyRGYXJlIDwxMCxdKVxubnJvdyh0aXRhbmljW3RpdGFuaWMkU3Vydml2ZWQgPT0xLF0pL25yb3codGl0YW5pYykifQ==

14.10.1.5 Snippet 5

Q: What was the chance of survival for passengers who traveled at least in a group of 3?
A: Just 10%!

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ0aXRhbmljPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1RpdGFuaWMtdHJhaW4uY3N2XCIpXG5cbnRhYmxlKHRpdGFuaWNbdGl0YW5pYyRTaWJTcD4zLF0kU3Vydml2ZWQpIn0=

14.10.1.6 Snippet 6

Q: Did survival depend on the class of the cabin?
A: Positive. Null hypothesis of independence rejected with p-value less than \(e^{-16}\)

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ0aXRhbmljPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1RpdGFuaWMtdHJhaW4uY3N2XCIpXG5jb2xvcnM8LSBjKCdyZWQnLCdibHVlJywnY3lhbicsJ3llbGxvdycsJ2dyZWVuJykgIyBBc3NpZ25pbmcgZGlmZmVyZW50IGNvbG9ycyB0byBiYXJzXG5cbmNoaXNxLnRlc3QodGl0YW5pYyRTdXJ2aXZlZCwgdGl0YW5pYyRQY2xhc3MpIn0=

14.10.1.7 Snippet 7

Fare distribution among survived and perished.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ0aXRhbmljPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L1RpdGFuaWMtdHJhaW4uY3N2XCIpXG5jb2xvcnM8LSBjKCdyZWQnLCdibHVlJywnY3lhbicsJ3llbGxvdycsJ2dyZWVuJywnYnJvd24nLCdncmV5JykgIyBBc3NpZ25pbmcgZGlmZmVyZW50IGNvbG9ycyB0byBiYXJzXG5cbmhpc3QodGl0YW5pY1t0aXRhbmljJFN1cnZpdmVkID09MCxdJEFnZSwgMTAwLCB4bGFiPVwiQWdlXCIseWxhYj1cIlN1cnZpdmVkXCIsY29sPWNvbG9ycywgXG4gICAgICAgIG1haW49XCJBZ2UgZGlzdHJpYnV0aW9uIGFtb25nIHN1cnZpdm9ycyBvZiBUaXRhbmljIGRpc2FzdGVyXCIsYm9yZGVyPVwiYmxhY2tcIikifQ==


You are now familiar with the Titanic dataset and it’s time to give your understanding a test. Please click the link below to get to the quiz and come back here to cross-check your answer in the snippet 14.10.3.

14.10.2 Titanic Data Quiz

Quiz Time

14.10.3 Check yourself

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ0aXRhbmljPC1yZWFkLmNzdihcImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L2FpcmJuYi5jc3ZcIilcblxuc3VtbWFyeSh0aXRhbmljKSJ9

14.11 Addiotional Reference