Introduction
The following five snippets are designed to practice your prediction model building skills. Each snippet uses one of the data puzzles from section 14 - the party 14.3, sleep 14.5, voting 14.4, canvas 14.7 and finally the real data set - the titanic 14.10.
We begin with an example created for a moody data set - snippet 21.2. We upload the training and testing data sets for the moody data puzzle. The task there is to predict the grade attribute. We begin with uploading the training and the testing data set (this one misses GRADE attribute). Students use the training data set to build the prediction model. In 21.2, a simple rpart() model was built. Then this rpart() model is applied to the testing data set to create the vector of predicted GRADE values. This constitutes the submission vector. Finally, the VERIFY function evaluates the error of the submission vector by comparing the predicted values with the real grades from the testing data set. It simulates what Kaggle does when students submit their submission file. The full testing data set is not available to students and is embedded inside the VERIFY() function. Students have access only to the training data set and to the testing data set WITHOUT the grade attribute. Just like in our prediction challenges on Kaggle.
Snippets 21.2-21.6 provide the training and testing data for party, sleep, voting, canvass and the titanic data sets. Students can use these snippets to plug in their submission vectors, just as they submit them to Kaggle.
Example:
Note: Student code in red.
# First lets import the rpart library
library(rpart)
# Import datasets (training and testing)
moody<-read.csv('https://raw.githubusercontent.com/dev7796/data101_tutorial/main/files/dataset/moody2022_new.csv')
moodyTest<-read.csv('https://raw.githubusercontent.com/dev7796/data101_tutorial/main/files/dataset/moody2022_test_students.csv') [testing data set without target variable]
tree <- rpart(Grade ~ Major+Score+Seniority, data = train, method = โclassโ,control=rpart.control(minbucket = 200))
submission<- predict(tree, test, type=โclassโ)
VERIFY(submission, moody_Test)
# will return the error of student Submission (accuracy or MSE) vs the hidden testing data set which will be part of VERIFY but hidden from the student.
Predict if the party will be fun? Boring? Just Ok?
Description of the dataset: 14.3
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6InZlcmlmeSA8LSBmdW5jdGlvbihkYXRhMSkge1xuICBkYXRhMiA8LSByZWFkLmNzdignaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvcGFydHlfdGVzdC5jc3YnKVxuICBkYXRhZnJhbWUxPC0gZGF0YS5mcmFtZShQYXJ0eSA9IGRhdGExKVxuICBkYXRhZnJhbWUyIDwtIGRhdGEyWydQYXJ0eSddXG4gIGFjY3VyYWN5PC1tZWFuKGRhdGFmcmFtZTEkUGFydHkgPT0gZGF0YWZyYW1lMiRQYXJ0eSlcbiAgcmV0dXJuKGFjY3VyYWN5KVxufSIsInNhbXBsZSI6IiNzIGltcG9ydCB0aGUgcnBhcnQgbGlicmFyeVxubGlicmFyeShycGFydClcblxuIyBJbXBvcnQgZGF0YXNldHMgKHRyYWluaW5nIGFuZCB0ZXN0aW5nKVxudHJhaW48LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvcGFydHlfdHJhaW4uY3N2XCIpXG50ZXN0PC1yZWFkLmNzdignaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvcGFydHlfdHJhaW5fdGVzdC5jc3YnKVxuXG50cmVlIDwtIHJwYXJ0KFBhcnR5IH4gLiwgZGF0YSA9IHRyYWluLCBtZXRob2QgPSBcImNsYXNzXCIpXG50cmVlXG5zdWJtaXNzaW9uPC0gcHJlZGljdCh0cmVlLCB0ZXN0LCB0eXBlPVwiY2xhc3NcIilcblxudmVyaWZ5KHN1Ym1pc3Npb24pXG4jIHdpbGwgcmV0dXJuIHRoZSBlcnJvciBvZiBzdHVkZW50IFN1Ym1pc3Npb24gKGFjY3VyYWN5IG9yIE1TRSkgdnMgdGhlIGhpZGRlbiB0ZXN0aW5nIGRhdGEgc2V0IHdoaWNoIHdpbGwgYmUgcGFydCBvZiBWRVJJRlkgYnV0IGhpZGRlbiBmcm9tIHRoZSBzdHVkZW50LiAifQ==
Predict if sleep will be deep, shallow, little or not at all?
Description of the dataset: 14.5
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6InZlcmlmeSA8LSBmdW5jdGlvbihkYXRhMSkge1xuICBkYXRhMiA8LSByZWFkLmNzdignaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvc2xlZXBfdGVzdC5jc3YnKVxuICBkYXRhZnJhbWUxPC0gZGF0YS5mcmFtZShTbGVlcCA9IGRhdGExKVxuICBkYXRhZnJhbWUyIDwtIGRhdGEyWydTbGVlcCddXG4gIGFjY3VyYWN5PC1tZWFuKGRhdGFmcmFtZTEkU2xlZXAgPT0gZGF0YWZyYW1lMiRTbGVlcClcbiAgcmV0dXJuKGFjY3VyYWN5KVxufSIsInNhbXBsZSI6IiNzIGltcG9ydCB0aGUgcnBhcnQgbGlicmFyeVxubGlicmFyeShycGFydClcblxuIyBJbXBvcnQgZGF0YXNldHMgKHRyYWluaW5nIGFuZCB0ZXN0aW5nKVxudHJhaW48LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvc2xlZXBfdHJhaW4uY3N2XCIpXG50ZXN0PC1yZWFkLmNzdignaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvc2xlZXBfdHJhaW5fdGVzdC5jc3YnKVxuXG50cmVlIDwtIHJwYXJ0KFNsZWVwIH4gLiwgZGF0YSA9IHRyYWluLCBtZXRob2QgPSBcImNsYXNzXCIpXG50cmVlXG5zdWJtaXNzaW9uPC0gcHJlZGljdCh0cmVlLCB0ZXN0LCB0eXBlPVwiY2xhc3NcIilcblxudmVyaWZ5KHN1Ym1pc3Npb24pXG4jIHdpbGwgcmV0dXJuIHRoZSBlcnJvciBvZiBzdHVkZW50IFN1Ym1pc3Npb24gKGFjY3VyYWN5IG9yIE1TRSkgdnMgdGhlIGhpZGRlbiB0ZXN0aW5nIGRhdGEgc2V0IHdoaWNoIHdpbGwgYmUgcGFydCBvZiBWRVJJRlkgYnV0IGhpZGRlbiBmcm9tIHRoZSBzdHVkZW50LiAifQ==
Predict if a local voter will vote for Anarchists, KnowNothings, or Royalists?
Description of the dataset: 14.4
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6InZlcmlmeSA8LSBmdW5jdGlvbihkYXRhMSkge1xuICBkYXRhMiA8LSByZWFkLmNzdignaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvcGFydHlfdGVzdC5jc3YnKVxuICBkYXRhZnJhbWUxPC0gZGF0YS5mcmFtZShQYXJ0eSA9IGRhdGExKVxuICBkYXRhZnJhbWUyIDwtIGRhdGEyWydQYXJ0eSddXG4gIGFjY3VyYWN5PC1tZWFuKGRhdGFmcmFtZTEkUGFydHkgPT0gZGF0YWZyYW1lMiRQYXJ0eSlcbiAgcmV0dXJuKGFjY3VyYWN5KVxufSIsInNhbXBsZSI6IiNzIGltcG9ydCB0aGUgcnBhcnQgbGlicmFyeVxubGlicmFyeShycGFydClcblxuIyBJbXBvcnQgZGF0YXNldHMgKHRyYWluaW5nIGFuZCB0ZXN0aW5nKVxudHJhaW48LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvcGFydHlfdHJhaW4uY3N2XCIpXG50ZXN0PC1yZWFkLmNzdignaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvcGFydHlfdHJhaW5fdGVzdC5jc3YnKVxuXG50cmVlIDwtIHJwYXJ0KFBhcnR5IH4gLiwgZGF0YSA9IHRyYWluLCBtZXRob2QgPSBcImNsYXNzXCIpXG50cmVlXG5zdWJtaXNzaW9uPC0gcHJlZGljdCh0cmVlLCB0ZXN0LCB0eXBlPVwiY2xhc3NcIilcblxudmVyaWZ5KHN1Ym1pc3Npb24pXG4jIHdpbGwgcmV0dXJuIHRoZSBlcnJvciBvZiBzdHVkZW50IFN1Ym1pc3Npb24gKGFjY3VyYWN5IG9yIE1TRSkgdnMgdGhlIGhpZGRlbiB0ZXN0aW5nIGRhdGEgc2V0IHdoaWNoIHdpbGwgYmUgcGFydCBvZiBWRVJJRlkgYnV0IGhpZGRlbiBmcm9tIHRoZSBzdHVkZW50LiAifQ==
Predict studentโs grade
Description of the dataset: 14.7
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6InZlcmlmeSA8LSBmdW5jdGlvbihkYXRhMSkge1xuICBkYXRhMiA8LSByZWFkLmNzdignaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvR3JhZGVfdGVzdC5jc3YnKVxuICBkYXRhZnJhbWUxPC0gZGF0YS5mcmFtZShHcmFkZSA9IGRhdGExKVxuICBkYXRhZnJhbWUyIDwtIGRhdGEyWydHcmFkZSddXG4gIGFjY3VyYWN5PC1tZWFuKGRhdGFmcmFtZTEkR3JhZGUgPT0gZGF0YWZyYW1lMiRHcmFkZSlcbiAgcmV0dXJuKGFjY3VyYWN5KVxufSIsInNhbXBsZSI6IiNzIGltcG9ydCB0aGUgcnBhcnQgbGlicmFyeVxubGlicmFyeShycGFydClcblxuIyBJbXBvcnQgZGF0YXNldHMgKHRyYWluaW5nIGFuZCB0ZXN0aW5nKVxudHJhaW48LXJlYWQuY3N2KFwiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvR3JhZGVfdHJhaW4uY3N2XCIpXG50ZXN0PC1yZWFkLmNzdignaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvR3JhZGVfdHJhaW5fdGVzdC5jc3YnKVxuXG50cmVlIDwtIHJwYXJ0KEdyYWRlIH4gLiwgZGF0YSA9IHRyYWluLCBtZXRob2QgPSBcImNsYXNzXCIpXG50cmVlXG5zdWJtaXNzaW9uPC0gcHJlZGljdCh0cmVlLCB0ZXN0LCB0eXBlPVwiY2xhc3NcIilcblxudmVyaWZ5KHN1Ym1pc3Npb24pXG4jIHdpbGwgcmV0dXJuIHRoZSBlcnJvciBvZiBzdHVkZW50IFN1Ym1pc3Npb24gKGFjY3VyYWN5IG9yIE1TRSkgdnMgdGhlIGhpZGRlbiB0ZXN0aW5nIGRhdGEgc2V0IHdoaWNoIHdpbGwgYmUgcGFydCBvZiBWRVJJRlkgYnV0IGhpZGRlbiBmcm9tIHRoZSBzdHVkZW50LiAifQ==
Predict Titanic passengerโs survival
Description of the dataset: 14.10
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6InZlcmlmeSA8LSBmdW5jdGlvbihkYXRhMSkge1xuICBkYXRhMiA8LSByZWFkLmNzdignaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2Rldjc3OTYvZGF0YTEwMV90dXRvcmlhbC9tYWluL2ZpbGVzL2RhdGFzZXQvdGl0YW5pY190ZXN0LmNzdicpXG4gIGRhdGFmcmFtZTE8LSBkYXRhLmZyYW1lKFN1cnZpdmVkID0gZGF0YTEpXG4gIGRhdGFmcmFtZTIgPC0gZGF0YTJbJ1N1cnZpdmVkJ11cbiAgYWNjdXJhY3k8LW1lYW4oZGF0YWZyYW1lMSRTdXJ2aXZlZCA9PSBkYXRhZnJhbWUyJFN1cnZpdmVkKVxuICByZXR1cm4oYWNjdXJhY3kpXG59Iiwic2FtcGxlIjoiI3MgaW1wb3J0IHRoZSBycGFydCBsaWJyYXJ5XG5saWJyYXJ5KHJwYXJ0KVxuXG4jIEltcG9ydCBkYXRhc2V0cyAodHJhaW5pbmcgYW5kIHRlc3RpbmcpXG50cmFpbjwtcmVhZC5jc3YoXCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vZGV2Nzc5Ni9kYXRhMTAxX3R1dG9yaWFsL21haW4vZmlsZXMvZGF0YXNldC90aXRhbmljX3RyYWluLmNzdlwiKVxudGVzdDwtcmVhZC5jc3YoJ2h0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9kZXY3Nzk2L2RhdGExMDFfdHV0b3JpYWwvbWFpbi9maWxlcy9kYXRhc2V0L3RpdGFuaWNfdHJhaW5fdGVzdC5jc3YnKVxuXG50cmVlIDwtIHJwYXJ0KFN1cnZpdmVkIH4gLiwgZGF0YSA9IHRyYWluLCBtZXRob2QgPSBcImNsYXNzXCIpXG50cmVlXG5zdWJtaXNzaW9uPC0gcHJlZGljdCh0cmVlLCB0ZXN0LCB0eXBlPVwiY2xhc3NcIilcblxudmVyaWZ5KHN1Ym1pc3Npb24pXG4jIHdpbGwgcmV0dXJuIHRoZSBlcnJvciBvZiBzdHVkZW50IFN1Ym1pc3Npb24gKGFjY3VyYWN5IG9yIE1TRSkgdnMgdGhlIGhpZGRlbiB0ZXN0aW5nIGRhdGEgc2V0IHdoaWNoIHdpbGwgYmUgcGFydCBvZiBWRVJJRlkgYnV0IGhpZGRlbiBmcm9tIHRoZSBzdHVkZW50LiAifQ==