Swamped with your writing assignments? Take the weight off your shoulder!
Submit your assignment instructions
Consider the dataset “D1.2 Credit card defaults.csv” (described in C1.2). This dataset contains
information about credit card consumers, in particular, their default behavior. Correspondingly,
the key variable in the dataset is “defaultpaymentnextmonth” (call this variable “y”), a
dichotomous variable that indicates whether a customer defaulted on his/her debt. There are 23
other variables that can be used to predict this outcome. For simplicity, we will refer to the set
containing all these variables as “X”.
Using this data, perform the following tasks:
1. [3 points] Generate a random training/validation index that implements a 70/30 split
• Use a random seed of your choice.
2. [7 points] Estimate two logistic specifications that allow you to generate out-of-sample
predictions of y. Take the following points into account:
• You choose the variables X that enter each model specification. These variables X
can be continuous or categorical. Make sure continuous and categorical variables
are entered appropriately into the models.
• Specify model 1 as the simplest of the two. This model must include at least 5
• Specify model 2 as the richer/more flexible of the two. Control flexibility through
the set of X variables used. Include at least one variable interaction. [An interaction
of two variables, x1 and x2, would be x3 = x1*x2.]
3. [5 points] Do any of your models exhibit signs of overfitting? Explain.
Submit two files (one submission per individual):
1. Slide Deck (MS Powerpoint or pdf)
▪ In the slide deck, I expect you to present results in an executive way – you need to
• what is the goal (question/problem at hand)
• what you did to achieve the goal (analysis procedures)
• why you did it (rationales behind key steps)
• what you obtained (results)
▪ Use as many slides as you need.
▪ The title page must include your name.
▪ If you have worked/discussed with someone else, please also include their name(s) in
a separate line.
2. R script file containing the codes that you used for your analysis.
▪ Include comments in the script to help the TA follow your procedures.
▪ The script file should be understood as a companion: you are encouraged to include
screenshots of the command lines (with command line #) in your slide deck to
demonstrate your key steps. This way TAs can easily go back and double check that
your answer in the ppt are well supported.