Gov 50: Data // Final Project
For the Harvard course Gov 50: Data, my team and I analyzed the social condition of Fall 2020 first-year students at Harvard with special attention to the impact of COVID-19. Beyond cleaning 10,200+ data points, conducting ~400 surveys, and building visualizations, I coded a series of models using regression and probability distributions in R to estimate conditional disparities in social satisfaction. I’ve included my direct contributions below, but you can check out our full project site and repository.
TL;DR: off-campus, remote students had a worse social experience than their on-campus peers. Yard first-years tended to have better social experiences and more social relationships than Quad first-years. Having more relationships aligned with better social satisfaction.
All models are wrong, but hopefully mine are useful!
Context
Our flagship predictive model regressed the variables Group Size, On Campus, and the interaction between the two onto Satisfaction. “Group Size” indicates the number of virtual relationships translated into in-person connections among first-years, and “On Campus” indicates whether or not the first-year student lived on campus during Fall 2020. Satisfaction represents the self-reported level of social satisfaction, with levels "Very Dissatisfied," "Dissatisfied," "Neutral," "Satisfied," and "Very Satisfied" — corresponding to "-2," "-1," "0," "1," and "2," respectively in our data.
Regression Table
Our output table from our flagship model is included above. The beta value of our Intercept indicates the predicted value of satisfaction when "Group Size" is equal to zero and the student is living off-campus with no interaction between the two predictors. The beta value for group_size reveals the predicted change in satisfaction for every increase in 1 for "Group Size" while holding "On Campus" constant. The beta value for on_campus represents the predicted change in satisfaction when "Group Size" is held constant and the student were to switch from living off-campus to on-campus.
Our interaction term is group_size*on_campus, which we include because we suspect the value of one predictor will depend on the value of the other (for instance, we assume Group Size values are dramatically smaller for students who live off campus due to lack of opportunity to form an in-person connection from one that was previously virtual). To interpret this interaction term, we must do so in the context of the other predictors. If we add the beta values for the Intercept, group_size, on_campus, and the interaction term, we receive a value of about 0.11. Essentially, this means that if we increase a student's group_size by 1 and change campus status from off to on, we can expect a predicted change of satisfaction from -0.66 to 0.11. Such a change from a largely negative to a positive social experience during the hybrid COVID semester may warrant revised attention from the administration.
Visualization
The first two figures represent estimated averages through posterior distributions on conditional expectations. In Figure 1, we examine the estimated average satisfaction for on-campus students with Group Sizes 0-10 and 15 and 20. We gather that as Group Size increases, overall social satisfaction tends to also positively increase.
In Figure 2, we examine the predicted average difference in social satisfaction between on campus students who have group sizes of 5 and those with group sizes of 0. There appears a median predicted average difference in satisfaction of roughly 0.27 in favor of those with the larger group size.
Rather than modeling predicted average distributions after conditional expectations of student characteristics, Figures 3 to 5 model outputs directly through posterior probability distributions. Figure 3, using our flagship model, displays the predicted probability distribution for social satisfaction between students living on and off campus. There appears to be a clear difference in predicted satisfactions between those living on and off campus, with those living off-campus expected to have a decidedly negative social experience compared to their on-campus counterparts.
Through a new model specifying between two on-campus residential locations rather than whether a student is on or off campus, we regressed the location variable from our dataset onto satisfaction. We wanted to analyze differences between those living in the Yard and those living in the Quad, given a significant portion of first year students lived in the Quad due to de-densified housing (characteristically upperclassmen housing situated quite far from central campus). Figure 4 reveals a higher predicted distribution for satisfaction for those living in the Yard than those in the Quad, although there is immense overlap and the difference is not nearly as dramatic as on vs. off-campus social satisfaction in Figure 3.
Again using a new model similar to Figure 4, except regressing on-campus residential location on Group Size, we see a parallel disparity in the predicted distributions for Group Size between Quad and Yard first-years as the distributions for satisfaction. Yard students tend to have a higher predicted Group Size, although there is immense overlap. This makes sense, given that Yard students live in centrally located housing among a greater concentration of fellow first-years.