With GAISE documents pushing (quite rightly) for more multivariate thinking, I’m wondering how I could make a CODAP file that lets students play around with multiple regression, even though there’s no formal functionality for it in CODAP. Hopefully nobody minds if I brainstorm “out loud” here and see if anyone chimes in.

Here’s my current idea: have a slider for each coefficient, and a column of predicted values constructed using a formula like

InterceptSlider + coeff1slider * xvar1 + coeff2slider * xvar2 + …

(not sure if I would have that pre-constructed, or ask students to construct it themselves), then construct a column of residuals. Have (or have them construct) a graph with x=prediction, y=residual. Have them use the Ruler option to show the mean & SD[footnote 1] of the residuals, and suggest that they play with the sliders to get the mean to be basically 0 and the SD to be as small as they could get it. Maybe have the slider values pre-set to something reasonable but not optimal.

Then, have them open a text box that had been minimized that says what the optimal coefficient values are (perhaps quoting output from using lm() in R), and then they copy those to the slider values and see that the mean of the residuals ends up 0 and the SD of the residuals ends up smaller than they got it to be by playing around.

Then they can spend time interpreting the coefficients, for example. And perhaps construct predictions for new datapoints that weren’t in the dataset. Maybe also compute R^2. Computing the SE or CI for each coefficient would be too hard, but if the output from lm() in R is included in a textbox, they could at least discuss or think about the SE or CI.

Has anyone done this? Does anyone see issues that I’m not seeing, or have ideas for improving it?

Something similar could be done with a Logistic-type regression or multiple regression, too!

[footnote 1: I know that technically we should divide by n-#params rather than n-1, so SD isn’t quite the right thing, but I’m willing to go for the streamlined approach in intro stat, leaving the better version for a later stats class.]