Absolute value of missing data is zero

This topic has 5 replies, 3 voices, and was last updated 3 years, 11 months ago by Jacob Sagrans.

Viewing 6 posts - 1 through 6 (of 6 total)

Author

Posts
February 13, 2020 at 2:50 pm #1318 Score: 0

Kathy Shafer
Participant

Hi Everyone,

A few weeks ago, I was teaching MAD with the Mammal data set. After working through the steps to find the MAD, in the case table and plots, a student noticed an error. The absolute deviations for the three mammals with missing data were calculated as zero hours. The actual MAD is 4.73, but we got 4.2 hours.

This was a great teaching moment as I talked about examining data sets with the pre-service teachers (it made sense to delete the three cases).

When I plotted the MAD on a new dot plot, the statistic was accurate. By adding count to the plot, it became a little more obvious.

My question is, why did the absolute deviation column report zero when the deviation column is “empty” for those mammals? This would be tricky to notice in a large data set.

Here is a link to the activity.

Cheers,
Kathy

February 13, 2020 at 4:00 pm #1319

Bill Finzer
Keymaster

Hi Kathy,

Thanks for this! You and your students found a bug in the abs function. The difference of the missing value and the mean is correctly evaluated as a missing value, but abs incorrectly reports zero instead of missing. Please convey our thanks to your class!

The bug fix will probably appear in less than two weeks. I’m sorry your teaching moment will be taken away. 😉

Bill

December 2, 2020 at 6:22 pm #6067
Jacob Sagrans
Participant
I just noticed something odd related to this that is making me wonder how CODAP treats missing values. I found a situation where the missing value seems to be treated as zero for making a particular calculation, but at other times the missing value is ignored for making the calculation. Look at the following three images and note especially the change in the intercept for the least squares line on the graph. At first, when a new case is added, it seems the % population and % cases for white people are being treated as zero, as the intercept changes from -23 to -16. Then, I added in some data (but not the data for % population and % cases for white people) and the intercept goes back to -23. So in the second image, it seems the missing data is being treated as zero, whereas in the third image, the missing data seems like it is being ignored.
Attachments:
December 2, 2020 at 6:36 pm #6074

Jacob Sagrans
Participant

Interesting, if I add a case to the end of a table and leave all the attributes blank for that case, the calculation changes as if a zero value has been added. But then if I add another case and fill in at least some of the attributes, the missing data in the previous case is suddenly just ignored, not treated as zero.

To be sure, is there a way to type in something instead of zero or another number in a cell for an attribute to tell CODAP “do not use this case when making calculations involving this attribute”?

December 2, 2020 at 6:51 pm #6075

Bill Finzer
Keymaster

Hi Jacob,

Amazing catch! It certainly appears to be a bug. I’ll attempt to replicate and log.

As to how missing values are (supposed to be) treated: They are never treated as zero and they don’t appear in plots except in legend coloration where their color is gray. Calculations in plots or attributes ignore missing values. One exception I can think of is the formula: count(!test). Suppose test is an attribute with one missing value. Then the result of this formula will be 1 because the exclamation point signifies negation and returns true for the missing value.

December 3, 2020 at 1:57 pm #6083

Jacob Sagrans
Participant

Thank you Bill! Okay, I am less worried now. I don’t think I would have noticed this except now for the first time in CODAP I am working with a data set with a fair amount of missing data.
Author

Posts