Good morning! I am a math educator from California State University, Fullerton who teaches elementary, middle school, and high school future/current teachers of math and stats. My students and I are using your tool in all my classes. Many of them are new to CODAP and are enjoying how easy it is to start playing around with data. Some current teachers are already thinking about using CODAP with their own students. Thank you so much for creating this!
We had a question regarding the Plugin to the Microportal that accesses the decennial census and American Community Survey data, which is very cool. How does CODAP do the random sampling?
Any information on how CODAP pulls from these data sets (and the size of the actual data sets) would be most appreciated! Thank you!
Hi,
Thanks for this question! It has an interesting, if somewhat complicated, answer.
When the Census Bureau does a survey, whether it’s the decennial census or the more frequent American Community Survey (ACS), a “weight” gets assigned to each person. This number is the number of people in the actual population that this person represents. For a decennial census, this weight should be 1 since each person is just representing themselves, but it turns out that even for the decennial census adjustments need to be made.
For the ACS, with its much smaller sample sizes and a frequent goal of being able to able specific questions about subpopulations, weights are quite varied within the sample. One person might stand in for 1,000 while another, as a member of a subpopulation for which detailed information is sought, might stand for only 100.
For students, actually for most people, weights add a layer of confusion. If you’re using more statistically sophisticated software than CODAP, you avoid the confusion by specifying the variable that represents weight, and the software does the rest. We didn’t want this black box capability for students working with census microdata obtained through our portal, so we worked out another solution which we validated with a statistician at IPUMS in Minneapolis.
Basically, we downloaded our samples from IPUMS, with each person having a weight and then modified the portal’s sampling process so that it is as if each person appears the number of times specified by their weight. So if a person’s weight is 1,000, it appears as if that 1,000 of that person are present. With this charade in place we can simulate simple random sampling for which each person in the population has equal probability of appearing in the sample.
We think that this is a good solution for our target audience of grades 5–14 because they already have plenty on their cognitive plates.
There is a downside of course, which is that in a sample a given person may appear more than once. This is noticeable if the person is an outlier in some respect; e.g. has a very high income. It is more likely when sampling from states with lower populations.
Does this help? What has come of your conversations with your students about census and ACS sampling? Have you had to introduce the concept of stratified sampling that leads to an assigned weight for each person?
Bill

