Scrambler Help

scrambler screenshot

2022-07-28

Here we describe the current version of the scrambler plugin.

You can try all this yourself in this sample document

(Want a task? That document is set up to compare 13-year-olds. Make it compare 10-year-olds!)

Background and an Example

The point of scrambling is to create a sampling distribution of some measure. For example, suppose that in your dataset it appears that 13-year-old boys are taller than 13-year-old girls. You want to assess whether it's plausible that the difference in means that you see could happen by chance.

To do that, you will make the "null hypothesis" real: you will break any association between Gender and Height by scrambling the values for one of those attributes. Then you look to see how different the boys and girls seem to be when the difference is just chance.

But one trial is not enough. Furthermore, you have to decide what, specifically, to look at to say that the boys are taller. In this situation, that means coming up with a number that represents how much taller the boys are.

This is very important, and bears highlighting:

You must create a measure of the effect you're seeing. It's not enough to say that boys are taller than girls; you have to say how much taller.

scrambler screen shot

In our example, we used the difference of means and called it dMeanHeights — and dragged it leftwards in the table. The CODAP formula looks like this:

mean(Height, Gender="Male") - mean(Height, Gender="Female")

We see how much taller boys are in the actual data (in our case, 5.87 cm in the mean). Then we will see how much taller they are when the data have been all scrambled. Because the data are randomly assigned, sometimes the difference will be positive, sometimes negative (the "girls" will be taller).

But is it plausible that 5.87 could appear by chance?

Repeat this process a few hundred times and see. In this case, no: even though it's possible that the data could be that extreme (after all, the real data could come up when you scramble), it doesn't happen very often.

Seeing the scrambled data

If you want to see what one of these scrambled datasets looks like, click the show scrambled button. You can make a graph and see how the "male" and "female" height distributions are more similar (but not identical). The graph will update as you scramble.

Analyzing your results

Make a graph of the measure from the "measures" table. You'll see the sampling distribution. The picture shows the results from 200 scrambles.

scrambler measures screen shot

You want to know what proportion of those measures are more extreme than your "test statistic" (which in our case is 5.87, the difference in mean heights).

Here's the trick:

Now you can see what percentage are on each side of the line.

Set the line to 5.87 (you might need to rescale) to see how unusual it is! (Chances are, very few of your measures, positive or negative, are that large.)

Why do things disappear?

If you change the structure of your dataset (for example, by making a new measure), it might change what gets scrambled or collected. In that case, the scrambler deletes your scrambled and measures datasets. This seems alarming, but of course it does not take long to re-collect any measures you might have taken...and besides, now you know they refer to the current version of the data.

If you change languages, that also happens, because the names of the "top" attributes in the measures dataset change.


Designed and written by Tim Erickson, Senior Scientist, Epistemological Engineering. Thanks to Bill Finzer and the whole CODAP team at Concord. Additional thanks to Susanne Podworny and other colleagues at the University of Paderborn in Germany. Visit codap.xyz to see what else might be coming this way.