create random subset of data

Tags:
Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #10425 Score: 0
    fshore@towson.edu
    Participant
    3 pts

    When you try to import a data file with >5000 rows, CODAP automatically gives a tool (or dialog box) that says a subset of the data would be best to avoid sluggishness, and offers options so CODAP will give you a randomly selected subset of the number of cases you specify (This is awesome!) . I have a dataset of ~2500 and I want to easily create random subsets of size n=500 from that ~2500, each unique dataset to be given to a different student group. How do I bring up that dialog box to accomplish creating a subset (one at a time, then saving file) or is there another way? Thanks!

    #10518
    Bill Finzer
    Keymaster

    Hello, I can think of two ways:

    First Way

    1. Create an attribute, let’s call it r.
    2. Give it the formula random().
    3. Clicking on the attribute name, choose Delete Formula (Keeping Values).
    4. Again clicking on the attribute name, choose Sort Ascending. (Don’t worry that the values near the top appear to be zero. They are just rounding to zero with two digits of precision.)
    5. Click on the first row in the case table to select the first case.
    6. Scroll down to the 500th row and shift-click that row. Now you have the 500 cases you want.
    7. In the case table’s inspector panel, click on the trash can and select Delete Unselected Cases. This leaves behind your desired random sample.

    Note that in step 7 instead of deleting, you could set aside the unselected cases, leaving them there so you could get a new random sample by restoring the set aside cases, re-randomizing, re-sorting, and re-setting aside.

    Second Way

    1. From the Plugins menu in the tool shelf, choose Sampler.
    2. From the buttons at the bottom, choose Collector. This fills the mixer with balls, one for each case in your dataset. They’re so densely packed you can’t see them as balls except for the top row.
    3. Edit the items and samples numbers to be 500 and 1 instead of 5 and 3.
    4. Choose the Options tab at the top.
    5. Click the without replacement option so you don’t get any duplicates.
    6. Go back to the Model tab and move the speed slider all the way to the right.
    7. Click the Start button to produce your sample.
    8. Delete all the attributes you don’t need: experiment, description, sample size, sample and output. The result is your desired sample of 500.
    9. Close the Sampler.
    10. From the Tables icon menu in the tool shelf, click the trash can next to your original dataset.

    Both ways have a lot of steps. Let us know if you have questions or problems.

    Bill

    #10517
    Dan Damelin
    Keymaster

    Because you want to give a different sample to each group, I would recommend option 2 that Bill mentioned above, because you can generate as many samples as you want by clicking the start button multiple times.

Viewing 3 posts - 1 through 3 (of 3 total)
  • You must be logged in to reply to this topic.