Reply To: create random subset of data

#10518
Bill Finzer
Keymaster

Hello,I can think of two ways:First Way

  1. Create an attribute, let’s call it¬†r.
  2. Give it the formula random().
  3. Clicking on the attribute name, choose Delete Formula (Keeping Values).
  4. Again clicking on the attribute name, choose¬†Sort Ascending. (Don’t worry that the values near the top appear to be zero. They are just rounding to zero with two digits of precision.)
  5. Click on the first row in the case table to select the first case.
  6. Scroll down to the 500th row and shift-click that row. Now you have the 500 cases you want.
  7. In the case table’s inspector panel, click on the trash can and select¬†Delete Unselected Cases. This leaves behind your desired random sample.

Note that in step 7 instead of deleting, you could set aside the unselected cases, leaving them there so you could get a new random sample by restoring the set aside cases, re-randomizing, re-sorting, and re-setting aside.Second Way

  1. From the Plugins menu in the tool shelf, choose Sampler.
  2. From the buttons at the bottom, choose¬†Collector. This fills the mixer with balls, one for each case in your dataset. They’re so densely packed you can’t see them as balls except for the top row.
  3. Edit the items and samples numbers to be 500 and 1 instead of 5 and 3.
  4. Choose the Options tab at the top.
  5. Click the¬†without replacement option so you don’t get any duplicates.
  6. Go back to the Model tab and move the speed slider all the way to the right.
  7. Click the Start button to produce your sample.
  8. Delete all the attributes you don’t need: experiment, description, sample size, sample and output. The result is your desired sample of 500.
  9. Close the Sampler.
  10. From the Tables icon menu in the tool shelf, click the trash can next to your original dataset.

Both ways have a lot of steps. Let us know if you have questions or problems.Bill