36383 - Randomly assign the observations in a data set to two or more groups - JOYK Joy of Geek, Geek News, Link all geek

SAS® 9.4 TS1M1 or later

Beginning with SAS/STAT® 13.1 in SAS 9.4 TS1M1, the GROUPS= option in the PROC SURVEYSELECT statement randomly assigns observations to groups. If you specify a number of groups, then the numbers of observations assigned to the groups are equal or as equal as possible. You also have the ability to specify different group sizes for the random assignments in the GROUPS= option.

For example, suppose you want to divide the ten observations in the following data set into three groups.

      data one;
        do x=1 to 10;
          output;
        end;
        run;

Specifying the GROUPS=3 option in PROC SURVEYSELECT divides the ten observations into three groups as evenly as possible. The results of this example can be reproduced by specifying the same value in the SEED= option.

      proc surveyselect data=one groups=3 seed=49201 out=RandomGroups noprint;
        run;
      proc freq data=RandomGroups;
        tables GroupID;
        run;

Group ID Number
GroupID	Frequency	Percent	Cumulative Frequency	Cumulative Percent
1	3	30.00	3	30.00
2	3	30.00	6	60.00
3	4	40.00	10	100.00

Releases before SAS® 9.4 TS1M1

Prior to SAS/STAT 13.1, you can use PROC SURVEYSELECT to randomly divide a data set into two groups as described in this note. For more than two groups, you can use PROC PLAN to randomly assign each observation to a group such that the groups are of equal size, or as equal as possible when the data set is not evenly divisible by the number of groups.

For example, suppose you want to divide the ten observations in data set ONE (above) into three groups. These statements create data set A consisting of four sets of three observations. Each set contains a random arrangement of the values 1, 2, and 3. Since three groups are desired, specify GROUP=3. To accommodate ten observations, you need four sets, so specify SET=4. The results of this example can be reproduced by specifying the same value in the SEED= option.

      proc plan seed=4233;
        factors set=4 group=3 / noprint;
        output out=a;
        run;

In the following DATA step, the RANUNI function is used to add a random number between 0 and 1 to each observation. Again, by using the same seed the results of this example can be reproduced. The IF statement removes the two extra observations created by PROC PLAN.

      data a; 
        set a; 
        random=ranuni(2342);
        if _n_>10 then stop;
        run;

Sorting by the random variable randomizes the group numbers across the entire data set.

      proc sort data=a; 
        by random;
        run;

The final data set consisting of the ten observations with assigned group numbers is created by merging the randomized data set of group numbers with the original data set.

      data RandomGroups;
        merge one a;
        run;
      proc print;
        id x;
        var group;
        run;

x	group
1	1
2	2
3	2
4	3
5	2
6	3
7	1
8	3
9	2
10	1

PROC FREQ can be used to verify the sizes of the groups.

      proc freq data=RandomGroups;
        tables group;
        run;

group	Frequency	Percent	Cumulative Frequency	Cumulative Percent
1	3	30.00	3	30.00
2	4	40.00	7	70.00
3	3	30.00	10	100.00

Note that groups 1 and 3 each have three observations and group 2 was randomly given a fourth observation. The group assignment for each observation is completely random.

Unknown Number of Observations

Suppose you want each consecutive set of G observations to randomly assign one observation to each group, where G is the number of groups. This is often desired when the total number of observations is not initially known. In this example, if you did not know how many observations you would end up with, you might want to randomly assign the first three observations to each of the groups and continue to do the same for each set of three observations as they become available. Do this by specifying a sufficiently large value for SET in the PLAN step above and omit the DATA and SORT steps that follow.

Suppose you have twelve subjects and want to assign them to three groups. You expect an unknown number of additional subjects to become available that will also need to be randomly assigned. When all observations are collected, you want the groups sizes to be as equal as possible. The following statements produce random assignments, in sets of three, for up to 10 × 3 = 30 observations. As subjects become available after the 12th, they can be assigned to groups according to the plan. Each additional set of three observations is randomly assigned one to a group.

      data one;
        do id=1 to 12;
          output;
        end;
        run;
      proc plan seed=58349;
        factors set=10 group=3 / noprint;
        output out=a;
        run;
      data RandomGroups;
        merge one a;
        run;
      proc print;
        id id;
        var group;
        run;

If more than 30 observations become available, simply run PROC PLAN again to generate more randomized sets of three.

      proc plan seed=39352;
        factors set=10 group=3 / noprint;
        output out=a;
        run;
      proc print noobs;
        var group;
        run;

Operating System and Release Information

Product Family	Product	System	SAS Release
Reported	Fixed*
SAS System	SAS/STAT	z/OS
OpenVMS VAX
Microsoft® Windows® for 64-Bit Itanium-based Systems
Microsoft Windows Server 2003 Datacenter 64-bit Edition
Microsoft Windows Server 2003 Enterprise 64-bit Edition
Microsoft Windows XP 64-bit Edition
Microsoft® Windows® for x64
OS/2
Microsoft Windows 95/98
Microsoft Windows 2000 Advanced Server
Microsoft Windows 2000 Datacenter Server
Microsoft Windows 2000 Server
Microsoft Windows 2000 Professional
Microsoft Windows NT Workstation
Microsoft Windows Server 2003 Datacenter Edition
Microsoft Windows Server 2003 Enterprise Edition
Microsoft Windows Server 2003 Standard Edition
Microsoft Windows Server 2008
Microsoft Windows XP Professional
Windows Millennium Edition (Me)
Windows Vista
64-bit Enabled AIX
64-bit Enabled HP-UX
64-bit Enabled Solaris
ABI+ for Intel Architecture
AIX
HP-UX
HP-UX IPF
IRIX
Linux
Linux for x64
Linux on Itanium
OpenVMS Alpha
OpenVMS on HP Integrity
Solaris
Solaris for x64
Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

36383 - Randomly assign the observations in a data set to two or more groups

SAS® 9.4 TS1M1 or later

Releases before SAS® 9.4 TS1M1

Unknown Number of Observations

Operating System and Release Information

Recommend

7 Ways To Tweak Your Content For Better SEO

北大毛有东团队Nature重大突破：AI助力时间分辨冷冻电镜，首次实现超大生命分子机器全...

Python News: What's New From May 2022

VR冥想应用《Hoame》在Quest 2推出

Today in Apple history: Yosemite brings a visual overhaul to OS X

Elon Musk tells Tesla workers to be in office full time or resign, report says

这支广告演绎了夏日打开的正确方式！

Rogue Waves Are Mysterious And Big

Git Cherry Pick 的後遺症

ROG Phone 6确定将于7月5日发布：将搭载新一代骁龙8+移动平台

About Joyk