3

Using formatted CLASS variables

 1 year ago
source link: https://blogs.sas.com/content/sgf/2022/05/31/using-formatted-class-variables/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Using formatted CLASS variables

2

If you use formatted variables in a CLASS statement in procedures such as MEANS or UNIVARIATE, SAS will use the formatted values of those variables and not the internal values. For example, consider you have a data set (Health) with variables Subj, Age, Weight, and Height. You want to see the mean Weight and Height for three age groups: 0 to 40, 41-70, and greater than 70. A program to create the Health data set and compute the means is shown below.

Data Health;
   input Subj $ Age Weight Height;
datalines;
001 23 120 72
002 35 200 80
003 77 98 63
004 51 230 75
005 15 98 54
006 44 160 68
007 83 111 65
008 10 . 45
009 60 200 .
;
proc format;
   value AgeGrp 0-40 = '0 to 40'
                41-70 = '41 to 70'
                71-high = '>70';
run;
 
proc means data=Health n nmiss mean;
   class Age;
   format Age AgeGrp.;
   var Weight Height;
run;

Here is the output:

Class_1-300x153.png

Notice that there was no need to run another DATA step and create a variable representing age ranges. Not only is this an easier way to get these results—it is also much more efficient.

The other day, I received an email from a person and he posed the following question. How can I see the mean of the variable Sales (in the SASHELP data set Shoes) for three ranges of Sales: 0 to 100,000; 100,000 to 200,000; 200,000 and above?

My first thought was to create a variable (called Range) and then use that variable in a CLASS statement with PROC MEANS. However, after some thought, I wondered if I could use the Sales variable in the CLASS statement (and supply a format) and use the same variable in the VAR statement. As is often the case, the best way to answer a SAS programming question is to write a short program and see what happens. Here is what I suggested:

proc format;
  value SalesRange 0 - <100000 = 'Lower'
              100000 - <200000 = 'Middle'
              200000 - high = 'Upper';
run;
 
proc means data=sashelp.Shoes n nmiss mean std;
   class Sales;  
   format Sales SalesRange.;
   var Sales;
run;

Here is the output:

Class_2-300x122.png

As you can see from the output, it worked just fine.

Before we leave this topic, let's use the first example, referencing the Health data set, to demonstrate a Multi-Label Format. Suppose you want to see means of Weight and Height for two different ranges of Age. One range is the original 0–40, 41–70, and 70+; the other is 0–50 and 51+. Here is a way to accomplish this using a single format:

proc format;
   value AgeGrp (multilabel) 0-40 = '0 to 40'
                             41-70 = '41 to 70'
                             71-high = '>70'
 
                             0-50 = '0 to 50'
                             51-high = '>50';
run;
 
proc means data=Health n nmiss mean;
   class Age / MLF;
   format Age AgeGrp.;
   var Weight Height;
run;

You need to add the option "multilabel" following the format name on the VALUE statement.  Without this option, you will get an error message if you have overlapping ranges. Next, use the option MLF (multi-label format) on the CLASS statement. Here is the output:

Class_3-300x232.png

You now have the mean Weight and Height for two separate ranges of Age.

Consider using a formatted variables in CLASS statements for those procedures that allow this statement. It will save you time and reduce CPU time as well.

If you liked this blog, then you might like my latest book: Cody’s Collection of SAS Blogs. It gathers all of my best tips and tricks from my blogs in one place for easy reference.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK