3

Use functions in a WHERE statement to filter observations

 1 year ago
source link: https://blogs.sas.com/content/iml/2022/07/06/use-functions-where-statement.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Use functions in a WHERE statement to filter observations

0

Many people know that you can use "WHERE processing" in SAS to filter observations. A typical use is to process only observations that match some criterion. For example, the following WHERE statement processes only observations for male patients who have high blood pressure:

WHERE Sex='Male' & Systolic > 140;

In this statement, a character variable (SEX) is tested for equality with a specified value, and a numerical variable (SYSTOLIC) is tested for inequality. Most people know that you can use operators (such as '=' and '>') to filter observations, but did you know that SAS also supports using functions as part of a WHERE statement? This article gives a few quick examples and discusses a neat trick that you can use when you want to filter observations, but you cannot use a WHERE statement directly.

Where can you use WHERE?

In SAS, there are four ways to perform WHERE processing:

  • The WHERE= data set option: This option is places after the name of the data set when you use the SET statement the DATA step or the DATA= option in a procedure. The WHERE= option reads only the observations that satisfy the criteria.
  • The WHERE statement: This global statement filters observations that have been read into the DATA step or procedure. The WHERE statement filters those observations. Only the observations that satisfy the criteria are processed by the procedure.
  • In a WHERE clause in the SQL procedure or in PROC IML. A WHERE clause might not support the full range of operators and functions that are supported in the WHERE statement.
  • Use a DATA step view to create an indicator variable. In a subsequent call to a procedure, use a WHERE statement to process observations for which the indicator variable has a specific value. Technically, this is not truly WHERE-processing, but it is a very useful technique, so I want to include it in this list.

The WHERE statement and the WHERE= data set option support similar options. The examples in this article use the WHERE statement, but the examples work equally well if you use the WHERE= data set option. For brevity, I will primarily refer to "the WHERE statement" and will show only one example that uses the WHERE= data set option.

The SAS documentation includes a chapter about WHERE-expression processing, which I will refer to as "WHERE processing." The doc spends many pages describing many special-purpose operators that you can use in WHERE statements. However, it only briefly mentions that you can use many function calls as well.

Using operators and functions in a WHERE statement

Before discussing functions, I want to point out that you can use arithmetic operators: plus, minus, multiplication, and so forth) in a WHERE statement. For example, the following statements use arithmetic operators to include only certain observations:

data Have2;
set sashelp.heart;
where Systolic - Diastolic     > 60    /* wide pulse pressure */
    & weight / height**2 * 703 > 25;   /* BMI > 25 */
run;

The resulting data set contains only observations (patients in a heart study) for which the difference between the systolic and diastolic blood pressure is large, and for which the patient's body-mass index is also large.

The previous statements do not call any SAS functions in the WHERE statement. The most popular functions are probably string-manipulation functions such as the SUBSTR and SCAN functions. These functions enable you to filter observations according to the value of one or more character variables. The following example uses the UPCASE and SUBSTR functions to test whether the first two characters of the SEX variable indicate that the patient is male. The function calls will match strings like 'male', 'Male', and 'MALE'. For fun, I also show that you can use the SQRT function, which performs a square-root operation.

data Have3;
set sashelp.heart;
where upcase(substr(Sex, 1, 2))='MA'    /* match first two characters of 'male' and 'Male' */
    & sqrt(weight*703) / height > 5;    /* alternative way to specify BMI */
run;

Another popular character function is the SCAN function. The following example uses the WHERE= data set option to read only the observations for which the cause of death includes the word 'Disease' as the third word. Matching values include "Cerebral Vascular Disease" and "Coronary Heart Disease":

/* an example that uses the WHERE= data set option */
data Have4;
set sashelp.heart(
    where=(scan(DeathCause, 3) = 'Disease')
    );
run;

As mentioned earlier, the WHERE= data set option can be more efficient that the WHERE statement. The syntax requires additional parentheses but is otherwise similar to the WHERE statement.

WHERE clauses in SAS IML

A customer recently noticed that the WHERE clause in SAS IML supports only a small subset of the operators that the WHERE statement supports. Furthermore, it does not support calling functions like the SUBSTR or SQRT functions. If you look at the documentation for the WHERE clause, you will notice that the syntax is of the form
   Variable operator Value
which means that you can only use variable names on the left side of the WHERE clause. You cannot use arithmetic operations on variables or functions of variables. The following PROC IML statements show a valid syntax for a WHERE clause on the USE statement in PROC IML. An invalid syntax is shown in the comment:

proc iml;
/* this WHERE clause works */
use sashelp.heart where( Sex='Male' & Systolic > 140 );
   read all var {Systolic Diastolic};
close;
 
/* The following statement is NOT supported in the WHERE clause:
   USE sashelp.heart WHERE(Systolic - Diastolic > 60
                           & weight / height**2 * 703 > 25);
*/

Although the WHERE clause on the USE statement does not support expressions that use arithmetic operations or function calls, there is a simple trick that enables you to use the power of the WHERE statement in the DATA step to filter data as you read observations into IML. The trick is to use the SUBMIT/ENDSUBMIT block to create a DATA step that filters the data. For example, the following statements create a DATA step view. The view is read by the USE and READ statements in the IML language. It filters the data at runtime while the observations are read:

submit;
   data _Filter / view=_Filter;
   set sashelp.heart;
   where upcase(substr(Sex, 1, 2))='MA'
         & sqrt(weight*703) / height > 5;
   run;
endsubmit;
 
use _Filter; 
   read all var {Systolic Diastolic};
close;

There might be times in which you want to subdivide data into two or more subsets but only process one of the subsets. In that case, you can use a DATA step view to create an indicator variable. You can then use the indicator variable in a WHERE clause in IML to read the data. For example, the following statements create a view that contains a binary indicator variable named _INCLUDE. The WHERE clause on the USE statement reads the observations for which _INCLUDE=1:

submit;
data _Filter / view=_Filter;
   set sashelp.heart;
   _Include = (upcase(substr(Sex, 1, 2))='MA'
               & sqrt(weight*703) / height > 5);
run;
endsubmit;
 
use _Filter WHERE(_Include=1); 
   read all var {Systolic Diastolic};
close;

Summary

This article serves as a reminder that the WHERE statement in SAS supports arithmetic operators and function calls, which means that you can use the WHERE statement to create sophisticated filters for your data. The WHERE= data set option is similar. However, some special-purpose languages in SAS support WHERE clauses that are more limited in their syntax. However, you can always use a DATA step view to filter data or to create an indicator variable that can be processed anywhere in SAS.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK