22). In ARRAY processing, what does the DIM function do?
DIM: It is used to return the number of elements in the array. When we use Dim function we would have to re –specify the stop value of an iterative
DO statement if u change the dimension of the array.
23). How would you determine the number of missing or non missing values in computations?
To determine the number of missing values that are excluded in a computation, use the NMISS function.
m = . ; y = 4 ; z = 0 ;
N = N(m , y, z);
NMISS = NMISS (m , y, z);
The above program results in N = 2 (Number of non missing values) and NMISS = 1 (number of missing values).
24). Do you need to know if there are any missing values?
Just use: missing_values=MISSING(field1,field2,field3); This function simply returns 0 if there aren’t any or 1 if there are missing values.
If you need to know how many missing values you have then use num_missing=NMISS(field1,field2,field3); You can also find the number of non-missing values with non_missing=N (field1,field2,field3);
25). What is the difference between: x=a+b+c+d; and x=SUM (of a, b, c ,d);?
Is anyone wondering why you wouldn’t just use total=field1+field2+field3; First, how do you want missing values handled? The SUM function returns the sum of non-missing values. If you choose addition, you will get a missing value for the result if any of the fields are missing. Which one is appropriate depends upon your needs.
However, there is an advantage to use the SUM function even if you want the results to be missing. If you have more than a couple fields, you can often use shortcuts in writing the field names If your fields are not numbered sequentially but are stored in the program data vector together then you can use: total=SUM(of fielda–zfield); Just make sure you remember the “of” and the double dashes or your code will run but you won’t get your intended results. Mean is another function where the function will calculate differently than the writing out the formula if you have missing values.
26). There is a field containing a date. It needs to be displayed in the format “ddmonyy” if it’s before 1975, “dd mon ccyy” if it’s after 1985, and as ‘Disco Years’ if it’s between 1975 and 1985. How would you accomplish this in data step code? Using only PROC FORMAT.
data new ;
input date ddmmyy10. ;
proc format ;
value dat low-’01jan1975’d=ddmmyy10.
format date dat. ;
27). In the following DATA step, what is needed for ‘fraction’ to print to the log?
if x=.3333 then put ‘fraction’;
28). What is the difference between calculating the ‘mean’ using the mean function and PROC MEANS?
By default Proc Means calculate the summary statistics like N, Mean, Std deviation, Minimum and maximum, Where as Mean function compute only the mean values.
What are some differences between PROC SUMMARY and PROC MEANS?
Proc means by default give you the output in the output window and you can stop this by the option NOPRINT and can take the output in the separate file by the statement OUTPUTOUT= , But, proc summary doesn’t give the default output, we have to explicitly give the output statement and then print the data by giving PRINT option to see the result.
29). What is a problem with merging two data sets that have variables with the same name but different data?
Understanding the basic algorithm of MERGE will help you understand how the step
Processes. There are still a few common scenarios whose results sometimes catch users off guard. Here are a few of the most frequent ‘gotchas’:
1- BY variables has different lengths. It is possible to perform a MERGE when the lengths of the BY variables are different,But if the data set with the shorter version is listed first on the MERGE statement, the Shorter length will be used for the length of the BY variable during the merge.
Due to this shorter length, truncation occurs and unintended combinations could result.
In Version 8, a warning is issued to point out this data integrity risk. The warning will be issued regardless of which data set is listed first:
WARNING: Multiple lengths were specified for the BY variable name by input data sets.
This may cause unexpected results. Truncation can be avoided by naming the data set with the longest length for the BY variable first on the MERGE statement, but the warning message is still issued. To prevent the warning, ensure the BY variables have the same length prior to combining them in the MERGE step with PROC CONTENTS. You can change the variable length with either a LENGTH statement in the merge DATA step prior to the MERGE statement, or by recreating the data sets to have identical lengths for the BY variables.
Note: When doing MERGE we should not have MERGE and IF-THEN statement in one data step if the IF-THEN statement involves two variables that come from two different merging data sets. If it is not completely clear when MERGE and IF-THEN can be used in one data step and when it should not be, then it is best to simply always separate them in different data step. By following the above recommendation, it will ensure an error-free merge result.
30). Which data set is the controlling data set in the MERGE statement?
Dataset having the less number of observations control the data set in the merge statement.
31). How do the IN= variables improve the capability of a MERGE?
What if you want to keep in the output data set of a merge only the matches (only those observations to which both input data sets contribute)? SAS will set up for you special temporary variables, called the “IN=” variables, so that you can do this and more. Here’s what you have to do: signal to SAS on the MERGE statement that you need the IN= variables for the input data set(s) use the IN= variables in the data step appropriately, So to keep only the matches in the match-merge above, ask for the IN= variables and use them:
merge one(in=x) two(in=y); /* x & y are your choices of names */
by id; /* for the IN= variables for data */
if x=1 and y=1; /* sets one and two respectively */
32). What techniques and/or PROCs do you use for tables?
Proc Freq, Proc univariate, Proc Tabulate & Proc Report.
33). Do you prefer PROC REPORT or PROC TABULATE? Why?
I prefer to use Proc report until I have to create cross tabulation tables, because, It gives me so many options to modify the look up of my table, (ex: Width option, by this we can change the width of each column in the table) Where as Proc tabulate unable to produce some of the things in my table. Ex: tabulate doesn’t produce n (%) in the desirable format.
34). How experienced are you with customized reporting and use of DATA _NULL_ features?
I have very good experience in creating customized reports as well as with Data _NULL_ step. It’s a Data step that generates a report without creating the dataset there by development time can be saved. The other advantages of Data NULL is when we submit, if there is any compilation error is there in the statement which can be detected and written to the log there by error can be detected by checking the log after submitting it. It is also used to create the macro variables in the data set.