STATISTICAL COMPUTING

In the typical statistics course before the ready availability of personal computers, all calculations were done by hand. If the study of statistics frustrated many students, it was because they spent so much time manipulating numbers according to poorly understood formulas that there was no time left to focus on the purpose of the analysis. Even though personal computers are everywhere and cost less than the first four-function calculators, many courses still have students performing calculations by hand, with the same frustrations as before. This may be due to intertia, lack of resources, or instructors with little or no experience analyzing data.

Today, computers can do most of the calculations for us and we are free to concentrate on what we are trying to accomplish through the use of statistical techniques. We still have to learn how to use statistical program packages, but this is a small price to pay.

In Nutrition 209, we will be using the statistical program package SAS. The software is available on personal computers on campus. Students with their own personal computers will be given site license copies of the software to install on their PCs.

[Important! SAS lies! They claim that various versions of SAS will not run with various versions of the Windows Operating System. However, what's really true is that SAS won't support the program under those versions. There are two versions of SAS available at the moment. You should have version 9.2 installed if you computer uses XP/Professional, the Enterprise, Business, or Ultimate versions of Vista, or Windows 7 Home Premium. Otherwise, you must use version 9.1.3, which will run under XP Home despite SAS's claims to the contrary.]

Why SAS? Because it is the most widely used program in the workplace. My personal favorite is SYSTAT because of the ease with which it creates statistical graphics. However, SYSTAT has a small market share and few employers are looking for people with SYSTAT experience. If one knows nothing else but how to use SAS effectively, one will never want for work. Experience with SAS gives applicants an advantage in the job market, so SAS it is!

Ideally, I'd like to expose you to more than one program. With one program--even if it's SAS--it's easy to fall into the trap of feeling you won't be able to do anything without it. Once you see two, you'll quickly realize that it's easy to master any number of them because you've got a sense of how they work. However, there are only so many hours in the day and so many hours for nutrition science students to spend learning statistical program packages.

There is a point-and-click graphical user interface buried in SAS. Data can be selected and analyses can be performed by making selections from menus and panels presented by the program. It isn't necessary to memorize specific command syntax. All you need know is under which broad category of menus a task is likely to be found. However, we will NEVER use SAS this way.

SAS will be run by creating text files of written commands. A command file can be short, containing a single request, or it can be thousands of lines long and carry out extensive data manipulation as well as multiple analyses. Once a command file is written, the statistical program package is instructed to execute the commands. This is how we will use SAS. It sounds a lot harder than pointing-and-clicking, but it isn't. It's all a matter of being trained properly. You'll quickly realize not only that command files are a necessity, but also that they make your work easier!

Command language is essential for maintainingan audit trails--a record of what was done--so that anyone with a copy of the data and the audit trail can reproduce an analysis and verify its accuracy. If I give you a command file and a data set, you can run the command file and obtain my results. At the moment, there is no widely-adopted way to give you a set of mouse clicks. I can tell you what I did, or even write it down, but that doesn't guarantee it was truly what I did. Perhaps I made an error writing it down. Perhaps I did not do what I thought I did. Or, perhaps you entered the mouse-clicks incorrectly. Or, even worse, perhaps we both made the mistakes and you replicated my faulty analysis! A command file eliminates these possibilities. It both performs the analysis and records what was done.

[Full disclosure demands I point out all serious programs with GUIs work by translating mouse clicks into commands and placing them in temporary storage. Because these files can be used to reanalyze the data at at later date, they are sometimes used as an excuse to avoid learning command language. This "excuse" fails for at least four reasons:

How do command files make our work easier? Only in a textbook or a classroom do we find clean, complete, immutable data! Real data are messy. Despite our best efforts, initial data sets are often incomplete and contain errors. It is common to be given additional data after performing initial analyses. Initial analyses are often planned that way! The additional observations might be additional observations, additional variables, or merely a few values that fill in some holes in the original worksheet. If the data are analyzed by using a point-and-click interface, everything has to be repeated...c.l.i.c.k.-.b.y.-.c.l.i.c.k! With command files, all you do is tell the program to run them again with the new data files!

Some Facts of Life

Be prepared to learn things at least twice, maybe three times. The first time you learn something, it will be new and may seem complicated. If you don't use a new skill for any length of time, you may forget much of what you've learned and have to relearn it, although programs with simple graphical user interfaces seem to minimize this. However, relearning something often takes only a fraction of the time it took to learn it the first time and, once relearned, it is retained much better. (I'm almost tempted to say that if you relearn something twice, you'll never forget it.) This is true for everyone. Don't despair if almost everything seems strange the first time around. This is a rite of passage. In a supportive environment, it's really impossible to fail.

It's Not Necessarily You

Everyone is certain to come across programs that seem counterintutitve and make no sense whatsoever. They will be hard to learn, difficult to remember, and impossible to master. The problem is not necessarily you. A critical part of program design is figuring out HOW to do things. This applies not only to the internal workings but also to a program's user interface. What might seem straightforward to a programmer or design team might seem terribly convoluted to a user. No matter how dedicated the programing team, there reaches a point where the search for the best way to do something has to stop and something--sometimes anything--has to be put in place. And some of those decisions are terrible.

Also, there are few standards. The way you do something on one computer or in one package is often different from the way you do it in another--and once you've learned one way, the mind and hand may not easily adapt to another.

Some Happy Notes

Computers are useful tools and learning how to use them can be a pleasant process. When you learn a new skill, you get the instant, positive feedback of seeing your requests carried out before your eyes.

If you learn one program, the second program comes for a fraction of the original price. All statistical packages work the same way: first you access the data, then you analyze them.

Personality Types

The personalities of computer users span a wide range of extremes. Some people are comfortable learning and using a wide variety of similar programs. Others can deal with only one program at a time. They resist learning any new programs if they can get by with what they already know, and when they do learn a new program, they immediately abandon the old one. There are some who can't resist learning more and more about programs they work with. Others learn only enough to carry out the task at hand and sometimes avoid learning advanced features if a sequence of basic features will get the job done.

Recognize that these personality types exist and don't be concerned when you find that you're one type and your colleagues are the other type. You are who you are. I tend to work with a small number of programs and learn their every nuance. Others move from machine to machine, application to application--and enjoy it. I don't know how they keep all the details straight! (I console myself by telling myself that all that breadth comes at the expense of depth!)

WARNINGS

(1) I've been using computers for more than 40 years. I got my first personal computer in 1986. (I wanted to add "back before hard drives when PCs came with a B: floppy drive", but back then the B: drive was an extra-cost add-in!) I've learned how to use the computer to perform many tasks. Over 40 years, it's almost impossible not to. Knowing how to do them, I may not have learned more efficient methods developed later. Also, having convinced myself, correctly at the time, that some task was not easily accomplished, I may have missed it when the feature was added in a later version. Don't be surprised if you discover a more effective way to carry out an operation than the one I show you. If you do, please show it to me! This applies even to (and especially to!) SAS.

(2) THE 15 MINUTE RULE: Whenever you find yourself stuck in a rut and you've been working on the problem for 15 minutes with no progress, I insist that you seek help from the TA, me, or anyone...but seek help! It is very easy to dribble away hours trying to get something to work properly. Reading manuals, help files, and trying different options are an important part of your education. You are expected to try to solve problems on your own, but don't lose your sense of proportion. There is no dishonor in seeking help once you realize you're in over you head after making a sincere effort.


Copyright © 1999, 2006 Gerard E. Dallal