Introduction to Astronomical Computing

Introduction to Astronomical Computing

At the College of Charleston

(last updated 21 August 2012 by J.E. Neff)

Exercise # 2 -- Getting Started with IDL

This is a short exercise to get you started using IDL. This is the only programming language you will ever need to know, unless you plan to work with instrumentation or do heavy-duty modelling. It is the most common language used by astronomers. It was written by scientists for scientists, so the syntax is fairly easy to master, and it provides a platform to apply your scientific creativity to make the computer do what you want. If you have any other programming experience, you will quickly become a convert to IDL. If you don't, some of the early learning curve might seem obscure, but it will be time well spent. You've picked up a little bit of unix (operating system) and iraf (data analysis package) experience. In those, you mostly "operate" the computer, performing actions that were anticipated by the developers. In a programming language, you are in charge. To be fair, there is a lot more to learn about unix and iraf, including the ability to "script" with them. Don't stop learning those, but learn to enjoy IDL.

I. The Nature of IDL; Starting up IDL; Getting Help

IDL is several things at once. First and foremost, it is a high-level programming language. It is also an INTERACTIVE operating environment (kind of like iraf and unix). Finally, it is a collection of an almost unlimited library of useful tasks and programs, mostly written by other scientists (also kind of like iraf, except that these programs are readable and changeable). You can be issuing commands interactively, compiling and running programs, and calling library routines all at the same time.

It is always best to run IDL from an xterm. Fire up an xterm window and type IDL. If everything is set up right, you should get no errors and a COFC_IDL> prompt. If you have errors, let me know now so I can fix them. There are other ways to run IDL, but you can easily learn those later. The first IDL command you should type is HELP. This command tells you what is in IDL's "active" memory. There is a memory level (it should say $MAIN$), variables (you won't have any yet), compiled procedures (probably ASTROLIB and $MAIN$), and compiled functions (you shouldn't have any of these yet either). Type HELP regularly as we go along. You'll quickly develop a fell for what's going on. One reason IDL is so flexible is that it operates primarily in the RAM. When you start it up, it allocates a fixed amount of space. $MAIN$ is the top-level in the memory tree, kind of like / is the top level in the unix file system. Most of the time, you will be interacting on the $MAIN$ level, but sometimes you get stuck in a subroutine. If that happens, IDL will stop and allow you to fix things and move on. In that case, when you type HELP, it might appear that IDL has lost everything you thought it knew. Have no fear, it's still there. HELP just shows you the active part of the memory. This might sound like a bug right now, but you'll soon see it to be a valuable feature.

If you type a ?, you will get a separate window that provides access to all of the on-line documentation. There's more documentation on the shelves and on the web. You can leave this help window up all the time if you wish.

II. Variable Types; Special Symbols

We already talked about this in class a bit, so you should have no trouble understanding the need for different variable types. Integer variables store numbers from -32767 to +32762 using 2 bytes (i.e. 16 bits). Long-word Integers use 4 bytes, so they range up to some huge number. Floating point numbers are stored with 4 bytes each, reserving some space for the sign, the exponent (which therefore has an upper limit), and the sign of the exponent. Double precision numbers are 64-bit floating point. You usually don't have to worry about double precision and longword, but you have to be aware of them. Byte variables are stored in a single 8-bit byte, so they are actually numbers ranging from 0 to 255 (IDL nearly always starts at 0). The entire ASCII character set can be represented this way; nearly all image display values are also converted to BYTE form before they are displayed (you can think of it as a gray scale with 0 = black and 255 = white, or vice versa). Remember that IDL works in the RAM, so the more space your variables take up, the less free memory is available to crunch numbers. This is not going to be an issue until you get involved with something more heavy duty than we are doing here. IDL also has the power to treat a STRING as a variable. These are generally text. For example, you could type silly='Hello there' and then print,silly. Go ahead and try it. Now type a=2 then type b=2L then type c=2.0 then type d=2D. Type HELP and see if it all makes sense.

One powerful aspect of IDL compared to other languages is that you do NOT have to explicitly tell it what kind of variable you want to store your value in. It will chose the easiest one by default. Unlike most other languages, it can also operate on variables of different types. For example, let's operate on your numbers: e=a*b f=a*c g=a*d

(hit the return key between each command) then print,a,b,c,d,e,f,g and type HELP . Can you follow what's going on? What will happen if you type print,a*silly ? How about print,1/a,1/c ? Try them out and see if it makes sense. IDL will automatically convert the variable types before operating on them in order to maintain the highest precision. Not always, though. If you give it all integers, it will not convert them to floating point before operating on them. But if one of your numbers is floating and the rest are integer, it will convert all of them to floating before the operation. Note that it doesn't change your variables in its memory permanently, just long enough to get the job done (i.e. type HELP and you'll see that a is still an integer).

IDL also provides commands to convert a variable of one data type to another. FIX turns a floating point or string (under certain conditions) into an integer [ type print,fix('2'),fix(1.49), fix(1.51) ]. FLOAT turns an integer or string into a floating point. LONG and DOUBLE convert integers and floating point to double precision.

There are very few rules regarding variable names, but there are special symbols that should never be used. Any variable that begins with a ! is a "system variable". These are variables whose value is known to IDL no matter what level or subroutine you are in. Otherwise, when control passes to a subroutine, all the "main" level variables are temporarily forgotten until the subroutine is finished. You have a few system variables set up already. print,!path

!path tells IDL where to find a procedure (the idl name for program). If you type a word that doesn't correspond with one of its built in commands, it will look through this path in order until it comes to a file with that name and a .pro extension. Then it will compile and immediately execute that program. Part of your search path is procedures that came with IDL, and part of it is stuff that I loaded that is relevant to astronomical data analysis. When you get a chance, start snooping around in your search path to find, read, and try to interpret some real IDL programs.

If you want to issue a unix command from inside IDL, you can start it with a $ (e.g. $pwd). $ has a different meaning inside a program (it tells idl to keep reading on the next line rather than executing the command when it gets to the carriage return). Don't put $ in a variable name. Also, NEVER EVER use spaces in a variable name or a procedure name.

Comment lines in IDL programs start with a semicolon, so never use one of those in your variable and procedure names. Don't use a . in the name either. IDL has another (actually several more) variable type called a structure. These are incredibly powerful, but they have .'s in them. For example, print,!d.name and it should tell you what kind of device IDL is communicating with (probably "X" for now). Some special commands begin with a "dot" (e.g. .run), so just don't use it in variable names (your procedure names will of course be called something.pro, but to IDL you will just call them something; for example, we'll be using a command juldate, which is actually a program in your path called juldate.pro; go ahead and locate it and print it out). Dashes (-) are not a good idea, because they might get interpreted as a minus sign! If you need some kind of delimiter, use an underscore (_). When you read the introductory IDL manuals, you will come across several more special symbols, but these are the main culprits.

IDL is not case sensitive, but filenames and unix commands are. print,a is the same as PRINT,A or print,A

III. Vectors and Arrays

Quite possibly the single-most powerful aspect of IDL, from the perspective of a scientific programmer, is the way it handles arrays of numbers. Simply put, it treats an array as a single variable. If you want to divide one image by another, you just type result=image1/image2 and IDL figures out the rest. This may not sound so impressive to you if you haven't struggled through writing loops in other languages to operate on each number in the array one at a time and then put the resulting numbers in the right spot in the resulting array (after declaring all the variable names and data types). All of the data types that apply to single variables also applies to arrays. A 1-D array (i.e. a column of numbers) is called a "vector". Mostly in astronomy we deal with 1- and 2-dimensional arrays. A spectrum, for example, is a vector, and an image is a 2-d array (which we'll just call an "array" from here on, because everything you'll learn also applies to arrays of any dimensions, but we seldom use them).

You can also string multiple operations togther in a single command line; you don't have to define intermediate variables or read and write disk files. Of course, you can if you want to; IDL has the power. But if you don't need to, it uses less memory and runs faster to operate on your variables inside a command line. You'll see some examples below.

If a vector called charlie is 25 elements long, the first element is charlie(0) and the last is charlie(24). In 2-dimensional arrays, you can think of the index values as rows and columns. For example, in a 1024 x 1024 image that you called fred, fred(512,736) is the value of the pixel in the 512^th row and 736^th column. Each row and column of the 2-dimensional array can be treated as a vector by giving a wildcard to the othe dimension (e.g. fred(512,*) is the entire 512^th row of fred).

IV. Running IDL Interactively

You seldom jump into IDL and start writing programs. Generally, you sit down at the "interpreter" level and issue a command and look at the result. When it starts doing what you want it to do, you copy these commands into a text editor somewhere and start piecing together a longer program (so that you don't have to do so much typing the next time). The ability to interpret commands one at a time is a rare feature among programming languges, so you should learn to love it. I'm going to give you a bunch of random but educational examples. Type them in, see if you can figure out what's going on, ask questions, follow your interest into more complex examples. When you get tired of this, we'll learn a little programming.

First, type HELP then type EXIT to get out of IDL. This is the only way to completely clear all the junk out of its memory. Then start IDL back up and type HELP.

intvec=intarr(100)

fltvec=fltarr(100)

help

print,intvec

intvec=indgen(100)/100

print,intvec

fltvec=findgen(100)/100

plot,intvec

plot,sin(fltvec*!pi)

silly=sin(fltvec*!pi)

plot,silly

plot,20*sin((fltvec-.2)*15)

oplot,[0,100],[-.2,-.2]

intvec=[23,97,111,45,-19]

print,intvec

intvec=intarr(10)

read,intvec

... type in 10 random integers

print,intvec

help

for i=0,9 do read,intvec

... type in 10 random numbers

for i=0,9 do print,i,'my random number is ',intvec(i)

a=findgen(100)/100.

b=fltarr(100,100)

for i=0,99 do b(i,*)=sin(a*2*!pi)

surface,b

contour,b

tv,b

tvscl,b

tvscl,rebin(b,500,500)

print,max(b),min(b),mean(b),median(b)

c=bytscl(b)

print,max(c),min(c),mean(c),median(c)

plot,b(50,*)

plot,b(*,60)

plot,b(*,60),yrange=[-1,0),psym=2

Now try to make up some commands on your own, based on what you've learned so far. Some will work, and some will bomb. Think of something simple you'd like to calculate or plot, and make it work. Show me what you did.

V. Simple IDL Programming

There are multiple ways to "drive" IDL, and that applies to writing programs, too. You can write a one-time-only program interactively. Try this example....

.run

moron=fltarr(50)

for i=0,49 do moron(i)=3*i^2-14*i+14.5

plot,i

end

Every the end command is necessary. When IDL encounters it, the program will be compiled and then run line-by-line. If there's an error, it will stop at that line. You can either type RETURN to go back to the level that ran the program and try again, or you can fix your error and type .CON to continue execution at this point. Because you can get many levels deep in subroutines, you sometimes have to type RETURN multiple times to get back to the $MAIN$ level. You can save yourself work by typing RETALL, which sends you back in one big jump. If you have a procedure already written (a text file that ends with "end" and has a .pro extension and is in your !path), you can either type .run filename (without the .pro) or just type the filename. They do slightly different things. .run always recompiles the program. If you've made changes to it since it was last compiled (e.g. if you fixed an error), this will compile and execute the new version. If you want to compile it the first time, all you have to do is type the name without the .run.

It is a good habit to always start any procedure with a line that reads pro procedurename (where procedure name is what you call the program and procedurename.pro is the name of the file). If variables are passed to this program (i.e. it is a subtroutine) you must have this line. If it is totally self-contained, or if it just uses the variables already known to IDL, you have to comment out this line by starting it with a semi-colon (but leave it in there commented out as a title for your program). After the pro line, it is customary to include a comment section starting with a line that has just a ;+ on it and ending with a line that has only ;- on it. All the lines between start with a ;. They are just comments, but if you type doc_library,procedurename, those comments will print out on the screen. Many procedures are written flexibly, so that if you call them wrong or with the wrong number of parameters, they print out instructions on what to do and die gracefully. As you'll soon see, there are many ways to send parameters and variables between programs. Some parameters are required, some are optional. Some are global (the system variables) and some are only known to the subroutine. Some are changed in the subroutine then passed back to the calling routine. I know this sounds obscure now, but we'll look at a few examples and you'll soon get the hang of it.

If you haven't already, print out the JULDATE program. What is the julian date right now? See if you can figure out how the program works, and calculate a number. What was the Julian date when you were born? How many days have your lived. Do all this in IDL.