Data Containers

Data Containers#

Overview#

To make effective use of GBTIDL, you must have an understanding of data containers and how they are used by GBTIDL procedures. The following sections give an introduction to data containers. Advanced users may wish to refer to the section Tips on Using Data Containers for Experts. A data container (DC) is an IDL data structure used by GBTIDL to store a spectrum and its associated header. A data container uses standard data types (e.g. integers, floats and strings) for header values and an IDL pointer for the data itself. There are 16 global data containers, or buffers, numbered 0 through 15 that are used like memory entries in a calculator. Buffer 0 contains the primary data container (PDC). The PDC will be discussed in more detail in a later section. GBTIDL includes a number of calibration, averaging, and analysis procedures that work with the PDC by default. You may see some of these routines refered to as part of the “GUIDE layer” elsewhere in the documentation.

In addition to the global buffers, it is possible to define IDL variables as data containers. GBTIDL includes commands and procedures that operate on these as well, and you may see some of these routines refered to as part of the GBTIDL “toolbox.” The IDL global structure !g is central to the GBTIDL implementation. It contains not only the global data containers, but also a set of values which assist in data reduction. For example, the field !g.nfit stores the order of the polynomial for the baseline to be fit, so that value does not have to be specified each time the baseline procedure is run. The gstatus procedure summarizes the current contents of the !g structure. Many users will never need to interact directly with the !g structure.

Todo

Add reference to the !g table section

A second global structure called !gc is used to store various constants . This structure is read-only. It’s contents are listed in the following table.

Table 27 Content of the global structure !gc#
Name	Type	Value	Unit	Description
!gc.light_speed	double	2.99792458D8	\(m\, s^{-1}\)	The speed of light
!gc.light_c	double	2.99792458D5	\(km\,s^{-1}\)	The speed of light
!gc.plank_h	double	6.6260755D-27	\(g\,cm^2\,s^{−1}\)	Planck constant
!gc.newt_g	double	6.67259D-8	\(dyne\,cm^2\,g^{−2}\)	Newton gravitational constant
!gc.boltz_k	double	1.380658D-16	\(erg\,K^{-1}\)	Boltzman constant
!gc.eV2erg	double	1.60217733D-12		eV to ergs
!gc.AU	double	1.4960D+13	\(cm\)	astronomical unit
!gc.m_H	double	1.673534D-24	\(g\)	Hydrogen mass
!gc.m_e	double	9.1093897D-28	\(g\)	Electron mass
!gc.pc	double	3.0857D+18	\(cm\)	parsec
!gc.rad_sig	double	5.67051D-5	\(erg\,cm^{-2}\,s^{-1}\,K^{-4}\)	Radiation constants

For example, if you wish to use the Planck equation, \(\Delta E = h\nu\), for a frequency of 1665 MHz, the following would calculate the energy:

GBTIDL -> print, !gc.plank_h * 1665d6

and print the result to screen

1.1032416e-17 ; The result

Data Containers and Pointers#

A data container is an IDL data structure that uses standard data types for header information, but uses a pointer for the data array. The reason for using a pointer is that it allows the data container to hold a spectrum with an arbitrary number of channels. The continuum data container uses additional pointers to hold auxilliary information stored in arrays that must match the dimensionality of the data array (e.g. the time at the center of each integration and the pointing direction). Pointers make the data container very flexible. Unfortunately, they also make it somewhat more difficult to use the data directly. GBTIDL procedures hide all of the nuances of working with pointers, so nearly all users will not need to be concerned with how to work with pointers in IDL. It is possible to copy the data in a data container to a standard IDL array, and back again into the data container using the GBTIDL function getdata and the command setdata. An example follows later in this section. Users who wish to learn more about advanced manipulation of data containers should see the Tips on Using Data Containers for Experts section.

About the Primary Data Container#

The PDC is the primary data container that is used by default by many GBTIDL procedures to improve ease of use. So a display operation such as show will display the PDC unless told to do otherwise, and likewise for smoothing operations, statistics, and so on. Data access procedures copy the data from disk into the PDC, unless told otherwise. Like all data containers, the PDC includes both the data and the associated header information. The PDC is stored in the IDL global variable !g.s[0] if it is a spectrum, and !g.c[0] if it is continuum data. In addition to the PDC, there are 15 global data containers that can be used for storage of results during data reduction sessions. These are called !g.s[1], !g.s[2], … and !g.c[1], !g.c[2], …

Examining and Changing Data Containers#

You may on occassion need to change the contents of a data container. For example, you may need to set the rest frequency by hand. The change can be made using the following command:

!g.s[0].line_rest_frequency=1667.359d6 ; Change to 1667.359 MHz

Another example involves setting the y-axis label on the plotter. For more information about changing axis labels, see Appendix H.

Todo

Replace this with the correct reference.

!g.s[0].units=’F(Jy)’ ; Set the label to ’F(Jy)’

To access the array containing the actual data values in a data container, use the commands getdata and setdata. For example:

GBTIDL -> getrec, 0 ; get some data
GBTIDL -> mydata = getdata() ; copy the data array into an IDL variable
GBTIDL -> help,mydata
  MYDATA FLOAT = Array[8192]
GBTIDL -> mydata[0:500] = 0 ; make some changes to the IDL array
GBTIDL -> setdata,mydata ; insert the new array into the PDC

Data Container Operations#

GBTIDL can be used as a calculator, operating on spectra contained in the 16 global data containers. Procedures are available to perform arithmetic operations on the global data containers, including add, subtract, multiply, and divide. These procedures take two required parameters: the indices of the buffers being operated on. They also take an optional third parameter, which identifies the buffer into which the result will be stored. If a storage buffer is not specified, the result is placed in the PDC (buffer 0), overwriting any existing spectrum there. A copy command copies the contents of one buffer directly into another. For example, to add two data containers, you could use the following command sequence:

getrec,1        ; Get some data
copy,0,10       ; Copy the PDC to DC 10
getrec,0        ; Get some other data
copy,0,11       ; Copy the PDC to DC 11
add,10,11,12    ; Put the sum of the two spectra in DC 12
show,12         ; Show the sum

These operations can be useful for handling baseline subtraction. For example, you can store a baseline fit in a data container and subtract that fit from any other spectrum, as in the following example:

getrec, 0                   ; get spectrum A and place it into the PDC (buffer 0)
nfit, 5                     ; set the order for the polynomial in a baseline fit
bshape, modelbuffer = 10    ; fit a baseline and store it in buffer 10
getrec, 1                   ; get spectrum B
copy, 0, 5                  ; copy spectrum B into buffer 10
subtract, 5, 10, 11         ; subtract the spectrum A baseline from spectrum B

Tips on Using Data Containers for Experts#

There are 16 global data containers, or buffers, numbered 0 through 15. Data container 0 is also called the primary data container, or PDC for short. If you find you need more than 16 buffers, one option is to use the nsave facility, which allows you to store an arbitrary number of data containers in a disk file. Alternatively, you can store data containers as IDL variables. If you choose to store data containers in IDL variables, there are a few procedures you should be aware of:

data_new: Create a new data container.
data_copy: Copy a data container.
data_free: Free the pointers in a data container or array of data containers.
set_data_container: Copy a data container stored as a variable into one of the 16 global buffers.

Check the reference pages or look at code examples for help on using these procedures. Make sure that when you create a new data container (either by data new or data copy) you free the pointers using data free when you are done, otherwise memory will be leaked. Be sure to avoid this mistake when using data containers:

GBTIDL -> mydc = !g.s[0]
; ... you do stuff to mydc here
; ... you think you are done, so you free it
GBTIDL -> data_free, mydc

The mistake here is that the initial assignment copies the value of the pointer, not the array that the pointer refers to. So, any changes to the data array through mydc will also change the data array in !g.s[0] because they use the same pointer. More importantly, the data_free at the end will also free the pointer in !g.s[0], likely bringing GBTIDL to its knees. Instead, use data_copy:

GBTIDL -> data_copy, !g.s[0], mydc
GBTIDL -> set_data_container, mydc              ; Resets index 0 with the contents of mydc
GBTIDL -> data_copy, !g.s[1], mydc
GBTIDL -> set_data_container, mydc, index=1     ; Resets index 1 with the contents of mydc
GBTIDL -> data_free, mydc

This example illustrates the use of set data container to copy a user-named data container into the global data container. It is not necessary to use data free before calling data copy because data copy takes care of all pointer maintenance in the output data container without leaking memory.

Also, be aware that when global values are used as parameters to functions or procedures, IDL passes those values by value and not by reference. So if you send a DC from !g to a procedure or function, all changes you make to that DC will remain local to that function, and will not be retained in the global variable.

If you need to work with an array of data containers here is one way you might do that: Suppose you want to run getfs on scans 50 through 100 and defer saving the data to the output file until the end. The step where the data are written to disk will be much faster if it can all be done at once, but it does mean that all 51 spectra will be in memory by the end of this operation so you should consider whether they will all fit in memory at the same time.

dcarr = replicate({spectrum_struct},51)     ; Create un-initialzed array of 51 data containers
freeze                                      ; Turn off the plotter’s auto-update feature
for i=50, 100 do begin                      ; Loop over the scan numbers
    getfs, i
    tmp = dcarr[i-50]                       ; Copy that to dcarr (this is the tricky step)
    data_copy, !g.s[0], tmp
    dcarr[i-50] = tmp
endfor
putchunk, dcarr                             ; Save it
data_free, dcarr                            ; Free up the memory

The tmp variable is used in the for loop because of the aforementioned issue of IDL passing elements to functions and procedures by value and not by reference. So we assign to tmp the specific element of dcarr that we want to modify. That gets a copy of everything, including the pointer. Inside data_copy, tmp is modified and since tmp in this case is passed by reference (because it is not a global value and it isn’t an array element), changes to tmp will be seen outside of data_copy. Once data_copy returns, the values in tmp (including the now valid pointer containing the copy of the data array) will be copied to dcarr. It is not necessary or desirable to use data_free on tmp because that would also free the copy of that pointer in dcarr. That pointer is freed at the end. Be sure and free up all of the pointers that you create this way so that memory is not leaked.