CGonzalez@swri.org

- Overview
- IDFS data products
- SCF Variables
- Algorithm Definition
- User-Defined Functions
- Data Product Generation
- Example of an SCF

Many times in the space physics world there arises the need to derive different quantities based upon data parameters that are returned by one or more spacecraft. In some cases, these derived products may be dependent upon values returned from a single instrument; in other cases, the derived products are dependent upon values taken from many instruments. In either case, there is a need to specify data parameters and the algorithms necessary to produce derived data products.

To fill this need, a domain specific system has been developed that allows for the definition and derivation of new data quantities from an homogeneous, spacecraft/instrument independent primary data format. This system, referred to as the SCF (Science Computation Formulation) software, utilizes a GUI-based definition session in which the user defines input variables, temporary variables and output variables in addition to the actual set of mathematical operations to be performed on the data to produce the derived data products. By providing a system in which the user dynamically defines the algorithm to be performed to derive the new data products, the need for specialized programming is drastically reduced. In addition, any errors in the algorithm can be quickly corrected without the need for data reprocessing.

The SCF system provides for the creation of new data products from an existing primary data set, provided that the data has been stored in the Instrument Description File System (IDFS) format. The IDFS is a data storage format that is designed to be general enough to handle the majority of scientific data sets. These data sets include raw telemetry, processed data, simulation data and theoretical data. The inclusion of raw telemetry was an important focal point in the development of the IDFS format. The IDFS itself has a built-in algorithmic capability through which individual measurements may be manipulated. This manipulation is primarily used to take raw measurements to physical units and to allow for transformation between various units. There is no possibility, within the IDFS, for the interaction between different measurements, which is the realm of the SCF.

The IDFS converts telemetry data into physical units as the data is accessed, which allows for the refinement of calibration factors and processing algorithms without having to reprocess the original data set. A set of software services the IDFS format. This covers access of the individual data files through a distributed database, positioning within the data file based on time, access of the data and the real-time conversion of the data to physical units.

There are several data sources that may be obtained from an IDFS data set. In addition to the primary data source, there may be several secondary data sources attached to it which may be accessed in parallel. Secondary data sources include: variables describing the instrument state; any calibration variables which may be associated with the primary data; data quality flags; if the satellite is rotating, the spin rate and the azimuthal angles for each primary data value as measured from a predefined zero degree position; and finally, the data accumulation and latency associated with each data value. Both the primary data and the secondary data is accessible within an SCF.

The SCF input variables are symbolic names given to the data parameters returned by the IDFS. As a result, all input variables are either scalar quantities or vector (1-D) quantities, based upon the derivation of the IDFS data source. A scalar quantity is a single data value that is dependent only upon time and position. A vector quantity is a one-dimensional data item that has a functional dependence on a single variable, which in IDFS terminology is called the scanning variable. The length of the vector is dictated by the IDFS data source. An example of an IDFS data source that returns a vector quantity would be a particle spectrometer which measures a particle flux (data) as a function of particle energy (scanning variable). The specification of any input value in the SCF must be accompanied by the desired physical units to be associated with the input data. These units are acquired in the real-time conversion of the raw IDFS data.

Temporary variables are symbolic names given to hold constant values (such as pi), format conversion strings, or intermediate results during the evaluation of the algorithm (such as the angle between the spacecraft velocity vector and the vector to the sun). Temporary variables are not returned by the SCF data access routines; only the defined output variables are returned. The dimensionality and the lengths of each dimension must be specified for each temporary variable. The SCF software supports a storage class up to ten dimensions (10-D). Data products of dimensions 2-D and higher can be created within the SCF algorithm section using data from defined input variables (scalar or vector) and/or other temporary variables.

The purpose of the SCF is to produce new derived data products. Output variables are symbolic names given to the derived data products that are generated by the computations. There must be at least one defined output variable; otherwise, the point of the computation is meaningless. Output variables are returned by the SCF data access routines. As in the case of temporary variables, the dimensionality and the lengths of each dimension must be specified for each output variable. The SCF software supports a storage class up to ten dimensions (10-D). Data products of dimensions 2-D and higher can be created within the SCF algorithm section using data from defined input variables (scalar or vector) and/or defined temporary variables.

Unlike input variables where there is a selection of physical units, each output variable returns data in only one physical unit; that determined by the algorithm used to create it in conjunction with the units of the input variables used in the computation. The Science Computation Formulation (SCF) generated data is consistent with the format of the primary data thus ensuring its compatibility with all existing applications of primary data.

The SCF software defines some variables that are internal to the software and are set once the execution of the algorithm commences. These internal variables are accessible within the algorithm definition section of the SCF. These variables are considered reserved SCF keywords. Of course, SCF keywords cannot be used when defining input, temporary and output variable names. The list of SCF keywords consists of internal variables, SCF-defined function names, user-defined function names and keywords that support the FOR loop construct and the IF-ELSE-ENDIF construct.

The SCF is a user-defined algorithm which defines a set of mathematical operations to be performed on the data from one or more instruments to produce a set of modified data values. There is only one defined operation per algorithm step. The first term in any algorithm step is the resultant variable name and the second is the equal sign (=). From this point, there are five formats which an algorithm step can take. In each of these formats, a non-scalar variable can be indexed to retrieve a scalar quantity, e.g. FMT[0] references the first element in the 1-D vector variable FMT. The index value can be a number or a variable, as in the case, FMT[T1]. If the index value is a variable, the index variable must be a scalar quantity and the index variable cannot be indexed itself, that is, indexing of the form FMT[T1[0]] is not allowed. If a variable name is used as an index, the index value is determined at execution time. Since the SCF software is written in ANSI C, indexing of multi-dimensional variables start at zero, not one, and the index values run from zero to dimension size minus one.

The simplest of these formats is the initialization of the resultant variable and is of the form

where VALUE can be a variable name, a number that is specified in decimal, floating point or exponential format or a format conversion string ("%10.2e"). RESULT must be one of the defined input, temporary or output variables. If the initialization value is a format conversion string, the quotation marks must be included. The SCF software provides a built-in print function which can serve as a mini-debugger so that the results of the algorithm can be displayed on the screen as the steps are being executed. The print function can also serve as a means of dumping the results of the computations. The print function takes two arguments, the format specification string and the variable to be printed. The format conversion strings must be held in temporary variables that are of a 1-D storage class and are at least 4 elements in length.

When the standard mathematical operations of addition, subtraction, multiplication and division are utilized, the algorithm step takes the form

where **oper** should be replaced by one of the 4 mathematical
symbols (+,-,*,/), RESULT must be one of the defined input, temporary or output
variables and VAR_1 and VAR_2 must be either a user-defined variable or
an internal SCF variable. As an example, the algorithm step to multiply the
variable BX by the variable BY and place the result in the variable TP is
written

If a constant is to be utilized in the mathematical operation, it must be assigned to a temporary variable and referenced through the temporary variable. Even though the SCF software supports a storage class up to ten dimensions (10-D), the mathematical operations that are currently defined work on scalar-scalar, vector-scalar and vector-vector quantities. In the SCF context, a scalar quantity is a single value and a vector quantity is a multi-value entity (1-D array of values). Future evolutions of the SCF software will provide the mathematical operations that are needed to process variables of higher dimensions. For the vector-vector operations, the lengths of the vectors must be the same. The definition for each case is shown in the following equations. If variables A, B and R are vectors then vector-vector operations are defined as

for all i where i runs from 0 to the vector length minus one. The resultant variable R is a vector of equal length to A or B. If variables A, B and R are scalars then scalar-scalar operations are defined as

Lastly, if the variables A and R are vectors and the variable B is a scalar then scalar-vector operations are defined as

for all i where i runs from 0 to the vector length minus one. The resultant variable R is a vector of equal length to A.

The third format is used when equating a resultant variable to a function. A function can have multi-variable input. The SCF software supports standard mathematical functions such as square root, sine, cosine, and polynomial expansions to name a few. A function can return either a scalar or a vector quantity. This type of algorithmic statement is written as

where RESULT must be one of the defined input, temporary or output variables and VAR_1, VAR_2, ... VAR_N must be either a user-defined variable or an internal SCF variable. If a constant is to be utilized in a function call, it must be assigned to a temporary variable and referenced through the temporary variable.

The fourth format is used to iterate over a series of algorithm steps, which is accomplished with the FOR loop construct. The FOR loop construct is of the form:

FOR LOOP_VAR = START TO END BEGIN algorithm step(s) to iterate END

There must be at least one algorithm step contained between the BEGIN-END block. The tokens FOR, TO, BEGIN and END are SCF reserved keywords. LOOP_VAR is referred to as the looping variable and must be one of the defined input, temporary or output variables. The tokens START and END can either be a number or a variable name. If the token is a number, only positive values can be specified. If the token is a variable name, the referenced value must be a scalar quantity. At execution time, the SCF software assumes that the START value is less than or equal to the END value. The following example loops over a 2-D variable called VALUES and sets each element of the variable to the product of the looping variables:

START = 0 STOP = 3 FOR T1 = START TO STOP BEGIN FOR T2 = 0 TO 2 BEGIN VALUES[T1][T2] = T1 * T2 END END

In this example, VALUES is a 4 x 3 matrix. Notice that the start values for both FOR loops are set to zero. This is necessary since the SCF software is written in ANSI C where indexing of multi-dimensional variables start at zero, not one.

The fifth and final format is used to execute a series of algorithm steps based upon the result of a numerical comparison. This is accomplished with the IF-ELSE-ENDIF construct, which can take one of the two forms shown below:

IF (VAR_1operVAR_2) algorithm step(s) ELSE algorithm step(s) ENDIF IF (VAR_1operVAR_2) algorithm step(s) ENDIF

There must be at least one algorithm step contained between the IF-ELSE,
ELSE-ENDIF and IF-ENDIF blocks. The tokens IF, ELSE and ENDIF are SCF
reserved keywords. The left and right parenthesis must be specified in the
IF statement. VAR_1 and VAR_2 are referred to as comparison values. A
comparison value can either be a number or a variable name. If the comparison
value is a variable name, the referenced value must be a scalar quantity.
The token **oper** should be replaced one of the following symbols:

< less than <= less than or equal to > greater than >= greater than or equal to == equal to != not equal to

In the case of the IF-ELSE-ENDIF construct, if the comparison holds true, the algorithm steps between the IF-ELSE block are executed. If the comparison proves false, the algorithm steps between the ELSE-ENDIF block are executed. In the case if the IF-ENDIF construct, if the comparison holds true, the algorithm steps between the IF-ENDIF block are executed. If the comparison proves false, no algorithm steps are executed. The following example sets the value of the variable VAL1 based upon the results of the comparison against the variable T1:

IF (T1 <= 5) VAL1 = T1 ELSE VAL1 = T1 * T1 ENDIF

When the storage space for all defined variables (input, temporary, and output) is created, all variables are initialized to a predefined value that can be referenced through the mnemonic OUTSIDE_MIN. Values for the temporary variables are not cleared after each iteration of the SCF algorithm allowing temporary variables to be treated as "static" variables. The use of static variables may be helpful in utilizing the IF-ELSE-ENDIF construct. The situation may arise where some calculations will need to be performed only for the first iteration of the algorithm. This may be accomplished by defining a temporary variable and using the IF-ELSE-ENDIF construct to test whether the variable is set to the predefined value. Obviously, one algorithm step within the IF block would be to set the value of the temporary variable to some other value so that at the next iteration of the algorithm, the test would prove false and the steps within the IF block would not be executed.

Non-standard operations such as FFT's, data filtering, etc. can easily be added to perform operations that are not supported by the SCF software. These non-standard operations are referred to as user-defined functions. By definition, user-defined functions are stand-alone ANSI C modules that must be compiled and added to the SCF software. User-defined functions are "locally defined"; that is, they are not ported to other sites as part of the SCF software unless the user-defined function becomes archived as part of the SCF domain.

Once the module has been created, the user-defined function must be "registered". A GUI-based registration system has been developed for the SCF software. The registrar must provide the name of the function; that is, how it is to be referenced in the algorithm section of the SCF. In addition, the registrar must provide the full pathname of the file (including the .c extension), the name of the ANSI C function, and the number of arguments pertinent to the function call. A user-defined function may also be "unregistered". This may be necessary when a user-defined function has been registered, but the code within the ANSI C module has been changed. In order to replace the old code with the new code, the registrar must "unregister" the user-defined function and then "register" the user-defined function again.

Since functions can have multi-variable input, the SCF software utilizes a generic wrapper function to standardize the calling sequence for all functions, SCF-specific or user-defined. This calling sequence is of the form

void FUNC_NAME (double *mem, unsigned long *args, long *arg_len, unsigned longres, longres_len, double *ival)

All of the parameters listed are SCF-provided. The first parameter in the calling sequence is a pointer to the data matrix that holds the values for all defined variables for the SCF. The second parameter is an array that holds the index value(s) for the data matrix to access each of the inputs to the function call, in the order in which the arguments are listed in the function call. The third parameter is an array that holds the size of each of the inputs to the function call, in the order in which the arguments are listed in the function call. The fourth parameter is an index value for the data matrix to access the resultant variable. The fifth parameter holds the size of the resultant variable. The final parameter holds an initialization value. This initialization value is set by the SCF software and is only used when the SCF initialization statement is encountered; therefore, it is not applicable to user-defined functions.

An example of the SCF registration process for a user-defined function is given below. The following information is defined by the registrar

Name of SCF function: BTOTAL Name of C file: btotal.c Name of C function: scf_btotal Number of parameters: 6

The information specifies that the user-defined function will be
referenced in the algorithm section of the SCF as **BTOTAL** and
will require six parameters. The SCF algorithm step is of the form

This user-defined function expects three input variables (BX, BY, BZ) and
returns three output variables (PHI, THETA, BTOTAL). The ANSI C code for the
user-defined function is found in the file **btotal.c** and is
as follows:

void scf_btotal (double *mem, unsigned long *args, long *arg_len, unsigned long res, long res_len, double *ival) { register double *phi, *theta, *btotal, *bx, *by, *bz, *result; double t1; /*************************************************************************/ /* Functions are of the form RESULT = FUNCTION (VAR1, ..., VAR_N) */ /* Retrieve the memory location for the resultant variable (RESULT) and */ /* for the arguments to this function. The function takes 6 arguments, */ /* 3 input variables (bx, by, bz) and three output variables (phi, */ /* theta, btotal). */ /*************************************************************************/ result = mem + res; bx = mem + *(args + 0); by = mem + *(args + 1); bz = mem + *(args + 2); phi = mem + *(args + 3); theta = mem + *(args + 4); btotal = mem + *(args + 5); /*********************************************************************/ /* Compute total magnetic field and the azimuthal and polar angles. */ /*********************************************************************/ *phi = atan2 (*by,*bx); *phi = *phi / TORAD; *bx = *bx * *bx; *by = *by * *by; t1 = *by + *bx; *bx = sqrt (t1); *theta = atan2 (*bx,*bz); *theta = *theta / TORAD; *bz = *bz * *bz; t1 = t1 + *bz; *btotal = sqrt (t1); /********************************************************************/ /* Copy the value for one of the output variables to the resultant */ /* variable. */ /********************************************************************/ *result = *btotal; }

The value for one of the output variables is copied into the resultant variable
**result**. This function could have been written to expect five
arguments, three input and two output, and have the third output value returned
through the resultant variable. This function assumes that all variables are
scalar quantities; therefore, the arguments **arg_len** and
**res_len** are not needed.

The SCF definition of the derived data products is generated once and then stored under a unique mnemonic for future use. The same algorithmic sequence can then be used to generate the new data products many times. Use of the SCF to produce the new data is achieved with a set of routines that provide access to the primary data values. The SCF itself has no dependence on time; that is, the algorithm can be applied to data taken at any time. The IDFS data files that are accessed by the SCF, however, are time dependent. Therefore, the routines that service the SCF require that a start time / stop time range be specified. If all of the input data sources are not available for processing, the realization of the new data cannot take place.

The input variable data is filtered in two ways upon execution of the SCF algorithm. First, the input data is filtered based upon the data quality value for the IDFS data source selected. Depending upon how the input variable is defined, the data for the input variable will either always be included or will be excluded if the data quality value falls within the specified exclusion range. Secondly, the data is filtered based upon a cutoff range that is defined for each input variable such that only data that falls within the defined range will be included in the computation. The cutoff range that is specified should be expressed in terms of the physical unit being returned for the input variable. Upon execution of each step of the SCF algorithm, if all of the data for the acquisition period is excluded, the value for that input variable will be set to a predefined value that can be referenced through the mnemonic OUTSIDE_MIN. The result of any algorithm step that utilizes the input variable will also be set to this predefined value.

The amount of time that is processed for each iteration of the SCF algorithm is dependent upon the algorithm defined. If the algorithm is constructed using only scalar input values, the derived data is returned at the rate of the fastest varying scalar input variable. On the other hand, if the input variables are a mixture of vector and scalar quantities, the derived data is returned at the rate of the fastest varying vector quantity. If the sample rate for one of the input variables changes while the algorithm is being executed, the software will continue to acquire data for the current accumulation period and the accumulation period will be re-set for the next iteration of the algorithm if the sample rate change affects those data having the smallest time scale.

The data that are generated from the SCF algorithm is time-rectified; that is, data from the same time period is utilized for all input variables. For each iteration of the algorithm, data will either need to be retrieved or the previous sample will hold for the current time interval. When data is retrieved, the data for the input variable is converted into the unit specified in the SCF. If the request for data failed, the data is set to a predefined value to indicate that no data was found. These values will not be included in the accumulation for the time interval being processed since these values will always be outside the data cutoff boundaries.

Once the sample of data has been acquired, the percentage of the sample to be included in the current time interval is determined. Based upon where the start time of the sample lies with respect to the start time defined for the time interval being processed, the fraction is either calculated with respect to the top edge (start time) or bottom edge (end time) of the time interval. The fraction should normally be between 0.0 (0%) and 1.0 (100%); however, if a gap was encountered from data of the fastest sample, the fractional values for the other IDFS sets of data could be negative, indicating a time-stamp somewhere before the gap or in the gap region. A fractional value could also be negative if a gap was encountered and if the start time of the data is beyond the end time of the current time interval. If the fraction is negative, the data from that IDFS set are not included in the current iteration of the algorithm. For positive fractional values, the data are multiplied by that fraction and the resulting quantities are accumulated for the current time interval. The fractional value is also saved separately and used to re-normalize the final quantity before it is used in the evaluation of the algorithm.

After the data for all input variables has been acquired, the algorithm is evaluated. The data values for all the defined output variables and the time period at which the execution of the algorithm transpired is computed. The data for each output variable is returned in the unit that is defined in the SCF. The output variables returned may be of different dimensionalities; that is, the data being returned may be a combination of scalar, vector and/or matrices since the SCF software supports up to a 10-D quantity. The data can be plotted or be fed as input into another higher order computation. SCF generated data is consistent with the format of the primary data, thus ensuring its compatibility with all existing applications for the primary data. In this manner, sets of descriptions can be strung together to generate physical quantities important to science analysis.

Shown below is the result of an example SCF definition session. This SCF is an algorithm in which a Cartesian representation of the measured magnetic field is transformed into a spherical representation; B, theta, phi. The results are printed each time the algorithm is executed.

/******************** Example SCF **************************************** Compute Magnitude of B, Phi and Theta /* title /******************** Contact section **************************************** 4 /* number of contact lines Chris Gurgiolo /* contact Southwest Research Inst PO Drawer 28510 San Antonio, TX 78228-0510 /******************** Comment section **************************************** 1 /* number of comment lines Routine returns variables in the order B, Phi, Theta /******************** Input Variables **************************************** 3 /* number of input variables BX /* input variable name TSS TSS-1 TEMAG TEMAG TMMO /* IDFS source SENSOR 0 /* data type 1 /* number of tables for unit 0 /* table numbers to apply = /* table operators VALID_MIN VALID_MAX /* lower and upper cutoff values 256 256 /* d_qual exclusion range BY /* input variable name TSS TSS-1 TEMAG TEMAG TMMO /* IDFS source SENSOR 1 /* data type 1 /* number of tables for unit 0 /* table numbers to apply = /* table operators VALID_MIN VALID_MAX /* lower and upper cutoff values 256 256 /* d_qual exclusion range BZ /* input variable name TSS TSS-1 TEMAG TEMAG TMMO /* IDFS source SENSOR 2 /* data type 1 /* number of tables for unit 0 /* table numbers to apply = /* table operators VALID_MIN VALID_MAX /* lower and upper cutoff values 256 256 /* d_qual exclusion range /******************** Temp Variables **************************************** 12 /* number of temporary variables T1 /* temporary variable name 0 /* rank and length of dimension FMT1 /* temporary variable name 1 4 /* rank and length of dimension FMT2 /* temporary variable name 1 4 /* rank and length of dimension FMT3 /* temporary variable name 1 4 /* rank and length of dimension TFMT1 /* temporary variable name 1 4 /* rank and length of dimension TFMT2 /* temporary variable name 1 4 /* rank and length of dimension TFMT3 /* temporary variable name 1 4 /* rank and length of dimension TFMT4 /* temporary variable name 1 4 /* rank and length of dimension TFMT5 /* temporary variable name 1 4 /* rank and length of dimension TFMT6 /* temporary variable name 1 4 /* rank and length of dimension TFMT7 /* temporary variable name 1 4 /* rank and length of dimension TFMT8 /* temporary variable name 1 4 /* rank and length of dimension /******************** Output Variables **************************************** 3 /* number of output variables B /* output variable name 0 /* rank and length of dimension PHI /* output variable name 0 /* rank and length of dimension THETA /* output variable name 0 /* rank and length of dimension /******************** Equation Definition ************************************* TFMT1 = "\nSTART YEAR = %.0f" TFMT2 = " START DAY = %.0f" TFMT3 = "\nSTART TIME_MS = %.0f" TFMT4 = " START TIME_NS = %.0f" TFMT5 = "\nEND YEAR = %.0f" TFMT6 = " END DAY = %.0f" TFMT7 = "\nEND TIME_MS = %.0f" TFMT8 = " END TIME_NS = %.0f" FMT1 = "\nOutput Variable 0 = %.6e" /* output format strings FMT2 = "\nOutput Variable 1 = %.6e" /* output format strings FMT3 = "\nOutput Variable 2 = %.6e" /* output format strings PHI = ATAN2 (BY,BX) /* compute phi PHI = RAD_TO_DEG (PHI) /* convert phi to degrees BX = BX * BX BY = BY * BY T1 = BY + BX /* BX**2 + BY**2 BX = SQRT (T1) THETA = ATAN2 (BX,BZ) /* compute theta THETA = RAD_TO_DEG (THETA) /* convert theta to degrees BZ = BZ * BZ T1 = T1 + BZ /* BX**2 + BY**2 + BZ**2 B = SQRT (T1) /* compute B PRINT (TFMT1, SYEAR) PRINT (TFMT2, SDAY) PRINT (TFMT3, SMILLI) PRINT (TFMT4, SNANO) PRINT (TFMT5, EYEAR) PRINT (TFMT6, EDAY) PRINT (TFMT7, EMILLI) PRINT (TFMT8, ENANO) PRINT (FMT1, B) PRINT (FMT2, PHI) PRINT (FMT3, THETA)

The output generated by this example SCF, using the PRINT function, is given below. This output reflects what is printed for the first iteration of the algorithm. In order to generate the data products defined in any SCF, the routines that service the SCF must be accessed and these routines require that a start time / stop time range be specified.

START YEAR = 1992 START DAY = 217 START TIME_MS = 32339986 START TIME_NS = 0 END YEAR = 1992 END DAY = 217 END TIME_MS = 32340050 END TIME_NS = 0 Output Variable 0 = 3.170429e+04 Output Variable 1 = 4.335091e+01 Output Variable 2 = 1.175501e+02