Submissions can be sent to Carrie Gonzalez at email@example.com
Question: What is IDFS?
Answer: IDFS, short-hand for Instrument Data File Set, is a data storage format that is designed to be general enough to handle the majority of scientific data sets. These data sets include raw telemetry, processed data, simulation data and theoretical data. All data that are stored within the IDFS files are considered to be raw data within the IDFS architecture even though it may already have been highly processed, for example, the output of a computer simulation. For a more in-depth explanation, refer to the following URL:
Question: What is the layout of the data structure that defines IDFS data sets?
Answer: The layout of the data within IDFS revolves around the concept of a lineage. The lineage allows one to track back an individual IDFS data stream to its roots. At the top of the hierarchy definition is the Project. A project is an aggregation of experiments whose joint set of measurements is aimed at studying a common set of problems. The next level within the lineage is defined as the Mission. A mission, also referred to as satellite, refers to a specific satellite within a project. The third level within the lineage is defined as the Experiment. An experiment is a set of measurements linked together by usage of a common DPU and under a single PI. The fourth level within the lineage is defined as the Instrument. The instrument is defined as a set of measurements within an experiment that are performed by the same or similar hardware. The final level within the lineage is defined as the Virtual Instrument. A virtual instrument is defined as a grouping of measurements all of which are obtained from a single data generator (instrument) having identical bit lengths, return rates, and, in general, a common set of units. However, this last point of commonality in units is not rigorous, as is often the case in housekeeping or monitor virtual instruments. A single instrument can spawn multiple virtual instruments or conversely stated, a virtual instrument does not necessarily contain all of the measurements generated by the instrument. The purpose in defining this lineage was so that data file generation would be to a logical measurement set level, not just to an instrument TM level.
Question: What is meant by a sensor?
Answer: The sensor is the basic measurement identifier within IDFS. An IDFS sensor is defined as a primary data source returned by the virtual instrument in question. It is a single measurement type within a virtual instrument.
Question:How is a data stream defined within the IDFS paradigm?
Answer: A data stream within the IDFS paradigm is classified as coming from a scalar virtual instrument or a vector virtual instrument. A scalar instrument is an instrument whose sensors, or data products, represent a set of singular data values that are dependent only upon time and position, i.e. a housekeeping temperature monitor. A vector instrument is an instrument whose sensors, or data products, represent multivalue (1-D) data sets that have known functional dependencies other than time or position, i.e. a particle spectrometer which returns counts as a function of energy. The length of the 1-D array is dictated by the IDFS data source. IDFS can accommodate one of these functional dependencies through a variable called the scan variable
Question: Can the sensor have a functional dependency on multiple variables?
Answer: Yes, the IDFS paradigm can handle data that has a dependency on charge, mass, phi angle, theta angle and / or a scan variable (such as frequency or energy).
Question: What is an IDFS data set comprised of?
Answer: An IDFS data set is comprised of 3 files, a data file, a header file and a Virtual Instrument Description File (VIDF). The data file contains the most rapidly varying data, returned in raw, unprocessed binary form. The header file contains data which, for the most part, is slowly varying in time and need not be repeated every data record. Information such as the year and day of year time components, the sensors being returned, and various timing elements are examples of information contained in the header file. The VIDF file is a complete description of the virtual instrument. Its purpose is to provide a description of the measurements being stored in the IDFS data and header files and to interface with the IDFS data access software in order to extract the data from the IDFS files and to convert the data into physical units.
Question: What is the distinction between putting calibration data within the data record or placing the corrections within the VIDF file through table corrections?
Answer: The VIDF is meant to be easily updated and to contain all of the data which may be periodically updated due to either refinements in the instrument calibration of due to degradation within the instrument. Since data is typically stored as raw data, the conversion to physical units is done as the data is accessed; thus any changes to calibration factors can be made and instantaneously applied.
If calibration data varies at the rate of the sensor data, then this calibration data should be written in the data files and not in the VIDF. If too many VIDF files are written, the user will find that it will become a logistics nightmare trying to keep track of all the VIDF files. Thus, frequently varying calibration data should be included in the data file.
Question: How are angular measurements defined with the IDFS paradigm?
Answer: Within the IDFS formalism, any virtual instrument may be assumed to be embedded within an inertial spherical coordinate system. This is a standard spherical coordinate system with the azimuthal angle measured in the XY plane, 0 degrees being along the positive X axis and increasing with rotation into the positive Y axis. The polar angle is measured from the Z-axis, with 0 degrees being along the positive Z-axis. The azimuthal angle varies between 0 and 360 degrees and the polar angle varies between 0 and 180 degrees.
If the virtual instrument is in a non-rotating environment, then the axes of the inertial coordinate system can be defined to lie along any preferred direction. If, however, the instrument is within a rotating environment, then the inertial Z axis lies along the angular momentum vector (positive Z pointing in the positive vector direction) and the positive X axis points along a direction which can be identified within the telemetry stream. This direction is located by a device given the generic name of the 0 degree indicator. This could be anything from a sun sensor or star tracker on a rotating satellite to a variable potentiometer on a turntable that rotates a set of instruments.
In this coordinate system, all of the angular variation occurs in the azimuthal angles while polar angles are constant in time. The IDFS contains information that allows the azimuthal angular separation of the 0 degree indicator for the inertial X-axis for any time T within a data record. The details of this are laid out in the description of the sun_sen field within the write-up for the IDFS data record. By knowing the azimuthal offsets of the virtual instrument sensors from the 0 degree indicator and the polar angles of the sensors in the inertial coordinate system, the absolute sensor orientation can be obtained.
Question: What is meant by a "sweep" of data?
Answer: The collection of data elements that are returned by a vector instrument for a single data acquisition.A "sweep" can be thought of as a collection of data values that are grouped together for some common reason. The word "sweep" and "scan" are often used in IDFS documentation interchangeably and mean the same thing.
Question: Does an IDFS data set need to be created for each possible physical unit that the raw telemetry can be converted to?
Answer: No, a single IDFS data set can be defined which holds the raw telemetry values from which all physical units can be derived. IDFS has a built-in algorithmic capability that performs a real-time conversion of telemetry data into physical units as the data is accessed. This approach allows for the refinement of calibration factors and processing algorithms without having to reprocess the original data set and it avoids the storage of the same data in many different unit values.IDFS stores the procedures for unit conversion rather than the actual converted values.
Question: How are data items converted from raw telemetry into physical units?
Answer: Data is converted from raw telemetry into physical units by the application of tables using specified table operators in a specified order. The IDFS keeps the decompression and scaling algorithms for each data set within the VIDF as a set of ASCII values in ASCII tables. Thus, one can easily update the scaling parameters within the VIDF by changing the VIDF as necessary. This process insures that any subsequent access of the data will be correctly converted to the set of physical units requested without the need to reprocess the entire data set. For ease of use, the procedure for applying tables to achieve different units is stored in the PIDF file.
Question: Is there an order to the definition of tables within the VIDF file?
Answer: For tables that are dependent upon sensor, calibration, scan or data quality data, there is no predefined order that needs to be followed for definition within the VIDF file. However, all ASCII and mode-dependent tables must be defined after all other tables have been defined.
Question: Is there a limit to the number of IDFS data sets that can be accessed at a single time?
Answer: The maximum number of files that can be opened at one time is a system dependent value. For example, with SunOS, the maximum number of open file descriptors is set at 256. This value can be modified; however, system performance may be degraded as this value is increased. Remember that for each IDFS source requested, three files are opened (data, header, VIDF).
Question: What type of data is stored within an IDFS data set?
Answer: There are several data sources that may be obtained from an IDFS data set. In addition to the primary data (sensor data), there may be several secondary data sources attached to it which may be accessed in parallel which include:
A different category of data within the IDFS paradigm refers to variables that describe the state of the instrument. This data is returned to as mode data and the information of these states is generally needed in the processing or use of the primary data. Unlike the primary and calibration data that reside in the IDFS data record, mode data is found in the IDFS header record. Mode data is not sensor-specific; that is, mode data is associated with all of the sensors within a virtual instrument.
Question: What is meant by the resolution of the instrument?
Answer: The "resolution of the instrument" is meant to refer to the maximum temporal resolution allowed by the selected data set.
Question: What is the difference between the time element "data_accum", "data_lat", and "swp_reset" as found within an IDFS header record?
Answer: Data accumulation (data_accum) is defined as the time integral over which the acquisition of a single datum occurs.Data latency (data_lat) is defined as the dead time between successive data acquisitions. Taken together, the summation of the two values defines the total time between successive accumulations. Sweep reset (swp_reset) is pertinent only to vector instruments and is defined as the dead time between successive columns of data, which is equivalent to any data latency which exists in going from the last step in the sweep back to the initial step in the next sweep.
Question: What is meant by real-time vs. playback production?
Answer: Real-time production refers to the situation where the data stream is being converted into IDFS data sets as the data is being captured (telemetered down to the ground). Playback production refers to the situation where the data stream is converted into IDFS data sets after the data has been captured and assembled into some pre-defined format.Another way to say this is that playback production is the production of IDFS files from an input data file which already exists whereas real-time production is the production of IDFS files from an input data file which is simultaneously being created.
Question: What is meant by a sensor set?
Answer: A sensor set is defined as a group of 2-D matrix pairs, where the first matrix contains the primary sensor data and the second matrix contains the calibration data, if applicable. Although the sensor set is thought of as a 2-D matrix, the data is actually laid down as a sequential collection of linear 1-D arrays, column by column, with the primary data being laid down first, followed by any calibration data. In essence, a sensor set defines all of the data returned from a group of sensors over a specified time period
The primary data is a 2-D matrix in which each column represents the data from a single sensor (vector or scalar) and the rows represent the number of measurements being returned by each sensor. For vector sensors, the rows of the matrix represent the individual elements within the sweep of data. For scalar sensors, the data can be laid down such that only the one value is written per record or the data record can be "packed" such that consecutive measurements are written together, with each acquisition being represented by a row of data within the 2-D matrix.
The second 2-D matrix within a sensor set contains the calibration data defined for the instrument. Similar to the primary data matrix, where each column represents the data from a single sensor, each column of the secondary matrix contains the calibration data sets for a single sensor. The number of elements in the column is determined by the number of calibration sets defined and the number of elements per calibration set.
Question: Does a new header record have to be written for every data record generated?
Answer: No, each sensor set within a data record contains an offset that identifies the header record that is to be utilized to access the data contained within the data matrix. Sensor sets that return the same sensors and return data for periods of time in which the virtual instrument was in the same state can have identical offsets into the header file. That is, a given header record can be used by multiple sensor sets as its description of both the instrument state and to give the layout of its data matrices. Typically, the creation of a new header record is triggered by one of the following events:
Question: What is the significance of having multiple VIDF files defined for an IDFS data set?
Answer: If data within the VIDF file changes with time, for example calibration coefficients are modified, additional VIDF files can be defined. Each VIDF file designates the time period over which the file is applicable to the IDFS data set. By having multiple VIDF files, each designating a unique time period over which the file is valid, a history or log of data definitions can be maintained. Caution – some parameters defined in the VIDF file are not changeable since changing these definitions are paramount to defining a new virtual data set.
Question: What cases drive the necessity to split your data set into separate virtual instruments as opposed to making multiple VIDFS for the same virtual instrument?
Answer: If the data file, header file and VIDF file all terminate at the same time boundaries, there is no need to split your data set into separate virtual instruments. All changes can be handled through the creation of multiple VIDF files to define the state of the instrument at each of the designated time segments.
If the cross over to a new VIDF file will happen in the middle of a data file, then you must examine the information that is changing to determine if multiple VIDFS can be used to describe the new state of the instrument. There are some defined parameters within the VIDF that can not change from VIDF to VIDF, even though the data set may require that change. Information that can change between multiple VIDF files include:
|VIDF Parameters that cannot be modified if multiple VIDF files pertain to a single data file|
Question: If my instrument is command-able such that a subset of data values is returned, what should I set the SWP_LEN value to in the VIDF file?
Answer: For a scalar instrument, SWP_LEN should always be set to one. However, for a vector instrument, the SWP_LEN value represents the maximum number of elements that can be defined for any sensor. For example, if an instrument was always returning 32 elements which are indexed from 0 through 63, the SWP_LEN value should be set to 64, not 32, since the largest possible index value is 63, which represents the largest of 64 possible elements.
Question: What is the connection between the SEN field in the VIDF file and the N_SEN field in the header record?
Answer: The SEN field in the VIDF file is defined as the maximum number of sensors which will be defined in the IDFS data set. However, not all of the sensors that are defined may be returned. Perhaps a subset of the sensors is currently being returned, or there is a rotation amongst the sensors that are being returned. To reflect this situation, the N_SEN field in the header record is meant to serve as the indication as to the actual number of sensors being returned and the SENSOR_INDEX field in the header record specifies the sensor numbers for those being returned.
Question: What coordinate system should the magnetometer data be in with respect to particle data when pitch angle computations are to be derived?
Answer: In order to use orthogonal measurements of a magnetometer along with particle data, they must be in the same coordinate system. Therefore, for pitch angle computations, the magnetic field is assumed to be given in the same coordinate system as the unit normal and makes the pitch angle computations accordingly.
Question: What is the definition used for determining Pitch Angle in IDFS?
Answer: When the pitch angle is defined, definitions of sensor direction and magnetometer are given. The magnetometer determines the magnetic field components, and thus, the magnetic field direction. The vector dot product is formed from the magnetic field direction and the negative of the sensor look direction (that is the sensor look direction is the outward pointing normal to the sensor). This forms the angle between the particle velocity vector and the direction of the magnetic field, thus the normal definition of the pitch angle.
Question: What time granularity can IDFS support?
Answer: The IDFS paradigm can easily accommodate any data set with a time granularity greater than or equal to one millisecond. There is no problem supporting granularity of years. With some thought and planning in the way data is packed within a data record, data granularity can be expanded to handle nanosecond samples, but no finer.
IDFS Data Access Questions
Question: What is meant by a "data key"?
Answer: In order to uniquely identify data sets that exist in IDFS format, each branch of the IDFS lineage is assigned a unique number. The combination of these values for each of the 5 branches in the lineage results in a unique value called a data key, by which the data set can be identified.The lineage is Project, Mission, Experiment, Instrument, Virtual Instrument, as in NOAA, NOAA-12, SEM, MEPED, MPSE for example.
Question: How much data is returned by a single call to the IDFS data access routine read_drec()?
Answer: The IDFS data access routine read_drec() returns not only the data for the requested data parameter, but also most of the pertinent ancillary data concerning the state of the instrument including time, instrument status (mode) values, applicable correction and calibration data, scan values, azimuthal and pitch angle values where applicable.
Question: What is the purpose of the "version" parameter that is passed to most of the IDFS data access routines?
Answer: The purpose of the version parameter is to allow multiple file openings of the same IDFS data set within a single program or session. Once an IDFS data set has been opened, a set of file descriptors is assigned to the data set in question. The normal mode of operation for IDFS data processing is to process the data one record at a time, in time sequential order. If there is a need to search the same IDFS data set in multiple accesses, the use of the version parameter must come into play. For each unique combination of data key, extension and version parameters, a set of file descriptors is assigned and all file manipulations performed by the IDFS data access software will be independent of other file descriptors that may be accessing the same data set. As a general rule, the retrieval of multiple data parameters from a single IDFS data source does not constitute the need for multiple version numbers; a single version number will suffice. In addition, the simultaneous retrieval of data parameters from many different IDFS data sets does not warrant the need for different version numbers.
Question: How can I retrieve multiple data parameters from the same sensor set within a single IDFS data record?
Answer: Data from an IDFS data set is acquired through a call to the IDFS data access routine called read_drec(). One of the parameters to this routine, referred to as the "fwd" flag, controls when the data pointer is advanced to the next set of data values (sensor set). By keeping the pointer at the same set of data values, repeated calls for different sensors can be made, ensuring that all of the data returned are contained within the same sensor set. As a general rule, the fwd flag is set to advance when the last sensor from the IDFS data set is being requested. For example, if sensors 0 through 5 are to be retrieved, the fwd flag is set to advance when data for the last sensor, sensor 5, is being retrieved.
Question: What is the difference between the status codes LOS_STATUS and NEXT_FILE_STATUS as returned by the read_drec() module?
Answer: The distinction for these two codes comes into play when dealing with real-time acquisition of the data to be placed into IDFS format. The code LOS_STATUS is interpreted to mean that the LOS (loss of signal) code has been received and thus, no more data is being telemetered down until the AOS (acquisition of signal) code is again received. The NEXT_FILE_STATUS code is used to represent the situation where the IDFS data/header files had to be closed prior to the end of the current acquisition period. A new set of IDFS data/header files was created to continue with the transformation of the data stream into an IDFS data set.
Question: What type of values are returned for the azimuthal angle information?
Answer: There are two azimuthal angle values returned in conjunction with the requested data item. These two values are referred to as the start azimuthal angle and the stop azimuthal angle. The start azimuthal angle is given for the selected sensor and refers to the azimuth location at the beginning of the accumulation period for the sensor. The stop azimuthal angle is given for the selected sensor and refers to the azimuth location at the end of the accumulation period for the sensor.
The start azimuthal angle values are always returned as values between 0 and 360 degrees. However, the stop azimuthal angle values could be negative (if the instrument is spinning in a negative direction) or could be greater than 360 degrees. The stop azimuthal angle values are computed by adding the degrees covered by the accumulation time of each sample to the start azimuthal angle values.
Question: What is the range of possible values that can be accommodated within the IDFS paradigm?
Answer: As new data sets have been converted into the IDFS format, the numeric constants that represent the range of data values that can be accommodated have changed. As such, it was decided to refer to mnemonics instead of actual values when talking about valid ranges of data values. These two mnemonics, VALID_MIN and VALID_MAX, represent the smallest and largest data value that is recognized by the IDFS data access routines, respectively. As of February 1, 2000, VALID_MIN is set to –3.0e38 and VALID_MAX is set to 3.0e38.
Question: What is an SCF?
Answer: The SCF (Science Computation Formulation) is a mechanism that provides for the creation of new data products from an existing primary data set (IDFS) through algorithmic manipulation. In some cases, these derived products may be dependent upon values returned from a single instrument; in other cases, the derived products are dependent upon values taken from many instruments. For a more in-depth explanation of the SCF system, the user is referred to the paper entitled "The Science Computation Formulation System".
Question: Can an SCF be utilized during real-time and playback scenarios?
Answer: The SCF software does not support real-time processing of data. In the real-time scenario, the header and data files are incomplete and it is possible to attempt to read from either file prior to the data being received. Therefore, the values for the input variables may not be attainable when the algorithm is being executed and thus, the algorithm cannot be executed correctly. In the playback scenario, the data is always available provided data was collected at the time period being processed.
Question: Is there a maximum length imposed on the name of the SCF file?
Answer: The name of the SCF file, which must include the full path, must be less than 512 characters in length.
Question: Why does the user need to specify a time period when accessing data from an SCF file?
Answer: The SCF definition (algorithm) is time-independent; that is, the algorithm can be applied to data taken at any time. However, the input sources that drive the computations are IDFS data products, which are dependent on the time range specified. Therefore, in order to access the IDFS data set, a time period must be provided and the computations are performed using the specified input sources during the requested time interval.
Question: What type of data products are returned by an SCF?
Answer: The data quantities that are returned by an SCF are defined by the creator of the SCF. The data that is returned does not have to be homogeneous; that is, the data can be a mixture of scalar and multi-dimensional output quantities such as matrices and tensors.
Question: Can I create an SCF?
Answer: Yes, anyone can create an SCF. The mechanics of creating an SCF are straightforward; however, creation of an SCF involves intricate knowledge of the instrument. It is the instrument operational issues that make decisions on how to generate the SCF so that it produces what the creator desires a difficult adventure. For these reasons, the creation of an SCF is not a task that is undertaken lightly. It is easy to generate data using the SCF; however, it is hard for that data to represent what you want it to represent. When the user is ready to create an SCF, the SCF Editor application can be utilized.
Question: I have an SCF that generates a data product. I want to use this data product in another SCF. How do I connect the two SCFs?
Answer: Currently, it is not possible to "link" separate SCFs together. The SCF paradigm requires that an SCF be self-contained; that is, all relevant information must be included within the SCF defined. Therefore, in order to access another SCF, you must merge the two SCFs into one single SCF.
Question: I have an SCF that I have been using to adjust data to correct for instrumental effects beyond what the IDFS can achieve. I would like to share this SCF with everyone else who can access my data. How do I do this?
Answer: In order to make SCF generated products visible as data products that can be returned by the data set in question, the PIDF file for the virtual instrument must be amended to define these additional data products. If a user is simply experimenting with SCF generation, a "private" copy of the PIDF file can be placed into the directory ~/SCF. Here, the user is free to manipulate and change the PIDF file without interfering with other users that are accessing the same IDFS data set. Once the user is ready to make publicly available the newly defined SCF data products, the user must get with the database administrator for the IDFS data set to make sure that the modified PIDF file is copied to the area from which the data is promoted and to ensure that the modified PIDF file is re-archived. Currently, the means by which SCF files are shared is by sharing "private" copies. That is, the author of the SCF must provide a copy of the SCF file to another user (e.g., e-mail, ftp) and the user must place the SCF file in their ~/SCF directory. Sometime in the future, a mechanism will be in place for the promotion of SCF files that can be made publicly available.
Question: I have a magnetometer whose axes are not quite orthogonal. I have defined a calibration matrix to correct for off axis terms, but it requires knowledge of the other sensors to correct the sensor that I am interested in. How do I configure the VIDF to automatically perform this operation?
Answer: You can not use the VIDF for this operation. The VIDF maintains that each sensor is a separate entity; that is, you can not use data from other sensors to correct the sensor data you want. However you can do this with the SCF. You can structure the SCF such that it reads all of the sensor data at once, computes new "corrected" values, and then outputs these newly generated values.
Question: Must all the input variables to an SCF come from the same IDFS data set?
Answer: Within the SCF, the creator defines the algorithm used to generate new data products and the inputs of data to the SCF are not restricted to data parameters from within the same virtual instrument. The creator of the SCF is free to utilize data parameters from any number of different virtual instruments. The only restriction placed upon the creator of the SCF is that all input variables must come from an IDFS data source. Any other values, such as constants, must be defined as temporary variables within the SCF.
Question: How do I configure the definitions of a VIDF in order to combine different sensors that are within a given virtual instrument? For example, I have two sensors, one sensor is the high byte of a value and the other sensor is the lower byte of a value. I would like to combine these two sensors together for my interpretation.
Answer: If the IDFS data set has already been produced, you can not do this within a VIDF; it requires the use of an SCF. Sensors defined within the VIDF are independent quantities and they are manipulated separately. You can use the SCF to create a new data product, where the algorithm basically reads the two sensors, rearranges the bit and byte structures as necessary, and outputs the combined value. With this scheme, the creator of the SCF has to know enough about the data in order to combine the sensors correctly to produce the desired quantity.
However, if the creator knows ahead of time that this combination of values will be needed, they can construct the IDFS data set such that all three values are possible. For example, if the two sensors are 8-bit quantities, the IDFS can be defined such that all of its sensors are defined to be 16-bit values. Thus, the two 8-bit sensors are merged together and represented as one 16-bit value. The creator can make use of the IDFS shift and mask operators in order to retrieve the two individual 8-bit sensor values, while at the same time, return the 16-bit composite value.
Question: I have a program that has been developed over many years and is quite complicated. While looking at the available parameters from VIDFs, I can see that what I require as input values for my program are stored in IDFS. How do I access this data with my program? (The SCF is way too simple to describe my code)
Answer: There are three ways to access IDFS data. The first two methods involve extracting data from the IDFS world. These two possibilities are: (1) use the exportIDFS program to generate files of the data you require in the prescribed format and (2) use the SCF PRINT function to create ASCII output in the format you desire. Under both of these methods, data has been removed from inside the IDFS paradigm; therefore, you will need to write an access routine to read the data from the newly created files. These two methods will give you more control over the input data. If you wish to translate the output of your program back into the IDFS world, you will need to create a new IDFS data set for that output, with all that is required.
The third method takes your program into the IDFS world. The bridge
here is the SCF. You can use the SCF to collect the data your program needs
and pass it through a function (which you create) to your program. This
user-defined SCF function acts as a translator between your code and the
SCF. A user-defined SCF function is not very easy to create since it has
to be unique for each program, but once generated, it can be used in multiple
SCFs and used multiple times within an SCF. The advantage here is that
output from your program can be passed back to the SCF and into the IDFS
world. Once in the IDFS world, you can compare the output data to any IDFS
data set directly. The user is referred to "The
Science Computation Formulation System" for the description
of creating a user-defined SCF function.