Mapping YAML File

The YAML mapping tells the BUFR component fields to read from the BUFR file, and how to encode those fields into an IODA ObsGroup object. To do that it defines 2 sections: bufr and ioda. The content of these sections (described bellow) can be thought of as descriptions for what will be read and how it will be encoded.

Note

Please see the test/testinput directory for examples.

BUFR Description

This section describes the BUFR parameters that will be read form the BUFR file, and allows for some basic operations to read the data in a form that is useful. For example, it has the ability to group variables in order to “unroll” sections of the BUFR file data. You can split BUFR data into sub groups (categories) based on the value of a BUFR field (for example: you can categorize data based on satellite IDs). You can also filter data based on the value of a BUFR field.

Here is an example:

bufr:
  exports:
    group_by_variable: longitude  # Optional
    subsets:
      - NC004001
      - NC004002
      - NC004003
    variables:
      timestamp:
        datetime:
          year: "*/YEAR"
          month: "*/MNTH"
          day: "*/DAYS"
          hour: "*/HOUR"
          minute: "*/MINU"
          second: "*/SECO"  # default assumed zero if skipped or found as missing
          hoursFromUtc: 0  # Optional

      # Or, sometimes BUFR data use an offset time related to model analysis/cycle.
      timestamp:
        timeoffset:
          timeOffset: "*/PRSLEVEL/DRFTINFO/HRDR"
          transforms:
            - scale: 3600
          referenceTime: "2020-11-01T12:00:00Z"

      satellite_id:
        query: "*/SAID"
        type: int64
      longitude:
        query: "*/CLON"
        transforms:
          - offset: -180
      latitude:
        query: "*/CLAT"
      channels:
        query: "[*/BRITCSTC/CHNM, */BRIT/CHNM]"
      radiance:
        query: "[*/BRITCSTC/TMBR, */BRIT/TMBR]"

    splits:
      satId:
        category:
          variable: satellite_id
          map:
            _3: sat_1  # can't use integers as keys
            _5: sat_2
            _8: sat_3

    filters:
      - bounding:
          variable: longitude
          upperBound: -68  # optional
          lowerBound: -86.3  # optional

The bufr section contains a section called exports which defines the data to read from the BUFR. It has the following sub-sections:

  • group_by_variable: (optional) String value that defines the name of the variable to group observations by. If this field is missing then observations will not be re-grouped.

  • subsets: (optional) List of subsets that you want to process. If the field is not present then all subsets will be processed in accordance with the query definitions.

  • variables: List of variables to read as key value pairs.

    • keys are arbitrary strings (anything you want). They can be referenced in the ioda section.

    • values (One of these types):

      • query: Query string which is used to get the data from the BUFR file. (optional) Can apply a list of tranforms to the numeric (not string) data. Possible transforms are offset and scale. You can also manually override the type by specifying the type as int, int64, float, or double.

      • datetime: Associate key with data for mnemonics for year, month, day, hour, minute, (optional) second, and (optional) hoursFromUtc (must be an integer). Internally, the value stored is number of seconds elapsed since a reference epoch, currently set to 1970-01-01T00:00:00Z.

      • timeoffset: Associate key with data for mnemonic for timeOffset, that should result in seconds relative to an ISO-8601 string of date and time (e.g., 2020-11-01T11:42:56Z). If the timeOffset mnemonic is a floating-point value in hours, then simply use transforms and scale by 3600 seconds. Internally, the value stored is number of seconds elapsed since a reference epoch, currently set to 1970-01-01T00:00:00Z.

  • (optional) splits List of key value pair (splits) that define how to split the data into subsets of data. Any number of splits can be applied. Possible categories within each split will be combined to form sets which describe all unique combinations of those categories. For example the splits with categories (“a”, “b”) and (“x”, “y”) will be combined into four split categories (“a”, “x”), (“a”, “y”), (“b”, “x”), (“b”, “y”).

    • keys are arbitrary strings (anything you want). They can be referenced in the ioda section.

    • values Type of split to apply (currently supports category)

      • category Splits data based on values assocatied with a BUFR mnemonic. Constists of:

        • variable The variable from the variables section to split on.

        • (optional) map Associates integer values in BUFR mnemonic data to a string. Please not that integer keys must be prepended with an _ (ex: _2). Rows where where the mnemonic value is not defined in the map will be rejected (won’t appear in output).

  • (optional) filters List of filters to apply to the data before exporting. Filters exclude data which does not meet their requirements. The following filters are supported:

    • bounding

      • variable The variable from the variables section to filter on.

      • (optional) upperBound The highest possible value to accept

      • (optional) lowerBound The lowest possible value to accept

Note

Either upperBound, lowerBound, or both must be present.

IODA Description

The ioda section defines the ObsGroup objects that will be created. Here is an example:

ioda:
  dimensions:
    - name: nchans
      paths:
        - "*/BRIT"
        - "*/BRITCSTC"
      source: variables/channels

  variables:
    - name: "MetaData/dateTime"
      source: "variables/timestamp"
      longName: "dateTime"
      units: "seconds since 1970-01-01T00:00:00Z"

    - name: "MetaData/latitude"
      source: "variables/latitude"
      longName: "Latitude"
      units: "degrees_north"
      range: [-90, 90]

    - name: "MetaData/longitude"
      source: "variables/longitude"
      longName: "Longitude"
      units: "degrees_east"
      range: [-180, 180]

    - name: "ObsValue/radiance"
      coordinates: "longitude latitude nchans"
      source: "variables/radiance"
      longName: "Radiance"
      units: "K"
      range: [120, 500]
      chunks: [1000, 15]
      compressionLevel: 4
  • dimensions used to define dimension information in variables

    • name arbitrary name for the dimension

    • paths list of subqueries for that dimension (different paths for different BUFR subsets only) or path Single subquery for that dimension ex: */BRITCSTC

    • source (optional) The exported data that acts as the source field for this dimension. The data dimension values (labels) will reflect this field. The source is validated to make sure it makes sense for the dimension and that it is made up of repeated values for each occurrence of the sequence. The source field must be inside the dimension and be 1:1 with it.

  • variables List of output variable objects to create.

    • name standardized pathname group/var_name.

      • group group name to which this variable belongs (example: MetaData or ObsVal).

      • var_name name for the variable

    • source reference to exported BUFR data defined in bufr section ex: variables/radiance

    • coordinates (optional)

    • longName any arbitrary string.

    • units string representing units (arbitrary but following udunits).

    • (optional) range Possible range of values (list of 2 ints).

    • (optional) chunks Size of chunked data elements ex: [1000, 1000].

    • (optional) compressionLevel GZip compression level (0-9).

Warning

  • MetaData/dateTime units must be “seconds since 1970-01-01T00:00:00Z”