Mapping YAML File¶
The mapping YAML describes which BUFR fields to extract and how they will be
written to an output file. It is divided into two top level sections:
bufr and encoder. The bufr section controls what data are read
from the file while encoder describes how these data are encoded. Complete
examples are available in test/testinput.
BUFR section¶
The bufr section controls how data are retrieved from the BUFR file. It
supports grouping repeated sequences, splitting the dataset into categories and
filtering rows before encoding. A trimmed example looks like:
bufr:
group_by_variable: longitude # Optional
subsets:
- NC004001
- NC004002
- NC004003
variables:
timestamp:
datetime:
year: "*/YEAR"
month: "*/MNTH"
day: "*/DAYS"
hour: "*/HOUR"
minute: "*/MINU"
second: "*/SECO" # default assumed zero if skipped or found as missing
hoursFromUtc: 0 # Optional
# Or, sometimes BUFR data use an offset time related to model analysis/cycle.
timestamp:
timeoffset:
timeOffset: "*/PRSLEVEL/DRFTINFO/HRDR"
transforms:
- scale: 3600
referenceTime: "2020-11-01T12:00:00Z"
satellite_id:
query: "*/SAID"
type: int64
longitude:
query: "*/CLON"
transforms:
- wrap: [ -180.0, 180.0 ]
latitude:
query: "*/CLAT"
channels:
query: "[*/BRITCSTC/CHNM, */BRIT/CHNM]"
radiance:
query: "[*/BRITCSTC/TMBR, */BRIT/TMBR]"
splits:
satId:
category:
variable: satellite_id
map:
_3: sat_1 # can't use integers as keys
_5: sat_2
_8: sat_3
filters:
- bounding:
variable: longitude
upperBound: -68 # optional
lowerBound: -86.3 # optional
bufr keys¶
group_by_variable(optional)Name of a variable used to group observations when expanding repeated sequences.
subsets(optional)List of subset names to read. When omitted all subsets matching the queries are processed.
variablesMapping of arbitrary names to variable descriptions. These names are later referenced by the
encodersection. A variable description can be one of the following:query– direct query into the BUFR tree. Numeric results may applyoffset,scaleorwraptransforms and the type may be forced toint,int64,floatordouble.datetime– combine mnemonics for year, month, day, hour and minute (and optionally second and hoursFromUtc) into an epoch time stored as seconds since1970-01-01T00:00:00Z.timeoffset– likedatetimebut the value is relative to areferenceTime. Transforms may be used to convert units.specialised forms such as
sensorScanAngleorremappedBrightnessTemperatureused in some satellite mappings.
splits(optional)Splits divide the dataset into categories. The
categorysplit type partitions the data by the value of a variable and can map integer values to string names.filters(optional)Filters remove rows prior to encoding. The
boundingfilter keeps rows whose values fall betweenlowerBoundandupperBound(at least one of these bounds must be supplied).
Encoder section¶
The encoder section describes how the exported data should be written. A
shortened example is shown below:
encoder:
dimensions:
- name: nchans
paths:
- "*/BRIT"
- "*/BRITCSTC"
source: variables/channels # optional
labels: "1-5, 8, 10-20" # optional
globals:
- name: "platformCommonName"
type: string
value: "ATMS"
variables:
- name: "MetaData/dateTime"
source: "variables/timestamp"
longName: "dateTime"
units: "seconds since 1970-01-01T00:00:00Z"
- name: "MetaData/latitude"
source: "variables/latitude"
longName: "Latitude"
units: "degrees_north"
range: [-90, 90]
- name: "MetaData/longitude"
source: "variables/longitude"
longName: "Longitude"
units: "degrees_east"
range: [-180, 180]
- name: "ObsValue/radiance"
coordinates: "longitude latitude nchans"
source: "variables/radiance"
longName: "Radiance"
units: "K"
range: [120, 500]
chunks: [1000, 15]
compressionLevel: 4
Encoder keys¶
dimensions(optional)List of named dimensions. Each entry contains:
name– dimension name.pathsorpath– queries used to determine the dimension.source(optional) – exported data used to label the dimension.labels(optional) – manual list of labels. Use eithersourceorlabels.
variablesList of variables to create in the output file. Each item includes:
name– pathgroup/variable.source– reference to a variable defined in thebufrsection.coordinates(optional) – names of coordinate variables.longName(optional) – descriptive name.units(optional) – units string.range(optional) – valid range[min, max].chunks(optional) – chunk sizes for chunked outputs.compressionLevel(optional) – gzip level0-9.
globals(optional)Global attributes to attach to the output file. Each definition provides
name,type(string,int,float,intVectororfloatVector) andvalue.
Warning
MetaData/dateTime must use units seconds since 1970-01-01T00:00:00Z.