5. GDS Data Product File Structure and attributes#
5.1. Overview of the GDS-2.2r0 netCDF File Format#
GDS-2.2r0 data files preferentially use the netCDF-4 Classic format. A major advantage to the use of NetCDF-4 format products is that it offers internal compression so that reading a given variable or some attributes does not require to explicitly decompress the entire file.
These GDS-2.2r0 formatted data sets must comply with the Climate and Forecast (CF) Conventions, v1.7 or later [1] because these conventions provide a practical standard for storing oceanographic data in a robust, easily-preserved for the long-term, and interoperable manner. The CF-compliant netCDF data format is flexible, self-describing, and has been adopted as a de facto standard for many operational and scientific oceanography systems. Both netCDF and CF are actively maintained including significant discussions and inputs from the oceanographic community. The CF convention generalizes and extends the Cooperative Ocean/Atmosphere Research Data Service (COARDS) Convention [2] but relaxes the COARDS constraints on dimension order and specifies methods for reducing the size of datasets. The purpose of the CF Conventions is to require conforming datasets to contain sufficient metadata so that they are self-describing, in the sense that each variable in the file has an associated description of what it represents, physical units if appropriate, and that each value can be located in space (relative to earth-based coordinates) and time. In addition to the CF Conventions, GDS-2.2r0 formatted files follow some of the recommendations of the Unidata Attribute Convention for Dataset Discovery (ACDD) [3].
In the context of netCDF, a variable refers to data stored in the file as a
vector or as a multidimensional array. Each variable in a GHRSST netCDF file
consists of a 2-dimensional [i x j], 3-dimensional [i x j x k], or
4-dimensional [i x j x k x l] array of data. The dimensions of each
variable must be explicitly declared in the dimension section.
Within the netCDF file, global attributes are used to hold information that applies to the whole file, such as the data set title. Each individual variable must also have its own attributes, referred to as variable attributes. These variable attributes define, for example, an offset, scale factor, units, a descriptive version of the variable name, and a fill value, which is used to indicate array elements that do not contain valid data. Where applicable, SI units should be used and described by a character string, which is compatible with the Unidata UDUNITS-2 package [4].
All GHRSST GDS-2.2r0 files conform to this structure and share a common set of netCDF global attributes. These global attributes include those required by the CF Convention plus additional ones required by the GDS-2.2r0. The required set of global attributes is described in Section 5.2 and entities within the GHRSST R/GTS framework are free to add their own, as long as they do not contradict the GDS-2.2r0 and CF requirements.
Following the CF convention, each variable also has a set of variable attributes. The required variable attributes are described in Section 5.3. In a few cases, some of these variable attributes may not be relevant for certain variables or additional variable attributes may be required. In those cases, the variable descriptions in each of the L2P, L3, L4 product specifications will identify the differences and specify requirements for each product. As with the global attributes, entities within the GHRSST R/GTS framework are free to add their own variable attributes, as long as they do not contradict the GDS-2.2r0 and CF requirements.
While the exact volumes can vary, an average L2P file will use about 33 bytes per pixel, an L3 file 28 bytes per pixel, and an L4 file about 8 bytes per pixel. The data type encodings for each variable are fixed except for the experimental fields, which are flexible and can chosen by the GHRSST producer.
5.2. Global Attributes#
Table 5.1 below summarizes the global attributes that are mandatory for every GDS-2.2r0 netCDF data file. More details on the CF-mandated attributes (as indicated in the Source column) are available at: http://cfconventions.org/cf-conventions/cf-conventions.html#attribute-appendix and information on the ACDD recommendations is available at https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3.
Global Attribute Name |
Format |
Description |
Example |
Source |
|---|---|---|---|---|
|
string |
A comma-separated list of the conventions that are followed by the dataset. For files that follow this version of ACDD, include the string ‘ACDD-1.3’.(This attribute is defined in NUG 1.7.) |
|
ACDD 1.3 |
|
string |
A short phrase or sentence describing the dataset. In many discovery systems, the title will be displayed in the results list from a search, and therefore should be human readable and reasonable to display in a list of such names. This attribute is recommended by the NetCDF Users Guide (NUG) and the CF conventions. |
|
ACDD 1.1 |
|
string |
A paragraph describing the dataset, analogous to an abstract for a paper. |
|
ACDD 1.1 |
|
string |
Published or web-based references that describe the data or methods used to produce it. Recommend URIs (such as a URL or DOI) for papers or other references. This attribute is defined in the CF conventions. |
|
ACDD 1.3 |
|
string |
The name of the institution principally responsible for originating this data. This attribute is recommended by the CF convention. |
|
ACDD 1.1, ACDD 1.3 |
|
string |
Provides an audit trail for modifications to the original data. This attribute is also in the NetCDF Users Guide: ‘This is a character array with a line for each invocation of a program that has modified the dataset. Well-behaved generic netCDF applications should append a line containing: date, time of day, user name, program name and command arguments.’ To include a more complete description you can append a reference to an ISO Lineage entity; see NOAA EDM ISO Lineage guidance. |
|
ACDD 1.1 |
|
string |
Miscellaneous information about the data, not captured elsewhere. This attribute is defined in the CF Conventions. |
|
ACDD 1.1 |
|
string |
Provide the URL to a standard or specific license, enter “Freely Distributed” or “None”, or describe any restrictions to data access and distribution in free text. GHRSST data sets should be freely and openly available to comply with the R/GTS framework, with no restrictions. However, if a user should submit a simple registration via a web form, for example, the URL could be given here. |
|
ACDD 1.1 |
|
string |
An identifier for the data set, provided by and unique within its naming authority. The combination of the |
|
ACDD 1.1 |
|
string |
The organization that provides the initial id (see above) for the dataset. The naming authority should be uniquely specified by this attribute. We recommend using reverse-DNS naming for the naming authority; URIs are also acceptable. Fixed as “org.ghrsst” following ACDD convention. |
|
ACDD 1.1 |
|
string |
Version identifier of the data file or product as assigned by the data creator. For example, a new algorithm or methodology could result in a new product_version. It may be different than the file version used in the file naming convention (Section 4). |
|
ACDD 1.3 |
|
string |
A uuid (Universal Unique Identifier) is a 128-bit number used to uniquely identify some object or entity on the Internet. Depending on the specific mechanisms used, a uuid is either guaranteed to be different or is, at least, extremely likely to be different from any other uuid generated until 3400 A.D. See http://en.wikipedia.org/wiki/Universally_Unique_Identifier for more information and tools. |
|
GDS |
|
string |
GDS version used to create this data file. For example, “2.2”. |
GDS |
|
|
string |
Version of netCDF libraries used to create this file. For example, “4.1.1” |
GDS |
|
|
string |
The date on which this version of the data was created. (Modification of values implies a new version, hence this would be assigned the date of the most recent values modification.) Metadata changes are not considered when assigning the date_created. The ISO 8601:2004 extended date format is recommended. |
|
ACDD 1.1 |
|
string |
The date on which the data was last modified. Note that this applies just to the data, not the metadata. The ISO 8601:2004 extended date format is recommended. |
|
ACDD 1.1 |
|
string |
The date on which this data (including all modifications) was formally issued (i.e., made available to a wider audience). Note that these apply just to the data, not the metadata. The ISO 8601:2004 extended date format is recommended. |
|
ACDD 1.1 |
|
string |
The date on which the metadata was last modified. The ISO 8601:2004 extended date format is recommended. |
|
ACDD 1.3 |
|
integer |
A code value describing the quality of the data: |
GDS |
|
|
string |
A string describing the approximate resolution of the product. For example, “1.1km at nadir” |
GDS |
|
|
string |
Describes the time of the first data point in the data set. Use the ISO 8601:2004 date format, of yyyy-mm-ddThh:mm:ssZ. The exact meaning of this attribute depends the type of granule: |
|
ACDD 1.1 |
|
string |
Describes the time of the last data point in the data set. Use ISO 8601:2004 date format, of “yyyy-mm-ddThh:mm:ssZ”. The exact meaning of this attribute depends the type of granule: |
|
ACDD 1.1 |
|
string |
Name of the contributing instrument(s) or sensor(s) used to create this data set or product. Indicate the controlled vocabulary used in |
|
ACDD 1.3 |
|
string |
Controlled vocabulary for the names used in the |
|
ACDD 1.3 |
|
string |
A URL that gives the location of more complete metadata. A persistent URL is recommended for this attribute. It is recommended to link to the product description in the GHRSST central catalogue. |
|
ACDD 1.3 |
|
string |
A comma-separated list of key words and/or phrases. Keywords may be common words or phrases, terms from a controlled vocabulary (GCMD is often used), or URIs for terms from a controlled vocabulary (see also |
|
ACDD 1.1 |
|
string |
If you are using a controlled vocabulary for the words/phrases in your |
|
ACDD 1.1 |
|
string |
The name and version of the controlled vocabulary from which variable standard names are taken. (Values for any |
CF Standard Name Table v27’. |
ACDD 1.1 |
|
float |
Describes a simple lower latitude limit; may be part of a 2- or 3-dimensional bounding region. |
|
ACDD 1.1 |
|
float |
Describes a simple upper latitude limit; may be part of a 2- or 3-dimensional bounding region. |
|
ACDD 1.1 |
|
string |
Units for the latitude axis described in |
|
ACDD 1.1 |
|
float |
Information about the targeted spacing of points in latitude. Recommend describing resolution as a number value followed by the units. Examples: 100 meters, 0.1 degree. For level 1 and 2 swath data this is an approximation of the pixel resolution. |
|
ACDD 1.1 |
|
float |
Describes a simple longitude limit; may be part of a 2- or 3-dimensional bounding region. |
|
ACDD 1.1 |
|
float |
Describes a simple longitude limit; may be part of a 2- or 3-dimensional bounding region. |
|
ACDD 1.1 |
|
string |
Units for the longitude axis described in |
|
ACDD 1.1 |
|
float |
Information about the targeted spacing of points in longitude. Recommend describing resolution as a number value followed by units. Examples: 100 meters, 0.1 degree. For level 1 and 2 swath data this is an approximation of the pixel resolution. |
|
ACDD 1.1 |
|
float |
Describes the numerically smaller vertical limit; may be part of a 2- or 3-dimensional bounding region. See |
|
ACDD 1.1 |
|
float |
Describes the numerically larger vertical limit; may be part of a 2- or 3-dimensional bounding region. See |
|
ACDD 1.1 |
|
float |
Information about the targeted vertical spacing of points. Example: 25 meters |
|
ACDD 1.1 |
|
string |
Units for the vertical axis described in |
|
ACDD 1.1 |
|
string |
One of ‘up’ or ‘down’. If up, vertical values are interpreted as ‘altitude’, with negative values corresponding to below the reference datum (e.g., under water). If down, vertical values are interpreted as ‘depth’, positive values correspond to below the reference datum. Note that if geospatial_vertical_positive is down (‘depth’ orientation), the |
|
ACDD 1.1 |
|
string |
Describes the data’s 2D or 3D geospatial extent in OGC’s Well-Known Text (WKT) Geometry format (reference the OGC Simple Feature Access (SFA) specification). The meaning and order of values for each point’s coordinates depends on the coordinate reference system (CRS). The ACDD default is 2D geometry in the EPSG:4326 coordinate reference system. The default may be overridden with geospatial_bounds_crs and geospatial_bounds_vertical_crs (see those attributes). EPSG:4326 coordinate values are latitude (decimal degrees_north) and longitude (decimal degrees_east), in that order. Longitude values in the default case are limited to the (-180, 180) range. Example: “POLYGON ((40.26 -111.29, 41.26 -111.29, 41.26 -110.29, 40.26 -110.29, 40.26 -111.29))”. |
|
ACDD 1.1 ACDD 1.3 |
|
string |
The coordinate reference system (CRS) of the point coordinates in the geospatial_bounds attribute. This CRS may be 2-dimensional or 3-dimensional, but together with geospatial_bounds_vertical_crs, if that attribute is supplied, must match the dimensionality, order, and meaning of point coordinate values in the geospatial_bounds attribute. If geospatial_bounds_vertical_crs is also present then this attribute must only specify a 2D CRS. EPSG CRSs are strongly recommended. If this attribute is not specified, the CRS is assumed to be EPSG:4326. Examples: “EPSG:4979” (the 3D WGS84 CRS), “EPSG:4047”. |
|
ACDD 1.3 |
|
string |
The vertical coordinate reference system (CRS) for the Z axis of the point coordinates in the geospatial_bounds attribute. This attribute cannot be used if the CRS in geospatial_bounds_crs is 3-dimensional; to use this attribute, geospatial_bounds_crs must exist and specify a 2D CRS. EPSG CRSs are strongly recommended. There is no default for this attribute when not specified. Examples: “EPSG:5829” (instantaneous height above sea level), “EPSG:5831” (instantaneous depth below sea level), or “EPSG:5703” (NAVD88 height). |
|
ACDD 1.3 |
|
string |
A place to acknowledge various types of support for the project that produced this data. |
|
ACDD 1.1 |
|
string |
The name of the person (or other creator type, such as a RDAC, specified by the |
|
ACDD 1.1 |
|
string |
The URL of the of the person (or other creator type specified by the |
|
ACDD 1.1 |
|
string |
The email address of the person (or other creator type specified by the |
|
ACDD 1.1 |
|
string |
Specifies type of creator with one of the following: person, group, institution, or position. If this attribute is not specified, the creator is assumed to be a person. For a RDAC, use here institution. |
|
ACDD 1.3 |
|
string |
The institution of the creator; should uniquely identify the creator’s institution. This attribute’s value should be specified even if it matches the value of |
|
ACDD 1.3 |
|
string |
The name of the project(s) principally responsible for originating this data. Multiple projects can be separated by commas, as described under Attribute Content Guidelines. Examples: ‘PATMOS-X’, ‘Extended Continental Shelf Project’. |
|
ACDD 1.1 |
|
string |
The overarching program(s) of which the dataset is a part. A program consists of a set (or portfolio) of related and possibly interdependent projects that meet an overarching objective. Examples: |
|
ACDD 1.3 |
|
string |
The name of any individuals, projects, or institutions that contributed to the creation of this data. May be presented as free text, or in a structured format compatible with conversion to ncML (e.g., insensitive to changes in whitespace, including end-of-line characters). |
|
ACDD 1.3 |
|
string |
The role of any individuals, projects, or institutions that contributed to the creation of this data. May be presented as free text, or in a structured format compatible with conversion to ncML (e.g., insensitive to changes in whitespace, including end-of-line characters). Multiple roles should be presented in the same order and number as the names in contributor_names. |
|
ACDD 1.3 |
|
string |
The name of the person (or other entity specified by the |
|
ACDD 1.1 |
|
string |
The URL of the person (or other entity specified by the |
|
ACDD 1.1 |
|
string |
The email address of the person (or other entity specified by the |
|
ACDD 1.1 |
|
string |
Specifies type of publisher with one of the following: person, group, institution, or position. If this attribute is not specified, the publisher is assumed to be a person. |
|
ACDD 1.3 |
|
string |
The institution that presented the data file or equivalent product to users; should uniquely identify the institution. If |
|
ACDD 1.3 |
|
string |
A textual description of the processing (or quality control) level of the data. GHRSST definitions are the options: L2P, L3U, L3C, L3S, L4 |
|
ACDD 1.1 |
|
string |
The data type, as derived from Unidata’s Common Data Model Scientific Data types and understood by THREDDS. (This is a THREDDS “dataType”, and is different from the CF NetCDF attribute ‘featureType’, which indicates a Discrete Sampling Geometry file in CF.). swath or grid. |
|
ACDD 1.1 |
5.3. Variable Attributes#
Variable Attribute Name |
Format |
Description |
Source |
|
|---|---|---|---|---|
|
Must be the same as the packed variable type |
Assigned value in the data file designating a null or missing observation. |
|
CF 1.7+ |
|
String |
The units of the variable’s data values. This attribute value should be a valid udunits[6] string. The |
|
ACDD 1.1 |
|
Must be expressed in the unpacked data type |
Slope of scaling relationship applied to transform measuement data to appropriate geophysical quantity representations. Should not be used if the |
|
CF 1.7+ |
|
Must be expressed in the unpacked data type |
Intercept of scaling relationship applied to transform measurement data to appropriate geophysical quantity representations. Should not be used if the |
|
CF 1.7+ |
|
String |
A long descriptive name for the variable (not necessarily from a controlled vocabulary). This attribute is recommended by the NetCDF Users Guide, the COARDS convention, and the CF convention. |
|
ACDD 1.1 |
|
Must be the same as the packed variable type |
Comma separated minimum and maximum values of the physical quantity defining the valid measurement range. The fill value should be outside this valid range. Note that some netCDF readers are unable to cope with signed bytes and may, in these cases, report valid min as 129. Some cases as unsigned bytes 0 to 255. Values outside of the |
|
CF 1.7+ |
|
String |
A long descriptive name for the variable taken from a controlled vocabulary of variable names. We require using the CF convention and the variable names from the CF standard name table[7]. This attribute is recommended by the CF convention. Do not include this attribute if no standard_name exists. |
|
ACDD 1.1 |
|
String |
Optional attribute field allowing provision of further free-form information about the variable or the methods used to produce it. |
|
CF 1.7+ |
|
string |
For L2P and L3 files: For a data variable with a single source, use the GHRSST unique string if the source is a GHRSST SST product. For other sources, following the best practice described in Section Section 4.9 to create the character string. |
|
CF 1.7+ |
|
string |
Published or web-based references that describe the data or methods used to produce it. Note that while at least one reference is required in the global attributes (See Section 5.2), references to this specific data variable may also be given. |
CF 1.7+ |
|
|
String |
Corresponding variable axis for plotting (eg. X, Y, Z). |
|
CF 1.7+ |
|
String |
For use with a vertical coordinate variables only. May have the value “up” or “down”. For example, if an oceanographic netCDF file encodes the depth of the surface as 0 and the depth of 1000 meters as 1000 then the axis would set positive to “down”. If a depth of 1000 meters was encoded as -1000, then positive would be set to “up”. See the section on vertical coordinate in CF Convention[1] |
|
CF 1.7+ |
|
String |
This attribute contains a space separated list of all the coordinates corresponding to the variable. The list should contain all the auxiliary coordinate variables and optionally the coordinate variables. |
|
CF 1.7+ |
|
String |
Describes the horizontal coordinate system used by the data. The grid_mapping attribute should point to a variable which would contain the parameters corresponding to the coordinate system. That named variable is called a grid mapping variable and is of arbitrary type since it contains no data. Its purpose is to act as a container for the attributes that define the mapping. There are typically several parameters associated with each coordinate system. CF defines a separate attributes for each of the parameters. Some examples are |
|
CF 1.7+ |
|
String |
Define the physical meaning of each |
|
CF 1.7+ |
|
Must be the same as the variable type |
Its values identify the flagged conditions by performing a bitwise AND of the variable value and each flag masks value. For example, if the variable value is of type unsigned byte and equal to 5 and the flag_masks are 1b, 2b, 4b, 8b, 16b, 32b. The binary encoding of 5 is 00000101 and the binary encoding of the flags are 00000001, 00000010, 00000100, 00001000, 00010000, 00100000. Now bitwide AND of the value with the masks returns 00000001, 00000000, 00000100, 00000000, 00000000, 00000000 respectively or 1b,0,4b,0,0,0,0,0 in decimal. So the masks corresponding to 1b and 4b are “”true””, rest are “”false””. Used primarily for quality_level and “source_of_xxx” variables. |
|
CF 1.7+ |
|
Must be the same as the variable type |
A number of independent Boolean conditions using bit field notation by setting unique bits in each |
|
CF 1.7+ |
|
String |
Use this to indicate the depth for which the SST data are valid. |
|
GDS |
|
String |
Use this to indicate the height for which the wind data are specified. |
|
GDS |
|
float |
Difference in hours between an ancillary field such as wind_speed and the SST observation time |
|
GDS |
|
String |
An ISO 19115-3 code to indicate the source of the data |
|
ACDD 1.1 |