Data Structures
- xarray has 2 fundamental data structures:
DataArray
, which holds single multi-dimensional variables and its coordinatesDataset
, which holds multiple variables that potentially share the same coordinates
DataArray
The DataArray is xarray’s implementation of a labeled, multi-dimensional array. It has several key properties:
Attribute | Description |
---|---|
data |
numpy.ndarray or dask.array holding the array’s values. |
dims |
dimension names for each axis. For example:(x , y , z ) (lat , lon , time ). |
coords |
a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings) |
attrs |
an OrderedDict to hold arbitrary attributes/metadata (such as units) |
name |
an arbitrary name of the array |
To create a basic DataArray
, you can pass this numpy array of random data to xr.DataArray
NOTE:
Xarray automatically generates some basic dimension names for us.
You can also pass in your own dimension names and coordinate values:
This is already improved upon from the original numpy array, because we have names for each of the dimensions (or axis in NumPy parlance).
We can also add attributes to an existing DataArray
:
Dataset
- Xarray’s
Dataset
is a dict-like container of labeled arrays (DataArrays
) with aligned dimensions. - It is designed as an in-memory representation of a netCDF dataset. - In addition to the dict-like interface of the dataset itself, which can be used to access any
DataArray
in aDataset
. Datasets have the following key properties:
Attribute | Description |
---|---|
data_vars |
OrderedDict of DataArray objects corresponding to data variables. |
dims |
dictionary mapping from dimension names to the fixed length of each dimension (e.g., {lat : 6, lon : 6, time : 8}). |
coords |
a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings) |
attrs |
OrderedDict to hold arbitrary metadata pertaining to the dataset. |
name |
an arbitrary name of the dataset |
- DataArray objects inside a Dataset may have any number of dimensions but are presumed to share a common coordinate system.
- Coordinates can also have any number of dimensions but denote constant/independent quantities, unlike the varying/dependent quantities that belong in data.
To create a Dataset
from scratch, we need to supply dictionaries for any variables (data_vars
), coordinates (coords
) and attributes (attrs
):
Let’s add some toy temperature
data array to this existing dataset:
Going Further:
Xarray Documentation on Data Structures: http://xarray.pydata.org/en/latest/data-structures.html