Reading and Writing Files
Xarray supports direct serialization and I/O to several file formats including pickle, netCDF, OPeNDAP (read-only), GRIB1/2 (read-only), Zarr, and HDF by integrating with third-party libraries. Additional serialization formats for 1-dimensional data are available through pandas.
File types
- Pickle
- NetCDF 3/4
- RasterIO
- Zarr
- PyNio
Interoperability
- Pandas
- Iris
- CDMS
- dask DataFrame
Opening xarray datasets
Xarray’s open_dataset
and open_mfdataset
are the primary functions for opening local or remote datasets such as netCDF, GRIB, OpenDap, and HDF. These operations are all supported by third party libraries (engines) for which xarray provides a common interface.
!ncdump -h ../data/rasm.nc
import xarray as xr
from glob import glob
ds = xr.open_dataset('../data/rasm.nc')
ds
Saving xarray datasets as netcdf files
Xarray provides a high-level method for writing netCDF files directly from Xarray Datasets/DataArrays.
ds.to_netcdf('../data/rasm_test.nc')
Multifile datasets
Xarray can read/write multifile datasets using the open_mfdataset
and save_mfdataset
functions.
paths = glob('../data/19*.nc')
paths
ds2 = xr.open_mfdataset(paths)
ds2
Zarr
Zarr is a Python package providing an implementation of chunked, compressed, N-dimensional arrays. Zarr has the ability to store arrays in a range of ways, including in memory, in files, and in cloud-based object storage such as Amazon S3 and Google Cloud Storage. Xarray’s Zarr backend allows xarray to leverage these capabilities.
# save to a Zarr dataset
ds.to_zarr('../data/rasm.zarr', mode='w')
!ls ../data/rasm.zarr
!du -h ../data/rasm.zarr
Going Further:
-
Xarray I/O Documentation: http://xarray.pydata.org/en/latest/io.html
-
Zarr Documentation: https://zarr.readthedocs.io/en/stable/
%load_ext watermark
%watermark --iversion -g -m -v -u -d