Reading and Writing Files

Xarray supports direct serialization and I/O to several file formats including pickle, netCDF, OPeNDAP (read-only), GRIB1/2 (read-only), Zarr, and HDF by integrating with third-party libraries. Additional serialization formats for 1-dimensional data are available through pandas.

File types

  • Pickle
  • NetCDF 3/4
  • RasterIO
  • Zarr
  • PyNio

Interoperability

  • Pandas
  • Iris
  • CDMS
  • dask DataFrame

Opening xarray datasets

Xarray’s open_dataset and open_mfdataset are the primary functions for opening local or remote datasets such as netCDF, GRIB, OpenDap, and HDF. These operations are all supported by third party libraries (engines) for which xarray provides a common interface.

!ncdump -h ../data/rasm.nc
netcdf rasm {
dimensions:
	time = 36 ;
	y = 205 ;
	x = 275 ;
variables:
	double Tair(time, y, x) ;
		Tair:_FillValue = 9.96920996838687e+36 ;
		Tair:units = "C" ;
		Tair:long_name = "Surface air temperature" ;
		Tair:type_preferred = "double" ;
		Tair:time_rep = "instantaneous" ;
		Tair:coordinates = "yc xc" ;
	double time(time) ;
		time:_FillValue = NaN ;
		time:long_name = "time" ;
		time:type_preferred = "int" ;
		time:units = "days since 0001-01-01" ;
		time:calendar = "noleap" ;
	double xc(y, x) ;
		xc:_FillValue = NaN ;
		xc:long_name = "longitude of grid cell center" ;
		xc:units = "degrees_east" ;
		xc:bounds = "xv" ;
	double yc(y, x) ;
		yc:_FillValue = NaN ;
		yc:long_name = "latitude of grid cell center" ;
		yc:units = "degrees_north" ;
		yc:bounds = "yv" ;

// global attributes:
		:title = "/workspace/jhamman/processed/R1002RBRxaaa01a/lnd/temp/R1002RBRxaaa01a.vic.ha.1979-09-01.nc" ;
		:institution = "U.W." ;
		:source = "RACM R1002RBRxaaa01a" ;
		:output_frequency = "daily" ;
		:output_mode = "averaged" ;
		:convention = "CF-1.4" ;
		:references = "Based on the initial model of Liang et al., 1994, JGR, 99, 14,415- 14,429." ;
		:comment = "Output from the Variable Infiltration Capacity (VIC) model." ;
		:nco_openmp_thread_number = 1 ;
		:NCO = "\"4.6.0\"" ;
		:history = "Tue Dec 27 14:15:22 2016: ncatted -a dimensions,,d,, rasm.nc rasm.nc\nTue Dec 27 13:38:40 2016: ncks -3 rasm.nc rasm.nc\nhistory deleted for brevity" ;
}
import xarray as xr
from glob import glob
ds = xr.open_dataset('../data/rasm.nc')
ds
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 ...
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltration Capacity...
    nco_openmp_thread_number:  1
    NCO:                       "4.6.0"
    history:                   Tue Dec 27 14:15:22 2016: ncatted -a dimension...

Saving xarray datasets as netcdf files

Xarray provides a high-level method for writing netCDF files directly from Xarray Datasets/DataArrays.

ds.to_netcdf('../data/rasm_test.nc')

Multifile datasets

Xarray can read/write multifile datasets using the open_mfdataset and save_mfdataset functions.

paths = glob('../data/19*.nc')
paths
['../data/1980.nc', '../data/1981.nc', '../data/1982.nc', '../data/1983.nc']
ds2 = xr.open_mfdataset(paths)
ds2
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
    xc       (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91
    yc       (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51
  * time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 dask.array<shape=(36, 205, 275), chunksize=(4, 205, 275)>
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltration Capacity...
    nco_openmp_thread_number:  1
    NCO:                       "4.6.0"
    history:                   Tue Dec 27 14:15:22 2016: ncatted -a dimension...

Zarr

Zarr is a Python package providing an implementation of chunked, compressed, N-dimensional arrays. Zarr has the ability to store arrays in a range of ways, including in memory, in files, and in cloud-based object storage such as Amazon S3 and Google Cloud Storage. Xarray’s Zarr backend allows xarray to leverage these capabilities.

# save to a Zarr dataset
ds.to_zarr('../data/rasm.zarr', mode='w')
<xarray.backends.zarr.ZarrStore at 0x11d5eeef0>
!ls ../data/rasm.zarr
Tair time xc   yc
!du -h ../data/rasm.zarr
348K	../data/rasm.zarr/yc
 12K	../data/rasm.zarr/time
7.6M	../data/rasm.zarr/Tair
332K	../data/rasm.zarr/xc
8.3M	../data/rasm.zarr

Going Further:

  • Xarray I/O Documentation: http://xarray.pydata.org/en/latest/io.html

  • Zarr Documentation: https://zarr.readthedocs.io/en/stable/


%load_ext watermark
%watermark --iversion -g -m -v -u -d
xarray 0.12.1
last updated: 2019-05-17 

CPython 3.6.7
IPython 7.5.0

compiler   : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)
system     : Darwin
release    : 18.2.0
machine    : x86_64
processor  : i386
CPU cores  : 8
interpreter: 64bit
Git hash   : b967193c452fe8cb9384ca3dd81a3ef0c6fb2abf