Label-based Indexing
Scientific data is inherently labeled. For example, time series data includes timestamps that label individual periods or points in time, spatial data has coordinates (e.g. longitude, latitude, elevation), and model or laboratory experiments are often identified by unique identifiers.
import xarray as xr
ds = xr.open_dataset('../data/air_temperature.nc')
ds
NumPy Positional Indexing
When working with numpy, indexing is done by position (slices/ranges/scalars).
t = ds['air'].data # numpy array
t
t.shape
# extract a time-series for one spatial location
t[:, 20, 40]
but wait, what labels go with 10 and 20? Was that lat/lon or lon/lat? Where are the timestamps that go along with this time-series?
Indexing with xarray
xarray offers extremely flexible indexing routines that combine the best features of NumPy and pandas for data selection.
da = ds['air'] # Extract data array
da
- NumPy style indexing still works (but preserves the labels/metadata)
da[:, 20, 40]
- Positional indexing using dimension names
da.isel(lat=20, lon=40)
- Label-based indexing
da.sel(lat=50., lon=200.)
- Nearest Neighbor Lookups
da.sel(lat=52.25, lon=251.8998, method='nearest')
- All of these indexing methods work on the dataset too:
ds.sel(lat=52.25, lon=251.8998, method='nearest')
Vectorized Indexing
Like numpy and pandas, xarray supports indexing many array elements at once in a vectorized manner:
# generate a coordinates for a transect of points
lat_points = xr.DataArray([52, 52.5, 53], dims='points')
lon_points = xr.DataArray([250, 250, 250], dims='points')
lat_points
lon_points
# nearest neighbor selection along the transect
da.sel(lat=lat_points, lon=lon_points, method='nearest')
%load_ext watermark
%watermark --iversion -g -m -v -u -d