BIDS Data Science Lecture Series | October 16, 2015 | 1:00-2:30 p.m. | 190 Doe Library, UC Berkeley
Sponsors: Berkeley Institute for Data Science and the Data, Society and Inference Seminar
The N-dimensional array has been the mainstay data structure of scientific computing. But can we do better? In this talk, I’ll describe two Python projects, xray and dask.array, that extend multi-dimensional arrays with features that enable scalable and reproducible science. Xray adds labels and metadata to arrays, letting developers use meaningful names for operations instead of easily confused axis numbers or integer positions.Dask is a framework for easy parallel computing with arrays that may be too big to fit into memory. I will highlight examples in climate science and meteorology, for which large and complex datasets are common.