Data sets can be local or remote (e.g. cloud storage), and a wide variety of file formats are supported. The large, multidimensional, datasets required for modern scientific analysis are supported natively. Adding completely new data formats requires only a few lines of code.
Data on your machine can be loaded directly into memory or lazy-loaded from disk (for very large files)
Remote datasets can either be streamed on-demand or downloaded and loaded into memory
🧬 A large (~60 GB) multi-resolution image can be interactively panned and zoomed while remaining entirely on disk.
Many common scientific data file formats are supported alongside domain-specific datasets, leveraging open-source libraries that provide access to a wide variety of underlying file formats.
If a Python library exists to read your data file format, it is often just a few lines of code to define a custom data loader.