4

HDF5eis: A storage and input/output solution for big multidimensional time serie...

 11 months ago
source link: https://pubs.geoscienceworld.org/geophysics/article-abstract/88/3/F29/622154/HDF5eis-A-storage-and-input-output-solution-for?redirectedFrom=fulltext
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Research Article| April 12, 2023

ABSTRACT

Modern high-performance computing (HPC) tasks overwhelm conventional geophysical data formats. We describe a new data schema called HDF5eis (read H-D-F-size) for handling big multidimensional time series data from environmental sensors in HPC applications and implement a freely available Python application programming interface (API) for building and processing HDF5eis files. HDF5eis augments the popular Hierarchical Data Format 5 with a minimal set of additional conventions that facilitate fast and flexible data input and output protocols for regularly sampled (in time) data with any number of dimensions. HDF5eis supports arbitrary ancillary data (e.g., metadata) storage in columnar format or as UTF-8 encoded byte streams alongside time series data. Our HDF5eis API enables simple and efficient access to big data sets distributed across a potentially large number of small heterogeneous files through a single point of access. HDF5eis outperforms conventional seismic data formats by up to two orders of magnitude in terms of random read access times. We contribute HDF5eis as an operational tool and an experimental draft proposal that will help establish the next generation of data standards in the earth sciences.

You do not have access to this content, please speak to your institutional administrator if you feel you should have access.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK