Abstract

Genomic data management is focused on achieving high performance over big datasets using batch, cloud-based architectures; this enables the execution of massive pipelines, but hampers the capability of exploring the solution space when it is not well-defined, by choosing different experimental samples or query extraction parameters. We present PyGMQL, a Python-based interoperability software layer that enables testing of experimental pipelines; PyGMQL solves the impedance mismatch between a batch execution environment and the agile programming style of Python, and provides transparency of access when exploration requires integrating local and remote resources. Wrapping PyGMQL and Python primitives within Jupyter notebooks guarantees reproducibility of the pipeline when used in different contexts or by different scientists. The software is freely available at https://github.com/DEIB-GECO/PyGMQL.


Original document

The different versions of the original document can be found in:

http://dx.doi.org/10.1145/3214708.3214710 under the license http://www.acm.org/publications/policies/copyright_policy#Background
https://re.public.polimi.it/handle/11311/1095264,
https://academic.microsoft.com/#/detail/2943444417
Back to Top

Document information

Published on 01/01/2018

Volume 2018, 2018
DOI: 10.1145/3214708.3214710
Licence: Other

Document Score

0

Views 2
Recommendations 0

Share this document

claim authorship

Are you one of the authors of this document?