Abstract

- The data-driven parallelization framework Hadoop MapReduce allows analysing large data sets in a scalable way. Since the development of MapReduce programs can be a time-intensive and challenging task, the application and usage of Hadoop in Biomedical Research is still limited. Here we present Cloudflow, a high-level framework to hide the implementation details of Hadoop and to provide a set of building blocks to create biomedical pipelines in a more intuitive way. We demonstrate the benefit of Cloudflow on three different genetic use cases. It will be shown how the framework can be combined with the Hadoop workflow system Cloudgene and the cloud orchestration platform CloudMan to provide Hadoop pipelines as a service to everyone. The framework is open source and free available at https://github.com/genepi/cloudflow.

Document type: Conference object

Full document

The PDF file did not load properly or your web browser does not support viewing PDF files. Download directly to your device: Download PDF document

Original document

The different versions of the original document can be found in:

http://mipro-proceedings.com,
http://fulir.irb.hr/1988/1/Cloudflow%20-%20A%20Framework%20for%20MapReduce%20Pipile%20Development%20in%20Biomedical%20Research.pdf
http://dx.doi.org/10.1109/mipro.2015.7160259
http://ieeexplore.ieee.org/document/7160259,
https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/2080825/CloudflowSUBMITTED.pdf,
https://doi.org/10.1109/MIPRO.2015.7160259,
http://fulir.irb.hr/1988,
https://academic.microsoft.com/#/detail/1605032987



DOIS: 10.6084/m9.figshare.1424739.v1 10.1109/mipro.2015.7160259 10.6084/m9.figshare.1424739

Back to Top

Document information

Published on 01/01/2015

Volume 2015, 2015
DOI: 10.6084/m9.figshare.1424739.v1
Licence: Other

Document Score

0

Views 0
Recommendations 0

Share this document

claim authorship

Are you one of the authors of this document?