Enacting Data Science Pipelines for Exploring Graphs: From Libraries to Studios

Abstract

This paper proposes a study of existing environments used to enact data science pipelines applied to graphs. Data science pipelines are a new form of queries combining classic graph operations with artificial intelligence graph analytics operations. A pipeline defines a data flow consisting of tasks for querying, exploring and analysing graphs. Different environments and systems can be used for enacting pipelines. They range from graph NoSQL stores, programming languages extended with libraries providing graph processing and analytics functions, to full machine learning and artificial intelligence studios. The paper describes these environments and the design principles that they promote for enacting data science pipelines intended to query, process and explore data collections and particularly graphs.