PolyPipe: Merging Data Pipelines and Multi-Model Databases

Authors
David Lengweiler, Tobias Weber, Heiko Schuldt, Marco Vogt
Type
In Proceedings
Date
2026/3
Appears in
Proceedings of the 29th International Conference on Extending Database Technology (EDBT 2026)
Location
Tampere, Finland
Abstract

Modern data is characterized by its high volume and inherent heterogeneity, primarily managed by systems tailored to three distinct modeling paradigms: the relational model, which enforces strict schema and high structural integrity; the document model, which offers schema flexibility for semi-structured data; and the graph model, which prioritizes modeling complex relationships between entities. While the database industry is trending toward multi-model systems that incorporate features from all paradigms, data management practices still lag behind. Data scientists rely on manual, multi-stage and labor-intensive workflows to integrate disparate data sources. This process forces users to switch tools, results in high data shipping costs, and forfeits database-level optimizations and structural guarantees, leading to complex, brittle and non-reusable “one-off” solutions.
We argue that embedding data pipelines directly into a multi-model database offers significant benefits, including streamlining, simplification, and improved maintainability, by utilizing declarative, database-native operators. This paper presents PolyPipe, an extension to the Polypheny multi-model database system. PolyPipe integrates data pipeline functionality as a first-class citizen, allowing the construction of complex pipelines using a hybrid of database and classical operators within a single system.