In:
Proceedings of the VLDB Endowment, Association for Computing Machinery (ACM), Vol. 13, No. 12 ( 2020-08), p. 2993-2996
Abstract:
Data pipelines are the new code. Consequently, data scientists need new tools to support the often time-consuming process of debugging their pipelines. We introduce Dagger , an end-to-end system to debug and mitigate data-centric errors in data pipelines, such as a data transformation gone wrong or a classifier underperforming due to noisy training data. Dagger supports inter-module debugging, where the pipeline blocks are treated as black boxes, as well as intra-module debugging, where users can debug data objects in Python scripts (e.g., DataFrames). In this demo, we will walk the audience through a rich, real-world business intelligence use case from our industrial collaborators at Intel, to highlight how Dagger enables data scientists to productively identify and mitigate data-centric problems at different stages of pipeline development.
Type of Medium:
Online Resource
ISSN:
2150-8097
DOI:
10.14778/3415478.3415527
Language:
English
Publisher:
Association for Computing Machinery (ACM)
Publication Date:
2020
detail.hit.zdb_id:
2478691-3
Permalink