In:
Computing, Springer Science and Business Media LLC, Vol. 103, No. 6 ( 2021-06), p. 1085-1104
Abstract:
With Process discovery algorithms, we discover process models based on event data, captured during the execution of business processes. The process discovery algorithms tend to use the whole event data. When dealing with large event data, it is no longer feasible to use standard hardware in a limited time. A straightforward approach to overcome this problem is to down-size the data utilizing a random sampling method. However, little research has been conducted on selecting the right sample, given the available time and characteristics of event data. This paper systematically evaluates various biased sampling methods and evaluates their performance on different datasets using four different discovery techniques. Our experiments show that it is possible to considerably speed up discovery techniques using biased sampling without losing the resulting process model quality. Furthermore, due to the implicit filtering (removing outliers) obtained by applying the sampling technique, the model quality may even be improved.
Type of Medium:
Online Resource
ISSN:
0010-485X
,
1436-5057
DOI:
10.1007/s00607-021-00910-4
Language:
English
Publisher:
Springer Science and Business Media LLC
Publication Date:
2021
detail.hit.zdb_id:
1458946-1
detail.hit.zdb_id:
215907-7
Permalink