In:
ACM SIGMOD Record, Association for Computing Machinery (ACM), Vol. 23, No. 2 ( 1994-06), p. 243-252
Abstract:
Evaluating database system performance often requires generating synthetic databases—ones having certain statistical properties but filled with dummy information. When evaluating different database designs, it is often necessary to generate several databases and evaluate each design. As database sizes grow to terabytes, generation often takes longer than evaluation. This paper presents several database generation techniques. In particular it discusses: (1) Parallelism to get generation speedup and scaleup. (2) Congruential generators to get dense unique uniform distributions. (3) Special-case discrete logarithms to generate indices concurrent to the base table generation. (4) Modification of (2) to get exponential, normal, and self-similar distributions. The discussion is in terms of generating billion-record SQL databases using C programs running on a shared-nothing computer system consisting of a hundred processors, with a thousand discs. The ideas apply to smaller databases, but large databases present the more difficult problems.
Type of Medium:
Online Resource
ISSN:
0163-5808
DOI:
10.1145/191843.191886
Language:
English
Publisher:
Association for Computing Machinery (ACM)
Publication Date:
1994
detail.hit.zdb_id:
243829-X
detail.hit.zdb_id:
2051432-3
Permalink