synthetic data

This synthetic data must meet two requirements: First, it must somewhat resemble the original data statistically, to ensure realism and keep problems engaging for data scientists. Second, it must also formally and structurally resemble the original data, so that any software written on top of it can be reused.

Use Case

Synthetic data techniques can create all the data needed to satisfy the needs of data hungry machine learning algorithms.
Synthetic data generation is also a method for making the data needed to stress test a system.
Synthetic data can change existing biases in data, thereby (e.g.) removing data discrimination.
Synthetic data can be used to impute missing information in existing data.
Synthetic data generated can be used to enable data sharing, without incurring the wrath of legislative bodies. In this way, organizations can share insights, thereby assisting in scientific reasoning.
When most people work with the synthetic , not the real data, then this increases data security.
Synthetic data can exploit current trends in data, thereby supporting forecasting.

Reference List

Patki, N., Wedge, R., & Veeramachaneni, K. (2016, October). The synthetic data vault. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (pp. 399-410). IEEE.
https://www.computer.org/csdl/magazine/so/2023/05/10273815/1R6sOyTc8r6

Boyang Yan

Explorer

synthetic data

Use Case

Reference List

Graph View

Table of Contents

Backlinks