Synthetic Data Generation
S J (~s2) |
Description:
In this talk, I will walk you through the basics of how to generate synthetic data from your production relational tables.
This is useful in cases where your customers do not want you to examine their production data but you want to build some models which simulates their usage of your product.
We will look at how you can mask values, capture variation within columns as well as dependence between the columns of a table and across tables.
Prerequisites:
prior exposure to statistical concepts like probability distributions would help but not required.
Video URL:
https://youtu.be/02iVENP5HzA
Speaker Info:
Software engineer with 18+ years of experience. Currently working at Kognitos
https://www.linkedin.com/in/sanjoshi/
Speaker Links:
links to some previous talks given
Talk on Disambiguating users in a social media graph https://hasgeek.com/fifthelephant/2023-08/sub/unraveling-the-identity-puzzle-disambiguating-user-AF2hNVZpvVtEME8qQ2JkTq
Talk on Lamport's TLA+ https://www.youtube.com/watch?v=8L2M-CpOEJc&list=PLMqXoQWiY8w5xQwWs37Tok0UNNpX_Qnqk&index=3
Talk on Data Streaming Algorithms https://www.youtube.com/watch?v=NUsNtosPoeI