The Science

Since 2019, at Carnegie Mellon, co-founders Giulia Fanti and Vyas Sekar have made significant advances
in the theory and practice of generative models such as developing state-of-art GAN-based models for
timeseries data, improving the stability and privacy of generative algorithms,
practical approaches for rare sample generation, and domain-specific adaptations of
deep generative models (e.g., telecommunications, IoT).

These peer-reviewed research papers have appeared at prestigious AI/ML venues
such as NeuRIPS, ICML, AAAI, and domain-specific venues such as IMC, SIGCOMM.

Read more about some of these research foundations below.

Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

TL;DR: This paper introduces DoppelGANger. a practical Generative Adversarial Network (GAN) based timeseries generation that created a novel architecture for decoupling metadata and measurement fields and showed how to make GANs work for high dimensional timeseries data to capture metadata properties, measurement timeseries properties, and metadata-measurement correlations at scale.

RareGAN: Generating Samples for Rare Classes

TL;DR: Many real world use cases need synthetic data to "amplify" rare scenarios (e.g., anomalies, fraud, attacks etc) . This paper shows a foundational mechanism for restructuring the learning and generation process to enable users to generate more "rare" samples from a limited selection of rare samples.
On the Privacy Properties of GAN-generated Samples

TL;DR: This paper proves theoretically that GAN based generation provides instrinsic protection against member inference attacks and some baseline differential privacy guarantees. It also shows that to get stronger differential privacy guarantees we need to rethink the training approach.
Why Spectral Normalization Stabilizes GANs: Analysis and Improvements

TL;DR: GANs are known to have issues with convergence and stability and the community proposed several heuristics to mitigate these. This paper proves theoretically why one of the popular heuristics called Spectral Normalization actually works and also uses the theoretical insights to improve the heuristic.
Practical GAN-based Synthetic IP Header Trace Generation using NetShare

TL;DR: This paper introduces NetShare, first deep generative model for generating IP header network traces at scale. It builds off the DoppelGANger architecture described above, and extends it with domain specific insights on reformulating the generation problem and encoding the data fields in packet headers to enable a practical workflow specifically for network traces.

The Science Behind Rockfish

Rockfish builds on and extends 6+ years of foundational academic research on making
Deep Generative Models practical for Enterprise-scale synthetic data generation.

Contact Us

The Science Behind RockfishRockfish builds on and extends 6+ years of foundational academic research on making ​Deep Generative Models practical for Enterprise-scale synthetic data generation.

Contact Us

The Science Behind Rockfish

Rockfish builds on and extends 6+ years of foundational academic research on making
Deep Generative Models practical for Enterprise-scale synthetic data generation.