Generative AI is a subset of artificial intelligence that focuses on generating new data such as images, texts, videos and audios from text or image prompts.
Generative AI models learn from large datasets to capture formats, patterns, styles, and structures in the data. They use this learned knowledge to generate new content that resembles the training data.
Popular Generative AI Architecture
- Generative Adversarial Networks (GAN) based
GANs revolutionized generative modelling. GANs consist of a generator and a discriminator that compete against each other. The generator aims to produce realistic samples, while the discriminator tries to distinguish between real and generated samples. GANs have achieved impressive results in generating images, videos, and more.
Popular models : DCGAN, StyleGAN, ESRGAN, Pix2Pix etc
- Variational AutoEncoder (VAE) based
VAEs, combined ideas from autoencoders and variational inference. VAEs use neural networks to encode data into a lower-dimensional latent space and decode it to generate new samples. They offer a powerful framework for generating diverse and realistic data.
Popular Models : CVAE , VQ-VAE , PixelCNN-VAE etc
- Transformer based
The Transformer architecture, introduced in 2017, has significantly impacted generative AI. Transformers utilize self-attention mechanisms to capture dependencies and have been successful in tasks like language translation, text generation, and image generation.
Popular Models : ChatGPT, GPT 4, BERT, CLIP, etc
- Diffusion based
Recent advancements in generative AI have focused on style transfer and controllable generation. Models can learn to transfer the style of one piece of content to another, allowing for creative applications. At a high level, Diffusion models work by destroying training data by adding noise and then learn to recover the data by reversing this noising process. In Other words, Diffusion models can generate coherent images from noise. Additionally, techniques like conditional generation and latent space manipulation enable fine-grained control over generated outputs.
Popular Models : Glide, DALL-E 2, Stable Diffusion, Imagen, Midjourney etc
Evolution of Generative AI
Earlier the Generative AI landscape was progressing separately in Natural Language Processing (NLP) and Computer Vision (CV) tasks. But the recent trend is that the new models developed are multimodal models, which can understand both Vision and Language (VL). In future, we shall expect multi models that can understand any sort of inputs such as images, text, audio, depth, thermal, IMU data etc
Applications of Generative AI
- Text generation: content creation, chatbots and virtual assistants, creative writing, language translation, summarization, etc..
- Image generation: content creation, photographs, artwork, data augmentation, fashion design, product design, etc..
- Voice generation: songs and compositions, text-to-speech, language learning and pronunciation, vocal assistants, etc..
- Video generation: short films, music videos, video summarization, deepfakes, scene generation, advertisement and marketing, etc..
- Code generation: code synthesis, bug fixing, code refactoring, test case generation, etc..
References