Generative AI : Introduction

3 min readMay 9, 2023

Generative AI is a subset of artificial intelligence that focuses on generating new data such as images, texts, videos and audios from text or image prompts.

Generative AI models learn from large datasets to capture formats, patterns, styles, and structures in the data. They use this learned knowledge to generate new content that resembles the training data.

Popular Generative AI Architecture

Generative Adversarial Networks (GAN) based

GANs revolutionized generative modelling. GANs consist of a generator and a discriminator that compete against each other. The generator aims to produce realistic samples, while the discriminator tries to distinguish between real and generated samples. GANs have achieved impressive results in generating images, videos, and more.

Popular models : DCGAN, StyleGAN, ESRGAN, Pix2Pix etc

Variational AutoEncoder (VAE) based

VAEs, combined ideas from autoencoders and variational inference. VAEs use neural networks to encode data into a lower-dimensional latent space and decode it to generate new samples. They offer a powerful framework for generating diverse and realistic data.

Popular Models : CVAE , VQ-VAE , PixelCNN-VAE etc

Transformer based

The Transformer architecture, introduced in 2017, has significantly impacted generative AI. Transformers utilize self-attention mechanisms to capture dependencies and have been successful in tasks like language translation, text generation, and image generation.

Popular Models : ChatGPT, GPT 4, BERT, CLIP, etc

Diffusion based

Recent advancements in generative AI have focused on style transfer and controllable generation. Models can learn to transfer the style of one piece of content to another, allowing for creative applications. At a high level, Diffusion models work by destroying training data by adding noise and then learn to recover the data by reversing this noising process. In Other words, Diffusion models can generate coherent images from noise. Additionally, techniques like conditional generation and latent space manipulation enable fine-grained control over generated outputs.

Popular Models : Glide, DALL-E 2, Stable Diffusion, Imagen, Midjourney etc

Evolution of Generative AI

Earlier the Generative AI landscape was progressing separately in Natural Language Processing (NLP) and Computer Vision (CV) tasks. But the recent trend is that the new models developed are multimodal models, which can understand both Vision and Language (VL). In future, we shall expect multi models that can understand any sort of inputs such as images, text, audio, depth, thermal, IMU data etc

Applications of Generative AI

Text generation: content creation, chatbots and virtual assistants, creative writing, language translation, summarization, etc..

Bard

Image generation: content creation, photographs, artwork, data augmentation, fashion design, product design, etc..

Adobe Firefly

Voice generation: songs and compositions, text-to-speech, language learning and pronunciation, vocal assistants, etc..

ElevenLabs

Video generation: short films, music videos, video summarization, deepfakes, scene generation, advertisement and marketing, etc..

RunwayML

Code generation: code synthesis, bug fixing, code refactoring, test case generation, etc..

GitHub Copilot

References

https://learnopencv.com/category/ai-art-generation/

Generative AI : Introduction

Popular Generative AI Architecture

Evolution of Generative AI

Applications of Generative AI

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Jyothish

No responses yet