Introduction to GenAI
TLDR;
Generative AI refers to AI systems that can produce original content — text, images, code, and audio — in response to natural language. At the centre of this is a type of model called a Large Language Model (LLM), trained on massive amounts of text to understand and generate human language. LLMs work by breaking language into tokens, converting them to numbers, and using a neural architecture called a Transformer to understand context and generate responses. They don't copy from the internet — they generate fresh responses based on learned patterns. Today they power tools like ChatGPT, Gemini, and Claude, and are used for writing, coding, translation, summarisation, and much more.
Introduction
We are living through one of the most significant shifts in the history of computing. For decades, interacting with a computer meant learning its language — writing code, clicking menus, or following rigid commands. Today, computers are learning ours.
This shift has a name: Generative AI.
Generative AI refers to artificial intelligence systems that can produce original content — text, images, code, audio, and video — simply in response to a natural language prompt. Unlike traditional software that follows fixed rules and produces predictable outputs, generative AI creates something new every time. Ask it to write an email, summarise a document, generate an image, or explain a concept — and it does.
What makes this possible is not one single breakthrough, but a series of compounding advances in how machines represent, process, and generate language. And at the centre of all of it is a specific type of model called a Large Language Model, or LLM.
Understanding LLMs — what they are, how they work, and why they behave the way they do — is the foundation for understanding almost everything happening in AI today. That is what this article covers.
What is an LLM?
LLM stands for Large Language Model. It is a type of artificial intelligence (AI) model trained on vast amounts of text collected from books, articles, websites, research papers, and other publicly available or licensed sources. Through this training, an LLM learns patterns, relationships, grammar, facts, and reasoning present in human language.
As a result, LLMs can understand, process, and generate natural language in a way that closely resembles human communication. This is why they are commonly associated with Natural Language Processing (NLP), a field of AI focused on enabling computers to understand and work with human language.
The core problem that LLMs solve is the ability to understand and generate human language at scale. They can process enormous amounts of text, identify context, infer meaning, and produce coherent, context-aware responses within seconds.
Today, LLMs are broadly available in two forms:
Closed-source (Proprietary) LLMs
These models are developed and maintained by companies, and their model weights are not publicly available.
GPT — OpenAI
Gemini — Google
Claude — Anthropic
Grok — xAI
Open-source (Open-weight) LLMs
These models make their model weights publicly available, allowing developers and researchers to run, fine-tune, or build applications on top of them.
Llama — Meta
Gemma — Google
DeepSeek
Qwen — Alibaba
Common Applications of LLMs
Some of the most common real-world applications of LLMs include:
Summarizing long documents or articles
Writing emails, blogs, and marketing copy
Translating text between languages
Answering questions and providing explanations
Assisting with coding and debugging
Generating ideas and brainstorming
Creating chatbots and AI assistants
Extracting information from documents
Proofreading and improving writing
What Happens When You Send a Message to ChatGPT?
When you hit send in ChatGPT, your message is securely transmitted to OpenAI’s servers, where the AI’s neural network breaks your words into chunks (tokens), calculates the mathematical probability of the next likely word, and streams a custom response back to your screen in milliseconds
Why aren't LLM responses copied from the internet?
Large Language Models (LLMs) work in a way that is somewhat similar to how humans learn. When a person studies many books, takes courses, and develops a deep understanding of a particular subject, they don't simply memorize and repeat the exact words they have read. Instead, they build their own understanding of the concepts and explain them in their own way.
As a result, a person's explanation may vary depending on the situation, audience, or context, even though the underlying concepts remain the same.
Similarly, LLMs are trained on large datasets and develop statistical representations of language, concepts, and relationships between ideas. Instead of copying text from the internet, they generate responses based on the patterns and understanding learned during training.
This is why an LLM doesn't need to reproduce another source's explanation word for word. Even if you ask the same question multiple times, the wording of the response may change because the model generates a fresh response each time. However, if the question is the same, the core concepts and the overall meaning generally remain consistent.
Why Don't Computers Understand Human Language?
Computers are built from billions of microscopic electronic components called transistors. These transistors work like tiny switches that can be either ON (1) or OFF (0). Together, billions of these switches perform calculations and execute instructions.
However, computers do not naturally understand human languages such as English, Hindi, or Spanish. They only understand machine instructions represented using binary digits (0s and 1s).
This is where software acts as a bridge between humans and computer hardware. Humans interact with software using natural language, text, images, or clicks, while the software translates these inputs into machine-readable instructions that the hardware can execute.
Why do computers prefer numbers instead of text?
Human language is inherently complex. The same word or sentence can have different meanings depending on its context, tone, culture, or intent.
For example:
"Can you open the window?" could be a request or a command.
"The bank is closed." could refer to a financial institution or the side of a river, depending on the context.
Because language is ambiguous, computers cannot directly reason about text the way humans do.
Numbers, on the other hand, are precise and unambiguous. The value 42 always represents the same quantity, regardless of where it appears. This consistency makes numerical representations much easier for computers to process reliably.
Why must everything be converted into numbers?
At the hardware level, computer circuits only process electrical signals represented as 1s and 0s. Therefore, every type of information—including text, images, audio, and videos—must eventually be converted into numbers before it can be processed.
For text, computers use character encoding standards that assign a unique number to every character.
For example:
ASCII was one of the earliest standards, assigning numeric values to English letters, digits, and symbols.
Unicode is the modern standard that supports characters from virtually every language in the world, including emojis and special symbols.
Once characters are represented as numbers, those numbers are stored and processed in binary (0s and 1s) by the computer's hardware.
Tokenization
Introduction to tokens
Although computers store text as numbers, modern AI models process language differently from traditional software.
Instead of working with individual characters or entire sentences, Large Language Models (LLMs) first break text into smaller units called tokens.
Breaking text into tokens allows LLMs to process language efficiently while preserving meaning and context.
What is tokenization and why is it needed?
An LLM cannot understand text directly. It can only process numbers. Tokenization is the process of breaking text into smaller pieces called tokens, which are then converted into numbers and processed by the model.
Tokenization is needed because human language contains millions of words, symbols, and variations. Instead of storing every possible word, LLMs use a vocabulary of reusable tokens. This makes them more efficient and allows them to understand even unfamiliar words.
Words vs tokens
A word is what humans read and write.
A token is the unit an LLM processes. A token can be:
A complete word
Part of a word
A punctuation mark
A number or symbol
For example:
"The cat sat."→["The", "cat", "sat", "."]"unbelievable"→["un", "believable"]
Notice that one word can become multiple tokens.
Rule of thumb: 1 token is roughly 4 characters or ¾ of an English word, though this varies depending on the text and the tokenizer.
In short, humans think in words, but LLMs process tokens.
Transformers
A Transformer is a deep learning architecture introduced by Google researchers in 2017. It is the foundational engine powering today’s Large Language Models (LLMs) like ChatGPT, Gemini, and Claude.
Unlike previous AI models that read sentences word-by-word, a Transformer processes an entire document all at once. It translates raw data into numerical relationships, allowing the machine to instantly evaluate how every word in a file connects to every other word.
Why has it changed AI?
Before 2017, AI relied on networks called RNNs (Recurrent Neural Networks) and LSTMs. These models had a massive flaw: they processed text like a conveyor belt, reading one word at a time.
The Transformer completely broke this bottleneck, changing AI in three massive ways:
Massive Parallelization: Because it processes data all at once, engineers can train models using thousands of computer GPUs simultaneously.
Unprecedented Scale: Parallel processing made it possible to train networks on billions of parameters using almost the entire public internet.
Universal Application: While built for text, the architecture works perfectly for images (Vision Transformers), audio, and DNA sequences.
How it helps us understand language?
Human language is messy, and words completely change meaning depending on their surroundings. The Transformer mirrors human reading comprehension through three core mechanics:
Self-Attention: The model computes how much "attention" to pay to other parts of a sentence to understand a specific word. In the sentence "The bank of the river," the model uses "river" to determine that "bank" means land, not a financial institution.
Long-Range Context: Older AI models would "forget" the beginning of a long paragraph by the time they reached the end. Transformers maintain a mathematical memory across thousands of words, tracking themes and characters effortlessly.
Positional Encoding: Since it reads everything simultaneously, the model injects a mathematical stamp to keep track of word order. This lets the AI distinguish between "The dog bit the man" and "The man bit the dog".
Why almost every modern LLM uses Transformers?
The global AI ecosystem has completely standardized around Transformers because they scale predictably and deliver unrivaled accuracy:
The Scaling Laws: Computer scientists discovered that as you give a Transformer more data and more compute power, its performance keeps improving without hitting a ceiling.
High Efficiency: They are highly optimized for modern computer chips. Training a massive model sequentially would take decades; with Transformers, it takes weeks or days.
Generative Superiority: The architecture is uniquely suited for predicting the next most logical chunk of text, making it the perfect foundation for conversational chatbots, code assistants, and translation tools.
Conclusion
Large Language Models are not magic — they are the result of decades of research in linguistics, mathematics, and computer science, combined with an enormous amount of data and computing power.
Once you understand the journey from binary digits to tokens, and from tokens to a Transformer predicting the next word in a sentence, the technology becomes far less mysterious. It also becomes easier to use well — and easier to think critically about its limitations.





