AI Engineer Roadmap

01 /

1

Foundations

Python, math, and data — the bedrock everything else rests on

6–8 weeks

+

🐍

+

Python essentials

Core syntax, data structures, OOP, file I/O, virtual environments

beginner

What to learn

Variables, loops, functions, list comprehensions

Classes & OOP, decorators, generators

pip, venv / conda, requirements.txt

NumPy & Pandas basics — arrays, DataFrames

Jupyter notebooks for experimentation

Git & GitHub basics — commits, branches, PRs

∑

+

Math for ML

Linear algebra, calculus, probability you'll actually use in practice

required

What to learn

Vectors, matrices, dot products, matrix multiply

Derivatives & gradients — intuition, not proofs

Chain rule (this IS backprop)

Probability: distributions, Bayes theorem

Statistics: mean, variance, covariance

3Blue1Brown Essence of LA & Calculus (free)

📊

+

Data & EDA

Wrangling, cleaning, and visualising datasets before any model touches them

practical

What to learn

Pandas for manipulation & cleaning

Matplotlib / Seaborn for visualisation

Handling missing data, outliers, dtypes

Train / val / test splitting correctly

Feature scaling: StandardScaler, MinMaxScaler

Kaggle datasets — start practising immediately

02 /

2

ML & Deep Learning

Neural nets, backprop, and transformers — the building blocks of every LLM

8–10 weeks

+

🌲

+

Classical ML

Scikit-learn fundamentals — the mental models still matter

important

What to learn

Linear / logistic regression from scratch

Decision trees, random forests, XGBoost

Cross-validation, confusion matrix, AUC-ROC

Overfitting & regularisation (L1 / L2)

Scikit-learn pipelines & GridSearchCV

When NOT to use deep learning

🧠

+

Deep Learning

Neural nets, backprop, CNNs, RNNs — foundations of all large models

core

What to learn

Perceptrons, activation functions, layers

Backpropagation & gradient descent by hand

PyTorch: tensors, autograd, nn.Module

Training loops: forward → loss → backward → step

Batch norm, dropout, weight decay

CNNs for vision; RNNs / LSTMs for sequences

⚡

+

Transformers

Attention is all you need — understand every component deeply

critical

What to learn

Self-attention: Q, K, V matrices explained

Multi-head attention — why multiple heads?

Positional encodings (sinusoidal & RoPE)

Encoder-decoder vs decoder-only models

Layer norm, residual connections, FFN layers

Implement a mini-GPT from scratch (Karpathy)

03 /

3

LLMs & GenAI

Internals, fine-tuning, RAG, and prompt engineering — the core specialisation

8–10 weeks

+

🔬

+

LLM internals

How GPT, Claude, Llama actually work under the hood

deep-dive

What to learn

Tokenisation: BPE, SentencePiece, tiktoken

Pre-training: next-token prediction at scale

RLHF / RLAIF — how models get aligned

Context windows, KV cache, attention patterns

Emergent abilities: in-context learning, CoT

Scaling laws — Chinchilla & compute-optimal

🔧

+

Fine-tuning & LoRA

Adapt pretrained models cheaply and effectively with PEFT methods

practical

What to learn

Full fine-tuning vs parameter-efficient methods

LoRA: low-rank matrix decomposition explained

QLoRA — quantised + LoRA for consumer GPUs

Hugging Face PEFT & Transformers libraries

Instruction tuning with custom datasets

Evaluation: perplexity, BLEU, human eval

🗄️

+

RAG systems

Retrieval-augmented generation — give LLMs external memory

in-demand

What to learn

Why RAG? Solving hallucination & knowledge cutoff

Embeddings: what they are, cosine similarity

Vector DBs: Chroma, Pinecone, Qdrant, Weaviate

Chunking: fixed, semantic, hierarchical

Hybrid search: dense + sparse (BM25)

Advanced RAG: re-ranking, HyDE, query rewriting

✍️

+

Prompt engineering

The craft of communicating with LLMs precisely and reliably

vibe-coding

What to learn

Zero-shot, few-shot, chain-of-thought

System prompts, role-playing, personas

Structured outputs: JSON mode, function calling

Tree of Thought, ReAct, self-consistency

Prompt injection & jailbreak awareness

Evaluating prompt quality at scale (LLM-as-judge)

04 /

4

AI Engineering & Vibecoding

APIs, agents, AI-native IDEs, and production deployment

10–12 weeks

+

🔌

+

APIs & SDKs

Building with Anthropic, OpenAI, and open-source model APIs

vibe-coding

What to learn

Anthropic SDK: messages, streaming, tool use

OpenAI-compatible APIs — portable patterns

Managing rate limits, retries, costs

Structured outputs & JSON schema enforcement

Streaming responses for real-time UX

Cost monitoring: tokens, context window budgets

🤖

+

Agents & tools

LLMs that take actions — the frontier of practical AI engineering

hot

What to learn

Tool / function calling — give LLMs abilities

ReAct pattern: reason → act → observe loop

LangChain / LlamaIndex agent frameworks

Memory: short-term (context) vs long-term (DB)

Multi-agent systems — orchestrator + workers

Human-in-the-loop approval patterns

⌨️

+

Vibe-coding mastery

Using AI IDEs and LLMs to build 10× faster than traditional coding

meta-skill

What to learn

Claude Code, Cursor, Windsurf, Copilot in practice

Writing prompts that generate working code

Iterative refinement — the vibe-coding loop

Debugging AI-generated code effectively

Context management in long coding sessions

When to override the AI vs trust it

🚀

+

Stack & infra

The tools real AI engineers actually deploy with in production

deployment

What to learn

FastAPI for serving ML models as REST APIs

Docker & Docker Compose for reproducibility

LangSmith / LangFuse for LLM observability

Hugging Face Hub — model versioning & sharing

Modal / Replicate for GPU inference in the cloud

Gradio & Streamlit for quick demos & prototypes

05 /

5

Mastery & Portfolio

Build real things, ship publicly, and stay current in a fast-moving field

ongoing

+

🏗️

+

Capstone projects

Build real things — this is what actually gets you hired

required

Project ideas

RAG chatbot over your own document corpus

Fine-tuned domain model with LoRA (coding assistant)

Agentic app: autonomous research or code-review agent

Multimodal app: image + text pipeline

Open-source contribution to LangChain / PEFT / Axolotl

Public GitHub + technical blog post per project

📏

+

Evaluation & evals

How to measure if your AI system actually works in production

underrated

What to learn

LLM-as-judge evaluation pipelines

RAGAS for RAG system evaluation

Creating golden test sets for regression

Latency, cost, and quality trade-off analysis

A/B testing prompts in production

Model red-teaming & safety evaluation

📡

+

Stay current

The field moves weekly — staying current is itself a skill to build

ongoing

Resources

Follow: Andrej Karpathy, Sebastian Raschka, @huggingface

Papers: arXiv cs.LG, cs.CL — read abstracts daily

Courses: fast.ai, DeepLearning.AI, HF courses

Communities: r/MachineLearning, HF Discord

Newsletters: The Batch, TLDR AI, Import AI

Reproduce one new paper per month