Large Language Models and Agentic AI: Graduate Course Syllabus

Course Description

This course is an advanced, research-oriented treatment of large language models and the agentic systems built on top of them. Building directly on a prior graduate course in deep learning, it moves quickly from a one-week consolidation of Transformer architecture and text generation into the core of the modern LLM stack: pretraining and scaling laws, reasoning models and test-time compute, inference optimization, prompting and hybrid architectures, fine-tuning and alignment, retrieval-augmented generation, tool-using and multi-agent systems, multimodal and conversational interfaces, rigorous evaluation, and safety and security. The centerpiece of the course is a semester-long research project: each team formulates an original research question, designs and runs experiments, and reports results in three milestone presentations (proposal, interim, final) and a documented GitHub repository. Each student leaves the course with a demonstrable, novel, technically deep, research-oriented project added to their portfolio.

Students are assumed to have completed an advanced graduate course in deep learning. In particular, the following are treated as known and are not retaught: backpropagation and optimization (SGD, Adam, learning-rate schedules), regularization, CNNs and RNNs, embeddings, sequence-to-sequence models, and basic attention; comfortable fluency in Python and PyTorch; and graduate-level probability and linear algebra. Week 1 provides a fast, LLM-focused consolidation of the Transformer and decoding, not an introduction to deep learning. Students missing the prerequisite should first work through Part I of the course text (Chapters 0 to 5) and Appendix A independently.

Learning Outcomes

Course Format

The course consists of lectures and student presentations. Ten sessions are lectures on the week's topic, with the listed chapters to be read before class. The remaining three sessions (Weeks 5, 8, and 13) are dedicated entirely to student presentations of the research projects: proposal, interim, and final. In all three, teams present and receive in-class feedback from the instructor and peers.

Research Project

The project is the core deliverable of the course and is explicitly research-oriented: the goal is a novel, defensible empirical or methodological contribution. Students work in teams of two. Each team is required to: (i) formulate a novel research problem connected to the course material, positioned against related work and not already answered in the literature; (ii) source suitable datasets or generate synthetic data where no adequate dataset exists, with the data collection or generation methodology documented and justified; and (iii) design and run controlled experiments that answer the problem with evidence. Significant novelty is a hard requirement and is assessed at the proposal stage; projects re-implementing an existing system or reproducing a published result without a new question will not be approved.

Grading

Component	Weight	Due
Proposal presentation	10%	Week 5
Interim presentation	20%	Week 8
Final presentation	20%	Week 13
Project repository: code, text, and documentation (GitHub)	50%	One week after Week 13

Weekly Schedule

Chapter numbers refer to the course text; each entry links to the corresponding chapter. Presentation weeks are highlighted.

Week	Topic and readings
1	From Deep Learning to LLMs Fast consolidation for students with a deep-learning background: attention as the central primitive, the Transformer architecture in detail (pre-norm, positional encodings, KV caching), and decoding strategies for text generation (sampling, beam, nucleus, speculative decoding). Course overview and project kickoff. Readings [1]: Ch. 2, Sequence Models & the Attention Mechanism; Ch. 3, The Transformer Architecture; Ch. 4, Decoding Strategies & Text Generation
2	Pretraining, Scaling Laws & the Modern LLM Landscape Pretraining objectives, data curation and contamination, compute-optimal scaling, and the contemporary model landscape: architecture variants, mixture-of-experts, long-context internals, and open-weight versus frontier models. Readings [1]: Ch. 6, Pretraining, Scaling Laws & Data Curation; Ch. 7, Modern LLM Landscape & Model Internals
3	Reasoning, Test-Time Compute & Efficient Inference Reasoning models and chain-of-thought training, test-time compute scaling, verification and search; inference optimization (quantization, batching, attention kernels, serving systems); a survey of interpretability and mechanistic analysis as a research toolkit. Readings [1]: Ch. 8, Reasoning Models & Test-Time Compute; Ch. 9, Inference Optimization & Efficient Serving; Ch. 10, Interpretability & Mechanistic Understanding
4	Working with LLMs: APIs, Prompting & Hybrid Architectures LLM APIs and structured outputs, prompt engineering as a disciplined methodology (few-shot, chain-of-thought, self-consistency, prompt optimization), and decision frameworks for hybrid ML+LLM system design. Proposal clinic: experimental-design checklist for the project proposals. Readings [1]: Ch. 11, Working with LLM APIs; Ch. 12, Prompt Engineering & Advanced Techniques; Ch. 13, Hybrid ML+LLM Architectures & Decision Frameworks
5	Student Presentations I: Project Proposals Each team presents its research question, related work, method, and experimental design, and receives in-class feedback from the instructor and peers.
6	Training & Adaptation: Fine-Tuning, PEFT & Alignment Synthetic data generation, supervised fine-tuning, parameter-efficient methods (LoRA and variants), distillation and model merging, and alignment via RLHF, DPO, and preference tuning. Readings [1]: Ch. 15, Synthetic Data Generation & LLM Simulation; Ch. 16, Fine-Tuning Fundamentals; Ch. 17, Parameter-Efficient Fine-Tuning, Distillation & Model Merging; Ch. 18, Alignment: RLHF, DPO & Preference Tuning
7	Retrieval-Augmented Generation & Information Extraction Embeddings, vector databases and semantic search; the RAG pipeline and its failure modes; structured information extraction and NER with LLMs; advanced RAG: query rewriting, reranking, agentic and graph-based retrieval. Readings [1]: Ch. 31, Embeddings, Vector Databases & Semantic Search; Ch. 32, Retrieval-Augmented Generation (RAG); Ch. 34, Structured Information Extraction & NER; Ch. 35, Advanced RAG
8	Student Presentations II: Interim Progress Each team presents first experimental results, diagnosis of what is and is not working, deviations from the proposal, and the plan for the final stretch, and receives in-class feedback from the instructor and peers.
9	Agentic AI: Tool Use, Protocols & Multi-Agent Systems Agent foundations (planning, memory, reflection), tool use and function calling, agent protocols (MCP and beyond), multi-agent architectures and coordination, and specialized agents for coding and research. Readings [1]: Ch. 26, AI Agent Foundations; Ch. 27, Tool Use, Function Calling & Protocols; Ch. 28, Multi-Agent Systems; Ch. 29, Specialized Agents
10	Multimodal & Conversational Systems Vision-language and omni models, document understanding and OCR, architectures for end-to-end conversational AI systems, and voice and realtime multimodal assistants. Readings [1]: Ch. 21, Document Understanding and OCR; Ch. 22, Vision-Language and Omni Models; Ch. 37, Building Conversational AI Systems; Ch. 40, Voice and Realtime Multimodal Assistants
11	Evaluation: Benchmarks, LLM-as-Judge & Observability Evaluation foundations and quality metrics; specialized evaluation for RAG, agents, multimodal, and long-context systems; LLM-as-judge protocols and their biases; online evaluation and production monitoring. Directly applicable to the final project experiments. Readings [1]: Ch. 42, LLM Evaluation & Quality Metrics; Ch. 43, Specialized Evaluation: RAG, Agents, Multimodal, Long-Context; Ch. 44, Online Evaluation, Observability, and Production Monitoring; Ch. 46, LLM-as-Judge & Automated Evaluation
12	Safety, Security & Research Frontiers Adversarial security and red teaming, prompt injection, guardrails and runtime safety, agent safety, bias and hallucinations; a closing survey of research frontiers: frontier architectures, theory and cognition, and open questions. Readings [1]: Ch. 47, Adversarial Security and Red Teaming; Ch. 48, Guardrails and Runtime Safety; Ch. 49, Agent Safety & Security; Ch. 52, Bias, Fairness & Hallucinations; Ch. 75, Frontier Architectures & Scaling; Ch. 76, Frontier Theory & Cognition; Ch. 77, AGI Trajectories & Open Questions
13	Student Presentations III: Final Project Presentations Conference-style final talks with in-class feedback: contribution, method, experiments, results, and limitations. Project repositories (code, text, and documentation) due one week later.

Policies

Use of AI tools

This is a course about LLMs; using LLMs and coding agents in your project work is encouraged and is itself a skill the course develops. Two rules apply. First, significant novelty: whatever tools are used, the submitted work must constitute a significant novel contribution by the team, in the problem formulation, the experimental design, and the findings; work whose substance could be produced by a single prompt to an off-the-shelf model does not meet the bar. Second, accountability: you are fully responsible for the correctness of every claim, number, and citation you submit, regardless of which tool produced it. Hallucinated references or unverified AI-generated results are treated as academic integrity violations.

Collaboration and integrity

Discussion across teams is encouraged; code, experiments, and writing must be the team's own. All experimental results reported in milestones and the repository documentation must be backed by runnable artifacts in the team's repository.

Large Language Models and Agentic AI

Prerequisites