Large Language Models and Agentic AI

Level: Graduate  ·  Duration: 13 weeks, one 3-hour session per week  ·  Credits: 3

Instructor: Dr. Alexander (Sasha) Apartsin  ·  Course text: llmbook.apartsin.com (open access) · ebook at Amazon

Course Description
This course is an advanced, research-oriented treatment of large language models and the agentic systems built on top of them. Building directly on a prior graduate course in deep learning, it moves quickly from a one-week consolidation of Transformer architecture and text generation into the core of the modern LLM stack: pretraining and scaling laws, reasoning models and test-time compute, inference optimization, prompting and hybrid architectures, fine-tuning and alignment, retrieval-augmented generation, tool-using and multi-agent systems, multimodal and conversational interfaces, rigorous evaluation, and safety and security. The centerpiece of the course is a semester-long research project: each team formulates an original research question, designs and runs experiments, and reports results in three milestone presentations (proposal, interim, final) and a documented GitHub repository. Each student leaves the course with a demonstrable, novel, technically deep, research-oriented project added to their portfolio.

Prerequisites

Students are assumed to have completed an advanced graduate course in deep learning. In particular, the following are treated as known and are not retaught: backpropagation and optimization (SGD, Adam, learning-rate schedules), regularization, CNNs and RNNs, embeddings, sequence-to-sequence models, and basic attention; comfortable fluency in Python and PyTorch; and graduate-level probability and linear algebra. Week 1 provides a fast, LLM-focused consolidation of the Transformer and decoding, not an introduction to deep learning. Students missing the prerequisite should first work through Part I of the course text (Chapters 0 to 5) and Appendix A independently.

Learning Outcomes

On completing the course, students will be able to:

Course Format

The course consists of lectures and student presentations. Ten sessions are lectures on the week's topic, with the listed chapters to be read before class. The remaining three sessions (Weeks 5, 8, and 13) are dedicated entirely to student presentations of the research projects: proposal, interim, and final. In all three, teams present and receive in-class feedback from the instructor and peers.

Research Project

The project is the core deliverable of the course and is explicitly research-oriented: the goal is a novel, defensible empirical or methodological contribution. Students work in teams of two. Each team is required to: (i) formulate a novel research problem connected to the course material, positioned against related work and not already answered in the literature; (ii) source suitable datasets or generate synthetic data where no adequate dataset exists, with the data collection or generation methodology documented and justified; and (iii) design and run controlled experiments that answer the problem with evidence. Significant novelty is a hard requirement and is assessed at the proposal stage; projects re-implementing an existing system or reproducing a published result without a new question will not be approved.

Milestones, all presented in class:

Grading

ComponentWeightDue
Proposal presentation10%Week 5
Interim presentation20%Week 8
Final presentation20%Week 13
Project repository: code, text, and documentation (GitHub)50%One week after Week 13

Weekly Schedule

Chapter numbers refer to the course text; each entry links to the corresponding chapter. Presentation weeks are highlighted.

WeekTopic and readings
1 From Deep Learning to LLMs
Fast consolidation for students with a deep-learning background: attention as the central primitive, the Transformer architecture in detail (pre-norm, positional encodings, KV caching), and decoding strategies for text generation (sampling, beam, nucleus, speculative decoding). Course overview and project kickoff.
2 Pretraining, Scaling Laws & the Modern LLM Landscape
Pretraining objectives, data curation and contamination, compute-optimal scaling, and the contemporary model landscape: architecture variants, mixture-of-experts, long-context internals, and open-weight versus frontier models.
3 Reasoning, Test-Time Compute & Efficient Inference
Reasoning models and chain-of-thought training, test-time compute scaling, verification and search; inference optimization (quantization, batching, attention kernels, serving systems); a survey of interpretability and mechanistic analysis as a research toolkit.
4 Working with LLMs: APIs, Prompting & Hybrid Architectures
LLM APIs and structured outputs, prompt engineering as a disciplined methodology (few-shot, chain-of-thought, self-consistency, prompt optimization), and decision frameworks for hybrid ML+LLM system design. Proposal clinic: experimental-design checklist for the project proposals.
5 Student Presentations I: Project Proposals
Each team presents its research question, related work, method, and experimental design, and receives in-class feedback from the instructor and peers.
6 Training & Adaptation: Fine-Tuning, PEFT & Alignment
Synthetic data generation, supervised fine-tuning, parameter-efficient methods (LoRA and variants), distillation and model merging, and alignment via RLHF, DPO, and preference tuning.
7 Retrieval-Augmented Generation & Information Extraction
Embeddings, vector databases and semantic search; the RAG pipeline and its failure modes; structured information extraction and NER with LLMs; advanced RAG: query rewriting, reranking, agentic and graph-based retrieval.
8 Student Presentations II: Interim Progress
Each team presents first experimental results, diagnosis of what is and is not working, deviations from the proposal, and the plan for the final stretch, and receives in-class feedback from the instructor and peers.
9 Agentic AI: Tool Use, Protocols & Multi-Agent Systems
Agent foundations (planning, memory, reflection), tool use and function calling, agent protocols (MCP and beyond), multi-agent architectures and coordination, and specialized agents for coding and research.
10 Multimodal & Conversational Systems
Vision-language and omni models, document understanding and OCR, architectures for end-to-end conversational AI systems, and voice and realtime multimodal assistants.
11 Evaluation: Benchmarks, LLM-as-Judge & Observability
Evaluation foundations and quality metrics; specialized evaluation for RAG, agents, multimodal, and long-context systems; LLM-as-judge protocols and their biases; online evaluation and production monitoring. Directly applicable to the final project experiments.
12 Safety, Security & Research Frontiers
Adversarial security and red teaming, prompt injection, guardrails and runtime safety, agent safety, bias and hallucinations; a closing survey of research frontiers: frontier architectures, theory and cognition, and open questions.
13 Student Presentations III: Final Project Presentations
Conference-style final talks with in-class feedback: contribution, method, experiments, results, and limitations. Project repositories (code, text, and documentation) due one week later.

Policies

Use of AI tools

This is a course about LLMs; using LLMs and coding agents in your project work is encouraged and is itself a skill the course develops. Two rules apply. First, significant novelty: whatever tools are used, the submitted work must constitute a significant novel contribution by the team, in the problem formulation, the experimental design, and the findings; work whose substance could be produced by a single prompt to an off-the-shelf model does not meet the bar. Second, accountability: you are fully responsible for the correctness of every claim, number, and citation you submit, regardless of which tool produced it. Hallucinated references or unverified AI-generated results are treated as academic integrity violations.

Collaboration and integrity

Discussion across teams is encouraged; code, experiments, and writing must be the team's own. All experimental results reported in milestones and the repository documentation must be backed by runnable artifacts in the team's repository.


References

[1] A. Apartsin and Y. Aperstein, Building Language AI: From Tokens to Agents, 16th ed., 2026. Open access online at https://llmbook.apartsin.com; also available as an ebook at Amazon: https://www.amazon.com/dp/B0H1MQH23D.

Supplementary books

[2] S. Raschka, Build a Large Language Model (From Scratch). Manning Publications, 2024. ISBN 978-1633437166. Available at Amazon.

[3] J. Alammar and M. Grootendorst, Hands-On Large Language Models: Language Understanding and Generation. O'Reilly Media, 2024. ISBN 978-1098150969. Available at Amazon.

[4] C. Huyen, AI Engineering: Building Applications with Foundation Models. O'Reilly Media, 2025. ISBN 978-1098166304. Available at Amazon.