Engineering of AI Systems · HIT

AI 320 · HIT · Advanced Course · Year 3 · 13 Weeks

Engineering of AI Systems

הנדסת מערכות בינה מלאכותית

DevOps, DataOps, MLOps, LLMOps, AgentOps, one running project, end to end

Prerequisites: Introduction to Machine Learning, Operating Systems, Software Engineering Required for: AI and software-engineering specializations Feeds into: the capstone project and industry internships Format: 2 h lecture + 2 h practice, one integrative team project Assessment: a running project presented three times, no written exams

HIT course catalogue package

Department submission documents in the HIT form, on the official letterhead (Word, downloadable):

Syllabus (English) Syllabus (Hebrew) Rationale Catalogue summary

About this course

This is the advanced course on building and running AI systems in production, taken after a first machine-learning course. Modern AI systems fail far more often in operations than in modelling: a model that scores well offline still has to be packaged, served, version-controlled, fed with trustworthy data, monitored for drift, secured, costed, and governed once real users depend on it. The course teaches the engineering discipline that surrounds the model: the practices, tooling, and architectures that take a prototype into reliable, observable, continuously-improving production service. The emphasis is on building and operating, not on watching.

No prior cloud or LLM experience is assumed. Students arrive with basic machine learning and some software engineering; the course builds the cloud foundations (compute, storage, networking, deployment models) in week 2 and the LLM and AI-API foundations (tokens, prompts, structured outputs, the token economy) in week 9, before operating on either.

Rationale. Every AI specialization, vision, language, agents, eventually meets the same operational stack. This course provides that shared base for deploying and running AI in the real world. It is project-based and designed for the way students will actually work, with an AI coding assistant at hand, while keeping the learning genuine through a Build, Operate, and Review model.

Course format

Each week is 4 contact hours: a 2-hour lecture (concepts and architecture) and a 2-hour practice session that carries its own material, the instructor demonstrates the tooling live and teaches hands-on topics that belong at the keyboard rather than on slides (cloud consoles, Kafka, gateways, managed AI platforms, MCP). There are no separate weekly labs: each practice session closes with a project-integration brief, the increment every team adds to its end-to-end system that week. The thirteen weeks form six parts: Foundations and the Cloud (weeks 1 to 2), DevOps (3 to 4), DataOps (5 to 6), MLOps (7 to 8), LLMOps and AgentOps (9 to 12), and Security and Governance (13). The running project threads through all of it and is presented to the class three times, in the Student Project Presentations: a specification (week 5), an interim review (week 8), and a final production demo (week 13).

DevOpsThe foundation: cloud, CI/CD, containers, orchestration, observability.
DataOpsTrustworthy data: lakes and medallion, pipelines, quality, versioning, streaming.
MLOpsThe model lifecycle: tracking, registry and catalogue, serving, monitoring, drift.
LLMOpsOperating LLMs: AI APIs and the token economy, RAG, gateways, evaluation, guardrails.
AgentOpsOperating agents: tools and MCP, tracing, step-level evaluation, bounds, managed agents.

Expected outcomes

By the end of the course, students will be able to:

1☁️Use cloud compute, storage, and networking, and choose a deployment model (IaaS to serverless).
2🏗️Design and operate a CI/CD pipeline with tests, containers, and a versioned REST API.
3🧮Build data lakes and pipelines on the medallion architecture, with versioning, quality gates, and streaming.
4🔄Operate the ML lifecycle: track experiments, version models in a registry/catalogue, serve and roll out safely.
5📉Monitor production models, detect data and concept drift, and trigger retraining.
6🪙Call AI APIs well and reason about the token economy: cost, latency, caching, model tiers.
7🧠Build and operate RAG services behind gateways, with eval suites, tracing, and guardrails.
8🤖Build, trace, bound, and evaluate tool-using agents, self-built and managed, with MCP.
9🛡️Secure the supply chain and apply the OWASP LLM Top 10 and privacy practice across the stack.
10🔗Carry one running service from specification through to a governed production deployment.

Prerequisites: review and self-check

The course assumes a prior machine-learning course and the background below. Each row links a short refresher and a set of self-check questions, so readiness can be confirmed before Week 1.

SubjectBackground topicsMaterial
🔧Programming & version control Python, Git and pull requests, unit testing, code review, reading a stack trace ReviewSelf-check
⚙️Operating systems & networking Processes, the Linux shell, filesystems, HTTP and REST, TCP/IP, an intuition for containers ReviewSelf-check
🧮Machine learning Training and evaluation, train/test split, loss versus metric, overfitting, a trained model as an artifact ReviewSelf-check

Weekly materials

Each week lists two sets of topics: what the lecture covers on the board, and what the practice session teaches at the keyboard. In the highlighted weeks the practice slot is a Student Project Presentation: fully student-run, no instructor material.

WkTopicMaterials
Part I · Foundations & the Cloud
1 Production Engineering & the Ops Landscape Lecture Why systems fail in operations; SLOs and error budgets; the SRE mindset; the five layers; the course use cases (IoT, chatbot, document processing). Practice Git workflow and repo hygiene; containerise a first service; form project teams. Tools Git · GitHub · Docker Lesson planPractice
2 Cloud Computing Fundamentals Lecture Compute, storage, and networking primitives; IaaS/PaaS/SaaS and serverless; regions and availability zones; shared responsibility; the cost model. Practice Budget alerts first; launch a VM and the project bucket; the same app in three deployment models; price your use case. Tools cloud free tier · VM · object storage (S3) · serverless · cost calculator Lesson planPractice
Part II · DevOps
3 CI/CD, Testing & REST Services Lecture CI/CD pipelines; the testing pyramid; trunk-based development; infrastructure as code; REST API design and versioning. Practice The project's FastAPI skeleton with tests; a GitHub Actions pipeline (lint, test, build, push); required checks gate the merge. Tools FastAPI · pytest · GitHub Actions · container registry · Terraform (intro) Lesson planPractice
4 Orchestration, Deployment Patterns & Observability Lecture Kubernetes primitives; blue-green and canary rollouts; GitOps; logs, metrics, traces; RED and tail latency. Practice Deploy the project to a local cluster; self-healing and scaling; a canary rollout; the project RED dashboard under load. Tools Kubernetes (kind) · kubectl · Prometheus · Grafana · Argo CD (demo) Lesson planPractice
Part III · DataOps
5 Data Lakes, Pipelines & Versioning Lecture Data lake, lakehouse, and the medallion architecture (bronze, silver, gold); DAG orchestration, idempotency, backfills; data versioning and lineage. 🎤 Student Project Presentation 1 · Specification Tools Airflow / Dagster · DVC · Parquet / Delta · the object-storage lake Lesson planPresentation brief
6 Data Quality, Contracts, Streaming & Feature Stores Lecture Validation and quality SLAs; data contracts; Kafka and streaming versus batch; feature stores and train/serve skew. Practice Profile and gate the project's real data; write and enforce its data contract; stand up Kafka and land a stream in the lake. Tools Great Expectations · Kafka · Feast (concept) Lesson planPractice
Part IV · MLOps
7 Experiment Tracking, Model Registry & Serving Lecture Experiment tracking and reproducibility; the model registry as catalogue; model cards; serving patterns and safe rollout (shadow, canary, A/B). Practice Instrument the project's training with MLflow; register, promote, roll back; serve behind the project API; canary model v2 next to v1. Tools MLflow (tracking + registry) · BentoML / FastAPI · model cards Lesson planPractice
8 Monitoring, Model Drift & Governance Lecture Data drift versus concept drift; detectors and proxy monitoring; retraining triggers; audit trails and model governance. 🎤 Student Project Presentation 2 · Interim Tools Evidently · PSI / KS tests · registry stage gates Lesson planPresentation brief
Part V · LLMOps & AgentOps
9 LLM Foundations: AI APIs, Tokens & the Token Economy Lecture LLMs from an engineering standpoint; tokens and context windows; prompts and structured outputs; pricing, cost levers, rate limits; managed AI platforms (Bedrock-class). Practice First API calls; count tokens, measure latency and cost; structured extraction from invoices; flagship versus mini tiers; a managed-platform console tour. Tools OpenAI & Anthropic SDKs · tokenizer · JSON Schema · Bedrock console Lesson planPractice
10 RAG & Serving LLMs: Vector Databases & Gateways Lecture Embeddings, vector databases, chunking, grounded prompts; prompts as versioned code; hosted versus self-hosted serving (vLLM); the gateway pattern. Practice Build the project's RAG core on its own corpus; tune chunking; LiteLLM gateway with fallback and budgets; prompt version swaps. Tools FAISS / Qdrant · embedding models · LiteLLM · vLLM (discussed) Lesson planPractice
11 LLM Evaluation, Guardrails & Observability Lecture Eval sets as regression suites; faithfulness and relevance; LLM-as-judge biases and calibration; LLM tracing (Langfuse); guardrails and prompt injection. Practice Build the project's eval harness and wire it into CI; trace every call through the gateway; attack with injection, add the guardrail, measure the cache. Tools Ragas · Langfuse · NeMo Guardrails · response cache Lesson planPractice
12 Agents & AgentOps: Tools, MCP & Managed Agents Lecture The agent loop and function calling; memory and termination; MCP as the tool standard; step-level tracing and evaluation; bounds; managed agent services. Practice Build a two-tool agent and trace it; localise a failure in the spans; add caps, budgets, and an error classifier; wrap a project tool as an MCP server. Tools function calling · MCP SDK · step tracing · managed-agent consoles Lesson planPractice
Part VI · Security & Governance
13 Security, Governance & Synthesis Lecture Supply-chain security and SBOMs; the OWASP Top 10 for LLM applications; privacy and audit trails; synthesis of the five layers. 🎤 Student Project Presentation 3 · Final (with oral defense) Tools secrets manager · pip-audit / SBOM · OWASP LLM checklist Lesson planPresentation brief

AI usage

Using an AI assistant is highly encouraged in this course; it reflects how production engineering is really done. Two conditions keep the learning genuine: students keep full ownership of, and responsibility for, everything they submit, and must be able to explain and defend any part of it. The Operate, Review, and oral-defense steps verify understanding rather than authorship; where an assistant was used, it should be disclosed.

Every weekly project increment follows a three-part model:

Part A · AI assistant welcomeBuild

Stand up the week's increment, a pipeline stage, a service, a monitor, an eval; an AI assistant may be used freely.

Part B · under loadOperate & observe

Deploy it, instrument it, then break it on purpose. Predict what the telemetry will show, run the experiment, and compare.

Part C · in plain languageReview & defend

Explain the design and its trade-offs, where it would fail, and what you would change; be ready to defend any line at the presentations.

Assessment and grading

Grading is project-based, with weight on the parts an AI assistant cannot do for the student: operating a system under load, interpreting telemetry, and defending design decisions. The running project is the single deliverable, built up in weekly increments and graded at the three Student Project Presentations. There are no written exams and no separate lab submissions.

ComponentWhat it coversWeight
Project · SpecificationStudent Project Presentation 1: problem, SLOs, architecture, data and DevOps plan (week 5).20%
Project · InterimStudent Project Presentation 2: working pipeline, registry, deployment, live CI/CD (week 8).30%
Project · FinalStudent Project Presentation 3: end-to-end production demo with a short oral defense (week 13).50%

The running project

100% of the grade · teams of three or four · presented three times

Teams carry a single AI-enabled service through the entire operational stack across the semester, integrating most of the covered material into one end-to-end system: cloud footprint, CI/CD, medallion data lake, model registry and serving, gateway-fronted LLM feature, evals and guardrails, an agentic capability, and the audit trail through all of it. Each week's practice session ends with a project-integration brief, the increment due before the next week; the three Student Project Presentations are the graded checkpoints. Teams choose one of the course use cases, IoT telemetry (sensor streams, anomaly alerts, predictive maintenance), a document-QA chatbot (RAG over a real corpus), or document processing (structured extraction at scale), or propose another domain in the same spirit for approval. Whatever the domain, teams must demonstrate operational maturity, not a working model alone. Each presentation is 12 to 15 minutes plus questions, with a short written report and a tagged release of the repository.

P1Student Project Presentation 1 · Specification

20% of the grade · week 5 · spec & design

P2Student Project Presentation 2 · Interim

30% of the grade · week 8 · working pipeline & first deployment

P3Student Project Presentation 3 · Final

50% of the grade with a short oral defense · week 13 · production demo

Example project ideas

Each idea below exercises every layer of the stack; teams may take one as-is or propose a variant of comparable scope. The three primary use cases (IoT, chatbot, document processing) are the safest choices; the rest are approved variants.

📡Predictive-maintenance IoT monitor. Stream simulated machine-sensor telemetry through Kafka into a medallion lake; serve an anomaly model behind a REST API; an agent triages alerts into ranked incidents with runbooks.
💬Support-docs Q&A chatbot. RAG over a real product or university documentation corpus; grounded answers with citations; eval set for faithfulness; a guardrail against injection; an agent that escalates unanswerable questions.
🧾Invoice / form processor. Structured extraction from PDFs and images at scale; a confidence threshold that routes low-confidence documents to a human; drift monitoring on field-extraction accuracy.
🍷Menu / receipt nutrition estimator. Multimodal extraction from photos; a classical model plus an LLM fallback; a feature pipeline for per-item history; cost-tiered routing under a token budget.
🔌Smart-home energy advisor. IoT power readings streamed and aggregated to gold features; a forecasting model served online; an agent that proposes (never executes) cost-saving actions with human-in-the-loop approval.
📝Code-review assistant. RAG over a repository plus an agent with read-only tools; step-level tracing; an eval set of known issues; strict least-privilege and prompt-injection defenses on untrusted diffs.
🎓Research-paper summariser. Ingest a stream of new arXiv abstracts; embed and index; a daily digest agent; evaluation against human-written summaries; full cost and latency reporting.
📊Churn-prediction service. A tabular pipeline on the lakehouse; tracked and registered models with canary rollout; drift detection on customer features; an LLM that drafts retention-email explanations from the prediction.

References

A curated reading list spanning the five layers. Individual lesson plans link the chapters and resources for that week; the full list is on the references page.

Foundations
Site Reliability Engineering · Beyer, Jones, Petoff & Murphy (Google, O'Reilly 2016)

The canonical text on running production systems: SLOs, error budgets, toil, on-call, and incident response.

Data
Designing Data-Intensive Applications · Martin Kleppmann (O'Reilly 2017)

The foundation for DataOps: storage, replication, batch and stream processing, and reliability of data systems.

ML / LLM
Designing ML Systems & AI Engineering · Chip Huyen (O'Reilly 2022 / 2025)

End-to-end MLOps and LLMOps: data, training, deployment, monitoring, RAG, evaluation, and inference at scale.

Tools and resources

The toolchain mirrors the weekly schedule; everything runs on laptops and cloud free tiers, no GPU required.