Build production AI systems you can actually put on your resume.
Each project includes architecture walkthrough, stack choices, implementation guidance, deployment notes, evaluation rubric, and ready-to-use resume bullet points.
Every project includes:
Architecture overview and design rationale
Recommended tech stack with alternatives
Step-by-step implementation guidance
Evaluation rubric and quality criteria
Deployment guide (Vercel, Railway, Fly.io)
3 ready-to-use resume bullet points
Free projects
Open access
Intermediate
10h
Production RAG Chatbot
Build an end-to-end RAG chatbot that ingests documents, chunks and embeds them, retrieves relevant context with hybrid search, reranks results, and returns grounded, cited answers.
Skills
RAGVector SearchEmbeddingsPython
Stack
PythonFastAPILangChain or customOpenAI
Built a production RAG chatbot with hybrid retrieval (BM25 + dense), reranking, and citation grounding, reducing hallucination rate by ~40%
Design and build a tool-calling AI agent that handles customer support tasks: looking up orders, checking policies, escalating to humans, and logging outcomes — with guardrails and audit trails.
Skills
AgentsTool CallingGuardrailsMemory
Stack
PythonOpenAI function callingFastAPIRedis (session memory)
Built a production agentic customer support system with 12 tools, structured outputs, and automatic escalation to human agents
Build a multi-document Q&A platform where users can upload PDFs, ask questions, and get answers grounded in specific document sections — with permission-aware retrieval for multi-tenant scenarios.
Skills
RAGMulti-tenantPermissionsPDF
Stack
Next.jsPythonLangChainPinecone namespaces
Built multi-tenant document Q&A platform with permission-aware retrieval using Pinecone namespaces and row-level auth
Build a pipeline that generates synthetic question-answer pairs from source documents using LLMs, validates quality with LLM-as-judge, and produces ready-to-use evaluation datasets for RAG systems.
Skills
EvaluationRAGSynthetic DataLLM-as-judge
Stack
PythonOpenAIRAGASPandas
Built synthetic eval dataset pipeline generating 500+ QA pairs from internal docs, filtered by LLM judge quality threshold
Build a tracing and observability dashboard for LLM applications that captures token usage, latency distributions, tool-call outcomes, evaluation scores, and cost-per-request trends over time.
Skills
LLMOpsObservabilityTracingCost Monitoring
Stack
PythonOpenTelemetryClickHouse or TimescaleDBGrafana or custom Next.js
Built LLM observability platform tracking token cost, latency p95, and eval scores across 50k+ daily LLM calls
Build an AI-powered interview coach that gives engineers practice with behavioral and system design questions, evaluates their answers using LLM-as-judge, and tracks improvement over time.
Skills
Interview PrepLLM-as-judgeSystem DesignEvaluation
Stack
Next.jsOpenAISupabaseFirebase Auth
Built AI interview coach using LLM-as-judge evaluation with per-rubric scoring across clarity, depth, and tradeoff awareness
Build a reusable evaluation framework for LLM applications with dataset management, multiple evaluator types (deterministic, LLM-as-judge, human), regression test runner, and result dashboards.
Skills
EvaluationLLMOpsTestingCI/CD
Stack
PythonFastAPISupabaseOpenAI
Built LLM evaluation harness with deterministic, LLM-as-judge, and human evaluator types, used to gate prompt version releases
Build a minimal MCP server registry where developers can publish tool definitions, browse available servers, and connect them to agent applications with auth and versioning.
Skills
MCPAgentsAPI DesignSecurity
Stack
Node.jsNext.jsSupabaseRedis
Built MCP server marketplace with schema validation, versioning, and auth-gated tool discovery for agent integrations
Build a multi-agent system where specialized sub-agents review code for security, performance, test coverage, and documentation quality — coordinated by a planner agent with human approval gates.
Skills
Multi-agentCode ReviewLangGraphGitHub
Stack
PythonLangGraphOpenAIGitHub API
Built multi-agent code review system with 4 specialized sub-agents orchestrated by a planner using LangGraph state machines
Build a production-ready AI SaaS template with Google auth, Stripe subscriptions, content gating, SEO pages, admin dashboard, and an AI feature — ready to clone and monetize.
Skills
AI SaaSStripeFirebaseNext.js
Stack
Next.jsTypeScriptTailwind CSSFirebase
Built production AI SaaS template with Google OAuth, Stripe subscriptions, webhook sync, and middleware-based content gating
Get full access to all 7 premium projects with complete implementation guides, evaluation rubrics, and resume bullet templates — plus all premium lessons and quizzes.