Skip to main content
8-bit pixel art cover image with 'CS 3100: Program Design & Implementation 2' header. Timeline showing code size vs validation time: Betty Holberton with ENIAC validation checklist (1000 instructions), Grace Hopper verifying compiler (5000 lines), Margaret Hamilton with code stack (400K lines, months of review), GitHub PR review (10M lines, hours), modern Copilot (100M lines, ???). Tagline: Technology Changes. Responsibility Doesn't.

CS 3100: Program Design and Implementation II

Lecture 13: AI Coding Assistants

©2026 Jonathan Bell & Ellen Spertus, CC-BY-SA

This Is Just the Beginning of Our Conversation

This topic is unlike anything we've covered so far:

  • The technology is evolving faster than any textbook can capture
  • There's genuine disagreement among experts about best practices
  • The hype is real—and so are the concerns
  • Your professors are learning alongside you, trying to model the best practices that we are teaching you

What comes next:

  • Lab 6 (Tuesday): Hands-on practice + surveys that will shape future lectures and workshops
  • Rest of semester: AI will be woven throughout our remaining lectures and assignments—this conversation continues

This is an exciting moment to be learning together.

Announcements

Learning Objectives

After this lecture, you will be able to:

  1. Define AI programming agents and enumerate their capabilities and limitations
  2. Compare model provider tools (Copilot, Claude Code) with tool builder IDEs (Cursor, Windsurf)
  3. Apply a 6-step workflow for effective human-AI collaboration
  4. Determine when it is appropriate (and inappropriate) to use an AI programming agent
  5. Use AI coding assistants to accelerate domain modeling and design exploration

Poll: How would you supervise a SWE intern?

Imagine you have graduated and have a full-time software engineering position and are asked to supervise an intern on their first co-op. How would you guide them and evaluate their work?

Poll Everywhere QR Code or Logo

Text espertus to 22333 if the
URL isn't working for you.

https://pollev.com/espertus

What Is an LLM? Text Prediction at Scale

Diagram showing LLM basics: Input text 'The capital of France is' gets tokenized, passes through a neural network trained on trillions of words, outputs probability distribution over next tokens (Paris 95%, a 2%, etc.). Bottom shows this repeats token by token. Key insight: Not a database, a pattern predictor.

LLMs predict the next token based on patterns learned from training data. That's it.

Model Tiers: Cost, Speed, and Capability

TierExamplesBest ForTradeoff
Fast/CheapGPT-5-nano, Claude Haiku, Gemini FlashSimple completions, boilerplate, routine tasksLow cost, fast, but limited reasoning
BalancedGPT-5-mini, Claude SonnetMost coding tasks, explanations, refactoringGood balance of speed and capability
FrontierClaude Opus, GPT-5.2, Gemini ProComplex architecture, difficult bugs, novel problemsHighest capability, but slower and expensive
Tool-specificCursor's composer-1, Copilot's internal modelsOptimized for that tool's workflowTuned for speed in specific contexts

Most tools offer "Auto" mode — the tool picks the right model for each task. This is often the right default.

Pricing is usually per million tokens (input and output priced separately). Fast models: ~$0.10-0.50/M tokens. Frontier: ~$5-75/M tokens. Output tokens cost more than input.

Context Is Everything

Split comparison: Top shows vague prompt producing generic AI lecture. Bottom shows same request with L1-L12 lectures, course style guide, historical pioneers context producing THIS lecture with Human in Every Loop cover, 6-step workflow, connections to Betty Holberton and Grace Hopper. Recursive callout notes this slide itself went through 6 iterations. Key insight: Same model, different context, the 6-step workflow closed the gap.

Poll: AI Coding Experience

Have you used an AI coding assistant before? (GitHub Copilot, Cursor, ChatGPT for code, etc.)

A. Yes, regularly

B. Yes, occasionally

C. Tried it once or twice

D. Never

Poll Everywhere QR Code or Logo

Text espertus to 22333 if the
URL isn't working for you.

https://pollev.com/espertus

Public Service Announcement

  • We can tell that some students are using AI-generated code without documenting it. There's no penalty!
  • We are particularly disappointed that some students are using AI for their reflections.
  • It's up to you whether to:
    • cheat
    • lie
    • waste time and money
    • show integrity
    • learn
  • You're not fooling us, but you may be fooling yourselves.

Remember Who You Are and What You Represent

A note card reading: 'Remember who you are and what you represent. -- Hettie Belle Ege, Mills College, Est. 1852

The Secret Sauce: Static Analysis Powers Context

VS Code-style IDE showing how AI coding assistants work. Center shows editor with cursor in a method, ghost text completion, and chat panel. Left shows static analysis tracing connections to relevant files (type definitions, call graphs, imports). Right shows curated context bundle sent to LLM, contrasted with wasteful 'dump everything' approach. Key insight: You type, the tool finds what matters, you evaluate.

Two Categories of AI Coding Tools

Split infographic comparing Model Providers (Microsoft/OpenAI with Copilot, Anthropic with Claude Code) who control the models and offer flat subscriptions, versus Tool Builders (Cursor, Windsurf, Cline) who purchase API access and compete on UX features. Arrows show API costs flowing from builders to providers.

What Are They Optimizing For?

Three-path strategic diagram: Copilot optimizes for enterprise sales while giving free access to students (future customers). Claude Code creates a flywheel where Anthropic builds the best models, then builds a tool that efficiently consumes its own API while charging competitors. Cursor optimizes for lightning-fast UX and model flexibility. Arrows show how strategies interact: tool builders pay model providers, funding better models.

Strengths: Pattern Recognition and Cross-Domain Transfer

  • Pattern recognition: Recognizes and reproduces common coding patterns
  • Syntax knowledge: Extensive knowledge of language syntax, libraries, frameworks
  • Cross-domain transfer: Can apply patterns from one language/domain to another
  • Natural language understanding: Translates business context into requirements into code
  • Rapid prototyping: Generates boilerplate, tests, and common implementations quickly

Think of it as a very well-read junior developer who has seen millions of codebases.

Limitations: Entirely Non-Deterministic, Limited Context Window

  • Context window constraints: Can only see ~100K tokens at once—may miss parts of large codebases
  • No runtime verification: Generates code based on patterns, not execution results
  • Training data cutoff: May not know recent libraries, API changes, or language features
  • Hallucination risk: May generate plausible-looking code that doesn't actually work

AI Assists Throughout the Software Development Lifecycle

Software development lifecycle diagram showing five phases (Requirements, Design, Implementation, Validation, Operations) with specific tasks in each phase and exit conditions below

AI can assist at every phase—but human judgment drives every decision.

The 6-Step Human-AI Collaboration Workflow

Circular workflow diagram with six steps: Identify (recognize AI needs), Engage (craft prompts), Evaluate (assess outputs), Calibrate (steer toward goals), Tweak (refine artifacts), Finalize (document). Arrows show flow with iteration back to start.

Based on research on Developer-AI collaboration from Google

Identify & Engage: It Depends on Three Things

Triangle diagram showing three factors that determine context needs: YOUR UNDERSTANDING (can you identify what matters?), MODEL CAPABILITIES (how much can it infer?), TOOL CAPABILITIES (does it auto-find files?). Center shows: when all are high, just describe intent; when any is low, you compensate manually.

Modern Tools Change the Game

Old Mental Model (2022-2023)

"I need to carefully identify and provide all relevant context"

  • Manually open all relevant files
  • Copy-paste code snippets into prompts
  • Describe file structure explicitly
  • Anticipate what AI needs to know

You do the work of context gathering

New Reality (2026+)

"I describe intent; the tool finds context"

  • Tools search your codebase automatically
  • You can reference specific files when needed
  • Tool indexes and retrieves relevant context
  • AI asks clarifying questions

Tool does context gathering; you validate

But: Tools can't find context that doesn't exist in code—requirements, design rationale, rejected alternatives. That's still YOUR job.

Prompting Myths That Don't Actually Help

MythExampleReality
Persona prompts improve output"You are a brilliant 10x engineer who..."Models don't roleplay better code. Context and specificity matter, not flattery.
Politeness affects performance"Please" and "Thank you"Doesn't affect the model—but polite, humble communication is a good habit for talking to humans!
Threats or stakes help"I'll lose my job if you get this wrong"The model has no concept of your job. Just describe what you actually need.
More instructions = better500-word system promptsOften WORSE. Key details get lost. Be concise and specific.
Magic phrases always work"Think step by step" everywhereUseful for reasoning, but tools have built this in better. See: Plan Mode.

What actually helps: Relevant context, specific requirements, concrete examples, clear success criteria.

Plan Mode: Reducing Your Evaluation Surface

Split comparison: WITHOUT PLAN MODE - developer overwhelmed evaluating 500 lines of generated code at once. WITH PLAN MODE - developer reviews short plan first. Two thought bubbles shown: Expert thinks 'should use sessions not JWT'; Learner thinks 'I don't know what JWT is - I should learn before approving!' Callout: If you can't evaluate the plan, you can't evaluate the code. Plans reveal knowledge gaps early.

How Do You Know You're Done?

Split comparison: Left - intern says 'I'm done!' but supervisor frowns asking 'What about OAuth token refresh?' Right - intern and supervisor agree on checklist first, then intern confidently reports 'All four criteria pass' and supervisor approves.

Whether waterfall or Agile, SE techniques share one thing: agree on what "done" means before you start each piece of work.

Evaluate: The Step That Never Changes

What ChangedWhat Stayed the Same
Tools auto-find contextYou must evaluate if output is correct
Models infer more from lessYou must spot hallucinations and errors
Less manual prompt craftingYou must know if it fits your requirements
AI asks clarifying questionsYou must have domain expertise to answer

No matter how good the tools get, evaluation requires YOUR expertise.

This is why the "task familiarity" principle still applies—if you can't evaluate, you can't use AI effectively.

Steps 4-6: Calibrate, Tweak, Finalize

StepWhat You DoExample
4. CalibrateSteer AI toward desired outcomes through feedback"That's close, but use interfaces instead of abstract classes"
5. TweakManually refine AI-generated artifactsFix edge cases, adjust naming, add error handling
6. FinalizeDocument decisions and rationaleAdd comments or notes explaining why you chose this approach

The goal: AI accelerates initial generation, but YOU make the final decisions.

The Fundamental Principle: Task Familiarity

2x2 quadrant diagram with Domain Expertise on X-axis and Task Complexity on Y-axis. Top-left (low expertise, high complexity) is red 'Danger Zone'. Top-right (high expertise, high complexity) is green 'Ideal for AI'. Bottom-left (low expertise, low complexity) is yellow 'Learning Opportunity'. Bottom-right (high expertise, low complexity) is green 'Efficient Use'.

How Hard Is It to Evaluate?

Horizontal spectrum from easy to hard evaluation. Easy (green): Does it run? Does it compile? Medium (yellow): Is code readable? Hard (red): Are quiz questions confusing? (give to 100 students), Is architecture scalable? (wait 6 months). Key insight: AI can generate for any task, but hard evaluations need external validation.

AI can help you generate for any task. But don't rely on AI to evaluate hard-to-evaluate outcomes.

Poll: Evaluating AI Output

If AI generates a sorting algorithm and you don't know any sorting algorithms, how would you verify it's correct?

A. Run it on a bunch of test cases

B. Skim the code

C. I couldn't verify it properly

D. Trust it if it compiles

Poll Everywhere QR Code or Logo

Text espertus to 22333 if the
URL isn't working for you.

https://pollev.com/espertus

The "Vibe Coding" Trap

Three-panel comic showing vibe coding trap: Panel 1 - developer asks AI for login system, runs without reading, it works. Panel 2 - error appears, developer asks AI to fix, cycle repeats. Panel 3 - developer surrounded by errors, tangled code, unable to describe the problem anymore.

When to STOP Using AI and Change Your Approach

Stop signals:

  1. You can't evaluate the output — You're not sure if the code is correct or why it works
  2. You can't calibrate effectively — Repeated attempts don't move toward your goal
  3. You're describing symptoms, not problems — "It's broken" instead of "The recursion doesn't terminate"
  4. You're repeating without progress — 3-4 variations of the same request aren't getting closer to what you need

What to do instead:

  • Are there technical topics you should learn first? Do some manual implementation.
  • Are there domain concepts you need to understand? Talk to stakeholders.
  • Are the requirements unclear? Go back to requirements analysis (L9).

AI Creates "Learning Debt" When Used Too Early

Two parallel paths comparison: 'Learning First' shows slow steady progress building strong foundation. 'AI First' shows rapid initial progress then collapse when hitting complex bugs. A graph shows Path A (learning first) eventually surpassing Path B (AI first) in long-term productivity.

The goal isn't to avoid AI — it's to use it in ways that support learning rather than replace it.

The Documentation Test

Before committing AI-generated code, ask yourself:

  1. Can I justify the design decisions in this code to a colleague?
  2. If I look at this in 6 months, will I know WHY I chose this approach?
  3. Am I prepared to take responsibility if this code is wrong?
  4. Would a new team member understand the design decisions?

If the answer to any of these is "no"—you're creating maintenance debt that future-you will pay.

Demo: Domain Modeling with GitHub Copilot

Let's revisit SceneItAll from L2—an IoT/smarthome control platform:

  • Lights: Can be switched, dimmable, or RGBW tunable
  • Fans: On/off with speeds 1-4
  • Shades: Open/closed by 1-100%
  • Areas: Group devices by physical area, can be nested
  • Scenes: Preset conditions for devices, with cascading AreaScenes

Goal: Use AI to explore domain model alternatives, following the 6-step workflow.

Step 1: Identify What Information AI Needs

Before prompting, ask yourself:

  • What domain concepts exist? (We have a basic list)
  • What level of detail is needed? (Domain model, not implementation)
  • What design constraints matter? (Design for change—L7)
  • What artifacts would be useful? (Mermaid diagrams, comparison matrix)

Connection to L4: This is like writing a spec. What does the reader (AI) need to give you the right answer?

Step 2: Engage with Context-Rich Prompt

This isn't a magic formula—it's showing what effective prompts have in common: context, constraints, and clear success criteria.

We are designing a new Java project called "SceneItAll". Our first step is
to enumerate some key requirements and explore domain model alternatives.

SceneItAll is an IoT/smarthome control app with the following domain concepts:
- Lights (can be switched, dimmable, or RGBW tunable)
- Fans (on/off and speeds 1-4)
- Shades (open/closed by 1-100%)
- Areas (group devices by physical area, can be nested)
- Scenes (define preset conditions for devices, with cascading AreaScenes)

Our domain model should emphasize "design for change" so that we can defer
decisions and get an MVP up soon for user feedback.

Generate a MODEL.md file with several design alternatives expressed as
mermaid class diagrams, including pros/cons for each.

Step 3: Evaluate Against Success Criteria

When evaluating AI output, ask:

  • Does it capture all the domain concepts we identified?
  • Do the design alternatives actually differ in meaningful ways?
  • Are the pros/cons accurate? (Use YOUR domain knowledge!)
  • Does it support "design for change"?
  • Is anything WRONG? (Hallucinated patterns, incorrect relationships)

This is where YOUR expertise matters. AI can generate; only YOU can evaluate.

Step 4: Calibrate Toward Your Goals

Example calibration prompts:

"Alternative 2 is interesting, but I'm concerned about type safety. Can you show how a client would call methods on a generic Device without knowing its type?"

"The Scene design assumes devices are always online. What happens when a device is offline when a scene is activated?"

"I like the hybrid approach, but we should use interfaces instead of abstract classes for the plugin system—show me what that looks like."

Calibration is a conversation—guide AI toward better solutions.

Steps 5-6: Tweak and Finalize

Step 5: Tweak

Manual refinements after AI generation:

  • Naming: Match your conventions
  • Edge cases: Add handling AI missed
  • Comments: Explain WHY, not just WHAT
  • Style: Adjust to team standards

How do you know you're done?

  • It meets your plan/spec criteria
  • It passes your tests
  • You can explain every decision

Step 6: Finalize

Document for future reference:

  • Update DESIGN.md with chosen approach
  • Record rejected alternatives and WHY
  • Note any assumptions made
  • Commit with descriptive message

The goal isn't AI-generated code. It's code YOU understand and can maintain.

Demo Recap: What We Built

Using the 6-step workflow with GitHub Copilot, we:

  1. Identified context needs: domain concepts, design constraints, desired outputs
  2. Engaged with a context-rich prompt specifying what, not how
  3. Evaluated multiple design alternatives using our domain knowledge
  4. Calibrated toward our goals through iterative dialogue
  5. Tweaked the output to match our conventions and add missing details
  6. Finalized by documenting our choice and rationale

AI accelerated exploration. Human judgment made the decisions.

Key Takeaways

  • AI amplifies, doesn't replace: Quality of output depends on quality of YOUR input
  • Use the 6-step workflow: Identify → Engage → Evaluate → Calibrate → Tweak → Finalize
  • Task familiarity determines appropriateness: If you can't evaluate the output, don't use AI for that task
  • Avoid "vibe coding": You must evaluate CODE, not just execution
  • Document decisions: The FINALIZE step prevents "why did we do this?" moments

The spec-writing skills from L4 directly apply to writing effective prompts. Ambiguous prompts → unpredictable outputs.

Next Steps

  • Set up GitHub Copilot if you haven't already (free with GitHub Education)
  • Lab 6: AI Coding Agents, HW3: CYB with AI

Recommended reading: