8-bit pixel art cover image with 'CS 3100: Program Design & Implementation 2' header. Timeline showing code size vs validation time: Betty Holberton with ENIAC validation checklist (1000 instructions), Grace Hopper verifying compiler (5000 lines), Margaret Hamilton with code stack (400K lines, months of review), GitHub PR review (10M lines, hours), modern Copilot (100M lines, ???). Tagline: Technology Changes. Responsibility Doesn't.

Connection to Lecture 1:

We started the semester with these same pioneers
Grace Hopper faced skeptics: "Machines can't write programs"
Margaret Hamilton's rigorous verification saved Apollo 11
The same pattern repeats with AI: tools generate, humans verify

The GitHub addition (2006):

Code review normalized having ANOTHER human check your work
Pull requests made review a first-class workflow
This trained an entire generation to expect review before merge
Now we apply those same skills to AI-generated code

The meta-point of this lecture:

This lecture was generated with AI assistance
But I (the professor) did all 6 steps of the workflow
I identified what context to provide
I engaged with prompts
I evaluated every slide, every word
I calibrated when things went wrong
I tweaked the final output
I'm finalizing by teaching it to you

The irony is intentional:

An AI-generated lecture about not blindly trusting AI
Only works BECAUSE human judgment was applied
If I had "vibe-coded" this lecture, it would be terrible
The fact that it's (hopefully) good proves the workflow works

The message:

You stand in a line of pioneers
Each generation faced skepticism about "automatic programming"
Each generation kept human judgment in the loop
You continue that tradition

→ Transition: Let's get into the formal lecture...

CS 3100: Program Design and Implementation II

Lecture 13: AI Coding Assistants

This Is Just the Beginning of Our Conversation

This topic is unlike anything we've covered so far:

The technology is evolving faster than any textbook can capture
There's genuine disagreement among experts about best practices
The hype is real—and so are the concerns
Your professors are learning alongside you, trying to model the best practices that we are teaching you

What comes next:

Lab 6 (Tuesday): Hands-on practice + surveys that will shape future lectures and workshops
Rest of semester: AI will be woven throughout our remaining lectures and assignments—this conversation continues

This is an exciting moment to be learning together.

Why this framing matters:

Be honest with students: this is genuinely hard to teach
The field is moving so fast that last year's best practices may be outdated
There's real disagreement among practitioners and researchers
We're not pretending to have all the answers

What we DO have:

Foundational principles that transcend specific tools
A framework for thinking about human-AI collaboration
The 6-step workflow from research
Your own critical thinking skills

Lab 6 preview:

Required surveys on AI usage and concerns
Survey results will directly influence future content
Hands-on practice with AI coding assistants
This is formative—we're building this curriculum together

Tuesday fireside chat:

UG Advisory Committee organized this
Prof Bell and Associate Dean Christo Wilson
Broader conversation about AI in CS education
Encourage students to attend and bring questions

Rest of semester:

AI isn't a one-lecture topic—it's now part of the fabric of software engineering
Future lectures will reference back to these principles
Assignments will include AI-assisted components where appropriate
We'll revisit and refine based on what we learn together

The meta-point:

We're modeling the kind of thoughtful, iterative approach we're teaching
We don't have a perfect curriculum—we're building it with student input
This is what "learning together" actually looks like

→ Transition: Here's what you'll be able to do after today...

Announcements

The final exam has been scheduled for Tuesday, April 21, 10:30 AM - 12:30 PM.
File a Final Exam Conflict form if you have two finals scheduled at the same time. (You can also file the form if you have more than 2 finals on the same day.)
Students are generally not allowed to reschedule the final because of travel, etc. If you have an external conflict and think your situation is exceptional (medical, religious, etc.), reach out immediately.
See also Final Examinations and Related Policies on Other Exams and Final Term Papers/Projects.

Learning Objectives

After this lecture, you will be able to:

Define AI programming agents and enumerate their capabilities and limitations
Compare model provider tools (Copilot, Claude Code) with tool builder IDEs (Cursor, Windsurf)
Apply a 6-step workflow for effective human-AI collaboration
Determine when it is appropriate (and inappropriate) to use an AI programming agent
Use AI coding assistants to accelerate domain modeling and design exploration

Poll: How would you supervise a SWE intern?

Imagine you have graduated and have a full-time software engineering position and are asked to supervise an intern on their first co-op. How would you guide them and evaluate their work?

Text espertus to 22333 if the
URL isn't working for you.

https://pollev.com/espertus

What Is an LLM? Text Prediction at Scale

Diagram showing LLM basics: Input text 'The capital of France is' gets tokenized, passes through a neural network trained on trillions of words, outputs probability distribution over next tokens (Paris 95%, a 2%, etc.). Bottom shows this repeats token by token. Key insight: Not a database, a pattern predictor.

LLMs predict the next token based on patterns learned from training data. That's it.

The fundamental insight:

LLMs are NOT databases that store and retrieve facts
LLMs are NOT search engines that find information
LLMs are PATTERN PREDICTORS that generate plausible text

How it works:

Take input text, break into tokens (~words/subwords)
Model predicts probability of each possible next token
Sample from distribution (or take highest probability)
Repeat until done

Why this matters:

Explains hallucination: model predicts plausible-LOOKING text, not necessarily TRUE text
Explains why context matters: predictions depend on what came before
Explains strengths: patterns in code are highly predictable!

The autocomplete metaphor:

Phone keyboard suggestions, but trained on the internet
Really good at predicting what text SHOULD come next
Doesn't "know" anything—just predicts patterns

→ Transition: So if it's all about prediction, what determines the quality of predictions?

Model Tiers: Cost, Speed, and Capability

Tier	Examples	Best For	Tradeoff
Fast/Cheap	GPT-5-nano, Claude Haiku, Gemini Flash	Simple completions, boilerplate, routine tasks	Low cost, fast, but limited reasoning
Balanced	GPT-5-mini, Claude Sonnet	Most coding tasks, explanations, refactoring	Good balance of speed and capability
Frontier	Claude Opus, GPT-5.2, Gemini Pro	Complex architecture, difficult bugs, novel problems	Highest capability, but slower and expensive
Tool-specific	Cursor's composer-1, Copilot's internal models	Optimized for that tool's workflow	Tuned for speed in specific contexts

Most tools offer "Auto" mode — the tool picks the right model for each task. This is often the right default.

Pricing is usually per million tokens (input and output priced separately). Fast models: ~$0.10-0.50/M tokens. Frontier: ~$5-75/M tokens. Output tokens cost more than input.

Pricing basics:

Models charge per million tokens (roughly ~750K words per million tokens)
Input tokens (your prompt + context) and output tokens (model's response) priced separately
Output tokens typically cost 2-5x more than input tokens
Fast models: fractions of a cent per request
Frontier models: can be dollars per long conversation
This is why "auto" mode matters—no need to pay frontier prices for simple completions

Why this matters:

Different tasks need different models
Simple autocomplete? Fast model is fine
Architecture decision? You want frontier reasoning

"Auto" mode in modern tools:

Cursor, Copilot, Claude Code all have auto-selection
Tool decides: simple completion → fast model, complex reasoning → frontier
This optimizes cost/quality automatically
Let the tool handle this unless you have a reason to override

When to override auto:

Force frontier model for architecture decisions
Force fast model when you need quick iteration
Understanding tiers helps you make this call

The landscape changes fast:

Today's frontier is tomorrow's baseline
The tiers matter more than specific model names
Focus on understanding the tradeoffs

→ Transition: But the model is only half the equation—context determines output quality...

Context Is Everything

Split comparison: Top shows vague prompt producing generic AI lecture. Bottom shows same request with L1-L12 lectures, course style guide, historical pioneers context producing THIS lecture with Human in Every Loop cover, 6-step workflow, connections to Betty Holberton and Grace Hopper. Recursive callout notes this slide itself went through 6 iterations. Key insight: Same model, different context, the 6-step workflow closed the gap.

The key insight of this entire lecture:

The MODEL is the same
The CONTEXT is different
The OUTPUT QUALITY is dramatically different

What is the context window?

Everything the model can "see" when generating
Early models: ~4,000 tokens (a few pages)
Modern models: 100,000-200,000 tokens (a novel)
Frontier: 1,000,000+ tokens (multiple books)

Why context matters for coding:

"Write a function to process orders" → generic
"Write a function to process orders [+ your Order class + your database schema + your error handling conventions]" → useful

This is why IDE integration matters:

Copilot sees your open files = more context
Claude Code can read your whole codebase = even more context
More relevant context = better predictions

The spec-writing connection (L4):

A vague spec yields unpredictable implementations
A vague prompt yields unpredictable outputs
Same principle, same solution: be specific about what matters

Teaser:

Are you curious now about the 6-step workflow?
That's one thing you'll learn today.

→ Transition: Now let's see how these tools integrate this into your workflow...

Poll: AI Coding Experience

Have you used an AI coding assistant before? (GitHub Copilot, Cursor, ChatGPT for code, etc.)

A. Yes, regularly

B. Yes, occasionally

C. Tried it once or twice

D. Never

Text espertus to 22333 if the
URL isn't working for you.

https://pollev.com/espertus

Public Service Announcement

We can tell that some students are using AI-generated code without documenting it. There's no penalty!
We are particularly disappointed that some students are using AI for their reflections.
It's up to you whether to:
- cheat
- lie
- waste time and money
- show integrity
- learn
You're not fooling us, but you may be fooling yourselves.

Remember Who You Are and What You Represent

The Secret Sauce: Static Analysis Powers Context

VS Code-style IDE showing how AI coding assistants work. Center shows editor with cursor in a method, ghost text completion, and chat panel. Left shows static analysis tracing connections to relevant files (type definitions, call graphs, imports). Right shows curated context bundle sent to LLM, contrasted with wasteful 'dump everything' approach. Key insight: You type, the tool finds what matters, you evaluate.

The through-line from "Context Is Everything":

We just said context determines output quality
But how do coding assistants GET the right context?
Answer: Static analysis—the same technology that powers your IDE's autocomplete, go-to-definition, and refactoring tools

What static analysis provides:

AST parsing: Understands code structure, not just text
Semantic indexing: Knows what symbols mean, where they're defined
Call graphs: Traces which functions call which
Type information: Knows the types flowing through your code

Why this matters:

When your cursor is in a method, the tool KNOWS:
- What class you're in
- What interfaces it implements
- What types the parameters are
- What methods are available on those types
It sends EXACTLY this to the LLM—not your whole codebase

The key insight:

Better static analysis = better context curation = better suggestions
This is what separates good tools from great ones
You don't need to learn "prompt engineering"—the tool does it for you

→ Transition: Let's see how this changes your interaction model...

Two Categories of AI Coding Tools

Split infographic comparing Model Providers (Microsoft/OpenAI with Copilot, Anthropic with Claude Code) who control the models and offer flat subscriptions, versus Tool Builders (Cursor, Windsurf, Cline) who purchase API access and compete on UX features. Arrows show API costs flowing from builders to providers.

What Are They Optimizing For?

Three-path strategic diagram: Copilot optimizes for enterprise sales while giving free access to students (future customers). Claude Code creates a flywheel where Anthropic builds the best models, then builds a tool that efficiently consumes its own API while charging competitors. Cursor optimizes for lightning-fast UX and model flexibility. Arrows show how strategies interact: tool builders pay model providers, funding better models.

Understanding the strategic game:

Copilot (Microsoft/OpenAI):

Sell to enterprise at premium prices
Give away to students (you!) for free
Why? You're the future enterprise customer
Classic "capture the next generation" strategy
Also: deep GitHub integration creates lock-in

Claude Code (Anthropic):

Pour resources into model quality
Build a tool that consumes their own API very efficiently
Two wins: showcase capabilities AND charge competitors
When Cursor uses Claude API, Anthropic gets paid
It's a flywheel: revenue funds better models

Cursor and Tool Builders:

Can't compete on models—so compete on UX
"Composer" is FAST—like, shockingly fast
Model-agnostic means they can switch to whoever's best
Betting that models commoditize, UX becomes the moat

Why this matters to you:

Helps you predict where tools are heading
Understand why features get prioritized
The "free" tier always has a strategy behind it

→ Transition: Let's look at a comparison table...

Strengths: Pattern Recognition and Cross-Domain Transfer

Pattern recognition: Recognizes and reproduces common coding patterns
Syntax knowledge: Extensive knowledge of language syntax, libraries, frameworks
Cross-domain transfer: Can apply patterns from one language/domain to another
Natural language understanding: Translates business context into requirements into code
Rapid prototyping: Generates boilerplate, tests, and common implementations quickly

Think of it as a very well-read junior developer who has seen millions of codebases.

Limitations: Entirely Non-Deterministic, Limited Context Window

Context window constraints: Can only see ~100K tokens at once—may miss parts of large codebases
No runtime verification: Generates code based on patterns, not execution results
Training data cutoff: May not know recent libraries, API changes, or language features
Hallucination risk: May generate plausible-looking code that doesn't actually work

AI Assists Throughout the Software Development Lifecycle

Software development lifecycle diagram showing five phases (Requirements, Design, Implementation, Validation, Operations) with specific tasks in each phase and exit conditions below

AI can assist at every phase—but human judgment drives every decision.

The 6-Step Human-AI Collaboration Workflow

Circular workflow diagram with six steps: Identify (recognize AI needs), Engage (craft prompts), Evaluate (assess outputs), Calibrate (steer toward goals), Tweak (refine artifacts), Finalize (document). Arrows show flow with iteration back to start.

Based on research on Developer-AI collaboration from Google

Identify & Engage: It Depends on Three Things

Triangle diagram showing three factors that determine context needs: YOUR UNDERSTANDING (can you identify what matters?), MODEL CAPABILITIES (how much can it infer?), TOOL CAPABILITIES (does it auto-find files?). Center shows: when all are high, just describe intent; when any is low, you compensate manually.

The key insight: Identify/Engage isn't static—it depends on:

1. Your understanding of the problem:

If you're an expert, you know EXACTLY what context matters
If you're learning, you might not know what to include
You can only identify relevant context for things you understand!

2. Model capabilities (and they keep improving):

GPT-3.5 needed very explicit instructions
Claude Opus / GPT-4o can infer much more from less
Frontier models are increasingly good at asking clarifying questions

3. Tool capabilities (varies wildly!):

Basic Copilot: sees your open files, that's it
Cursor: indexes entire codebase, @-mentions, finds files automatically
Claude Code: searches codebase, reads files on demand
The tool determines how much YOU need to manually provide

The practical implication:

"Identify what context AI needs" is outdated for modern tools
Better framing: "Recognize when the tool WON'T find what it needs"
Modern tools often just... figure it out

→ Transition: So what does this mean practically?

Modern Tools Change the Game

Old Mental Model (2022-2023)

"I need to carefully identify and provide all relevant context"

Manually open all relevant files
Copy-paste code snippets into prompts
Describe file structure explicitly
Anticipate what AI needs to know

You do the work of context gathering

New Reality (2026+)

"I describe intent; the tool finds context"

Tools search your codebase automatically
You can reference specific files when needed
Tool indexes and retrieves relevant context
AI asks clarifying questions

Tool does context gathering; you validate

But: Tools can't find context that doesn't exist in code—requirements, design rationale, rejected alternatives. That's still YOUR job.

Prompting Myths That Don't Actually Help

Myth	Example	Reality
Persona prompts improve output	"You are a brilliant 10x engineer who..."	Models don't roleplay better code. Context and specificity matter, not flattery.
Politeness affects performance	"Please" and "Thank you"	Doesn't affect the model—but polite, humble communication is a good habit for talking to humans!
Threats or stakes help	"I'll lose my job if you get this wrong"	The model has no concept of your job. Just describe what you actually need.
More instructions = better	500-word system prompts	Often WORSE. Key details get lost. Be concise and specific.
Magic phrases always work	"Think step by step" everywhere	Useful for reasoning, but tools have built this in better. See: Plan Mode.

What actually helps: Relevant context, specific requirements, concrete examples, clear success criteria.

Why these myths persist:

Confirmation bias: sometimes it "works" by coincidence
Cargo-cult prompting: copying what worked once without understanding why
Anthropomorphizing: treating the model like a person who responds to flattery

Persona prompts:

"You are a great software engineering professor who really 'gets AI'"
Does NOT yield better lecture slides
The model doesn't have self-esteem to boost
What DOES help: "Generate lecture slides in this format: [example]"

Politeness:

Won't affect model performance
But expressing yourself with humility and politeness IS a good habit
Same communication skills help when talking to humans
Think of it as practice for "winning friends" in professional settings

Threats/stakes:

"This is for my thesis defense" doesn't make the model try harder
The model has no concept of consequences
Just describe what you actually need

"Think step by step":

Actually DOES help for reasoning/math problems
But the tools have productized this better than you can prompt it
This leads us to Plan Mode...

→ Transition: Speaking of "think step by step"—there's a better way...

Plan Mode: Reducing Your Evaluation Surface

Split comparison: WITHOUT PLAN MODE - developer overwhelmed evaluating 500 lines of generated code at once. WITH PLAN MODE - developer reviews short plan first. Two thought bubbles shown: Expert thinks 'should use sessions not JWT'; Learner thinks 'I don't know what JWT is - I should learn before approving!' Callout: If you can't evaluate the plan, you can't evaluate the code. Plans reveal knowledge gaps early.

The "think step by step" insight, productized:

Chain-of-thought prompting DOES help models reason better
But manually prompting it is clunky
Plan mode is the tool-level implementation of this insight

What Plan mode does:

AI generates a PLAN first (not code)
You review the plan (small evaluation surface!)
You refine the plan (cheap to change!)
THEN AI generates code based on approved plan

Why this matters:

Evaluating a 10-line plan is MUCH easier than evaluating 500 lines of code
Catching "wrong approach" at plan stage = minutes to fix
Catching "wrong approach" in code = hours to fix
Catching "wrong approach" in production = days/weeks to fix

Connection to L4 (specs):

This is EXACTLY what specs are for!
Get agreement on WHAT before implementing HOW
The plan IS a lightweight spec

Tools that support this:

Cursor: Plan mode in Composer
Claude Code: Thinks through approach before coding
Codex/ChatGPT: Can be prompted to plan first

When to use Plan mode:

Multi-file changes
Architectural decisions
Anything where "wrong direction" would be costly
NOT needed for: single-line fixes, simple refactors

→ Transition: But there's a deeper SE principle here...

How Do You Know You're Done?

Split comparison: Left - intern says 'I'm done!' but supervisor frowns asking 'What about OAuth token refresh?' Right - intern and supervisor agree on checklist first, then intern confidently reports 'All four criteria pass' and supervisor approves.

Whether waterfall or Agile, SE techniques share one thing: agree on what "done" means before you start each piece of work.

The intern/supervisor scenario:

You're an intern, you finish a task, you bring it to your supervisor
"I'm done!" → Supervisor looks, frowns: "No... where's the error handling?"
You didn't know that was required. Now you have to redo the work.
This happens ALL THE TIME in professional settings
The fix: agree on what "done" means BEFORE you start

The SE principle:

If you start making changes without knowing where you're going, how do you know when you arrive?
This is why we write specs (L4), define requirements (L9), create acceptance criteria
The same principle applies to AI-assisted work AND to working with humans

Without a definition of done:

"Make it better" → but better how? Better for whom?
Endless tweaking with no exit condition
"I'll know it when I see it" rarely works in practice
You waste time and never feel confident in the result

With a definition of done:

Clear success criteria BEFORE you start prompting (or coding)
Each output can be checked against the criteria
You know exactly when to stop
The plan you approved in Plan Mode IS your definition of done

Connection to CS4530:

This is an essential skill you'll practice more deeply in CS4530
Agile "Definition of Done", acceptance criteria, user stories
Start building this habit now—with AI, with supervisors, with yourself

Practical application:

Before starting ANY work, write down: "I'll know this is done when..."
Use that as your evaluation checklist
Plan Mode makes this explicit for AI work
Asking your supervisor "what does done look like?" makes it explicit for human work

→ Transition: Now let's talk about evaluation itself...

Evaluate: The Step That Never Changes

What Changed	What Stayed the Same
Tools auto-find context	You must evaluate if output is correct
Models infer more from less	You must spot hallucinations and errors
Less manual prompt crafting	You must know if it fits your requirements
AI asks clarifying questions	You must have domain expertise to answer

No matter how good the tools get, evaluation requires YOUR expertise.

This is why the "task familiarity" principle still applies—if you can't evaluate, you can't use AI effectively.

Steps 4-6: Calibrate, Tweak, Finalize

Step	What You Do	Example
4. Calibrate	Steer AI toward desired outcomes through feedback	"That's close, but use interfaces instead of abstract classes"
5. Tweak	Manually refine AI-generated artifacts	Fix edge cases, adjust naming, add error handling
6. Finalize	Document decisions and rationale	Add comments or notes explaining why you chose this approach

The goal: AI accelerates initial generation, but YOU make the final decisions.

The Fundamental Principle: Task Familiarity

2x2 quadrant diagram with Domain Expertise on X-axis and Task Complexity on Y-axis. Top-left (low expertise, high complexity) is red 'Danger Zone'. Top-right (high expertise, high complexity) is green 'Ideal for AI'. Bottom-left (low expertise, low complexity) is yellow 'Learning Opportunity'. Bottom-right (high expertise, low complexity) is green 'Efficient Use'.

The quadrant explained:

Bottom-left (Low expertise, Low complexity):

Simple tasks you don't know how to do
This is a LEARNING OPPORTUNITY
Do it manually! Build the knowledge base
Using AI here creates "learning debt"

Bottom-right (High expertise, Low complexity):

Routine tasks you understand well
Perfect for AI—let it handle boilerplate
You can instantly spot if something's wrong

Top-left (Low expertise, High complexity):

DANGER ZONE
Complex tasks you don't understand
You CAN'T evaluate AI's output
This is where "vibe coding" disasters happen

Top-right (High expertise, High complexity):

The sweet spot for AI assistance
Complex tasks where AI accelerates your work
You can evaluate, guide, and refine

Key insight: The ability to EVALUATE output determines appropriateness.

→ Transition: But there's another dimension to consider...

How Hard Is It to Evaluate?

Horizontal spectrum from easy to hard evaluation. Easy (green): Does it run? Does it compile? Medium (yellow): Is code readable? Hard (red): Are quiz questions confusing? (give to 100 students), Is architecture scalable? (wait 6 months). Key insight: AI can generate for any task, but hard evaluations need external validation.

AI can help you generate for any task. But don't rely on AI to evaluate hard-to-evaluate outcomes.

The evaluation difficulty dimension:

Task familiarity tells you IF you can evaluate
Evaluation difficulty tells you HOW HARD it is to evaluate
Both matter for deciding how to use AI

Easy to evaluate (use AI freely):

Does the script run? (Yes/No)
Does it compile? (Yes/No)
Does the test pass? (Yes/No)
You can verify immediately

Hard to evaluate (use AI carefully):

Does the quiz have confusing questions? → Give it to 100 students
Is this architecture scalable? → Wait 6 months in production
Will users find this intuitive? → Ship and measure

The key nuance:

This DOESN'T mean "don't use AI" for hard-to-evaluate tasks
You CAN use AI to generate quiz drafts, explore architectures, prototype UIs
But don't trust AI's judgment on the OUTCOME
"This quiz looks clear to me" ≠ "Students won't find it confusing"

What to do for hard-to-evaluate tasks:

Use AI to generate options quickly
Get external validation: user testing, expert review, stakeholder feedback
Accept that some evaluations take time

→ Transition: Let's test your evaluation intuition...

Poll: Evaluating AI Output

If AI generates a sorting algorithm and you don't know any sorting algorithms, how would you verify it's correct?

A. Run it on a bunch of test cases

B. Skim the code

C. I couldn't verify it properly

D. Trust it if it compiles

Text espertus to 22333 if the
URL isn't working for you.

https://pollev.com/espertus

The "Vibe Coding" Trap

Three-panel comic showing vibe coding trap: Panel 1 - developer asks AI for login system, runs without reading, it works. Panel 2 - error appears, developer asks AI to fix, cycle repeats. Panel 3 - developer surrounded by errors, tangled code, unable to describe the problem anymore.

When to STOP Using AI and Change Your Approach

Stop signals:

You can't evaluate the output — You're not sure if the code is correct or why it works
You can't calibrate effectively — Repeated attempts don't move toward your goal
You're describing symptoms, not problems — "It's broken" instead of "The recursion doesn't terminate"
You're repeating without progress — 3-4 variations of the same request aren't getting closer to what you need

What to do instead:

Are there technical topics you should learn first? Do some manual implementation.
Are there domain concepts you need to understand? Talk to stakeholders.
Are the requirements unclear? Go back to requirements analysis (L9).

Recognizing the stop signals:

Can't evaluate:

The code compiles but you're not sure if it's right
You'd have to run it to find out
This is the DANGER ZONE

Can't calibrate:

You've given feedback 3-4 times
AI keeps going in wrong directions
Problem: YOUR understanding might be incomplete

Describing symptoms:

"It crashes" vs "NullPointerException on line 42"
"It's slow" vs "The nested loops create O(n²) complexity"
If you can only describe symptoms, you can't guide AI

Repeating without progress:

You've tried 3-4 different phrasings of the same request
Each attempt isn't getting closer to what you need
This might mean: there's no right answer to give
Tools behave differently with false negatives vs false positives
Sometimes the task isn't well-suited for AI—reformulate your approach

Connection to L9:

Remember requirements elicitation?
Same skills apply—if you don't understand the domain, go learn it

→ Transition: There's also a learning consideration...

AI Creates "Learning Debt" When Used Too Early

Two parallel paths comparison: 'Learning First' shows slow steady progress building strong foundation. 'AI First' shows rapid initial progress then collapse when hitting complex bugs. A graph shows Path A (learning first) eventually surpassing Path B (AI first) in long-term productivity.

The goal isn't to avoid AI — it's to use it in ways that support learning rather than replace it.

The car analogy:

When you get in a car, you should know you could crash
Does that mean you don't drive? No!
You learn to drive safely, understand the risks, and make informed choices
Same with AI: the goal isn't avoidance, it's informed, skillful use

The learning debt concept:

Like technical debt, but for YOUR knowledge
Functional code masks gaps in understanding
Works until you hit something that requires real expertise

There are ways to use AI that support learning vs. replace it:

SUPPORT: Use AI to explain concepts after you've tried manually
SUPPORT: Use AI to generate variations after you understand the pattern
REPLACE: Use AI to skip understanding entirely
REPLACE: Copy-paste without reading the code

Path A - Learning First:

Slower at first—manual work, mistakes, frustration
But you BUILD UNDERSTANDING
When you eventually use AI, you can evaluate and guide it

Path B - AI First:

Fast initial progress—ship features quickly
But you never learned WHY the code works
When complex bugs hit, you're stuck
You can't even describe the problem to AI effectively

The recommendation:

For this course: Do initial implementations MANUALLY
Then use AI for variations, extensions, boilerplate
Example: Implement JSON serialization for 2-3 classes by hand, then let AI do the rest

→ Transition: There are also long-term considerations beyond learning...

The Documentation Test

Before committing AI-generated code, ask yourself:

Can I justify the design decisions in this code to a colleague?
If I look at this in 6 months, will I know WHY I chose this approach?
Am I prepared to take responsibility if this code is wrong?
Would a new team member understand the design decisions?

If the answer to any of these is "no"—you're creating maintenance debt that future-you will pay.

The four questions, explained:

1. Can you explain it without AI?

If you need the AI to explain YOUR code, you don't understand it
Understanding is required for debugging, extending, and teaching others
Test: Rubber duck it. Can you explain each decision?

2. Will you know WHY in 6 months?

"It worked" is not a reason
"The AI suggested it" is not a reason
Document the actual design rationale

3. Did you document rejected alternatives?

AI often shows multiple approaches
Which did you reject and why?
This saves future-you from re-exploring dead ends

4. Would a new team member understand?

You won't be the only person reading this code
Internship/job: you'll onboard others to your code
They won't have your context (or your AI conversation)

The maintenance debt concept:

Every undocumented decision = interest on a loan
Future changes require re-learning the context
Compounds over time—small debts become big ones

→ Transition: Let's apply all of this in a live demo...

Demo: Domain Modeling with GitHub Copilot

Let's revisit SceneItAll from L2—an IoT/smarthome control platform:

Lights: Can be switched, dimmable, or RGBW tunable
Fans: On/off with speeds 1-4
Shades: Open/closed by 1-100%
Areas: Group devices by physical area, can be nested
Scenes: Preset conditions for devices, with cascading AreaScenes

Goal: Use AI to explore domain model alternatives, following the 6-step workflow.

Step 1: Identify What Information AI Needs

Before prompting, ask yourself:

What domain concepts exist? (We have a basic list)
What level of detail is needed? (Domain model, not implementation)
What design constraints matter? (Design for change—L7)
What artifacts would be useful? (Mermaid diagrams, comparison matrix)

Connection to L4: This is like writing a spec. What does the reader (AI) need to give you the right answer?

Step 2: Engage with Context-Rich Prompt

This isn't a magic formula—it's showing what effective prompts have in common: context, constraints, and clear success criteria.

We are designing a new Java project called "SceneItAll". Our first step is
to enumerate some key requirements and explore domain model alternatives.

SceneItAll is an IoT/smarthome control app with the following domain concepts:
- Lights (can be switched, dimmable, or RGBW tunable)
- Fans (on/off and speeds 1-4)
- Shades (open/closed by 1-100%)
- Areas (group devices by physical area, can be nested)
- Scenes (define preset conditions for devices, with cascading AreaScenes)

Our domain model should emphasize "design for change" so that we can defer
decisions and get an MVP up soon for user feedback.

Generate a MODEL.md file with several design alternatives expressed as
mermaid class diagrams, including pros/cons for each.

Step 3: Evaluate Against Success Criteria

When evaluating AI output, ask:

Does it capture all the domain concepts we identified?
Do the design alternatives actually differ in meaningful ways?
Are the pros/cons accurate? (Use YOUR domain knowledge!)
Does it support "design for change"?
Is anything WRONG? (Hallucinated patterns, incorrect relationships)

This is where YOUR expertise matters. AI can generate; only YOU can evaluate.

Step 4: Calibrate Toward Your Goals

Example calibration prompts:

"Alternative 2 is interesting, but I'm concerned about type safety. Can you show how a client would call methods on a generic Device without knowing its type?"

"The Scene design assumes devices are always online. What happens when a device is offline when a scene is activated?"

"I like the hybrid approach, but we should use interfaces instead of abstract classes for the plugin system—show me what that looks like."

Calibration is a conversation—guide AI toward better solutions.

Steps 5-6: Tweak and Finalize

Step 5: Tweak

Manual refinements after AI generation:

Naming: Match your conventions
Edge cases: Add handling AI missed
Comments: Explain WHY, not just WHAT
Style: Adjust to team standards

How do you know you're done?

It meets your plan/spec criteria
It passes your tests
You can explain every decision

Step 6: Finalize

Document for future reference:

Update DESIGN.md with chosen approach
Record rejected alternatives and WHY
Note any assumptions made
Commit with descriptive message

The goal isn't AI-generated code. It's code YOU understand and can maintain.

Tweak is always necessary:

AI output is rarely perfect
This is EXPECTED, not a failure
Your tweaks add the domain knowledge AI lacks

How do you know you're done tweaking?

Connect back to your definition of done (from Plan Mode)
Check against the success criteria in your original prompt
Apply the Documentation Test: can you justify it? will you understand in 6 months?
Tweaking ends when: it meets the spec, passes tests, and you can explain every decision

Finalize is often skipped but critical:

Document what you decided
Document WHY (not just what)
Future-you will thank present-you

Connection to L7 (design for change):

Good documentation enables change
If you don't know WHY something was chosen, you can't safely change it

The deliverable isn't AI output:

It's code YOU understand
Code YOU can maintain
Code YOU can explain to others

→ Transition: Let's complete the demo and see the final result... [LIVE DEMO concludes]

Demo Recap: What We Built

Using the 6-step workflow with GitHub Copilot, we:

Identified context needs: domain concepts, design constraints, desired outputs
Engaged with a context-rich prompt specifying what, not how
Evaluated multiple design alternatives using our domain knowledge
Calibrated toward our goals through iterative dialogue
Tweaked the output to match our conventions and add missing details
Finalized by documenting our choice and rationale

AI accelerated exploration. Human judgment made the decisions.

Key Takeaways

AI amplifies, doesn't replace: Quality of output depends on quality of YOUR input
Use the 6-step workflow: Identify → Engage → Evaluate → Calibrate → Tweak → Finalize
Task familiarity determines appropriateness: If you can't evaluate the output, don't use AI for that task
Avoid "vibe coding": You must evaluate CODE, not just execution
Document decisions: The FINALIZE step prevents "why did we do this?" moments

The spec-writing skills from L4 directly apply to writing effective prompts. Ambiguous prompts → unpredictable outputs.

Next Steps

Set up GitHub Copilot if you haven't already (free with GitHub Education)
Lab 6: AI Coding Agents, HW3: CYB with AI

Recommended reading:

Developer-AI Collaboration research paper (source of the 6-step workflow)
GitHub Copilot documentation

Lecture 13: AI Coding Assistants​

This Is Just the Beginning of Our Conversation​

Announcements​

Learning Objectives​

Poll: How would you supervise a SWE intern?​

What Is an LLM? Text Prediction at Scale​

Model Tiers: Cost, Speed, and Capability​

Context Is Everything​

Poll: AI Coding Experience​

Public Service Announcement​

Remember Who You Are and What You Represent​

The Secret Sauce: Static Analysis Powers Context​

Two Categories of AI Coding Tools​

What Are They Optimizing For?​

Strengths: Pattern Recognition and Cross-Domain Transfer​

Limitations: Entirely Non-Deterministic, Limited Context Window​

AI Assists Throughout the Software Development Lifecycle​

The 6-Step Human-AI Collaboration Workflow​

Identify & Engage: It Depends on Three Things​

Modern Tools Change the Game​

Prompting Myths That Don't Actually Help​

Plan Mode: Reducing Your Evaluation Surface​

How Do You Know You're Done?​

Evaluate: The Step That Never Changes​

Steps 4-6: Calibrate, Tweak, Finalize​

The Fundamental Principle: Task Familiarity​

How Hard Is It to Evaluate?​

Poll: Evaluating AI Output​

The "Vibe Coding" Trap​

When to STOP Using AI and Change Your Approach​

AI Creates "Learning Debt" When Used Too Early​

The Documentation Test​

Demo: Domain Modeling with GitHub Copilot​

Step 1: Identify What Information AI Needs​

Step 2: Engage with Context-Rich Prompt​

Step 3: Evaluate Against Success Criteria​

Step 4: Calibrate Toward Your Goals​

Steps 5-6: Tweak and Finalize​

Demo Recap: What We Built​

Key Takeaways​

Next Steps​

Lecture 13: AI Coding Assistants

This Is Just the Beginning of Our Conversation

Announcements

Learning Objectives

Poll: How would you supervise a SWE intern?

What Is an LLM? Text Prediction at Scale

Model Tiers: Cost, Speed, and Capability

Context Is Everything

Poll: AI Coding Experience

Public Service Announcement

Remember Who You Are and What You Represent

The Secret Sauce: Static Analysis Powers Context

Two Categories of AI Coding Tools

What Are They Optimizing For?

Strengths: Pattern Recognition and Cross-Domain Transfer

Limitations: Entirely Non-Deterministic, Limited Context Window

AI Assists Throughout the Software Development Lifecycle

The 6-Step Human-AI Collaboration Workflow

Identify & Engage: It Depends on Three Things

Modern Tools Change the Game

Prompting Myths That Don't Actually Help

Plan Mode: Reducing Your Evaluation Surface

How Do You Know You're Done?

Evaluate: The Step That Never Changes

Steps 4-6: Calibrate, Tweak, Finalize

The Fundamental Principle: Task Familiarity

How Hard Is It to Evaluate?

Poll: Evaluating AI Output

The "Vibe Coding" Trap

When to STOP Using AI and Change Your Approach

AI Creates "Learning Debt" When Used Too Early

The Documentation Test

Demo: Domain Modeling with GitHub Copilot

Step 1: Identify What Information AI Needs

Step 2: Engage with Context-Rich Prompt

Step 3: Evaluate Against Success Criteria

Step 4: Calibrate Toward Your Goals

Steps 5-6: Tweak and Finalize

Demo Recap: What We Built

Key Takeaways

Next Steps