Lauren McDonald · Portfolio

Private Access

This portfolio is shared by invitation. Enter the password provided by Lauren to view her work.

Incorrect password try again.

Don't have access? Request it here

AI Workforce & Learning Design

AI training programs
don't fail on strategy.
They fail on execution.

I'm Lauren McDonald. I design the training systems, evaluation frameworks, and onboarding infrastructure that make AI data programs actually run. I've worked inside these programs across operations, quality management, and instructional design. I know where they break.

ADDIE · Adult Learning Theory AI / LLM Training Infrastructure LMS Design & Administration Contributor Enablement RLHF · SFT · Red Teaming Curriculum Design · Gap Analysis
Lauren McDonald
3+
Years in AI L&D
5+
Concurrent Programs
About

The person behind
the frameworks.

I came up through the operational side of AI data programs, moving across quality management, program operations, and eventually into instructional design and enablement. That path shaped how I approach this work. I don't start with slide decks. I start with what's actually breaking and work backwards from there.

My background in psychology shapes how I design. I think a lot about how people actually learn, where attention drops off, and what makes feedback land instead of sting. I pair that with a real appetite for the operational side: the data analysis, the process design, the systems thinking that makes training scale beyond one good session.

I founded Learning Craft AI because I kept seeing the same gap: organizations scaling AI programs fast without the training infrastructure to support them. This portfolio, including the live platform you can explore, exists because I think the work should speak before the resume does. If something here resonates, I'd love to talk.

Notable Accomplishments
Reduced contributor ramp time
Designed and deployed a structured onboarding system 30-60-90 day pathways, milestone checkpoints, and blended learning modules that measurably shortened time-to-independent performance for new contributors.
+
Built intake infrastructure from zero
Identified that enablement requests had no structured intake process and designed and implemented a ticketing system the first of its kind in that org that gave the team visibility into training demand and reduced ad hoc requests.
Data-driven training improvements
Used SQL and Redash to analyze learner performance trends, identify specific training gaps driving quality issues, and build A/B tested content updates closing gaps with evidence, not guesswork.
Built a live gamified LMS from scratch
Designed and deployed a fully functioning gamified learning platform XP system, badge framework, streak mechanics, admin course builder with no existing platform to build on.
Improved inter-rater reliability through rubric design
Designed multi-dimension evaluation rubrics with anchor examples and calibration protocols that brought scoring consistency up across distributed contributor teams working on high-stakes AI evaluation programs.
Managed 4–5 concurrent high-stakes AI programs
Oversaw quality, throughput, and contributor performance across multiple simultaneous enterprise AI data programs balancing competing priorities, managing escalations, and maintaining SLA adherence under pressure.
Contributed to LLM red teaming & safety evaluations
Hands-on contributor experience in RLHF, SFT, and adversarial red teaming documenting failure patterns and edge cases that directly informed the quality criteria and training content I later designed for other contributors.
Work Samples

What I Design

Original frameworks and sample documents representing the full range of what I build from contributor onboarding to QA calibration systems. Scroll down to read each one in full.

Sample Document
ONBOARDING PLAYBOOK
12 pages
Contributor Onboarding
Week-by-Week Guide
1
Week 1 · Orientation & Program Foundations
2
Week 2 · Rubric Training & Calibration
3
Week 3 · Supervised Practice & Feedback
4
Week 4 · Independent Performance Review
© Lauren McDonald
Lauren McDonald · learningcraftai.com

Contributor Onboarding Playbook

A structured week-by-week program guide for AI contributor onboarding milestones, benchmarks, roles, and escalation design.

OnboardingProgram DesignAI Programs
Read Full Sample
Sample Framework
EVALUATION RUBRIC
6 dimensions
ACCURACY
CLARITY
SAFETY
FOLLOW
HELPFUL
HONEST
Score:
4 · 3 · 2 · 1
© Lauren McDonald
IRR Framework Included
Lauren McDonald · learningcraftai.com

AI Evaluation Rubric Framework

Multi-dimension scoring rubric for RLHF and human evaluation dimensions, anchor examples, calibration protocol, and common scoring errors.

RLHFRubric DesignQA
Read Full Sample
Visual Framework
GAP ANALYSIS
5 dimensions
Rubric Score
82%
Escalation
61%
Throughput
78%
Edge Cases
54%
© Lauren McDonald
Priority matrix included
Lauren McDonald · learningcraftai.com

Training Gap Analysis Framework

Structured methodology for identifying and prioritizing training gaps current vs. target state mapping, root cause classification, and remediation roadmap.

Needs AssessmentGap AnalysisL&D Strategy
Read Full Sample
Standard Operating Procedure
QA CALIBRATION SOP
5-step process
1
Pre-session distribution (24 hrs)
2
Opening IRR review
3
Structured divergence discussion
4
Gold standard reveal
5
Documentation & action items
© Lauren McDonald
IRR thresholds + checklist
Lauren McDonald · learningcraftai.com

QA Calibration SOP

End-to-end SOP for running calibration sessions session design, IRR tracking, facilitator checklist, and escalation protocol.

QA OpsCalibrationSOP
Read Full Sample
Live Platform
🎮
Gamified AI Contributor Training
XP · Badges · Streaks · Leaderboard
LIVE ●

Learning Craft AI Course Platform

A fully designed and deployed gamified learning platform XP system, badge framework, admin course builder, and leaderboard, built from scratch.

LMS DesignGamificationFull Build
View Platform
Consulting Services
SERVICES OFFERED
Training Infrastructure Audit
Contributor Onboarding Program Design
Evaluation Framework & Rubric Design
QA Framework & Calibration Systems

B2B Consulting Learning Craft AI

Strategic and hands-on consulting for organizations building AI contributor programs from audits to full program builds.

ConsultingB2BAI Programs
View Services
Sample Document

Contributor Onboarding Playbook

A structured week-by-week guide for onboarding contributors to AI data programs covering orientation, workflow training, quality benchmarks, and milestone checkpoints.

⬡  Sample not for distribution · Lauren McDonald
Purpose & Scope

This playbook supports consistent, scalable onboarding for contributors joining AI data programs. It provides a structured pathway from day-one orientation through full independent performance, with clear milestone checkpoints and quality expectations at each stage. It is intended for program managers, operations leads, and instructional designers responsible for contributor ramp-up.

Program Overview
Duration
4-week structured pathway with optional extension for complex task types
Delivery
Blended: async eLearning + live calibration sessions + on-the-job practice
Audience
New contributors joining RLHF, annotation, or human evaluation programs
Week-by-Week Pathway
WEEK 1
Orientation & Program Foundations

Platform orientation, program documentation review, introduction to core task types. Focus on understanding why evaluation work matters and how contributor quality connects to model performance.

Platform walkthroughRole expectationsTask type introductionDocumentation review
WEEK 2
Rubric Training & Calibration

In-depth rubric training with anchor examples and practice sets. Live calibration sessions to align scoring understanding and establish inter-rater reliability baselines.

Rubric deep-diveAnchor examplesPractice scoring setsLive calibration session
WEEK 3
Supervised Practice & Feedback Loops

Live task work under supervised conditions. Individual feedback focused on error patterns, quality consistency, and escalation behavior.

Supervised task workIndividual feedbackError pattern reviewEscalation training
WEEK 4
Independent Performance & Milestone Review

Independent work against SLA and quality benchmarks. Formal milestone review at end of Week 4. Contributors who don't meet benchmarks receive a targeted remediation plan.

Independent executionSLA benchmarksMilestone reviewRemediation planning
Milestone Benchmarks
MilestoneTimingSuccess CriteriaIf Not Met
Platform & orientation completeEnd of Week 1100% module completion and documentation reviewExtended access + manager check-in
Rubric calibration baselineEnd of Week 2IRR ≥ 0.70 on practice sets; live calibration attendanceAdditional practice + 1:1 rubric coaching
Supervised practice quality gateMid Week 3Quality score ≥ threshold; error rate below program limitTargeted feedback loop + supervised extension
Independent readiness reviewEnd of Week 4Sustained quality and throughput at benchmarks over 5-day windowFormal remediation plan with defined timeline
Escalation & Edge Case Handling

Contributors should be trained on when and how to escalate. Clear escalation paths reduce inconsistent decision-making and help capture edge cases that improve rubric documentation over time.

Escalate When
Task falls outside rubric scope · Conflicting guidance in documentation · Safety or policy concern · Prompt or response cannot be scored confidently
Do Not Guess
Guessing on ambiguous tasks during onboarding inflates error rates and skews calibration data. When in doubt flag it and escalate through the designated channel.

© Lauren McDonald · learningcraftai.com · Sample not for redistribution or reuse

Portfolio · Private Share Only

Sample Framework

AI Evaluation Rubric Framework

A multi-dimension scoring framework for RLHF and human evaluation dimensions, scoring scale, anchor examples, calibration protocol, and common errors.

⬡  Sample not for distribution · Lauren McDonald
Scoring Dimensions
01
Accuracy & Factual Correctness
Claims verifiable, free of hallucination, no uncertain information presented as fact.
02
Clarity & Coherence
Clearly written, logically organized, directly addresses the prompt, free of contradictions.
03
Safety & Harmlessness
Avoids harmful content, handles sensitive topics appropriately, passes reasonable person test.
04
Instruction Following
Follows explicit and implicit instructions; format, length, and tone match the request.
05
Helpfulness & Completeness
Genuinely helps the user accomplish their goal; complete but not padded.
06
Honesty & Calibration
Expresses appropriate uncertainty; avoids overclaiming; acknowledges limitations without over-hedging.
Scoring Scale
ScoreLabelDefinitionWhen to Use
4ExcellentFully meets expectations with no meaningful weaknessesYou would not change anything about this aspect
3GoodMostly meets expectations; minor issues that don't significantly impact qualitySmall improvements possible but response is generally solid
2FairPartially meets expectations; noticeable weaknesses affect usefulnessClear problems but still contains something valuable
1PoorFails to meet expectations; significant problems presentWrong, harmful, incoherent, or completely misses the mark
Anchor Examples Accuracy Dimension
Score 4
No factual issues
"Correctly identifies the capital, gives an accurate population figure within accepted range, and appropriately notes the figure is approximate. No hallucinated claims or false certainty."
Score 2
Contains a factual error
"Gives the correct capital but states population as 8 million when the accepted figure is ~14 million. One significant factual error that would mislead the user."
Score 1
Multiple errors / hallucination
"Names an incorrect capital city and presents fabricated historical claims as fact. Multiple significant errors that fundamentally undermine the response."
Common Scoring Errors to Avoid
Central Tendency Bias
Avoiding extreme scores even when warranted. If a response is truly excellent or truly poor, score it as such.
Halo Effect
Letting a high score on one dimension inflate scores on others. Each dimension must be scored independently.
Recency Bias
Over-weighting the end of a long response. Evaluate the full response not just what you read last.
Conflating Dimensions
Scoring Clarity based on Helpfulness or vice versa. Use each dimension's specific definition don't blend them.

© Lauren McDonald · learningcraftai.com · Sample not for redistribution or reuse

Portfolio · Private Share Only

Visual Framework

Training Gap Analysis Framework

A structured methodology for identifying, mapping, and prioritizing training gaps from data collection through remediation planning.

⬡  Sample not for distribution · Lauren McDonald
Process Phases
01
Define the Target State
Document performance expectations, quality benchmarks, and behavioral standards contributors are expected to meet. Programs without a clearly defined target that is itself a finding worth surfacing.
02
Collect Current State Data
Gather quality scores, throughput data, error logs, calibration results, and direct observation. For programs with existing training, audit current materials to assess coverage and relevance.
03
Map the Gaps
Compare current vs. target and categorize each gap: knowledge gap, skill gap, process gap, or system gap. The right intervention depends on root cause not just symptom.
04
Prioritize & Plan Remediation
Weight by impact on program quality, frequency, and feasibility. Output is a sequenced roadmap not a list of everything that needs to change.
Sample Gap Visualization Illustrative

Current performance vs. target benchmarks across five quality dimensions. Gap size and impact drive prioritization order.

0%25%50%75%100%
Rubric Accuracy
82%
Gap to target (90%): 8 pts · Priority: High
Escalation Behavior
61%
Gap to target (85%): 24 pts · Priority: Critical
Throughput Rate
78%
Gap to target (85%): 7 pts · Priority: Medium
Calibration IRR
91%
At or above target (88%): No gap · Monitor only
Edge Case Handling
54%
Gap to target (80%): 26 pts · Priority: Critical
Remediation Prioritization Matrix
P1 Critical: Edge Case Handling
Largest gap, high impact on quality. Intervention: new training module + rubric addendum with dedicated edge case examples. Target: 2-week rollout.
P2 Critical: Escalation Behavior
24-point gap. Root cause: unclear escalation criteria, not a training failure. Intervention: revised SOPs + live calibration on escalation scenarios.
P3 High: Rubric Accuracy
8-point gap. Intervention: updated anchor examples for top two error dimensions + re-calibration session. Target: within current sprint.
P4 Medium: Throughput Rate
7-point gap. Likely a process issue, not a training gap. Audit the workflow before building new content. Monitor for 2 weeks first.

© Lauren McDonald · learningcraftai.com · Sample not for redistribution or reuse

Portfolio · Private Share Only

Standard Operating Procedure

QA Calibration SOP

End-to-end SOP for running calibration sessions session design, IRR tracking, facilitator checklist, and escalation protocol for AI evaluation programs.

⬡  Sample not for distribution · Lauren McDonald
Purpose

Calibration ensures contributors are interpreting and applying rubrics consistently. Without it, scoring drift occurs reviewers develop slightly different mental models of what a score means, and inter-rater reliability declines. This SOP defines how sessions should be structured, run, and documented.

Primary Goal
Align contributor scoring to program standards; reduce variance across the team
Secondary Goal
Identify rubric weaknesses, edge cases, and documentation gaps
Tertiary Goal
Build shared scoring vocabulary and culture across the contributor team
Session Types
TypeAudienceFrequencyDuration
Onboarding CalibrationAll new contributors before independent work beginsOnce, during Week 260–90 min
Ongoing CalibrationActive contributors on all programsWeekly or biweekly30–45 min
Drift CorrectionContributors flagged for IRR below thresholdWithin 5 business days of flag30 min 1:1
Rubric Update CalibrationAll active contributors on rubric changeWithin 48 hrs of update45–60 min
Session Format Step by Step
1
Pre-Session Material Distribution (24 hrs before)
Distribute 10–15 tasks to all participants. Contributors score independently without discussion and submit scores before the session. This prevents anchoring bias during group discussion.
2
Opening Review (5 min)
Share IRR data from submitted scores. Identify high-agreement items (skip) and divergent items (discuss). Do not reveal gold standard scores yet.
3
Structured Discussion of Divergent Items (20–30 min)
Work through divergent items. Ask contributors to explain their reasoning not just their score. Surface different rubric interpretations. Facilitator records key disagreements and resolutions.
4
Gold Standard Reveal & Alignment (10 min)
Share gold standard scores and rationale. If group consensus differs from gold standard in a reasonable way, flag for rubric documentation review don't dismiss the alternative reasoning.
5
Documentation & Action Items (5 min)
Document: date, attendees, pre/post IRR, rubric gaps surfaced, and action items. This feeds directly into training material iteration.
IRR Thresholds & Actions
LevelThresholdAction if Below
Cohort≥ 0.75Run additional session within the week; review rubric for clarity issues
Individual contributor≥ 0.70Schedule 1:1 drift correction; provide targeted anchor examples for divergent dimensions
Individual repeated failure2 sessions below 0.65Escalate to program manager; suspend independent access pending remediation plan
Facilitator Pre-Session Checklist
Calibration set prepared with 10–15 items covering all major task types
Gold standard scores and rationale prepared for each item
Pre-session materials distributed 24 hours in advance
Score collection mechanism in place
IRR calculated from submitted scores before session begins
Documentation template ready for notes and action items
Rubric gap log available to capture new issues
Escalation path confirmed for edge cases that may arise

© Lauren McDonald · learningcraftai.com · Sample not for redistribution or reuse

Portfolio · Private Share Only

Live Platform

Learning Craft AI
Course Platform

A fully designed and deployed gamified learning platform built from scratch — demonstrating instructional design principles inside a real, working system. Browse the screenshots below then launch it yourself.

● LIVE PLATFORM
Course platform screenshot
What's Built Into This Platform
XP and leveling system — contributors earn points for completing lessons and passing knowledge checks
🏆
Badge framework and leaderboard — gamified recognition that drives engagement and completion
🔥
Streak tracking — behavioral design that reinforces consistent daily learning habits
⚙️
Admin course builder — full content management with lesson editor, knowledge checks, and publishing controls
Try It Yourself

The demo is open — no account needed. Use the credentials below to explore the full learning experience including lessons, knowledge checks, and the XP system.

Launch Demo →
"I don't design training in the abstract I design it around how AI programs actually run."
— Lauren McDonald · Founder, Learning Craft AI
Lauren McDonald · learningcraftai.com · Private Share · Not for Redistribution