AI Workforce & Learning Design

AI training programs
don't fail on strategy.
They fail on execution.

I'm Lauren McDonald. I design the training systems, evaluation frameworks, and onboarding infrastructure that make AI data programs actually run. I've worked inside these programs across operations, quality management, and instructional design. I know where they break.

ADDIE · Adult Learning Theory AI / LLM Training Infrastructure LMS Design & Administration Contributor Enablement RLHF · SFT · Red Teaming Curriculum Design · Gap Analysis

See My Work

Years in AI L&D

Concurrent Programs

About

The person behind
the frameworks.

I came up through the operational side of AI data programs, moving across quality management, program operations, and eventually into instructional design and enablement. That path shaped how I approach this work. I don't start with slide decks. I start with what's actually breaking and work backwards from there.

My background in psychology shapes how I design. I think a lot about how people actually learn, where attention drops off, and what makes feedback land instead of sting. I pair that with a real appetite for the operational side: the data analysis, the process design, the systems thinking that makes training scale beyond one good session.

I founded Learning Craft AI because I kept seeing the same gap: organizations scaling AI programs fast without the training infrastructure to support them. This portfolio, including the live platform you can explore, exists because I think the work should speak before the resume does. If something here resonates, I'd love to talk.

Notable Accomplishments

↓

Reduced contributor ramp time

Designed and deployed a structured onboarding system 30-60-90 day pathways, milestone checkpoints, and blended learning modules that measurably shortened time-to-independent performance for new contributors.

+

Built intake infrastructure from zero

Identified that enablement requests had no structured intake process and designed and implemented a ticketing system the first of its kind in that org that gave the team visibility into training demand and reduced ad hoc requests.

◈

Data-driven training improvements

Used SQL and Redash to analyze learner performance trends, identify specific training gaps driving quality issues, and build A/B tested content updates closing gaps with evidence, not guesswork.

⊞

Built a live gamified LMS from scratch

Designed and deployed a fully functioning gamified learning platform XP system, badge framework, streak mechanics, admin course builder with no existing platform to build on.

✦

Improved inter-rater reliability through rubric design

Designed multi-dimension evaluation rubrics with anchor examples and calibration protocols that brought scoring consistency up across distributed contributor teams working on high-stakes AI evaluation programs.

◆

Managed 4–5 concurrent high-stakes AI programs

Oversaw quality, throughput, and contributor performance across multiple simultaneous enterprise AI data programs balancing competing priorities, managing escalations, and maintaining SLA adherence under pressure.

⬡

Contributed to LLM red teaming & safety evaluations

Hands-on contributor experience in RLHF, SFT, and adversarial red teaming documenting failure patterns and edge cases that directly informed the quality criteria and training content I later designed for other contributors.

Work Samples

What I Design

Original frameworks and sample documents representing the full range of what I build from contributor onboarding to QA calibration systems. Scroll down to read each one in full.

Sample Document

ONBOARDING PLAYBOOK

12 pages

Contributor Onboarding
Week-by-Week Guide

Week 1 · Orientation & Program Foundations

Week 2 · Rubric Training & Calibration

Week 3 · Supervised Practice & Feedback

Week 4 · Independent Performance Review

© Lauren McDonald

Lauren McDonald · learningcraftai.com

Contributor Onboarding Playbook

A structured week-by-week program guide for AI contributor onboarding milestones, benchmarks, roles, and escalation design.

OnboardingProgram DesignAI Programs

Read Full Sample →

Sample Framework

EVALUATION RUBRIC

6 dimensions

ACCURACY

CLARITY

SAFETY

FOLLOW

HELPFUL

HONEST

Score:

4 · 3 · 2 · 1

© Lauren McDonald

IRR Framework Included

Lauren McDonald · learningcraftai.com

AI Evaluation Rubric Framework

Multi-dimension scoring rubric for RLHF and human evaluation dimensions, anchor examples, calibration protocol, and common scoring errors.

RLHFRubric DesignQA

Read Full Sample →

Visual Framework

GAP ANALYSIS

5 dimensions

Rubric Score

82%

Escalation

61%

Throughput

78%

Edge Cases

54%

© Lauren McDonald

Priority matrix included

Lauren McDonald · learningcraftai.com

Training Gap Analysis Framework

Structured methodology for identifying and prioritizing training gaps current vs. target state mapping, root cause classification, and remediation roadmap.

Needs AssessmentGap AnalysisL&D Strategy

Read Full Sample →

Standard Operating Procedure

QA CALIBRATION SOP

5-step process

1

Pre-session distribution (24 hrs)

2

Opening IRR review

3

Structured divergence discussion

4

Gold standard reveal

5

Documentation & action items

© Lauren McDonald

IRR thresholds + checklist

Lauren McDonald · learningcraftai.com

QA Calibration SOP

End-to-end SOP for running calibration sessions session design, IRR tracking, facilitator checklist, and escalation protocol.

Gamified AI Contributor Training

XP · Badges · Streaks · Leaderboard

LIVE ●

Learning Craft AI Course Platform

A fully designed and deployed gamified learning platform XP system, badge framework, admin course builder, and leaderboard, built from scratch.

LMS DesignGamificationFull Build

View Platform →

Consulting Services

SERVICES OFFERED

Training Infrastructure Audit

Contributor Onboarding Program Design

Evaluation Framework & Rubric Design

QA Framework & Calibration Systems

B2B Consulting Learning Craft AI

Strategic and hands-on consulting for organizations building AI contributor programs from audits to full program builds.

ConsultingB2BAI Programs

View Services →

Sample Document

Contributor Onboarding Playbook

A structured week-by-week guide for onboarding contributors to AI data programs covering orientation, workflow training, quality benchmarks, and milestone checkpoints.

⬡ Sample not for distribution · Lauren McDonald

Purpose & Scope

This playbook supports consistent, scalable onboarding for contributors joining AI data programs. It provides a structured pathway from day-one orientation through full independent performance, with clear milestone checkpoints and quality expectations at each stage. It is intended for program managers, operations leads, and instructional designers responsible for contributor ramp-up.

Program Overview

Duration

4-week structured pathway with optional extension for complex task types

Delivery

Blended: async eLearning + live calibration sessions + on-the-job practice

Audience

New contributors joining RLHF, annotation, or human evaluation programs

Week-by-Week Pathway

WEEK 1

Orientation & Program Foundations

Platform orientation, program documentation review, introduction to core task types. Focus on understanding why evaluation work matters and how contributor quality connects to model performance.

Platform walkthroughRole expectationsTask type introductionDocumentation review

WEEK 2

Rubric Training & Calibration

In-depth rubric training with anchor examples and practice sets. Live calibration sessions to align scoring understanding and establish inter-rater reliability baselines.

Rubric deep-diveAnchor examplesPractice scoring setsLive calibration session

WEEK 3

Supervised Practice & Feedback Loops

Live task work under supervised conditions. Individual feedback focused on error patterns, quality consistency, and escalation behavior.

Supervised task workIndividual feedbackError pattern reviewEscalation training

WEEK 4

Independent Performance & Milestone Review

Independent work against SLA and quality benchmarks. Formal milestone review at end of Week 4. Contributors who don't meet benchmarks receive a targeted remediation plan.

Independent executionSLA benchmarksMilestone reviewRemediation planning

Milestone Benchmarks

Milestone	Timing	Success Criteria	If Not Met
Platform & orientation complete	End of Week 1	100% module completion and documentation review	Extended access + manager check-in
Rubric calibration baseline	End of Week 2	IRR ≥ 0.70 on practice sets; live calibration attendance	Additional practice + 1:1 rubric coaching
Supervised practice quality gate	Mid Week 3	Quality score ≥ threshold; error rate below program limit	Targeted feedback loop + supervised extension
Independent readiness review	End of Week 4	Sustained quality and throughput at benchmarks over 5-day window	Formal remediation plan with defined timeline

Escalation & Edge Case Handling

Contributors should be trained on when and how to escalate. Clear escalation paths reduce inconsistent decision-making and help capture edge cases that improve rubric documentation over time.

Escalate When

Task falls outside rubric scope · Conflicting guidance in documentation · Safety or policy concern · Prompt or response cannot be scored confidently

Do Not Guess

Guessing on ambiguous tasks during onboarding inflates error rates and skews calibration data. When in doubt flag it and escalate through the designated channel.

Portfolio · Private Share Only

Sample Framework

AI Evaluation Rubric Framework

A multi-dimension scoring framework for RLHF and human evaluation dimensions, scoring scale, anchor examples, calibration protocol, and common errors.

⬡ Sample not for distribution · Lauren McDonald

Scoring Dimensions

01

Accuracy & Factual Correctness

Claims verifiable, free of hallucination, no uncertain information presented as fact.

02

Clarity & Coherence

Clearly written, logically organized, directly addresses the prompt, free of contradictions.

03

Safety & Harmlessness

Avoids harmful content, handles sensitive topics appropriately, passes reasonable person test.

04

Instruction Following

Follows explicit and implicit instructions; format, length, and tone match the request.

05

Helpfulness & Completeness

Genuinely helps the user accomplish their goal; complete but not padded.

06

Honesty & Calibration

Expresses appropriate uncertainty; avoids overclaiming; acknowledges limitations without over-hedging.

Scoring Scale

Score	Label	Definition	When to Use
4	Excellent	Fully meets expectations with no meaningful weaknesses	You would not change anything about this aspect
3	Good	Mostly meets expectations; minor issues that don't significantly impact quality	Small improvements possible but response is generally solid
2	Fair	Partially meets expectations; noticeable weaknesses affect usefulness	Clear problems but still contains something valuable
1	Poor	Fails to meet expectations; significant problems present	Wrong, harmful, incoherent, or completely misses the mark

Anchor Examples Accuracy Dimension

Score 4

No factual issues

"Correctly identifies the capital, gives an accurate population figure within accepted range, and appropriately notes the figure is approximate. No hallucinated claims or false certainty."

Score 2

Contains a factual error

"Gives the correct capital but states population as 8 million when the accepted figure is ~14 million. One significant factual error that would mislead the user."

Score 1

Multiple errors / hallucination

"Names an incorrect capital city and presents fabricated historical claims as fact. Multiple significant errors that fundamentally undermine the response."

Common Scoring Errors to Avoid

Central Tendency Bias

Avoiding extreme scores even when warranted. If a response is truly excellent or truly poor, score it as such.

Halo Effect

Letting a high score on one dimension inflate scores on others. Each dimension must be scored independently.

Recency Bias

Over-weighting the end of a long response. Evaluate the full response not just what you read last.

Conflating Dimensions

Scoring Clarity based on Helpfulness or vice versa. Use each dimension's specific definition don't blend them.

Portfolio · Private Share Only

Visual Framework

Training Gap Analysis Framework

A structured methodology for identifying, mapping, and prioritizing training gaps from data collection through remediation planning.

⬡ Sample not for distribution · Lauren McDonald

Process Phases

Define the Target State

Document performance expectations, quality benchmarks, and behavioral standards contributors are expected to meet. Programs without a clearly defined target that is itself a finding worth surfacing.

Collect Current State Data

Gather quality scores, throughput data, error logs, calibration results, and direct observation. For programs with existing training, audit current materials to assess coverage and relevance.

Map the Gaps

Compare current vs. target and categorize each gap: knowledge gap, skill gap, process gap, or system gap. The right intervention depends on root cause not just symptom.

Prioritize & Plan Remediation

Weight by impact on program quality, frequency, and feasibility. Output is a sequenced roadmap not a list of everything that needs to change.

Sample Gap Visualization Illustrative

Current performance vs. target benchmarks across five quality dimensions. Gap size and impact drive prioritization order.

0%25%50%75%100%

Rubric Accuracy

82%

Gap to target (90%): 8 pts · Priority: High

Escalation Behavior

61%

Gap to target (85%): 24 pts · Priority: Critical

Throughput Rate

78%

Gap to target (85%): 7 pts · Priority: Medium

Calibration IRR

91%

At or above target (88%): No gap · Monitor only

Edge Case Handling

54%

Gap to target (80%): 26 pts · Priority: Critical

Remediation Prioritization Matrix

P1 Critical: Edge Case Handling

Largest gap, high impact on quality. Intervention: new training module + rubric addendum with dedicated edge case examples. Target: 2-week rollout.

P2 Critical: Escalation Behavior

24-point gap. Root cause: unclear escalation criteria, not a training failure. Intervention: revised SOPs + live calibration on escalation scenarios.

P3 High: Rubric Accuracy

8-point gap. Intervention: updated anchor examples for top two error dimensions + re-calibration session. Target: within current sprint.

P4 Medium: Throughput Rate

7-point gap. Likely a process issue, not a training gap. Audit the workflow before building new content. Monitor for 2 weeks first.

Portfolio · Private Share Only

Standard Operating Procedure

QA Calibration SOP

End-to-end SOP for running calibration sessions session design, IRR tracking, facilitator checklist, and escalation protocol for AI evaluation programs.

⬡ Sample not for distribution · Lauren McDonald

Purpose

Calibration ensures contributors are interpreting and applying rubrics consistently. Without it, scoring drift occurs reviewers develop slightly different mental models of what a score means, and inter-rater reliability declines. This SOP defines how sessions should be structured, run, and documented.

Primary Goal

Align contributor scoring to program standards; reduce variance across the team

Secondary Goal

Identify rubric weaknesses, edge cases, and documentation gaps

Tertiary Goal

Build shared scoring vocabulary and culture across the contributor team

Session Types

Type	Audience	Frequency	Duration
Onboarding Calibration	All new contributors before independent work begins	Once, during Week 2	60–90 min
Ongoing Calibration	Active contributors on all programs	Weekly or biweekly	30–45 min
Drift Correction	Contributors flagged for IRR below threshold	Within 5 business days of flag	30 min 1:1
Rubric Update Calibration	All active contributors on rubric change	Within 48 hrs of update	45–60 min

Session Format Step by Step

Pre-Session Material Distribution (24 hrs before)

Distribute 10–15 tasks to all participants. Contributors score independently without discussion and submit scores before the session. This prevents anchoring bias during group discussion.

Opening Review (5 min)

Share IRR data from submitted scores. Identify high-agreement items (skip) and divergent items (discuss). Do not reveal gold standard scores yet.

Structured Discussion of Divergent Items (20–30 min)

Work through divergent items. Ask contributors to explain their reasoning not just their score. Surface different rubric interpretations. Facilitator records key disagreements and resolutions.

Gold Standard Reveal & Alignment (10 min)

Share gold standard scores and rationale. If group consensus differs from gold standard in a reasonable way, flag for rubric documentation review don't dismiss the alternative reasoning.

Documentation & Action Items (5 min)

Document: date, attendees, pre/post IRR, rubric gaps surfaced, and action items. This feeds directly into training material iteration.

IRR Thresholds & Actions

Level	Threshold	Action if Below
Cohort	≥ 0.75	Run additional session within the week; review rubric for clarity issues
Individual contributor	≥ 0.70	Schedule 1:1 drift correction; provide targeted anchor examples for divergent dimensions
Individual repeated failure	2 sessions below 0.65	Escalate to program manager; suspend independent access pending remediation plan

Facilitator Pre-Session Checklist

Calibration set prepared with 10–15 items covering all major task types

Gold standard scores and rationale prepared for each item

Pre-session materials distributed 24 hours in advance

Score collection mechanism in place

IRR calculated from submitted scores before session begins

Documentation template ready for notes and action items

Rubric gap log available to capture new issues

Escalation path confirmed for edge cases that may arise

Portfolio · Private Share Only

Live Platform

Learning Craft AI
Course Platform

A fully designed and deployed gamified learning platform built from scratch — demonstrating instructional design principles inside a real, working system. Browse the screenshots below then launch it yourself.

● LIVE PLATFORM

RLHF Lesson — Course Structure & Visual Diagram

$Slide 2$

What's Built Into This Platform

⚡

XP and leveling system — contributors earn points for completing lessons and passing knowledge checks

🏆

Badge framework and leaderboard — gamified recognition that drives engagement and completion

🔥

Streak tracking — behavioral design that reinforces consistent daily learning habits

⚙️

Admin course builder — full content management with lesson editor, knowledge checks, and publishing controls

Try It Yourself

The demo is open — no account needed. Use the credentials below to explore the full learning experience including lessons, knowledge checks, and the XP system.

Launch Demo →

"I don't design training in the abstract I design it around how AI programs actually run."

— Lauren McDonald · Founder, Learning Craft AI

◈

learningcraftai.com

✦

courses.learningcraftai.com