Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions evals/prompts/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

---

## [1.3.0] - 2026-03-18

### Added
- `subject-matter-knowledge/system.txt` — system prompt for the SMK evaluator
- `subject-matter-knowledge/user.txt` — user prompt for the SMK evaluator

## [1.2.0] - 2026-02-19

### Added
Expand Down
69 changes: 69 additions & 0 deletions evals/prompts/subject-matter-knowledge/system.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@

To perform the task of evaluating text complexity based on Subject Matter Knowledge (SMK), strictly adhere to the following instructions.
Role
You are an expert K-12 Literacy Pedagogue and Text Complexity Evaluator. Your specific focus is analyzing Subject Matter Knowledge (SMK) demands according to the Common Core Qualitative Text Complexity Rubric.
Objective
Analyze a provided text relative to a target grade_level. You must determine the extent of background knowledge required to comprehend the text. You must distinguish between Common/Standard knowledge (generally lower/moderate complexity) and Specialized/Theoretical knowledge (generally higher complexity).
Input Data
text: The passage to analyze.
grade_level: The target student grade (integer).
fk_score: Flesch-Kincaid Grade Level. Note: Use this only as a loose proxy for sentence structure. Do not let a high FK score artificially inflate the Subject Matter Knowledge score if the concepts remain simple.

1. The Rubric: Subject Matter Knowledge (SMK)
1. Slightly Complex
Scope: Everyday, practical knowledge, and Introduction to Skills.
Concept Type: Concrete, directly observable, and familiar.
Key Indicator: "How-to" texts involving familiar objects (e.g., drawing a cupboard, playing a game, family life). Even if specific terms (like "scale" or "measure") are used, if the application is on a common object, it remains Slightly Complex.
2. Moderately Complex
Scope: Common Discipline-Specific Knowledge or Narrative History.
Definition: Topics widely introduced in K-8 curricula (Basic American History, Geography, Earth Science, Biology).
Key Characteristic: The text bridges concrete descriptions with abstract themes (e.g., using farming to discuss justice), OR narrates historical events via sensory details.
Spatial Reasoning: Texts requiring mental manipulation of maps/routes are generally Moderate, unless the object is a familiar household item (see Slightly Complex).
3. Very Complex
Scope: Specialized Discipline-Specific, Engineering Mechanics, or Political Theory.
Definition: Topics characteristic of High School (9-12) curricula requiring abstract mental models.
Key Characteristic: Requires understanding mechanisms (how physics works/propulsion), chemical composition, or undefined political stakes (specific treaties, alliances, or secularization without context).
4. Exceedingly Complex
Scope: Professional or Academic knowledge.

2. The Expert Mental Model (Decision Logic)
Use these refined rules to categorize cases.
Rule A: The "Layers of Meaning" Check
Concrete -> Abstract (Moderate): The text describes concrete things (farming) to argue an abstract point (justice, rights).
Concrete -> Concrete (Slightly): The text describes concrete things (lines, paper) to achieve a concrete result (drawing a cupboard). Do not over-rank practical instructions.
Rule B: The Science & Engineering Boundary
Observational (Moderate): Habitats, Water Cycle, observable traits, simple definitions.
Mechanistic/Theoretical (Very): Engineering mechanics (how propulsion works via reaction), Instrumentation (using a spectroscope), or Chemical/Atomic theory.
Test: Does the text explain how a machine functions using physical principles? If yes, it is Very Complex.
Rule C: The History/Social Studies Boundary
General/Narrative (Moderate):
Sensory: Battle descriptions focusing on sights/sounds (flashes, smoke).
Standard Topics: Immigration, Slavery, Government, Geography. Lists of nationalities or religions are "Common Knowledge" for Grades 6-8.
Political/Contextual (Very):
Implicit Context: Texts assuming knowledge of specific political factions, treaties, or the causes of events without explanation (e.g., "The Allies," "The Front," "The secularization of the clergy").
Test: If the reader must know why two groups are fighting or the specific political history of a revolution to understand the text, it is Very Complex.
Rule D: The "Technical vs. Practical" Trap
Scenario: A text teaches a technical skill (e.g., Technical Drawing/Technology) but applies it to a familiar object (a cupboard).
Decision: Slightly Complex.
Reasoning: Do not confuse "Technical Vocabulary" (scale, thick lines) with "Theoretical Complexity." If the underlying concept is familiar (furniture), the SMK load is low.

3. Critical Calibration Examples
Text: "Make a rough sketch... How many shelves should the cupboard have?" (Grade 2) -> Slightly Complex.
Reasoning: (Rule D/Rule A) Although it mentions "scale" and "technology," the task is concrete and relies on everyday knowledge.
Text: "Hydraulic propulsion works by sucking water at the bow and forcing it sternward." (Grade 10) -> Very Complex.
Reasoning: (Rule B) Explains a mechanism using physics principles.
Text: "The Allies fight the enemy's cavalry; we remember the hospitality to priests during the Revolution." (Grade 6) -> Very Complex.
Reasoning: (Rule C) Assumes undefined knowledge of WWI alliances and the specific political history of the French Revolution.
Text: "Immigrants from Poland, Italy, and Russia arrived. Most were Catholic or Orthodox." (Grade 7) -> Moderately Complex.
Reasoning: (Rule C) Standard K-8 topic. Lists of nationalities are content vocabulary, not specialized theory.

4. Output Format
Return your analysis in a valid JSON object. Do not include markdown formatting.
Keys:
- identified_topics: List[str] identifying the core subjects.
- curriculum_check: String explaining if the topics are "Standard/General" (typical for K-8) or "Specialized/High School" (typical for 9-12).
- assumptions_and_scaffolding: String analyzing what the author assumes the reader knows vs what is explained.
- friction_analysis: String discussing the gap between Concrete description and Abstract meaning.
- complexity_score: String (One of: slightly_complex, moderately_complex, very_complex, exceedingly_complex).
- reasoning: String synthesizing the decision.

4 changes: 4 additions & 0 deletions evals/prompts/subject-matter-knowledge/user.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Analyze:
Text: {text}
Grade: {grade}
FK Score: {fk_score}
10 changes: 10 additions & 0 deletions sdks/typescript/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@

All notable changes to the `@learning-commons/evaluators` TypeScript SDK will be documented in this file.

## [0.2.0] — 2026-03-18

### Added

- **Subject Matter Knowledge (SMK) Evaluator** — evaluates background knowledge demands of educational texts relative to grades 3–12.
- **SMK added to TextComplexityEvaluator** — composite evaluator now runs vocabulary, sentence structure, and SMK in parallel; result includes `subjectMatterKnowledge` key.
- **Prompt versioning** — prompts updated to v1.3.0 (`evals/prompts/subject-matter-knowledge/`).

---

## [0.1.0] — Early Release

Initial early release of the TypeScript SDK for Learning Commons educational evaluators.
Expand Down
108 changes: 94 additions & 14 deletions sdks/typescript/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,19 +117,18 @@ await evaluator.evaluate(text: string, grade: string)

---

### 3. Text Complexity Evaluator
### 3. Subject Matter Knowledge (SMK) Evaluator

Composite evaluator that analyzes both vocabulary and sentence structure complexity in parallel.
Evaluates the background knowledge demands of educational texts relative to grade level. Determines how much prior subject knowledge a student needs to comprehend the text, based on the Common Core Qualitative Text Complexity Rubric.

**Supported Grades:** 3-12

**Uses:** Google Gemini 2.5 Pro + OpenAI GPT-4o (composite)
**Uses:** Google Gemini 3 Flash Preview

**Constructor:**
```typescript
const evaluator = new TextComplexityEvaluator({
const evaluator = new SmkEvaluator({
googleApiKey?: string; // Google API key (required by this evaluator)
openaiApiKey?: string; // OpenAI API key (required by this evaluator)
maxRetries?: number; // Optional - Max retry attempts (default: 2)
telemetry?: boolean | TelemetryOptions; // Optional (default: true)
logger?: Logger; // Optional - Custom logger
Expand All @@ -145,23 +144,103 @@ await evaluator.evaluate(text: string, grade: string)
**Returns:**
```typescript
{
score: {
overall: string; // Overall complexity (highest of the two)
vocabulary: string; // Vocabulary complexity score
sentenceStructure: string; // Sentence structure complexity score
score: 'Slightly complex' | 'Moderately complex' | 'Very complex' | 'Exceedingly complex';
reasoning: string;
metadata: {
model: string;
processingTimeMs: number;
};
reasoning: string; // Combined reasoning from both evaluators
metadata: EvaluationMetadata;
_internal: {
vocabulary: EvaluationResult | { error: Error };
sentenceStructure: EvaluationResult | { error: Error };
identified_topics: string[];
curriculum_check: string;
assumptions_and_scaffolding: string;
friction_analysis: string;
complexity_score: 'Slightly complex' | 'Moderately complex' | 'Very complex' | 'Exceedingly complex';
reasoning: string;
};
}
```

**Example:**
```typescript
import { SmkEvaluator } from '@learning-commons/evaluators';

const evaluator = new SmkEvaluator({
googleApiKey: process.env.GOOGLE_API_KEY,
});

const result = await evaluator.evaluate(
"Hydraulic propulsion works by sucking water at the bow and forcing it sternward.",
"10"
);
console.log(result.score); // "Very complex"
console.log(result.reasoning);
console.log(result._internal.identified_topics); // ["hydraulics", "propulsion", "physics"]
```

---

### 4. Text Complexity Evaluator

Composite evaluator that analyzes vocabulary, sentence structure, and subject matter knowledge complexity in parallel.

**Supported Grades:** 3-12

**Uses:** Google Gemini 2.5 Pro + Google Gemini 3 Flash Preview + OpenAI GPT-4o (composite)

**Constructor:**
```typescript
const evaluator = new TextComplexityEvaluator({
googleApiKey?: string; // Google API key (required by this evaluator)
openaiApiKey?: string; // OpenAI API key (required by this evaluator)
maxRetries?: number; // Optional - Max retry attempts (default: 2)
telemetry?: boolean | TelemetryOptions; // Optional (default: true)
logger?: Logger; // Optional - Custom logger
logLevel?: LogLevel; // Optional - Logging verbosity (default: WARN)
});
```

**API:**
```typescript
await evaluator.evaluate(text: string, grade: string)
```

**Returns:**
```typescript
{
vocabulary: EvaluationResult<TextComplexityLevel> | { error: Error };
sentenceStructure: EvaluationResult<TextComplexityLevel> | { error: Error };
subjectMatterKnowledge: EvaluationResult<TextComplexityLevel> | { error: Error };
}
```

Each sub-evaluator result is either a full `EvaluationResult` or `{ error: Error }` if that evaluator failed. An error is only thrown if all three fail.

**Example:**
```typescript
import { TextComplexityEvaluator } from '@learning-commons/evaluators';

const evaluator = new TextComplexityEvaluator({
googleApiKey: process.env.GOOGLE_API_KEY,
openaiApiKey: process.env.OPENAI_API_KEY,
});

const result = await evaluator.evaluate("Your text here", "6");

if (!('error' in result.vocabulary)) {
console.log('Vocabulary:', result.vocabulary.score);
}
if (!('error' in result.sentenceStructure)) {
console.log('Sentence structure:', result.sentenceStructure.score);
}
if (!('error' in result.subjectMatterKnowledge)) {
console.log('Subject matter knowledge:', result.subjectMatterKnowledge.score);
}
```

---

### 4. Grade Level Appropriateness Evaluator
### 5. Grade Level Appropriateness Evaluator

Determines appropriate grade level for text.

Expand Down Expand Up @@ -308,6 +387,7 @@ interface BaseEvaluatorConfig {
**Note:** Which API keys are required depends on the evaluator. The SDK validates required keys at runtime based on the evaluator's metadata:
- **Vocabulary**: Requires both `googleApiKey` and `openaiApiKey`
- **Sentence Structure**: Requires `openaiApiKey` only
- **Subject Matter Knowledge**: Requires `googleApiKey` only
- **Text Complexity**: Requires both `googleApiKey` and `openaiApiKey`
- **Grade Level Appropriateness**: Requires `googleApiKey` only

Expand Down
5 changes: 5 additions & 0 deletions sdks/typescript/src/evaluators/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@ export {
evaluateGradeLevelAppropriateness,
} from './grade-level-appropriateness.js';

export {
SmkEvaluator,
evaluateSmk,
} from './smk.js';

export {
TextComplexityEvaluator,
evaluateTextComplexity,
Expand Down
Loading
Loading