Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
dc7426a
feat: Add gradient optimizer based on ProTeGi by "Automatic Prompt Op…
DanielDango Dec 23, 2025
afc5741
feat: implement caching for OpenAI chat models to optimize performance
DanielDango Jan 7, 2026
2adc722
feat: return intermediate results during prompt optimization
DanielDango Jan 7, 2026
778d5f0
feat: encapsulate cache replacement strategies into enum
DanielDango Jan 20, 2026
aecfb43
Merge remote-tracking branch 'upstream/feature/add-prompt-optimizatio…
DanielDango Jan 22, 2026
7fbcaac
Merge branch 'feature/add-prompt-optimization-module' into feature/ad…
DanielDango Feb 16, 2026
52fb042
Merge remote-tracking branch 'upstream/feature/add-prompt-optimizatio…
DanielDango Feb 18, 2026
279d7c3
Revert "revert: remove evaluator from config which is to be added wit…
DanielDango Feb 18, 2026
35afddd
Revert "revert: Evaluator module only required for future gradient op…
DanielDango Feb 18, 2026
bedf3f1
refactor: fix package hierarchy
DanielDango Feb 18, 2026
14e5f71
revert: remove CacheReplacementStrategy.java
DanielDango Feb 18, 2026
23c2750
chore: pull recent changes into ProTeGi implementation
DanielDango Feb 18, 2026
891081c
fix: rename logger
DanielDango Feb 18, 2026
bded850
Merge branch 'feature/add-prompt-optimization-module' into feature/ad…
DanielDango Feb 18, 2026
678fae1
Merge remote-tracking branch 'origin/main' into feature/add-protegi-o…
dfuchss Feb 19, 2026
8eabec5
Spotless applied
dfuchss Feb 19, 2026
6523e69
Make chat language model provider lazy
dfuchss Feb 19, 2026
05a5877
chore: update optimize methods to return lists of optimized prompts
DanielDango Mar 4, 2026
6929d4b
fix: add docs and address APO issues
DanielDango Mar 4, 2026
904d9dd
refactor: clarify evaluator module and rename to Selector
DanielDango Mar 4, 2026
fa8174a
chore: standardize configuration parameter names and improve document…
DanielDango Mar 5, 2026
5093db3
refactor: replace maximum_iterations statistic regex replacement with…
DanielDango Mar 5, 2026
271efd2
chore: extend configurable fields for the AutomaticPromptOptimizer
DanielDango Mar 5, 2026
d333c3e
refactor: rename AutomaticPromptOptimizer.java and GradientOptimizerC…
DanielDango Mar 5, 2026
5564254
Apply suggestion from @dfuchss
dfuchss Mar 6, 2026
dc33542
Update src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/P…
dfuchss Mar 6, 2026
7424af6
refactor: rename evaluator to selector in configuration and related f…
DanielDango Mar 6, 2026
e8b7a06
revert: remove e2e test for sample configs as the datasets are not pa…
DanielDango Mar 6, 2026
bc3c82f
refactor: access the config object directly instead of manipulating j…
DanielDango Mar 6, 2026
a282af2
refactor: pull SamplerFactory.java into SampleStrategy.java
DanielDango Mar 6, 2026
4bf68ff
chore: various tweaks
DanielDango Mar 6, 2026
dff2a0c
revert: remove intermediate statistics as the config fields are not r…
DanielDango Mar 6, 2026
9adb38f
Flip correct flag
dfuchss Mar 18, 2026
ee20a09
Merge remote-tracking branch 'origin/feature/add-protegi-optimization…
DanielDango Apr 1, 2026
df3b657
Merge remote-tracking branch 'upstream/main' into feature/add-protegi…
DanielDango Apr 1, 2026
cdd041b
refactor: remove unused import in EvaluationResult.java
DanielDango Apr 1, 2026
5a47ebd
refactor: improve configuration validation in optimization classes, c…
DanielDango Apr 1, 2026
b90be13
Merge remote-tracking branch 'upstream/main' into feature/add-protegi…
DanielDango Apr 16, 2026
786731f
refactor: make Selector in PromptOptimizer nullable and ensure proper…
DanielDango Jan 20, 2026
d7958fb
docs: enhance prompt optimization documentation for clarity and detail
DanielDango Apr 16, 2026
b3a081e
refactor: simplify Selectors by removing the development artifact moc…
DanielDango Apr 16, 2026
a3b68a3
feat: add support for new ChatRequestOptions in chat method
DanielDango Apr 16, 2026
841f693
refactor: remove nullability from createSelector as mock selector was…
DanielDango Apr 16, 2026
c12a75a
docs: remove MockSelector references from documentation
DanielDango Apr 16, 2026
541723d
Do not expose LazyChatModel
dfuchss Apr 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions docs/prompt-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,39 @@ This also enables us to quantify the importance of well designed prompts in the

## Core Components

### Overview of Prompt Optimization Subcomponents

The table below provides a brief overview of the subcomponents used in the prompt optimization module.

| Component | **SampleStrategy** | **Selector** | **Metric** |
|------------------------|----------------------------------|------------------------------------------------|--------------------------------|
| **Location** | `promptoptimizer.samplestrategy` | `promptoptimizer.promptselector` | `promptoptimizer.promptmetric` |
| **Purpose** | Select items from a collection | Orchestrate prompt evaluation with budget | Calculate performance scores |
| **Answers** | "Which generic items to use?" | "Which prompts to test when?" | "How good is this prompt?" |
| **Method** | `sample(items, sampleSize)` | `selectAndEvaluate(prompts, examples, metric)` | `getMetric(prompts, examples)` |
| **Algorithm Examples** | First/Ordered/Shuffled | Simple/UCB Bandit | Pointwise/FBeta |

### Sample Strategies (`samplestrategy` package)

A [`SampleStrategy`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/samplestrategy/SampleStrategy.java) determines how to select a subset of items from a collection.
These strategies are used throughout the optimization process to sample items when the full set would be too large or expensive to process.
The key method `sample(items, sampleSize)` returns a list of selected items based on the strategy's selection logic.
In practice items may be classification examples, candidate prompts, or simple identifiers depending on the context in which the sampler is used.

Custom sample strategies can be added by implementing the [`SampleStrategy`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/samplestrategy/SampleStrategy.java) interface and integrating them via the static factory method `SampleStrategy.createSampler(...)` defined there.

#### Available Sample Strategies

- **[`First Sampler`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/samplestrategy/FirstSampler.java)** (`first`):
Selects the first n items from the collection without any modification.
Maintains the original order of items.
- **[`Ordered First Sampler`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/samplestrategy/OrderedFirstSampler.java)** (`ordered`):
Sorts items before selecting the first n items.
Ensures deterministic sampling based on the natural ordering of items.
- **[`Shuffled First Sampler`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/samplestrategy/ShuffledFirstSampler.java)** (`shuffled`):
Randomly shuffles items before selecting the first n items.
Provides random sampling with reproducibility through seeded random number generation.

### Prompt Metrics (`promptmetric` package)

A [`Metric`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptmetric/Metric.java) is a numeric measure used to evaluate the quality of prompts during the optimization process.
Expand All @@ -30,6 +63,29 @@ Custom metrics can be added either through implementation of the [`Global Metric
- Mean
- **[`Mock Metric`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptmetric/MockMetric.java)** (`mock`): Returns dummy values for testing purposes

### Selectors (`promptselector` package)

A [`Selector`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptselector/Selector.java) orchestrates the evaluation of multiple prompts within a given evaluation budget.
They determine which prompts to test and when, managing the trade-off between exploration (testing new prompts) and exploitation (focusing on promising prompts).
Selectors use the `selectAndEvaluate` method to coordinate prompt evaluation, calling the metric to score prompts against classification examples while respecting budget constraints.

Comment thread
DanielDango marked this conversation as resolved.
The exact evaluation budget parameters are selector-specific, controlling how many total evaluations can be performed.
This budget management is crucial for expensive LLM-based evaluations.

Custom selectors can be added by implementing the [`Selector`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptselector/Selector.java) interface.

Comment thread
DanielDango marked this conversation as resolved.
#### Available Selectors

- **[`Simple Selector`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptselector/SimpleSelector.java)** (`simple`):
Evaluates all provided candidate prompts against a subset of examples.
The sample size is determined by dividing the evaluation budget by the number of prompts.
Examples are shuffled randomly before selection to ensure diverse evaluation.

- **[`Upper Confidence Bound Bandit Selector`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/promptselector/UpperConfidenceBoundBanditSelector.java)** (`ucb`):
Implements a multi-armed bandit approach using the UCB (Upper Confidence Bound) algorithm.
Balances exploration and exploitation by selecting prompts based on both their current performance and uncertainty.
More efficient than simple selection when evaluating many prompts, as it focuses on promising candidates.

### Optimizers (`promptoptimizer` package)

The [`Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/PromptOptimizer.java) module handles prompt optimization requests.
Expand All @@ -56,6 +112,21 @@ Custom optimizers can be added by implementing the [`Prompt Optimizer`](../src/m
In each iteration, it queries the model with an additional feedback text on the current prompt.
The optimizer carries the optimized prompt to the next iteration naively.
Trace links that were incorrectly classified in previous iterations are highlighted in the feedback text to guide the model towards better performance.
- **[`ProTeGi Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/ProTeGiOptimizer.java)** (`protegi`):
An advanced optimizer based on textual gradient descent for large language models, following the approach by Pryzant et al. (2023).
Uses textual gradients derived from error analysis to systematically refine prompts.
In each iteration:
1. **Candidate Expansion**: Generates multiple candidate prompt variations
- Analyzes why the current prompt misclassifies examples (textual gradients)
- Creates transformations based on these error patterns
- Generates synonym variations to explore the prompt space
2. **Candidate Evaluation**: Uses the configured selector and metric to evaluate all candidate prompts
- Selector decides which candidate prompts to test and on how many examples (budget-aware)
- Metric scores each candidate prompt's performance
3. **Best Selection**: Selects the top-performing candidate prompts (beam size) for the next iteration

Example flow: Current prompt gets accuracy 70% → generates 20 candidates → evaluates them with limited budget → selects top 4 for next iteration

- **[`Mock Optimizer`](../src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/MockOptimizer.java)** (`mock`): Returns dummy optimized prompts for testing purposes

## Configuration
Expand Down
82 changes: 82 additions & 0 deletions example-configs/gradient-optimizer-config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@

{
Comment thread
DanielDango marked this conversation as resolved.
"cache_dir": "./cache/WARC",

"gold_standard_configuration": {
"path": "./datasets/req2req/WARC/answer.csv",
"hasHeader": "true"
},

"source_artifact_provider" : {
"name" : "text",
"args" : {
"artifact_type" : "requirement",
"path" : "./datasets/req2req/WARC/high"
}
},
"target_artifact_provider" : {
"name" : "text",
"args" : {
"artifact_type" : "requirement",
"path" : "./datasets/req2req/WARC/low"
}
},
"source_preprocessor" : {
"name" : "artifact",
"args" : {}
},
"target_preprocessor" : {
"name" : "artifact",
"args" : {}
},
"embedding_creator" : {
"name" : "openai",
"args" : {
"model": "text-embedding-3-large"
}
},
"source_store" : {
"name" : "custom",
"args" : {}
},
"target_store" : {
"name" : "cosine_similarity",
"args" : {
"max_results" : "4"
}
},
"metric" : {
"name" : "pointwise",
"args" : {}
},
"selector" : {
"name" : "ucb",
"args" : {
"samples_per_eval" : "16"
}
},
"prompt_optimizer": {
"name" : "gradient_openai",
"args" : {
"prompt": "Question: Here are two parts of software development artifacts.\n\n {source_type}: '''{source_content}'''\n\n {target_type}: '''{target_content}'''\n Are they related?\n\n Answer with 'yes' or 'no'.",
"model": "gpt-4o-mini-2024-07-18",
"maximum_iterations": 3,
"minibatch_size" : "20"
}
Comment on lines +63 to +65
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numeric values are inconsistently quoted across this config file: maximum_iterations uses a bare integer while minibatch_size uses a quoted string. Both are read via argumentAsInt which coerces strings, so it works, but the inconsistency is confusing. Please standardise — either always use bare JSON numbers or always quote them.

},
"classifier" : {
"name" : "simple_openai",
"args" : {
"model": "gpt-4o-mini-2024-07-18",
"temperature": 0.0
}
},
"result_aggregator" : {
"name" : "any_connection",
"args" : {}
},
"tracelinkid_postprocessor" : {
"name" : "identity",
"args" : {}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -77,13 +77,16 @@ public void run() {
}

for (Path optimizationConfig : configsToOptimize) {
String optimizedPrompt = runOptimization(optimizationConfig);
if (optimizedPrompt.isEmpty()) {
List<String> optimizedPrompts = runOptimizations(optimizationConfig);
if (optimizedPrompts.isEmpty()) {
logger.warn(
"Skipping evaluation for optimization config '{}' as no optimized prompt was generated.",
"Skipping evaluation for optimization config '{}' as no optimized prompt was generated. "
+ "This can happen when the optimizer terminates early (e.g., due to configuration such "
+ "as zero iterations) or when a mock optimizer is used.",
optimizationConfig);
continue;
}
String optimizedPrompt = optimizedPrompts.getLast();
for (Path evaluationConfig : configsToEvaluate) {
runEvaluation(evaluationConfig, optimizedPrompt);
}
Expand All @@ -94,21 +97,21 @@ public void run() {
* Runs the optimization pipeline using the specified configuration file.
*
* @param optimizationConfig The path to the optimization configuration file
* @return The optimized prompt generated by the optimization pipeline
* @return The optimized prompts generated by the optimization pipeline
*/
private static String runOptimization(Path optimizationConfig) {
private static List<String> runOptimizations(Path optimizationConfig) {
logger.info("Invoking the optimization pipeline with '{}'", optimizationConfig);
String optimizedPrompt = "";
List<String> optimizedPrompts = List.of();
try {
var optimization = new Optimization(optimizationConfig);
optimizedPrompt = optimization.run();
optimizedPrompts = optimization.run();
} catch (IOException e) {
logger.warn(
"Optimization configuration '{}' threw an exception: {} \n Maybe the file does not exist?",
optimizationConfig,
e.getMessage());
}
return optimizedPrompt;
return optimizedPrompts;
}

private static void runEvaluation(Path evaluationConfig, String optimizedPrompt) {
Expand Down
33 changes: 24 additions & 9 deletions src/main/java/edu/kit/kastel/sdq/lissa/ratlr/Optimization.java
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import java.io.IOException;
import java.nio.file.Path;
import java.util.List;
import java.util.Objects;
import java.util.Set;

Expand All @@ -18,6 +19,7 @@
import edu.kit.kastel.sdq.lissa.ratlr.knowledge.TraceLink;
import edu.kit.kastel.sdq.lissa.ratlr.promptoptimizer.PromptOptimizer;
import edu.kit.kastel.sdq.lissa.ratlr.promptoptimizer.promptmetric.Metric;
import edu.kit.kastel.sdq.lissa.ratlr.promptoptimizer.promptselector.Selector;

/**
* Represents a single prompt optimization run of the LiSSA framework.
Expand Down Expand Up @@ -68,7 +70,7 @@ public Optimization(Path configFile) throws IOException {
* <ol>
* <li>Loads the configuration from the specified file</li>
* <li>Initializes the evaluation pipeline</li>
* <li>Creates the Metric, Evaluator and Optimizer</li>
* <li>Creates the Metric, Selector and Optimizer</li>
* </ol>
*
* @throws IOException If there are issues reading the configuration
Expand All @@ -84,8 +86,13 @@ private void setup() throws IOException {
evaluationPipeline.getClassifier(),
evaluationPipeline.getAggregator(),
evaluationPipeline.getTraceLinkIdPostProcessor());
Selector selector = null;
if (configuration.selector() != null) {
selector = Selector.createSelector(configuration.selector());
}

promptOptimizer = PromptOptimizer.createOptimizer(configuration.promptOptimizer(), goldStandard, metric);
promptOptimizer =
PromptOptimizer.createOptimizer(configuration.promptOptimizer(), goldStandard, metric, selector);
configuration.serializeAndDestroyConfiguration();
}

Expand All @@ -95,24 +102,32 @@ private void setup() throws IOException {
* <ol>
* <li>Sets up the source and target stores</li>
* <li>Optimizes the prompt using the configured optimizer</li>
* <li>Generates and saves optimization statistics</li>
* <li>Generates and saves optimization statistics for the final prompt</li>
* <li>Flushes the cache to persist changes</li>
* </ol>
*
* @return The optimized prompt as a String
* @return A list of prompts representing the optimization state at each iteration,
* where the last element is the final optimized prompt
*/
public String run() {
public List<String> run() {
evaluationPipeline.initializeSourceAndTargetStores();

logger.info("Optimizing Prompt");
String result =

List<String> results =
promptOptimizer.optimize(evaluationPipeline.getSourceStore(), evaluationPipeline.getTargetStore());
logger.info("Optimized Prompt: {}", result);

Statistics.generateOptimizationStatistics(configFile.toFile(), configuration, result);
if (results.isEmpty()) {
logger.warn("No optimized prompt was generated. Make sure maximum_iterations is set to greater than zero.");
return results;
}
Comment on lines +120 to +123

Statistics.generateOptimizationStatistics(configFile.toFile(), configuration, results.getLast());

logger.info("Optimized prompt after {} steps: \n {}", results.size(), results.getLast());

CacheManager.getDefaultInstance().flush();

return result;
return results;
}
}
Loading
Loading