LiSSA uses JSON configuration files to define the behavior of the traceability link recovery process. This guide provides detailed information about available configuration options.
All pipeline components (artifact providers, preprocessors, embedding creators, classifiers, aggregators, and postprocessors) can access shared context via a ContextStore. This context mechanism is handled automatically by the framework and does not require explicit configuration in most cases.
Configuration options in LiSSA are defined in the code through several mechanisms:
- Component Classes: Each component (e.g.,
ArtifactProvider,Preprocessor,Classifier) has a corresponding class that defines its configuration options. For example:TextArtifactProviderdefines options for text-based artifact loadingCodeTreePreprocessordefines options for code tree processingOpenAiEmbeddingCreatordefines options for OpenAI embedding generationOpenWebUiEmbeddingCreatordefines options for Open WebUI embedding generation
- Configuration Classes: The
Configurationclass serves as the central configuration container, defining the structure of the configuration file. - Example Configurations: You can find example configurations in the
example-configsdirectory, which demonstrate different configuration setups for various use cases. - Configuration Template: The
config-template.jsonfile provides a template with all available configuration options and their default values.
{
"cache_dir": "./cache/path", // Directory for caching results
"gold_standard_configuration": {
"path": "path/to/answer.csv", // Path to ground truth file
"hasHeader": false // Whether the CSV has a header
}
}{
"source_artifact_provider": {
"name": "text", // or "recursive_text"
"args": {
"artifact_type": "requirement", // Type of artifact
"path": "path/to/artifacts", // Path to artifacts
"extensions": "java" // For recursive_text provider
}
}
}{
"source_preprocessor": {
"name": "artifact", // or "code_tree", "code_chunking", etc.
"args": {
"language": "JAVA", // For code processors
"chunk_size": 60, // For chunking
"compare_classes": false, // For code tree
"includeUsages": true, // For UML processor
"includeOperations": true, // For UML processor
"includeInterfaceRealizations": true // For UML processor
}
}
}This section describes how to configure the embedding creation and classification steps. You must configure either a single classifier or a list of classifiers for multi-stage pipelines.
All classifier and embedding creator instances receive access to the shared ContextStore, enabling advanced scenarios such as sharing intermediate results or configuration between pipeline stages.
Use the classifier field to configure a single classifier.
{
"embedding_creator": {
"name": "openai", // or "ollama", "openwebui", "onnx", "mock"
"args": {
"model": "text-embedding-3-large"
}
},
"classifier": {
"name": "reasoning_openai", // or "simple_openai", "reasoning_ollama", "simple_ollama", "reasoning_openwebui", "simple_openwebui", "reasoning_blablador", "simple_blablador", "reasoning_deepseek", "simple_deepseek", "mock"
"args": {
"model": "gpt-4o-mini-2024-07-18",
... // Other classifier-specific arguments
}
}
}Use the classifiers field to define a pipeline of classification stages. This field takes a list of lists of classifier configurations.
Each inner list represents a stage in the pipeline. Classifiers within the same stage are executed in parallel, and their results are aggregated using majority voting. The results of one stage are passed as input to the next stage.
{
"embedding_creator": {
"name": "openai", // or "ollama", "openwebui", "onnx", "mock"
"args": {
"model": "text-embedding-3-large"
}
},
"classifiers": [
// Stage 1
[
{
"name": "simple_openai",
"args": {
"model": "gpt-4o-mini-2024-07-18"
}
},
{
"name": "reasoning_openai",
"args": {
"model": "gpt-4o-mini-2024-07-18"
}
}
],
// Stage 2
[
{
"name": "reasoning_openai",
"args": {
"model": "gpt-4o-2024-05-13",
// Additional arguments for the second stage
}
}
]
]
}LiSSA supports multiple platforms for embedding creation and language model classification. Each platform requires specific environment variables to be configured:
- openai: OpenAI's embedding models
OPENAI_ORGANIZATION_ID: Your OpenAI organization IDOPENAI_API_KEY: Your OpenAI API key
- ollama: Local Ollama embedding models
OLLAMA_EMBEDDING_HOST: The host URL for the Ollama server (required)OLLAMA_EMBEDDING_USER: Username for authentication (optional)OLLAMA_EMBEDDING_PASSWORD: Password for authentication (optional)
- openwebui: Open WebUI embedding models
OPENWEBUI_URL: The URL of the Open WebUI serverOPENWEBUI_API_KEY: Your Open WebUI API key
- onnx: Local ONNX models (no environment variables required)
- mock: Mock embedding creator for testing (no environment variables required)
Chat language models are configured by prefixing the classifier name with the platform. For example, simple_openai, reasoning_ollama, simple_openwebui, etc.
- OpenAI (
*_openai): Uses OpenAI's chat modelsOPENAI_ORGANIZATION_ID: Your OpenAI organization IDOPENAI_API_KEY: Your OpenAI API key
- Ollama (
*_ollama): Uses local Ollama chat modelsOLLAMA_HOST: The host URL for the Ollama server (required)OLLAMA_USER: Username for authentication (optional)OLLAMA_PASSWORD: Password for authentication (optional)
- Open WebUI (
*_openwebui): Uses Open WebUI chat modelsOPENWEBUI_URL: The URL of the Open WebUI serverOPENWEBUI_API_KEY: Your Open WebUI API key
- Blablador (
*_blablador): Uses Blablador's chat modelsBLABLADOR_API_KEY: Your Blablador API key
- DeepSeek (
*_deepseek): Uses DeepSeek's chat modelsDEEPSEEK_API_KEY: Your DeepSeek API key
{
"embedding_creator": {
"name": "openwebui",
"args": {
"model": "nomic-embed-text:v1.5"
}
},
"classifier": {
"name": "simple_openwebui",
"args": {
"model": "llama3:8b",
"seed": 133742243,
"temperature": 0.0
}
}
}The retrieval of similar elements in the target store is now handled by a configurable retrieval strategy. The most common strategy is cosine_similarity, which finds the most similar elements based on cosine similarity of their embeddings. You can configure the retrieval strategy and its parameters in the target_store section.
{
"source_store": {
"name": "custom",
"args": {}
},
"target_store": {
"name": "cosine_similarity", // Retrieval strategy for finding similar elements
"args": {
"max_results": "20" // Maximum number of similar elements to return, or "infinity"
}
},
"result_aggregator": {
"name": "any_connection",
"args": {
"source_granularity": 0,
"target_granularity": 0
}
}
}- The
source_storedoes not use a retrieval strategy and simply stores all source elements. - The
target_storemust specify a retrieval strategy (currently,cosine_similarityis supported). - The
max_resultsargument controls how many similar elements are returned for each query. Use"infinity"to return all elements.
For more information about using the CLI to run configurations, see the CLI documentation.