feat: add-sklearn by edwinjosechittilappilly · Pull Request #2 · Qentora/langflow

edwinjosechittilappilly · 2026-03-10T01:46:42Z

Summary by CodeRabbit

Release Notes

New Features
- Added Scikit-Learn integration with five new components: dataset loader, model trainer, data scaler, train-test splitter, and classification report generator for machine learning workflows.
- Added Scikit-Learn icon and UI support.
Chores
- Added scikit-learn dependency.

EntelligenceAI PR Summary

This PR adds comprehensive scikit-learn integration to Langflow with five new ML components and frontend branding support.

Added scikit-learn (>=1.6.1) dependency in pyproject.toml and uv.lock
Implemented SklearnDatasetComponent for loading 5 built-in datasets (iris, digits, wine, breast_cancer, diabetes)
Created SklearnModelComponent supporting 8 model types (4 classifiers, 4 regressors) with training and prediction capabilities
Added DataScalerComponent with 4 scaler types (StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler)
Implemented TrainTestSplitComponent for dataset splitting with configurable parameters
Created ClassificationReportComponent for model evaluation with dictionary and DataFrame outputs
Added Scikit-Learn logo assets (SVG, JSX, React component) and registered icon in styleUtils.ts
Note: DataScalerComponent contains duplicate fit_transform call on line 74 that should be addressed

Confidence Score: 2/5 - Changes Needed

The package naming issue (scikit-learn with hyphen) is a critical Python import problem that will cause runtime errors when attempting to import the module
New components with non-trivial ML functionality (model training, prediction, data processing) were added without any tests, creating significant risk for production bugs
These are not cosmetic issues - the import problem is a blocking technical issue and lack of tests for complex ML operations represents poor code quality and maintainability risk
While no new issues were found in current review, the two existing unresolved comments represent fundamental problems that should be addressed before merge
8 previous unresolved comment(s) likely resolved in latest diff (score-only signal; thread status unchanged)

Files requiring special attention

src/backend/base/langflow/components/scikit-learn/__init__.py
src/backend/base/langflow/components/scikit-learn/model.py

coderabbitai · 2026-03-10T01:47:10Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9d297031-d128-44e3-89e8-f4690bb3def5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

This pull request integrates scikit-learn support into Langflow by adding five ML pipeline components (dataset loading, train-test splitting, data scaling, model training, and classification evaluation), along with frontend icon registration and dependency configuration. The changes enable data manipulation and model management workflows within the Langflow framework.

Changes

Cohort / File(s)	Summary
Dependency Configuration `pyproject.toml`	Added scikit-learn>=1.6.1 as a project dependency.
Scikit-Learn Components `src/backend/base/langflow/components/scikit-learn/__init__.py`, `src/backend/base/langflow/components/scikit-learn/dataset.py`, `src/backend/base/langflow/components/scikit-learn/split.py`, `src/backend/base/langflow/components/scikit-learn/scaler.py`, `src/backend/base/langflow/components/scikit-learn/model.py`, `src/backend/base/langflow/components/scikit-learn/evaluator.py`	Created five new ML workflow components: SklearnDatasetComponent (loads iris/digits/wine/breast_cancer/diabetes datasets), TrainTestSplitComponent (splits data into train/test subsets), DataScalerComponent (scales features using StandardScaler/MinMaxScaler/RobustScaler/MaxAbsScaler), SklearnModelComponent (trains classification/regression models), and ClassificationReportComponent (generates classification metrics). Each includes input validation, error handling, and DataFrame/Data output formatting.
Frontend Icons `src/frontend/src/icons/ScikitLearn/ScikitLearnLogo.jsx`, `src/frontend/src/icons/ScikitLearn/index.tsx`	Added SVG-based Scikit-Learn logo component with forwardRef-enabled TypeScript wrapper for ref forwarding and prop spreading.
UI Integration `src/frontend/src/utils/styleUtils.ts`	Registered Scikit-Learn icon in nodeIconsLucide mapping and added Scikit-Learn entry to SIDEBAR_BUNDLES for UI navigation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	PR adds 7 new component implementations (5 backend scikit-learn components, 2 frontend icon components) with zero test files, while project has established testing conventions with 142 existing tests.	Create comprehensive test files: backend unit tests for each component (dataset, evaluator, model, scaler, split), integration tests for component workflows, and frontend tests for icon components, covering identified bugs.
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Quality And Coverage	⚠️ Warning	No test files were added for the five new scikit-learn components despite implementation of complex ML pipeline functionality.	Add comprehensive pytest test files covering data loading, model training, scaling, train-test splits, and classification reports with error handling validation.
Test File Naming And Structure	⚠️ Warning	Pull request adds 444 lines of scikit-learn component code across 6 Python files with zero test files, violating established repository testing patterns and leaving critical review issues unvalidated.	Create comprehensive test files in src/backend/tests/integration/components/scikit_learn/ using pytest and src/frontend/tests/core/features/ using Playwright, covering positive cases, error handling, edge cases, and issues flagged in review comments.
Excessive Mock Usage Warning	❓ Inconclusive	The PR does not include any test files, so excessive mock usage cannot be assessed.	Review the PR to determine if test coverage should be added for the new scikit-learn components.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add-sklearn' directly describes the main change: adding scikit-learn integration with multiple components to the codebase.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch update-sklearn

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Adds Scikit-Learn as a first-class integration, including backend components for basic ML workflows and frontend icon/bundle wiring.

Changes:

Add scikit-learn dependency (and update uv.lock).
Introduce new backend components under langflow/components/scikit-learn/ (dataset loader, scaler, train/test split, model trainer/predictor, classification report).
Add Scikit-Learn icon assets and register the bundle/icon in the frontend sidebar/icon map.

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`pyproject.toml`	Adds `scikit-learn>=1.6.1` to project dependencies.
`uv.lock`	Updates lock metadata and adds `scikit-learn`; also alters wheel lists for some packages.
`src/backend/base/langflow/components/scikit-learn/dataset.py`	New component to load built-in sklearn datasets as `Data`/`DataFrame`.
`src/backend/base/langflow/components/scikit-learn/split.py`	New component to split a `DataFrame` into train/test sets.
`src/backend/base/langflow/components/scikit-learn/scaler.py`	New component to scale features and return scaled `DataFrame` + scaler object.
`src/backend/base/langflow/components/scikit-learn/model.py`	New component to train a sklearn model and generate predictions.
`src/backend/base/langflow/components/scikit-learn/evaluator.py`	New component to generate a sklearn classification report as `Data`/`DataFrame`.
`src/backend/base/langflow/components/scikit-learn/__init__.py`	Exports the new sklearn components.
`src/frontend/src/utils/styleUtils.ts`	Registers Scikit-Learn in sidebar bundles and icon mapping.
`src/frontend/src/icons/ScikitLearn/index.tsx`	Adds the Scikit-Learn icon React wrapper.
`src/frontend/src/icons/ScikitLearn/ScikitLearnLogo.jsx`	Adds the Scikit-Learn SVG React component.
`src/frontend/src/icons/ScikitLearn/ScikitLearnLogo.svg`	Adds the raw Scikit-Learn SVG asset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-10T01:54:41Z

src/backend/base/langflow/components/scikit-learn/split.py

+        SliderInput(
+            name="test_size",
+            display_name="Test Size",
+            value=0.25,
+            range_spec=RangeSpec(min=0, max=0.9, step=0.01),
+        ),


test_size slider allows a value of 0, but sklearn.model_selection.train_test_split rejects test_size=0.0 (must be > 0). Consider setting the slider min to a small positive value (e.g. 0.01) to prevent a runtime ValueError from valid UI input.

Copilot · 2026-03-10T01:54:41Z

src/backend/base/langflow/components/scikit-learn/scaler.py

+            raise ValueError(msg)
+        self.scaler_instance = scaler_class()
+        scaled_data = self.scaler_instance.fit_transform(self.dataset_features)
+        scaled_data = self.scaler_instance.fit_transform(self.dataset_features)
+        # Add the target column back to the scaled data


fit_transform is called twice in a row on the same data, which will double the work and can change results for stateful scalers. Remove the duplicate call and only compute scaled_data once.

Copilot · 2026-03-10T01:54:42Z

src/backend/base/langflow/components/scikit-learn/model.py

+    def predict(self) -> DataFrame:
+        self.train_model()
+        if not hasattr(self, "TrainedModel"):
+            msg = "No trained model provided. Please connect a train-test split component."


predict() always calls self.train_model() first, which means generating predictions will retrain the model every time and overwrite any previously trained model. This is surprising (and expensive); consider only training when self.TrainedModel is not set, and otherwise reuse the existing model.

Copilot · 2026-03-10T01:54:42Z

src/backend/base/langflow/components/scikit-learn/model.py

+        # Get the selected model class and create an instance
+        model_class = self.AVAILABLE_MODELS[self.model_type]
+
+        model = model_class()
+


random_state is exposed as an input but isn't used when creating the model instance. Either remove the input or pass it through for models that support it so the UI setting has an effect.

Copilot · 2026-03-10T01:54:42Z

src/backend/base/langflow/components/scikit-learn/evaluator.py

+        # Generate the classification report
+        self.report_data = classification_report(
+            y_true,
+            y_pred,
+            output_dict=True
+        )


The output_dict input is ignored: classification_report(..., output_dict=True) is hard-coded. Use the self.output_dict value so the component matches its UI/description, and note that the downstream str handling in get_report_dataframe() is currently unreachable because of the hard-coded True.

Copilot · 2026-03-10T01:54:42Z

src/backend/base/langflow/components/scikit-learn/__init__.py

+from .dataset import SklearnDatasetComponent
+from .evaluator import ClassificationReportComponent
+from .model import SklearnModelComponent
+from .scaler import DataScalerComponent
+from .split import TrainTestSplitComponent
+


This package lives under components/scikit-learn/, but scikit-learn is not a valid identifier for import ... statements (you can't write import langflow.components.scikit-learn). Even if components are discovered by file scanning, this makes direct imports and tooling awkward; consider renaming the directory to scikit_learn (and adjusting any references) to match Python module naming rules.

Copilot · 2026-03-10T01:54:43Z

src/backend/base/langflow/components/scikit-learn/split.py

+    def split_dataframe_train(self) -> DataFrame:
+        self.split_dataframe_base()
+        return DataFrame(self.TrainSplit)
+
+    def split_dataframe_test(self) -> DataFrame:
+        self.split_dataframe_base()
+        return DataFrame(self.TestSplit)


Both split_dataframe_train and split_dataframe_test call split_dataframe_base(), so requesting both outputs will recompute the split twice. Caching the result (e.g., only split if TrainSplit/TestSplit are unset or inputs changed) would avoid extra work on large datasets and guarantee both outputs come from the same split operation.

Copilot · 2026-03-10T01:54:43Z

src/backend/base/langflow/components/scikit-learn/model.py

+        if not hasattr(self, "TrainedModel"):
+            msg = "No trained model provided. Please connect a train-test split component."
+            raise ValueError(msg)
+        if not isinstance(self.test_data, DataFrame):


if not hasattr(self, "TrainedModel") will never be true because TrainedModel is defined as a class attribute (initialized to None). If you want to validate training, check self.TrainedModel is None (and ideally avoid hasattr for this case).

Copilot · 2026-03-10T01:54:43Z

src/backend/base/langflow/components/scikit-learn/model.py

+class SklearnModelComponent(Component):
+    display_name = "Sklearn Model"
+    description = "Train and use scikit-learn models"
+    documentation = "https://scikit-learn.org/stable/supervised_learning.html"
+    icon = "ScikitLearn"
+    TrainedModel = None
+    Predictions = None
+    ModelPerformance = None


New scikit-learn components introduce non-trivial behavior (dataset loading, scaling, splitting, model training/prediction, report generation) but no tests were added. The repo has extensive component unit/integration tests under src/backend/tests/; adding a small unit test suite for these components would help prevent regressions (e.g., target-column handling, deterministic splits, and that predict() doesn't retrain).

Copilot · 2026-03-10T01:54:43Z

src/backend/base/langflow/components/scikit-learn/model.py

+        # Make predictions if test data is provided
+        # if self.TrainedModel is None:
+        # self.TrainedModel = self.train_model()
+        self.test_x = self.test_data.drop(self.target_column, axis=1)
+        self.test_y = self.test_data[self.target_column]
+        predictions = self.TrainedModel.predict(self.test_x)


predict() assumes self.target_column exists in test_data (it calls drop(self.target_column, ...)), but it isn't validated like it is for train_data. If the test set is missing the target column, this will raise a pandas KeyError; consider adding an explicit check and a clearer error message.

entelligence-ai-pr-reviews

Walkthrough

This PR integrates scikit-learn machine learning capabilities into the Langflow framework. It adds scikit-learn as a project dependency and introduces five new components: dataset loading (supporting iris, digits, wine, breast_cancer, diabetes datasets), model training/prediction (supporting 8 classifier and regressor types), data preprocessing (train/test splitting and feature scaling with 4 scaler types), and model evaluation (classification reports). The frontend is updated with Scikit-Learn branding assets and icon integration. All components follow Langflow's architecture patterns with proper input validation, metadata, and output methods for seamless pipeline integration.

Changes

File(s)	Summary
`pyproject.toml` `uv.lock`	Added scikit-learn (>=1.6.1) as a new project dependency and updated lockfile with package metadata. Removed platform-specific wheels for NVIDIA packages (aarch64) and legacy PyPy27 distributions.
`src/backend/base/langflow/components/scikit-learn/__init__.py`	Created initialization file that imports and exports five scikit-learn component classes: SklearnDatasetComponent, ClassificationReportComponent, SklearnModelComponent, DataScalerComponent, and TrainTestSplitComponent.
`src/backend/base/langflow/components/scikit-learn/dataset.py`	Implemented SklearnDatasetComponent for loading five built-in datasets (iris, digits, wine, breast_cancer, diabetes) with outputs as DataFrame and Data object with metadata.
`src/backend/base/langflow/components/scikit-learn/model.py`	Added SklearnModelComponent supporting 8 model types (LogisticRegression, RandomForest, SVM, DecisionTree classifiers and regressors) with train_model() and predict() methods.
`src/backend/base/langflow/components/scikit-learn/scaler.py`	Implemented DataScalerComponent with 4 scaler types (StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler) that scales features while preserving target column. Contains duplicate fit_transform call on line 74.
`src/backend/base/langflow/components/scikit-learn/split.py`	Created TrainTestSplitComponent wrapping train_test_split with configurable test_size, random_state, and target_column parameters, returning split train/test datasets.
`src/backend/base/langflow/components/scikit-learn/evaluator.py`	Added ClassificationReportComponent that generates classification reports from true and predicted labels, outputting both dictionary and DataFrame formats.
`src/frontend/src/icons/ScikitLearn/ScikitLearnLogo.jsx` `src/frontend/src/icons/ScikitLearn/ScikitLearnLogo.svg` `src/frontend/src/icons/ScikitLearn/index.tsx`	Added Scikit-Learn logo assets as React component and SVG file, with forwardRef wrapper for icon component integration.
`src/frontend/src/utils/styleUtils.ts`	Registered ScikitLearnIcon in SIDEBAR_BUNDLES and nodeIconsLucide mapping to enable Scikit-Learn components display in UI sidebar and node visualizations.

Sequence Diagram

This diagram shows the interactions between components:

sequenceDiagram
    participant Client
    participant Dataset as SklearnDatasetComponent
    participant Split as TrainTestSplitComponent
    participant Scaler as DataScalerComponent
    participant Model as SklearnModelComponent
    participant Evaluator as ClassificationReportComponent

    Note over Client,Evaluator: ML Pipeline Flow

    Client->>Dataset: load_dataset()
    activate Dataset
    Dataset-->>Client: raw_data
    deactivate Dataset

    Client->>Split: split(raw_data)
    activate Split
    Split-->>Client: X_train, X_test, y_train, y_test
    deactivate Split

    Client->>Scaler: fit(X_train)
    activate Scaler
    Scaler-->>Client: fitted_scaler
    deactivate Scaler

    Client->>Scaler: transform(X_train)
    Scaler-->>Client: X_train_scaled

    Client->>Scaler: transform(X_test)
    Scaler-->>Client: X_test_scaled

    Client->>Model: train(X_train_scaled, y_train)
    activate Model
    Model-->>Client: trained_model
    deactivate Model

    Client->>Model: predict(X_test_scaled)
    Model-->>Client: predictions

    Client->>Evaluator: generate_report(y_test, predictions)
    activate Evaluator
    Evaluator-->>Client: classification_metrics
    deactivate Evaluator

    Note over Client: Pipeline Complete

🔗 Cross-Repository Impact Analysis

Enable automatic detection of breaking changes across your dependent repositories. → Set up now

Learn more about Cross-Repository Analysis

What It Does

Automatically identifies repositories that depend on this code
Analyzes potential breaking changes across your entire codebase
Provides risk assessment before merging to prevent cross-repo issues

How to Enable

Visit Settings → Code Management
Configure repository dependencies
Future PRs will automatically include cross-repo impact analysis!

Benefits

🛡️ Prevent breaking changes across repositories
🔍 Catch integration issues before they reach production
📊 Better visibility into your multi-repo architecture

entelligence-ai-pr-reviews · 2026-03-10T01:55:27Z

src/backend/base/langflow/components/scikit-learn/split.py

+    def split_dataframe_base(self):
+        if not hasattr(self, "dataset"):
+            msg = "No dataset provided. Please connect a dataset component."
+            raise ValueError(msg)
+        # split the dataset into train and test
+        if not isinstance(self.dataset, DataFrame):
+            msg = "The dataset is not a DataFrame. Please connect a DataFrame component."
+            raise TypeError(msg)
+
+        if self.target_column not in self.dataset.columns:
+            msg = f"Error: The target column '{self.target_column}' does not exist in the dataset."
+            raise ValueError(msg)
+
+        train_df, test_df = train_test_split(self.dataset, test_size=self.test_size, random_state=self.random_state)
+        self.TrainSplit = train_df
+        self.TestSplit = test_df
+
+    def split_dataframe_train(self) -> DataFrame:
+        self.split_dataframe_base()
+        return DataFrame(self.TrainSplit)
+
+    def split_dataframe_test(self) -> DataFrame:
+        self.split_dataframe_base()
+        return DataFrame(self.TestSplit)


Correctness: The component introduces several logic issues:

Metadata Loss: It fails to propagate 'text_key' and 'default_value' from the input dataset to the output DataFrames, which breaks downstream components (like embeddings) that rely on knowing which column contains the text.

Unused Validation: It validates the existence of 'target_column' but never uses it in the split logic, causing unnecessary failures on datasets without that specific column name.

Redundant Computation: train_test_split is executed twice if both outputs are connected, which is inefficient.

🤖 AI Agent Prompt for Cursor/Windsurf

📋 Copy this prompt to your AI coding assistant (Cursor, Windsurf, etc.) to get help fixing this issue

In `src/backend/base/langflow/components/scikit-learn/split.py`, refactor the split methods to: 1. Cache the split result in `self.TrainSplit` to avoid redundant computation when multiple outputs are used. 2. Remove the unused `target_column` validation block (lines 58-60). 3. Preserve `text_key` and `default_value` metadata when creating the output `DataFrame` objects by passing them from `self.dataset` to the `DataFrame` constructor.

…e compatibility (langflow-ai#10008) * fix: Restore Agent memory functionality by fixing inheritance and type compatibility This commit fixes two critical bugs that completely broke Agent memory in the main branch: ## Bug #1: Inheritance Method Call Error - Fixed incorrect method calls in Agent component inheritance - Changed `get_base_inputs()` to `_base_inputs` in: - src/lfx/src/lfx/components/agents/agent.py:157 - src/lfx/src/lfx/base/agents/agent.py:229 ## Bug #2: Message Type Incompatibility - Fixed type checking in Agent base class to handle both Message types - Memory returns `langflow.schema.message.Message` but Agent expected `lfx.schema.message.Message` - Updated type check to use duck typing instead of strict isinstance check - Changed in src/lfx/src/lfx/base/agents/agent.py:148-150 ## Impact - Agents can now remember conversation context across messages - Memory functionality restored to same level as release-1.6.0 - Fixes issue where agents would forget user information immediately ## Test Results - Before: Agent says "I don't have access to your name or occupation" - After: Agent says "Your name is VICTORY TEST, and you work as a memory bug hunter" 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * fix: Update data_to_messages function to accept both Data and Message types This commit modifies the `data_to_messages` function to accept a list of both `Data` and `Message` types, enhancing type compatibility. The function's docstring has been updated to reflect the new input type and return type, ensuring clarity in its usage. * fix: improve message validation in Agent This commit updates the chat history processing in the LCAgentComponent to ensure that only messages with valid 'text' data are included. The method now checks for the presence of 'text' in the message data before converting it to the appropriate format. Additionally, the base input retrieval method has been changed from `_base_inputs` to `get_base_inputs()` for consistency and clarity. * fix: enhance chat history validation to support Data type * fix: improve input handling to support dynamic message conversion --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Gabriel Luiz Freitas Almeida <gabriel@langflow.org> Co-authored-by: Edwin Jose <edwin.jose@datastax.com>

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (1)

src/backend/base/langflow/components/scikit-learn/model.py (1)

93-97: Reuse the trained estimator instead of fitting again in predict().

Line 94 retrains on every prediction request. That doubles fit cost when both outputs are resolved and can overwrite previously trained state. Train lazily only when self.TrainedModel is still None.

Proposed fix

     def predict(self) -> DataFrame:
-        self.train_model()
-        if not hasattr(self, "TrainedModel"):
+        if self.TrainedModel is None:
+            self.train_model()
+        if self.TrainedModel is None:
             msg = "No trained model provided. Please connect a train-test split component."
             raise ValueError(msg)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/backend/base/langflow/components/scikit-learn/model.py` around lines 93 -
97, The predict() method currently calls train_model() unconditionally and
retrains on every prediction; change predict() to only call self.train_model()
when self.TrainedModel is None (i.e., train lazily), and afterwards proceed to
use self.TrainedModel for prediction or raise the existing ValueError if it's
still not set. Update references in predict() to rely on the trained estimator
attribute TrainedModel rather than refitting each call.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Line 118: The langflow-base package is missing the scikit-learn runtime
dependency, causing imports under langflow/components/scikit-learn to fail for
standalone installs; add "scikit-learn>=1.6.1" to the langflow-base
pyproject.toml dependencies (the base package's pyproject.toml) so the
dependency is declared in the package manifest and will be installed with
langflow-base.

In `@src/backend/base/langflow/components/scikit-learn/evaluator.py`:
- Around line 15-16: The code currently caches report_data and only regenerates
it when report_data is None, which causes stale outputs if y_true, y_pred, or
output_dict change; fix it by removing the persistent cache or resetting it
whenever inputs change: either compute and return the classification report
fresh each call instead of using the module-level/instance variable report_data,
or ensure any setter or update path for y_true, y_pred, or output_dict clears
report_data (sets it to None) before the getters run; locate the report_data
variable and the getters that read it as well as code paths that assign y_true,
y_pred, and output_dict and apply one of these fixes to lines referenced around
the report_data usage (including the block at ~70-80).
- Around line 30-35: The UI BoolInput named "output_dict" is never used because
generate_report() hardcodes output_dict=True when calling
sklearn.metrics.classification_report, making the string branch unreachable;
update generate_report() to read the node input value (output_dict) and pass
that boolean into classification_report(...) instead of the hardcoded True, and
ensure downstream logic still handles both dict and string outputs (the existing
string-handling branch in generate_report remains applicable when output_dict is
False).

In `@src/backend/base/langflow/components/scikit-learn/model.py`:
- Around line 64-74: The train_model method currently requires self.test_data
and raises if it's missing which breaks training-only flows; remove or defer the
test_data isinstance check from train_model (keep the train_data existence/type
checks and the ValueError/TypeError for it) and move the test_data validation
into the predict method (or wherever model prediction is performed) so that the
Trained Model output can be produced without a wired test frame; update
predict() to raise a clear TypeError/ValueError if self.test_data is missing or
not a DataFrame before using it.
- Around line 98-105: The code currently assumes self.target_column exists in
self.test_data and always sets self.test_y, which breaks inference on unlabeled
feature frames; update the block in the method that handles self.test_data
(checking isinstance(self.test_data, DataFrame)) so that you set self.test_x by
dropping self.target_column only if self.target_column is present in
self.test_data.columns, otherwise set self.test_x to the entire DataFrame, and
only assign self.test_y if the target column exists; keep the TypeError for
non-DataFrame inputs and remove the unconditional read of self.test_y.
- Around line 45-50: The component exposes IntInput name="random_state" but the
estimator constructors are created without using it; update the estimator
instantiation code (where each estimator is constructed — e.g., the lines that
create LogisticRegression, RandomForestClassifier, SVC, DecisionTreeClassifier,
RandomForestRegressor, DecisionTreeRegressor) to pass
random_state=int(random_state) (or random_state=None when input is None/empty)
into their constructors so the user-specified reproducibility value is wired
into the created estimator; ensure you convert the IntInput value to an int and
only add the argument for estimators that accept random_state.

In `@src/backend/base/langflow/components/scikit-learn/scaler.py`:
- Around line 15-16: The component currently memoizes scaled_data and
scaler_instance on the instance (variables scaled_data and scaler_instance) so
subsequent executions can reuse stale state; modify the component to clear or
reinitialize these cached attributes at the start of each execution (e.g., in
the main execute/run method) or change the getters to recompute when inputs
change (scaler_type, target_column, or input dataset) — ensure you also clear
any other cached attributes referenced around lines 84-93 the same way so each
run recreates scaler_instance and recomputes scaled_data from the current
dataset/parameters.
- Around line 52-82: scale_data currently always fits a new scaler (using
SCALER_MAPPING and scaler_instance) which causes data leakage; change scale_data
to (1) accept an optional pre-fitted scaler_instance or a mode flag (e.g.,
fit=True/False) on the component, (2) if a fitted scaler_instance is provided or
fit=False call scaler_instance.transform(...) instead of fit_transform, and (3)
only call fit_transform when no fitted scaler is supplied and fit=True; ensure
you remove the duplicate fit_transform call and keep handling of target_column,
dataset_features, scaled_data, concat_data and scaled_data assignment intact so
transforms preserve original column order.

In `@src/backend/base/langflow/components/scikit-learn/split.py`:
- Around line 24-29: The SliderInput for the "test_size" parameter currently
allows min=0 which can pass 0 to train_test_split and cause a ValueError; update
the RangeSpec in the SliderInput (name="test_size") to use min=0.01 instead of
min=0 so the slider only permits values in (0.0,1.0), leaving the default value
(value=0.25) unchanged.

---

Nitpick comments:
In `@src/backend/base/langflow/components/scikit-learn/model.py`:
- Around line 93-97: The predict() method currently calls train_model()
unconditionally and retrains on every prediction; change predict() to only call
self.train_model() when self.TrainedModel is None (i.e., train lazily), and
afterwards proceed to use self.TrainedModel for prediction or raise the existing
ValueError if it's still not set. Update references in predict() to rely on the
trained estimator attribute TrainedModel rather than refitting each call.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 178aa6b9-de4c-4adf-8362-ac0e3f5e0806

📥 Commits

Reviewing files that changed from the base of the PR and between 5220c22 and 9e194d1.

⛔ Files ignored due to path filters (2)

src/frontend/src/icons/ScikitLearn/ScikitLearnLogo.svg is excluded by !**/*.svg
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (10)

pyproject.toml
src/backend/base/langflow/components/scikit-learn/__init__.py
src/backend/base/langflow/components/scikit-learn/dataset.py
src/backend/base/langflow/components/scikit-learn/evaluator.py
src/backend/base/langflow/components/scikit-learn/model.py
src/backend/base/langflow/components/scikit-learn/scaler.py
src/backend/base/langflow/components/scikit-learn/split.py
src/frontend/src/icons/ScikitLearn/ScikitLearnLogo.jsx
src/frontend/src/icons/ScikitLearn/index.tsx
src/frontend/src/utils/styleUtils.ts

coderabbitai · 2026-03-10T01:57:26Z

pyproject.toml

+    "apify-client>=1.8.1",
+    "pylint>=3.3.4",
+    "ruff>=0.9.7",
+    "scikit-learn>=1.6.1",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -e echo "== Workspace/package manifests ==" fd '^pyproject\.toml$' echo echo "== sklearn imports under src/backend/base ==" rg -n --glob '*.py' '\bfrom sklearn\b|\bimport sklearn\b' src/backend/base echo echo "== scikit-learn declarations across manifests ==" rg -n --glob 'pyproject.toml' 'name\s*=\s*"langflow-base"|name\s*=\s*"langflow"|scikit-learn' .

Repository: Qentora/langflow

Length of output: 1548

🏁 Script executed:

cat src/backend/base/pyproject.toml

Repository: Qentora/langflow

Length of output: 7294

Add scikit-learn>=1.6.1 to src/backend/base/pyproject.toml dependencies.

The sklearn imports in src/backend/base/langflow/components/scikit-learn/ are missing from that package's manifest. Since langflow-base is independently published, standalone installs will fail to import sklearn at runtime. The dependency must be declared in src/backend/base/pyproject.toml, not only in the root package.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 118, The langflow-base package is missing the scikit-learn runtime dependency, causing imports under langflow/components/scikit-learn to fail for standalone installs; add "scikit-learn>=1.6.1" to the langflow-base pyproject.toml dependencies (the base package's pyproject.toml) so the dependency is declared in the package manifest and will be installed with langflow-base.

coderabbitai · 2026-03-10T01:57:26Z

src/backend/base/langflow/components/scikit-learn/evaluator.py

+    report_data = None
+


⚠️ Potential issue | 🟠 Major

Don't cache report_data across executions.

The getters only regenerate when report_data is None. If this component instance is reused, changing y_true, y_pred, or output_dict can still return the previous report.

Also applies to: 70-80

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/components/scikit-learn/evaluator.py` around lines 15 - 16, The code currently caches report_data and only regenerates it when report_data is None, which causes stale outputs if y_true, y_pred, or output_dict change; fix it by removing the persistent cache or resetting it whenever inputs change: either compute and return the classification report fresh each call instead of using the module-level/instance variable report_data, or ensure any setter or update path for y_true, y_pred, or output_dict clears report_data (sets it to None) before the getters run; locate the report_data variable and the getters that read it as well as code paths that assign y_true, y_pred, and output_dict and apply one of these fixes to lines referenced around the report_data usage (including the block at ~70-80).

coderabbitai · 2026-03-10T01:57:26Z

src/backend/base/langflow/components/scikit-learn/evaluator.py

+        BoolInput(
+            name="output_dict",
+            display_name="Output as Dictionary",
+            info="If True, return output as dict instead of string",
+            value=True,
+        ),


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/backend/base/langflow/components/scikit-learn/evaluator.py

Repository: Qentora/langflow

Length of output: 4112

🌐 Web query:

sklearn classification_report output_dict parameter

💡 Result:

In sklearn.metrics.classification_report, the parameter output_dict controls the return type:

output_dict=False (default): returns a formatted text report (string).

output_dict=True: returns a Python dict (nested) with per-class and aggregate metrics. Each class label maps to {'precision', 'recall', 'f1-score', 'support'}; aggregate keys include things like 'macro avg', 'weighted avg' (and sometimes 'micro avg' / 'samples avg' depending on the problem). Also, when output_dict=True, digits is ignored (values are not rounded). [1]

Example:

from sklearn.metrics import classification_report report = classification_report(y_true, y_pred, output_dict=True) f1_class0 = report["0"]["f1-score"] acc = report["accuracy"]

Caveat: the dict keys include fixed names like 'accuracy', so if you have a class label named "accuracy", it can collide and make the output “brittle.” [2]

Sources: [1] scikit-learn docs for classification_report (sklearn.org), [2] scikit-learn GitHub issue discussing output_dict=True brittleness (github.com)

Citations:

1: https://sklearn.org/stable/modules/generated/sklearn.metrics.classification_report.html

2: classification_report with output_dict=True leads to brittle output scikit-learn/scikit-learn#29205

Wire the output_dict input into classification_report().

The UI exposes output_dict as a BoolInput, but generate_report() always passes output_dict=True. This makes the input dead—toggling it has no effect, and the string-handling branch (lines 81–83) is unreachable.

Proposed fix

self.report_data = classification_report( y_true, y_pred, - output_dict=True + output_dict=self.output_dict, )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/components/scikit-learn/evaluator.py` around lines 30 - 35, The UI BoolInput named "output_dict" is never used because generate_report() hardcodes output_dict=True when calling sklearn.metrics.classification_report, making the string branch unreachable; update generate_report() to read the node input value (output_dict) and pass that boolean into classification_report(...) instead of the hardcoded True, and ensure downstream logic still handles both dict and string outputs (the existing string-handling branch in generate_report remains applicable when output_dict is False).

coderabbitai · 2026-03-10T01:57:26Z

src/backend/base/langflow/components/scikit-learn/model.py

+        IntInput(
+            name="random_state",
+            display_name="Random State",
+            value=42,
+            info="Random state for reproducibility",
+        ),


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In scikit-learn 1.6.1, which of these estimators accept a random_state constructor argument: LogisticRegression, RandomForestClassifier, SVC, DecisionTreeClassifier, LinearRegression, RandomForestRegressor, SVR, DecisionTreeRegressor?

💡 Result:

In scikit-learn 1.6.1, these accept random_state in the constructor:

Yes: LogisticRegression [1], RandomForestClassifier [2], SVC [3], DecisionTreeClassifier [4], RandomForestRegressor [6], DecisionTreeRegressor [8]

No: LinearRegression [5], SVR [7]

Sources: [1] https://scikit-learn.org/1.6/modules/generated/sklearn.linear_model.LogisticRegression.html, [2] https://scikit-learn.org/1.6/modules/generated/sklearn.ensemble.RandomForestClassifier.html, [3] https://scikit-learn.org/1.6/modules/generated/sklearn.svm.SVC.html, [4] https://scikit-learn.org/1.6/modules/generated/sklearn.tree.DecisionTreeClassifier.html, [5] https://scikit-learn.org/1.6/modules/generated/sklearn.linear_model.LinearRegression.html, [6] https://scikit-learn.org/1.6/modules/generated/sklearn.ensemble.RandomForestRegressor.html, [7] https://scikit-learn.org/1.6/modules/generated/sklearn.svm.SVR.html, [8] https://scikit-learn.org/1.6/modules/generated/sklearn.tree.DecisionTreeRegressor.html

🏁 Script executed:

find . -type f -name "*.py" -path "*/scikit-learn/*" | head -20

Repository: Qentora/langflow

Length of output: 435

🏁 Script executed:

cat -n src/backend/base/langflow/components/scikit-learn/model.py | sed -n '40,95p'

Repository: Qentora/langflow

Length of output: 2552

🏁 Script executed:

rg "AVAILABLE_MODELS" src/backend/base/langflow/components/scikit-learn/ -A 20

Repository: Qentora/langflow

Length of output: 6042

Wire random_state into estimator construction.

The component exposes a reproducibility input on lines 45-50, but line 85 instantiates the model without it. Six of the eight available estimators (LogisticRegression, RandomForestClassifier, SVC, DecisionTreeClassifier, RandomForestRegressor, DecisionTreeRegressor) accept random_state, so users setting this input still receive non-reproducible models.

Proposed fix

- model = model_class() + model_kwargs = {} + if "random_state" in model_class().get_params(): + model_kwargs["random_state"] = self.random_state + model = model_class(**model_kwargs)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/components/scikit-learn/model.py` around lines 45 - 50, The component exposes IntInput name="random_state" but the estimator constructors are created without using it; update the estimator instantiation code (where each estimator is constructed — e.g., the lines that create LogisticRegression, RandomForestClassifier, SVC, DecisionTreeClassifier, RandomForestRegressor, DecisionTreeRegressor) to pass random_state=int(random_state) (or random_state=None when input is None/empty) into their constructors so the user-specified reproducibility value is wired into the created estimator; ensure you convert the IntInput value to an int and only add the argument for estimators that accept random_state.

coderabbitai · 2026-03-10T01:57:26Z

src/backend/base/langflow/components/scikit-learn/model.py

+    def train_model(self) -> BaseEstimator:
+        if not hasattr(self, "train_data"):
+            msg = "No training data provided. Please connect a train-test split component."
+            raise ValueError(msg)
+
+        if not isinstance(self.train_data, DataFrame):
+            msg = "The training data is not a DataFrame. Please connect a DataFrame component."
+            raise TypeError(msg)
+        if not isinstance(self.test_data, DataFrame):
+            msg = "The test data is not a DataFrame. Please connect a DataFrame component."
+            raise TypeError(msg)


⚠️ Potential issue | 🟠 Major

Don’t require test_data to build the model output.

Line 72 makes train_model() fail when no test frame is wired, even though the Trained Model output only needs train_data. In the current form, training-only flows will raise before fitting. Move test_data validation into predict().

Proposed fix

def train_model(self) -> BaseEstimator: if not hasattr(self, "train_data"): msg = "No training data provided. Please connect a train-test split component." raise ValueError(msg) if not isinstance(self.train_data, DataFrame): msg = "The training data is not a DataFrame. Please connect a DataFrame component." raise TypeError(msg) - if not isinstance(self.test_data, DataFrame): - msg = "The test data is not a DataFrame. Please connect a DataFrame component." - raise TypeError(msg) if self.target_column not in self.train_data.columns: msg = f"Error: The target column '{self.target_column}' does not exist in the training data." raise ValueError(msg)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/components/scikit-learn/model.py` around lines 64 - 74, The train_model method currently requires self.test_data and raises if it's missing which breaks training-only flows; remove or defer the test_data isinstance check from train_model (keep the train_data existence/type checks and the ValueError/TypeError for it) and move the test_data validation into the predict method (or wherever model prediction is performed) so that the Trained Model output can be produced without a wired test frame; update predict() to raise a clear TypeError/ValueError if self.test_data is missing or not a DataFrame before using it.

coderabbitai · 2026-03-10T01:57:26Z

src/backend/base/langflow/components/scikit-learn/model.py

+        if not isinstance(self.test_data, DataFrame):
+            msg = "The test data is not a DataFrame. Please connect a DataFrame component."
+            raise TypeError(msg)
+        # Make predictions if test data is provided
+        # if self.TrainedModel is None:
+        # self.TrainedModel = self.train_model()
+        self.test_x = self.test_data.drop(self.target_column, axis=1)
+        self.test_y = self.test_data[self.target_column]


⚠️ Potential issue | 🟠 Major

Allow prediction on unlabeled feature frames.

Lines 104-105 require target_column to be present in test_data, but this method never uses self.test_y. That breaks the common inference case where the input only contains features. Drop the target column only if it exists, and remove the unused label read.

Proposed fix

if not isinstance(self.test_data, DataFrame): msg = "The test data is not a DataFrame. Please connect a DataFrame component." raise TypeError(msg) # Make predictions if test data is provided # if self.TrainedModel is None: # self.TrainedModel = self.train_model() - self.test_x = self.test_data.drop(self.target_column, axis=1) - self.test_y = self.test_data[self.target_column] + self.test_x = self.test_data.drop(columns=[self.target_column], errors="ignore") predictions = self.TrainedModel.predict(self.test_x)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if not isinstance(self.test_data, DataFrame):

msg = "The test data is not a DataFrame. Please connect a DataFrame component."

raise TypeError(msg)

# Make predictions if test data is provided

# if self.TrainedModel is None:

# self.TrainedModel = self.train_model()

self.test_x = self.test_data.drop(self.target_column, axis=1)

self.test_y = self.test_data[self.target_column]

if not isinstance(self.test_data, DataFrame):

msg = "The test data is not a DataFrame. Please connect a DataFrame component."

raise TypeError(msg)

# Make predictions if test data is provided

# if self.TrainedModel is None:

# self.TrainedModel = self.train_model()

self.test_x = self.test_data.drop(columns=[self.target_column], errors="ignore")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/components/scikit-learn/model.py` around lines 98 - 105, The code currently assumes self.target_column exists in self.test_data and always sets self.test_y, which breaks inference on unlabeled feature frames; update the block in the method that handles self.test_data (checking isinstance(self.test_data, DataFrame)) so that you set self.test_x by dropping self.target_column only if self.target_column is present in self.test_data.columns, otherwise set self.test_x to the entire DataFrame, and only assign self.test_y if the target column exists; keep the TypeError for non-DataFrame inputs and remove the unconditional read of self.test_y.

coderabbitai · 2026-03-10T01:57:26Z

src/backend/base/langflow/components/scikit-learn/scaler.py

+    scaled_data = None
+    scaler_instance = None


⚠️ Potential issue | 🟠 Major

Reset cached scaler state between executions.

scaled_data and scaler_instance are memoized on the component instance, and the getters only recompute once. If this component is reused with a different dataset, scaler_type, or target_column, it can return the previous run's frame/scaler.

Also applies to: 84-93

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/components/scikit-learn/scaler.py` around lines 15 - 16, The component currently memoizes scaled_data and scaler_instance on the instance (variables scaled_data and scaler_instance) so subsequent executions can reuse stale state; modify the component to clear or reinitialize these cached attributes at the start of each execution (e.g., in the main execute/run method) or change the getters to recompute when inputs change (scaler_type, target_column, or input dataset) — ensure you also clear any other cached attributes referenced around lines 84-93 the same way so each run recreates scaler_instance and recomputes scaled_data from the current dataset/parameters.

coderabbitai · 2026-03-10T01:57:26Z

src/backend/base/langflow/components/scikit-learn/scaler.py

+    def scale_data(self):
+        if not hasattr(self, "dataset"):
+            msg = "No dataset provided. Please connect a dataset component."
+            raise ValueError(msg)
+
+        if not isinstance(self.dataset, DataFrame):
+            msg = "The dataset is not a DataFrame. Please connect a DataFrame component."
+            raise TypeError(msg)
+
+        # Create the appropriate scaler instance
+        scaler_class = self.SCALER_MAPPING.get(self.scaler_type)
+        if scaler_class is None:
+            msg = f"Invalid scaler type: {self.scaler_type}"
+            raise ValueError(msg)
+
+        # Initialize and fit the scaler
+        if self.target_column in self.dataset.columns:
+            self.dataset_features = self.dataset.drop(self.target_column, axis=1)
+        else:
+            msg = "Target column not found in dataset"
+            raise ValueError(msg)
+        self.scaler_instance = scaler_class()
+        scaled_data = self.scaler_instance.fit_transform(self.dataset_features)
+        scaled_data = self.scaler_instance.fit_transform(self.dataset_features)
+        # Add the target column back to the scaled data
+        scaled_data = pd.DataFrame(scaled_data, columns=self.dataset_features.columns)
+        concat_data = pd.concat([scaled_data, self.dataset[self.target_column]], axis=1)
+
+        # Convert to DataFrame with original column names
+        scale_df = pd.DataFrame(concat_data, columns=self.dataset.columns)
+        self.scaled_data = DataFrame(scale_df)


⚠️ Potential issue | 🟠 Major

This API bakes data leakage into train/test workflows.

scale_data() always calls fit_transform() on the provided dataset, but the component has no way to accept an already-fitted scaler and run transform() on a held-out/test dataset. With TrainTestSplitComponent in the same feature set, the only supported flows are either scaling before the split (leakage) or fitting separate scalers per split (invalid metrics).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/components/scikit-learn/scaler.py` around lines 52 - 82, scale_data currently always fits a new scaler (using SCALER_MAPPING and scaler_instance) which causes data leakage; change scale_data to (1) accept an optional pre-fitted scaler_instance or a mode flag (e.g., fit=True/False) on the component, (2) if a fitted scaler_instance is provided or fit=False call scaler_instance.transform(...) instead of fit_transform, and (3) only call fit_transform when no fitted scaler is supplied and fit=True; ensure you remove the duplicate fit_transform call and keep handling of target_column, dataset_features, scaled_data, concat_data and scaled_data assignment intact so transforms preserve original column order.

coderabbitai · 2026-03-10T01:57:26Z

src/backend/base/langflow/components/scikit-learn/split.py

+        SliderInput(
+            name="test_size",
+            display_name="Test Size",
+            value=0.25,
+            range_spec=RangeSpec(min=0, max=0.9, step=0.01),
+        ),


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/backend/base/langflow/components/scikit-learn/split.py | head -40

Repository: Qentora/langflow

Length of output: 1737

🌐 Web query:

sklearn train_test_split test_size=0 parameter validation constraints

💡 Result:

In sklearn.model_selection.train_test_split, test_size is validated so the split is non-empty and the value is in-range:

If test_size is a float: it must be in the (0.0, 1.0) proportion range (docs say “between 0.0 and 1.0” and the implementation enforces it cannot be 0). Passing test_size=0.0 triggers a ValueError. [1]

If test_size is an int: it must be >= 1 and < n_samples (otherwise you’d get an empty train or test set). The common error for test_size=0 is: “test_size=0 should be either positive and smaller than the number of samples … or a float in the (0,1) range”. [2]

If both train_size and test_size are given: they must be compatible (for floats, their sum must be <= 1.0; for ints, <= n_samples), and the resulting train/test sizes must each be at least 1 sample (else you’ll get the “resulting train set will be empty” style error). [3]

Defaults: if both are None, test_size defaults to 0.25. [1]

Sources: scikit-learn train_test_split docs [1], Stack Overflow error example for test_size=0 [2], scikit-learn error behavior example referencing _validate_shuffle_split (“resulting train set will be empty”) [3].

Change min=0 to min=0.01 in the slider's range specification.

train_test_split() requires test_size to be strictly in the range (0.0, 1.0). The current slider allows test_size=0, which will fail at runtime with a ValueError.

Proposed fix

SliderInput( name="test_size", display_name="Test Size", value=0.25, - range_spec=RangeSpec(min=0, max=0.9, step=0.01), + range_spec=RangeSpec(min=0.01, max=0.9, step=0.01), ),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/backend/base/langflow/components/scikit-learn/split.py` around lines 24 - 29, The SliderInput for the "test_size" parameter currently allows min=0 which can pass 0 to train_test_split and cause a ValueError; update the RangeSpec in the SliderInput (name="test_size") to use min=0.01 instead of min=0 so the slider only permits values in (0.0,1.0), leaving the default value (value=0.25) unchanged.

github-actions · 2026-03-10T01:58:23Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	23.75% (8457/35606)	16.57% (4643/28013)	16.54% (1236/7472)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
2679	0 💤	0 ❌	0 🔥	45.73s ⏱️

codecov · 2026-03-10T02:01:30Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

edwinjosechittilappilly added 5 commits February 27, 2025 03:00

adding scikitlearn models

3fbaec7

Merge remote-tracking branch 'upstream/main' into feat-sklearn-trials

9d56393

update

e5f275f

add sk learn evaluator component

5aa9a4f

update to the componnents

9e194d1

Copilot AI review requested due to automatic review settings March 10, 2026 01:46

github-actions bot added the community label Mar 10, 2026

Copilot started reviewing on behalf of edwinjosechittilappilly March 10, 2026 01:47 View session

github-actions bot added the enhancement New feature or request label Mar 10, 2026

Copilot AI reviewed Mar 10, 2026

View reviewed changes

entelligence-ai-pr-reviews bot reviewed Mar 10, 2026

View reviewed changes

Merge branch 'main' into update-sklearn

27ad9f3

github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 10, 2026

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

[autofix.ci] apply automated fixes

2266091

github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 10, 2026

Conversation

edwinjosechittilappilly commented Mar 10, 2026 • edited by entelligence-ai-pr-reviews bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

EntelligenceAI PR Summary

Confidence Score: 2/5 - Changes Needed

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Pre-merge checks failed

❌ Failed checks (1 error, 3 warnings, 1 inconclusive)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

entelligence-ai-pr-reviews bot left a comment

Choose a reason for hiding this comment

Walkthrough

Changes

Sequence Diagram

🔗 Cross-Repository Impact Analysis

What It Does

How to Enable

Benefits

Uh oh!

entelligence-ai-pr-reviews bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

edwinjosechittilappilly commented Mar 10, 2026 •

edited by entelligence-ai-pr-reviews bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading