Add `MultiLabelBinarizerTransformer` to New Transformers Module

## Description

Add a new `MultiLabelBinarizerTransformer` class that wraps sklearn's `MultiLabelBinarizer` to make it fully compatible with sklearn pipelines. This transformer should be added to a new `transformers` module in the package.

## Motivation

Currently, sklearn's `MultiLabelBinarizer` has limitations when used in modern sklearn pipelines:

1. **Missing `get_feature_names_out` method**: Unlike most sklearn transformers, `MultiLabelBinarizer` doesn't implement `get_feature_names_out`, which was standardized in sklearn 1.0+ (SLEP007). This breaks feature name propagation through pipelines and prevents integration with tools that rely on feature names.

2. **Input handling inconsistency**: The transformer doesn't gracefully handle both list and array-like inputs without preprocessing.

3. **Type compatibility**: Outputs may need conversion to float64 for downstream pipeline components that expect numeric dtypes.

This wrapper class solves these problems by:
- Implementing the complete transformer interface including `get_feature_names_out`
- Handling both list and non-list inputs automatically
- Converting output to float64 for compatibility with downstream components
- Providing meaningful feature names based on the label classes

## Proposed Implementation

### New Module Structure
Create a new module: `ds_utils.transformers`

This module will house sklearn-compatible transformer wrappers and extensions.

### Class: MultiLabelBinarizerTransformer

**Inherits from:** `BaseEstimator`, `TransformerMixin`

**Suggested Implementation:**

```python
from sklearn.preprocessing import MultiLabelBinarizer

 class MultiLabelBinarizerTransformer(BaseEstimator, TransformerMixin):
     def __init__(self):
         self.mlb = MultiLabelBinarizer()
     
     def _sanitize_column_name(self, name):
         """Sanitize column name to remove invalid characters for Delta tables.
         
         Invalid characters: space, comma, semicolon, braces, parentheses, newline, tab, equals
         """
         import re
         # Convert to string if not already
         name_str = str(name)
         # Replace invalid characters ( ,;{}()\n\t=) with underscore
         sanitized = re.sub(r'[ ,;{}()\n\t=]', '_', name_str)
         # Replace multiple consecutive underscores with a single underscore
         sanitized = re.sub(r'_+', '_', sanitized)
         # Remove leading/trailing underscores
         sanitized = sanitized.strip('_')
         return sanitized
     
     def _handle_none_values(self, X):
         """Convert None/NaN values to empty lists for MultiLabelBinarizer"""
         
         if hasattr(X, 'tolist'):
             X_list = X.tolist()
         else:
             X_list = list(X)
         
         # Handle None/NaN values - convert to empty list
         processed = []
         for item in X_list:
             if item is None or (isinstance(item, float) and pd.isna(item)):
                 processed.append([])
             elif isinstance(item, np.ndarray):
                 # Convert numpy array to list and ensure all items are hashable
                 item_list = item.tolist()
                 if isinstance(item_list, list):
                     # Filter and ensure hashable
                     cleaned = []
                     for x in item_list:
                         if isinstance(x, np.ndarray):
                             x = x.item() if x.size == 1 else x.tolist()
                         if isinstance(x, (str, int, float, bool)) and x is not None and not (isinstance(x, float) and pd.isna(x)):
                             cleaned.append(x)
                     processed.append(cleaned)
                 else:
                     # Single value from array
                     if isinstance(item_list, (str, int, float, bool)) and item_list is not None:
                         processed.append([item_list])
                     else:
                         processed.append([])
             elif isinstance(item, list):
                 # Filter out None values from lists, and convert any numpy arrays
                 cleaned = []
                 for x in item:
                     if isinstance(x, np.ndarray):
                         x = x.item() if x.size == 1 else x.tolist()
                         if isinstance(x, list):
                             cleaned.extend([y for y in x if isinstance(y, (str, int, float, bool)) and y is not None])
                         elif isinstance(x, (str, int, float, bool)) and x is not None:
                             cleaned.append(x)
                     elif isinstance(x, (str, int, float, bool)) and x is not None and not (isinstance(x, float) and pd.isna(x)):
                         cleaned.append(x)
                 processed.append(cleaned)
             else:
                 # If it's a single hashable value, wrap it in a list
                 if isinstance(item, (str, int, float, bool)) and item is not None and not (isinstance(item, float) and pd.isna(item)):
                     processed.append([item])
                 else:
                     processed.append([])
         
         return processed
     
     def fit(self, X, y=None):
         processed_X = self._handle_none_values(X)
         self.mlb.fit(processed_X)
         return self
     
     def transform(self, X):
         processed_X = self._handle_none_values(X)
         result = self.mlb.transform(processed_X)
         return result.astype('float64')
     
     def fit_transform(self, X, y=None):
         return self.fit(X, y).transform(X)
     
     def get_feature_names_out(self, input_features=None):
         prefix = input_features[0] if input_features and len(input_features) > 0 else "label"
         # Sanitize label names to remove invalid characters for Delta tables
         sanitized_labels = [self._sanitize_column_name(label) for label in self.mlb.classes_]
         return [f"{prefix}_{label}" for label in sanitized_labels]
```

**Key Features:**
- Automatic conversion of non-list inputs to lists using `hasattr(X, 'tolist')`
- Returns float64 arrays for better pipeline compatibility
- **Proper implementation of `get_feature_names_out`**: Returns feature names based on `self.mlb.classes_`, following sklearn conventions
- Handles `input_features` parameter to customize the prefix for feature names
- Feature names follow the pattern `{prefix}_{label}` for each label class

**Note on `get_feature_names_out` Implementation:**
The implementation uses the `input_features` parameter to determine the prefix:
- If `input_features` is `None`, uses `"label"` as default prefix
- If `input_features` is provided, uses the first feature name as prefix
- Returns a list of strings in the format `f"{prefix}_{label}"` for each label in `self.mlb.classes_`

## Implementation Checklist

- [ ] Create new `ds_utils/transformers.py` module with appropriate docstring
- [ ] Implement `MultiLabelBinarizerTransformer` class with complete docstrings
- [ ] Implement `get_feature_names_out` method following sklearn API conventions
- [ ] Add comprehensive unit tests covering:
  - Basic fit/transform functionality
  - Pipeline integration
  - List and array inputs
  - **Feature name generation via `get_feature_names_out`**
  - **Integration with `set_output(transform="pandas")`** 
  - Edge cases (empty labels, single labels, etc.)
- [ ] Add documentation:
  - Docstrings for all methods (following numpy/sklearn docstring format)
  - Usage examples in module documentation
  - Update README if appropriate
- [ ] Update package `__init__.py` to expose the new module

## Example Usage

```python
from ds_utils.transformers import MultiLabelBinarizerTransformer
from sklearn.pipeline import Pipeline
import pandas as pd

# Basic usage
mlb_transformer = MultiLabelBinarizerTransformer()
X = [['sci-fi', 'action'], ['romance'], ['action', 'comedy']]
X_transformed = mlb_transformer.fit_transform(X)

# Get feature names
feature_names = mlb_transformer.get_feature_names_out()
print(feature_names)  # ['label_action', 'label_comedy', 'label_romance', 'label_sci-fi']

# In a pipeline with pandas output
pipeline = Pipeline([
    ('mlb', MultiLabelBinarizerTransformer()),
    # other transformers...
])
pipeline.set_output(transform="pandas")
df_transformed = pipeline.fit_transform(X)
print(df_transformed.columns)  # Will show the feature names

# In a full ML pipeline
from sklearn.ensemble import RandomForestClassifier
full_pipeline = Pipeline([
    ('mlb', MultiLabelBinarizerTransformer()),
    ('classifier', RandomForestClassifier())
])
full_pipeline.fit(X, y)
```

## Benefits

This transformer enables:
- **Full pipeline compatibility**: Works seamlessly with sklearn's modern pipeline infrastructure
- **Feature name tracking**: Maintains feature names through complex pipelines
- **Pandas integration**: Compatible with `set_output(transform="pandas")` for DataFrame outputs
- **Multi-label classification**: Essential for multi-label ML problems where samples have multiple labels
- **Feature engineering**: Useful for binarizing categorical list data in preprocessing pipelines

## References

- sklearn's MultiLabelBinarizer: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html
- SLEP007 (Feature Names API): https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep007/proposal.html
- Similar pattern in issue #88

## Technical Notes

**Why `get_feature_names_out` is critical:**
Modern sklearn pipelines (v1.0+) rely on this method for feature name propagation. Without it, the transformer:
- Cannot be used with `ColumnTransformer` verbose output
- Breaks `set_output(transform="pandas")` functionality  
- Prevents downstream feature importance analysis
- Is incompatible with model inspection tools

The implementation should follow sklearn's conventions: return a numpy array of strings, handle the optional `input_features` parameter, and generate meaningful names based on the binarized classes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `MultiLabelBinarizerTransformer` to New Transformers Module #89

Description

Motivation

Proposed Implementation

New Module Structure

Class: MultiLabelBinarizerTransformer

Implementation Checklist

Example Usage

Benefits

References

Technical Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add MultiLabelBinarizerTransformer to New Transformers Module #89

Description

Description

Motivation

Proposed Implementation

New Module Structure

Class: MultiLabelBinarizerTransformer

Implementation Checklist

Example Usage

Benefits

References

Technical Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Add `MultiLabelBinarizerTransformer` to New Transformers Module #89