Daphne is a Python library designed to streamline data processing and preparation tasks. It includes tools for loading, cleaning, normalizing, and splitting datasets, as well as evaluating and managing them. Daphne is intended for data scientists and engineers who need a consistent and efficient way to handle datasets in their projects.
- Data Loading: Load data from various formats (CSV, JSON, Parquet, Excel).
- Data Processing: Clean, normalize, impute, and encode data with ease.
- Dataset Preparation: Split and balance datasets for training, validation, and testing.
- Dataset Evaluation: Analyze and visualize data distributions, correlations, and mutual information.
- Parallel Processing: Accelerate data processing tasks using
concurrent.futures. - Resource Management: Monitor memory usage and offload data to disk when needed.
- Python 3.10+
- pandas, numpy, scikit-learn, matplotlib, seaborn, psutil
From the latest release
pip install Daphne-0.1.0-py3-none-any.whlgit clone https://github.com/Arkonova/Daphne.git
cd Daphne
pip install -e .from Daphne import DataLoader, DataProcessor, DatasetPreparator, DatasetEvaluator
# Load the data
df = DataLoader.load_data('data.csv')
# Clean and process the data
processor = DataProcessor()
df = processor.clean_data(df)
df = processor.impute_missing_data(df)
df = processor.normalize_data(df)
df = processor.encode_categorical_data(df, ['category'])
# Split and evaluate the data
preparator = DatasetPreparator()
X_train, X_val, X_test, y_train, y_val, y_test = preparator.split_data(df, 'target')
evaluator = DatasetEvaluator()
evaluator.plot_distribution(df, 'target')
evaluator.check_correlations(df)from Daphne import ParallelProcessor
def double_values(df):
df['feature'] = df['feature'] * 2
return df
processor = ParallelProcessor()
result = processor.apply_parallel(df, double_values, num_partitions=4)from Daphne import ResourceManager
manager = ResourceManager(max_memory_usage=0.75)
if manager.check_memory():
manager.free_up_memory(df)Full documentation is available in the docs directory, including installation guides, usage examples, and API reference.
Daphne is licensed under the Apache 2.0 License. You are free to use, modify, and distribute this software under the terms of this license.
