Daphne

Overview

Daphne is a Python library designed to streamline data processing and preparation tasks. It includes tools for loading, cleaning, normalizing, and splitting datasets, as well as evaluating and managing them. Daphne is intended for data scientists and engineers who need a consistent and efficient way to handle datasets in their projects.

Features

Data Loading: Load data from various formats (CSV, JSON, Parquet, Excel).
Data Processing: Clean, normalize, impute, and encode data with ease.
Dataset Preparation: Split and balance datasets for training, validation, and testing.
Dataset Evaluation: Analyze and visualize data distributions, correlations, and mutual information.
Parallel Processing: Accelerate data processing tasks using concurrent.futures.
Resource Management: Monitor memory usage and offload data to disk when needed.

Requirements

Python 3.10+
pandas, numpy, scikit-learn, matplotlib, seaborn, psutil

Installation

From the latest release

pip install Daphne-0.1.0-py3-none-any.whl

From source

git clone https://github.com/Arkonova/Daphne.git
cd Daphne
pip install -e .

Usage

from Daphne import DataLoader, DataProcessor, DatasetPreparator, DatasetEvaluator

# Load the data
df = DataLoader.load_data('data.csv')

# Clean and process the data
processor = DataProcessor()
df = processor.clean_data(df)
df = processor.impute_missing_data(df)
df = processor.normalize_data(df)
df = processor.encode_categorical_data(df, ['category'])

# Split and evaluate the data
preparator = DatasetPreparator()
X_train, X_val, X_test, y_train, y_val, y_test = preparator.split_data(df, 'target')

evaluator = DatasetEvaluator()
evaluator.plot_distribution(df, 'target')
evaluator.check_correlations(df)

Parallel processing

from Daphne import ParallelProcessor

def double_values(df):
    df['feature'] = df['feature'] * 2
    return df

processor = ParallelProcessor()
result = processor.apply_parallel(df, double_values, num_partitions=4)

Memory management

from Daphne import ResourceManager

manager = ResourceManager(max_memory_usage=0.75)
if manager.check_memory():
    manager.free_up_memory(df)

Documentation

Full documentation is available in the docs directory, including installation guides, usage examples, and API reference.

License

Daphne is licensed under the Apache 2.0 License. You are free to use, modify, and distribute this software under the terms of this license.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Daphne		Daphne
docs		docs
tests		tests
.gitignore		.gitignore
README.md		README.md
conftest.py		conftest.py
daphne_logo.png		daphne_logo.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Daphne

Overview

Features

Requirements

Installation

From the latest release

From source

Usage

Parallel processing

Memory management

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Daphne

Overview

Features

Requirements

Installation

From the latest release

From source

Usage

Parallel processing

Memory management

Documentation

License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages