About Static NG-DBSCAN

Inside "Static NG-DBSCAN Code" folder, all the files to run are present.
While plotting 2 and 3 dimensional datasets, try changing epsilon values to get better cluster quality.
Images folder contains some of the clusters generated.
For synthetic datasets, we used moon, blob and circular datasets.

Instructions to run the Static NG-DBSCAN Code:

Generate Synthetic Datasets

- Use dataset_generator.py. 

- Enter number of points (or number of sentences in case of text dataset) and type of dataset.(use upto 10000 for fast running).

- Run the following command and the dataset will be generated in points.txt file:
	- python3 dataset_generator.py

Run the Static NG-DBSCAN Algorithm

- Run the main algorithm (NG-DBSCAN) using command: 
	- g++ phase2.cpp
	- ./a.out

- It will create clusters.txt which contains all the clusters. 

- Now to plot the clusters (for 2 and 3 dimensional datasets), use the following command:
	- python3 clusters_generator.py

Calculate metrics and compare results of NG-DBSCAN and DBSCAN

- We calculated these NG-DBSCAN metrics for comparison with DBSCAN: compactness, separation, recall. Use the following command for calculating NG-DBSCAN metrics. 
	- g++ ngdbscan_metrics_main.cpp
	- ./a.out

- Run DBSCAN algorithm (for comparison with NG-DBSCAN) over points.txt using dbscan_code.py using the following command:
	- python3 dbscan_code.py

- We calculated the DBSCAN metrics: compactness, separation, recall. Use the following command for calculating DBSCAN metrics.
	- g++ dbscan_metrics_main.cpp
	- ./a.out

- Using this we can compare between DBSCAN and NG-DBSCAN.

Code Files for Static NG-DBSCAN

- Here are the different types of files used in static NG-DBSCAN:

Header Files

- classes.h - contains all the used classes in the algorithm.

C++ Files

- phase1.cpp - contains the phase-1 code which will be used to create epsilon graph.

- phase2.cpp - used to create the propagation tree and list of clusters.

- ngdbscan_metric_main.cpp - main method to calculate metrics for NG-DBSCAN.

- metric_functions.cpp - contains Functions to contain metrics.

- dbscan_metrics_main.cpp - main method to calculate metrics for DBSCAN.

- jaro_winkler_distance.cpp: it calculates Jaro Winkler Distance between two strings.

Python Files

- dbscan_code.py - main method to create the DBSCAN clusters and numbered_dbscan_clusters.txt file.

- dataset_generator.py - contains the python code to generate the random points.

- clusters_generator.py - plots the clusters in 2-dimension in different colours using clusters.

Text Files

- epsilon_graph.txt - represents the epsilon graph used in the algorithm.

- propagation_tree.txt - represents the propagation tree generated by the algorithm.

- points.txt - contains the randomly generated points used as input in phase2.cpp.

- clusters.txt - contains the lists of list of clusters.

- numbered_clusters.txt - contains the lists of list of clusters, where elements of list are node number instead of data values.

- numbered_dbscan_clusters.txt - contains all the points of DBSCAN, where ith point represents in which cluster number it lies.

- health_twitter.txt & sms_spam_collection.txt - text datasets

- Traffic_Dataset/points.txt - real dataset

Clusters Generated With 3-D Points

About Incremental NG-DBSCAN

Inside "Incremental NG-DBSCAN Code" folder, all the files to run are present.
Try to tune the parameters while running on different datasets to get more accuracy.
Inside each dataset of 2 dimension, images of cluster plotting are also present.
For synthetic datasets, we used moon, blob and circular datasets.
For clusters generation wait for 4 seconds for the image to disappear.

Instructions to run the Incremental NG-DBSCAN Code:

To run all datasets in automated version:

Automated Running

- Run the following command in terminal:
	- make

To run each dataset seperately:

Follow the below rules step wise for each dataset to get the results.

Generate and Plot the already existing dataset (which is considered as database in real life)

- First put the points.txt, queries.txt, parameters_incr, parameters_static for the dataset you want the algorithm to run on.

- Then first run the static version without considering the queries.txt using the following command. Here First argument = 1 to copy the value of parameters. Second argument = 0 to indicate that it will not use queries.txt. 
	- g++ static_ngdbscan.cpp 
	- ./a.out 1 0

- It will create old_epsilon_graph.txt, old_clusters.txt, old_points.txt and this will constitute our database.

- To plot the clusters for the points present in database, we will run the following command. Here <image_name> refers to the image name for saving that image.
	- python3 clusters_generator.py 0 <image_name>

Run and Plot the Incremental NG-DBSCAN algorithm over the existing dataset

- Now to run the queries.txt over incremental version, use the following command. Here First argument = 1 to copy the value of parameters.
	- g++ incr_ngdbscan.cpp 
	- ./a.out 1

- After this, new_epsilon_graph.txt, new_clusters.txt, new_points.txt will be generated.

- To plot the clusters for the points after considering queries.txt, we will run the following command. Here <image_name> refers to the image name for saving that image.
	- python3 clusters_generator.py 1 <image_name>

Run and Plot the Static NG-DBSCAN algorithm over the existing dataset (for comparison with incremental version)

- Now to run the queries.txt over incremental version (for comparison with incremental version), use the following command. Here First argument = 1 to copy the value of parameters. Second argument = 1 to indicate that it will use queries.txt. 
	- g++ static_ngdbscan.cpp
	- ./a.out 1 1

- After this, new_epsilon_graph.txt, new_clusters.txt, new_points.txt will be generated.

- To plot the clusters for the points after considering queries.txt, we will run the following command:
	- python3 clusters_generator.py 1

Calculate metrics on Incremental and Static Versions for comparison

- To calculate and store metrics on the static version, we will use the following command:
	- g++ metrics_main.cpp 
	- ./a.out 0

- To calculate and store metrics on the incremental version, we will use the following command:
	- g++ metrics_main.cpp 
	- ./a.out 1

Code Files for Incremental NG-DBSCAN

- Here are the different types of files used in incremental NG-DBSCAN:

Header Files

- incr_classes.h - contains all used classes in the incremental NG-DBSCAN algorithm.

C++ Files

- incr_ngdbscan.cpp - contains the whole incremental NG-DBSCAN algorithm code.

- static_ngdbscan.cpp - contains the main function for running the static NG-DBSCAN code while comparing with static NG-DBSCAN algorithm.

- metrics_main.cpp - main method to calculate metrics for incremental and static NG-DBSCAN.

- metrics_functions.cpp - contains all the main functions to calculate metrics.

- resources_calculation.cpp - contains the main functions to calculate time, memory, CPU usage etc for static and incremental versions.

Python Files

- clusters_generator.py - used to plot the clusters of static and incremental NG-DBSCAN versions.

- plot_metric.py - used to plot the graphs of the used metrics.

Text Files

- points.txt - contains the database points (clusters would be formed using these points in real scenario) used as input in static_ngdbscan.cpp and incr_ngdbscan.cpp.

- queries.txt - contains the points to add or delete from the existing dataset (points.txt).

- old_epsilon_graph.txt - contains the epsilon graph for the old dataset.

- new_epsilon_graph.txt - contains the epsilon graph for the updated dataset (after considering queries.txt).

- old_points.txt - contains the information of each point in the old dataset.

- new_points.txt - contains the information of each point in the updated dataset (after considering queries.txt).

- old_clusters.txt - contains the clusters for the old dataset.

- new_clusters.txt - contains the clusters for the new dataset (after considering queries.txt).

- parameters.txt - contains the parameters for the respective dataset for both static & incremental algorithm in readable format.

- parameters_incr - contains the parameters for the respective dataset for the incremental NG-DBSCAN algorithm.

- parameters_static - contains the parameters for the respective dataset for the static NG-DBSCAN algorithm.

- memory*.txt - Files of this format contains in each line: <no_of_points> <virtual_memory_used> <physical_memory_used>.

- time*.txt - Files of this format contains in each line: <no_of_points> <time_taken>.

- cpu*.txt - Files of this format contains in each line: <no_of_points> <cpu_usage>.

- seperation*.txt - These files contains the number of points on the first line and then mean and standard deviation on the subsequent line for all pair of clusters.

- compactness*.txt - These files contains the number of points on the first line and then mean and standard deviation on the subsequent line for all pair of clusters.

Clusters Generated With 3-D Points

Initial clustering of dataset points:
Final clustering after applying queries through static version:
Final clustering after applying queries through incremental version:

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
Course Reports		Course Reports
Incremental NG-DBSCAN Code		Incremental NG-DBSCAN Code
Related Research Papers		Related Research Papers
Static NG-DBSCAN Code		Static NG-DBSCAN Code
README.md		README.md
mail_to_authors.txt		mail_to_authors.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About Static NG-DBSCAN

Instructions to run the Static NG-DBSCAN Code:

Clusters Generated With 3-D Points

About Incremental NG-DBSCAN

Instructions to run the Incremental NG-DBSCAN Code:

To run all datasets in automated version:

To run each dataset seperately:

Clusters Generated With 3-D Points

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About Static NG-DBSCAN

Instructions to run the Static NG-DBSCAN Code:

Clusters Generated With 3-D Points

About Incremental NG-DBSCAN

Instructions to run the Incremental NG-DBSCAN Code:

To run all datasets in automated version:

To run each dataset seperately:

Clusters Generated With 3-D Points

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages