Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,4 @@ dmypy.json

# Pyre type checker
.pyre/
.jekyll-cache
18 changes: 15 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Modified from Makefile of CoverTree
# https://github.com/manzilzaheer/CoverTree
#
#
# Copyright (c) 2017 Manzil Zaheer All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
Expand All @@ -24,7 +24,7 @@ CLEAN_PROGS = $(subst $(CURR_DIR)/src/,clean-,$(SOURCES))

CTYPE = gcc

.PHONY: all dir compile $(SOURCES)
.PHONY: all dir compile $(SOURCES)

all: dir compile py

Expand All @@ -45,7 +45,7 @@ dir:
@echo Setting up directories
@mkdir -p $(BUILDDIR)
@mkdir -p dist


compile: $(SOURCES)

Expand All @@ -65,3 +65,15 @@ $(PROGS): % : $(CURR_DIR)/src/%/makefile
$(CLEAN_PROGS): clean-% : $(CURR_DIR)/src/%/makefile
rm -rf build/$(subst clean-,,$@)
rm -rf dist/$(subst clean-,,$@)


docs-build:
cp README.md homepage/_includes/
cd homepage && bundle exec jekyll build -d ../docs/build/html
sphinx-build -b html docs/source/ docs/build/html/documentation

docs-serve: docs-build
cd homepage && bundle exec jekyll serve -d ../docs/build/html --skip-initial-build

docs-deploy: docs-build
cd docs && make gh-deploy
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ make
pip install build
python -m build --wheel
# which can be used as:
# pip install --force dist/graphgrove-0.0.1-cp37-cp37m-linux_x86_64.whl
# pip install --force dist/graphgrove-0.0.1-cp37-cp37m-linux_x86_64.whl
```

## Examples

Toy examples of [clustering](examples/clustering.py), [DAG-structured clustering](examples/dag_clustering.py), and [nearest neighbor search](examples/nearest_neighbor_search.py) are available.
Toy examples of [clustering](https://github.com/nmonath/graphgrove/blob/main/examples/clustering.py), [DAG-structured clustering](https://github.com/nmonath/graphgrove/blob/main/examples/dag_clustering.py), and [nearest neighbor search](https://github.com/nmonath/graphgrove/blob/main/examples/nearest_neighbor_search.py) are available.

At a high level, incremental clustering can be done as:

Expand All @@ -57,7 +57,7 @@ cores=4
tree = gg.graph_builder.Cosine_SGTree(k=k, cores=cores)
# data_batches - generator of numpy matrices mini-batch-size by dim
for batch in data_batches:
tree.insert(batch) # or tree.insert_and_knn(batch)
tree.insert(batch) # or tree.insert_and_knn(batch)
```

## Algorithms Implemented
Expand All @@ -70,3 +70,9 @@ Clustering:
Nearest Neighbor Search:
* CoverTree: Alina Beygelzimer, Sham Kakade, and John Langford. "Cover trees for nearest neighbor." ICML. 2006.
* SGTree: SG-Tree is a new data structure for exact nearest neighbor search inspired from Cover Tree and its improvement, which has been used in the TerraPattern project. At a high level, SG-Tree tries to create a hierarchical tree where each node performs a "coarse" clustering. The centers of these "clusters" become the children and subsequent insertions are recursively performed on these children. When performing the NN query, we prune out solutions based on a subset of the dimensions that are being queried. This is particularly useful when trying to find the nearest neighbor in highly clustered subset of the data, e.g. when the data comes from a recursive mixture of Gaussians or more generally time marginalized coalscent process . The effect of these two optimizations is that our data structure is extremely simple, highly parallelizable and is comparable in performance to existing NN implementations on many data-sets. Manzil Zaheer, Guru Guruganesh, Golan Levin, Alexander Smola. [TerraPattern: A Nearest Neighbor Search Service](http://manzil.ml/res/Papers/2019_sgtree.pdf). 2019.

## Credits

Special thanks to the following contributors:

- Andrew Drozdov ([@mrdrozdov](https://github.com/mrdrozdov))
24 changes: 24 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

gh-deploy:
@make html
@ghp-import build/html -p -o -n

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
24 changes: 24 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Documentation is built using sphinx.

```
pip install sphinx
pip install sphinx-rtd-theme # Theme for "Read the Docs".
pip install ghp-import # For publishing to github pages.
pip install m2r2 # For importing markdown files (i.e. README.md).
```

Build documentation:

```
sphinx-build -b html docs/source/ docs/build/html/documentation
```

Deploy to github pages:

```
cd docs && make gh-deploy
```

Additional notes:

- It's recommended to write docstrings in Google style. https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html#google-vs-numpy
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
65 changes: 65 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.

import os
import sys

# Adding missing paths.
modules_path = os.path.abspath('../../.')
print(f'ADDING PATH: {modules_path}')
sys.path.insert(0, modules_path)

# Mock tricky modules.
autodoc_mock_imports = ['covertreec', 'llamac', 'sccc', 'sgtreec']


# -- Project information -----------------------------------------------------

project = 'graphgrove'
copyright = '2021, Nicholas Monath'
author = 'Nicholas Monath'

# The full version, including alpha/beta/rc tags
release = '0.0.11'


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['m2r2', 'sphinx.ext.napoleon', 'sphinx.ext.autodoc']

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']

root_doc = 'index'

html_extra_path = []
84 changes: 84 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
.. graphgrove documentation master file, created by
sphinx-quickstart on Mon Oct 4 14:13:47 2021.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Welcome to graphgrove's documentation!
======================================

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search.

Links
=====

`Github Project <https://github.com/nmonath/graphgrove>`_

`Project Home <../index.html>`_

Package Classes
===============

.. autoclass:: graphgrove.covertree.Node
:members:

.. autoclass:: graphgrove.covertree.NNS_L2
:members:

.. autoclass:: graphgrove.covertree.MIPS
:members:

.. autoclass:: graphgrove.covertree.MCSS
:members:

.. autoclass:: graphgrove.graph_builder.Index
:members:

.. autoclass:: graphgrove.graph_builder.Cosine_CoverTree
:members:

.. autoclass:: graphgrove.graph_builder.Cosine_SGTree
:members:

.. autoclass:: graphgrove.graph_builder.Cosine_SGTreeBeam
:members:

.. autoclass:: graphgrove.graph_builder.Cosine_FaissFlat
:members:

.. autoclass:: graphgrove.graph_builder.Cosine_FaissHNSW
:members:

.. autoclass:: graphgrove.llama.LLAMA
:members:

.. autoclass:: graphgrove.scc.Node
:members:

.. autoclass:: graphgrove.scc.Level
:members:

.. autoclass:: graphgrove.scc.SCC
:members:

.. autoclass:: graphgrove.sgtree.Node
:members:

.. autoclass:: graphgrove.sgtree.NNS_L2
:members:

.. autoclass:: graphgrove.sgtree.MIPS
:members:

.. autoclass:: graphgrove.sgtree.MCSS
:members:

.. autoclass:: graphgrove.vec_scc.Cosine_SCC
:members:


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
Loading