This is a functional, immutable implementation of the Leapfrog Triejoin algorithm in Clojure.
Most implementations of LFTJ are inherently imperative. The paper is a great read but I needed a version I couldn read and reason about as I worked.
Because the iterators are immutable, there is no need to implement atEnd() or up(), and the data structures fit nicely into the existing ecosystem. (Sequable, IReduceInit, etc.)
Current implementation is tested, documented, aimed to be simple to read and reason about. However, the implementation is also naive, allocation-heavy and the constant factors to the runtime are definitely suboptimal. Optimization for the above was a non-goal, so always choose the right implementation strategy for your use case.
The artifacts are not yet published, but you can use it with deps.edn as a git dependency.
This library defines two protocols: LinearIterator and TrieIterator to define the data the algorithm can work on.
Utility function sorted-iter can be used to produce a LinearIterator from an already sorted collection.
(def iter-1 (sorted-iter (sorted-set 1 2 3 4)))
(def iter-2 (sorted-iter (sorted-set 0 2 4 6 8)))Then, function leapfrog-join produces a new LinearIterator to visit elements present in both collections.
(leapfrog-join [iter-1 iter-2]) ;=> [2 4]Utility function trie-iterator can be used to produce a TrieIterator from a set of tuples.
Since most functions rely on a specific variable ordering to traverse the tries, they accept a configuration map with the following keys:
:variable-orderinga vector specifying the order of variables.:trie-iteratorholding theTrieIteratorintance.
(def trie1 (trie-iterator [[1 2 3] [1 2 4] ...]))
(def t1 {:variable-ordering [:a :b :c] :trie-iterator trie1})- Function
trie-joinconstructs the intersection of such tries using the LFTJ Algorithm. - Function
unionconstructs the iterator of the union of multiple tries as discussed in the LFTJ article. - Function
trie-antijoincomputes the antijoin (set difference) between two tries.
Most functions have a -iterator suffixed version also which directly returns a TrieIterator instead of the map form. There are also some helpers available when working with tries:
- Functions
bfsanddfsreturn a lazy seq of breadth-first and a depth-first walk over the supplied trie. - Function
trie-routesreturns a lazy seq of all paths from the root to the leaves of the trie. - Function
reorderchanges the variable ordering of a trie iterator. - Function
eagerconstructs a view of a trie iterator that skips incomplete branches.
Refer to triejoin_test.clj for common use cases and examples.
Contributions are welcome. Feel free to reach out by opening an Issue here or directly on Clojurians Slack or LinkedIn. When working on this repo, you can:
- Run tests with
clojure -M:test. - Generate API docs in Markdown:
clojure -T:build doc. - Try to improve test coverage.
- Add documentation.
Copyright (c) Janos Erdos. All rights reserved. The use and distribution terms for this software are covered by the Eclipse Public License 2.0 (https://www.eclipse.org/legal/epl-2.0/) which can be found in the file LICENSE.txt at the root of this distribution. By using this software in any fashion, you are agreeing to be bound by the terms of this license. You must not remove this notice, or any other, from this software.