-
Notifications
You must be signed in to change notification settings - Fork 18
Awesome Lucene
Recently, we had a an issue in one of our production environments which was causing the UI to be unusable for one of our main use cases. The issue was with the auto-complete stalling of a long time and not returning any data. We couldn't reproduce the exact issue in our test environment because the amount of data we store and process in in production environment is 100x more. To effectively reproduce the bug, I had to simulate the the situation with test. In this multi-part series, we shall dive into the awesome world of Lucene!
The autocomplete code in the API uses PATRICIA (Practical Algorithm to Retrieve Information Coded in Alphanumeric) Trie implementation to store the data and return results. A PATRICIA trie, also known a Radix Trie, is a compressed trie in which any node that is an only child is merged with its parent node. First step was to recreate a version of autocomplete implementation to isolate the problem.
private class AutoCompleteCacheLoader extends CacheLoader<String, PatriciaTrie<String>> {
@Override
public PatriciaTrie<String> load(String field){
return loadAll(field);
}
PatriciaTrie<String> loadAll(String field){
return loadTrie(field);
}
PatriciaTrie<String> loadTrie(String field){
PatriciaTrie<String> trie = new PatriciaTrie<String>();
// load 10M items into the Trie
for(int i=0;i<=10000000;i++){
switch(field.toLowerCase()){
case "ipaddress":
String ip = randomIpAddress();
trie.put(ip,ip );
break;
case "hostname":
String hostName = RandomStringUtils.randomAlphanumeric(10);
trie.put(hostName, hostName);
break;
case "username":
String userName = RandomStringUtils.randomAlphanumeric(10);
trie.put(userName, userName);
break;
case "macaddress":
String mac = randomMACAddress();
trie.put(mac, mac);
break;
default:
throw new IllegalArgumentException("Unknown field name: "+field);
}
}
return trie;
}
private String randomIpAddress(){
Random rand = new Random();
return String.format("%d.%d.%d.%d", rand.nextInt(256), rand.nextInt(256), rand.nextInt(256), rand.nextInt(256));
}
private String randomMACAddress(){
Random rand = new Random();
byte[] macAddr = new byte[6];
rand.nextBytes(macAddr);
macAddr[0] = (byte)(macAddr[0] & (byte)254); //zeroing last 2 bytes to make it unicast and locally adminstrated
StringBuilder sb = new StringBuilder(18);
for(byte b : macAddr){
if(sb.length() > 0)
sb.append(":");
sb.append(String.format("%02x", b));
}
return sb.toString();
}
@Override
public ListenableFuture<PatriciaTrie<String>> reload(final String key, PatriciaTrie<String> prevGraph) {
// asynchronous
ListenableFutureTask<PatriciaTrie<String>> task = ListenableFutureTask.create(new Callable<PatriciaTrie<String>>() {
@Override
public PatriciaTrie<String> call() throws Exception {
return loadTrie(key);
}
});
Executors.newSingleThreadExecutor().execute(task);
return task;
}
}Now, I need to create the test that will exercise the code and instrument to calculate the retrieval time. For example
private List<String> getAllCopy(String incomingString, String fieldName, LoadingCache<String, PatriciaTrie<String>> cache)
throws Exception{
List<String> list = Lists.newArrayList();
fieldName = fieldName.toLowerCase();
PatriciaTrie<String> trie = cache.get(fieldName);
Map<String, String> map = trie.prefixMap(incomingString.toLowerCase());
list.addAll(map.values());
return list;
}
@Test
public void testTrieAllCopy() throws Exception{
PatriciaTrie<String> trie = cache.get("ipAddress");
assertNotNull(trie);
assertNotEquals(0, trie.size());
//prime the cache
long start = System.nanoTime();
Collection<String> all = getAllCopy("1", "ipAddress", cache);
long stop = System.nanoTime();
System.out.println("Time taken for fetching "+Integer.toString(all.size())+" records: "+Long.toString((stop-start)/(1000))+" micro secs (No Priming)");
} Running the test reproduced the problem we were experiencing in production.
The code was not priming the trie before measuring the speed. Measuring the performance in this manner is usually considered a bad practice. So, added the code to measure the average time taken for fetching the data after priming.
@Test
public void testTrieAllCopy() throws Exception{
PatriciaTrie<String> trie = cache.get("ipAddress");
assertNotNull(trie);
assertNotEquals(0, trie.size());
//prime the cache
long start = System.nanoTime();
Collection<String> all = getAllCopy("1", "ipAddress", cache);
long stop = System.nanoTime();
System.out.println("Time taken for fetching "+Integer.toString(all.size())+" records: "+Long.toString((stop-start)/(1000))+" micro secs (No Priming)");
start = System.nanoTime();
for(int i=0; i<100; i++) {
all = getAllCopy("1", "ipAddress", cache);
}
stop = System.nanoTime();
assertNotEquals(0,all.size());
System.out.println("Average Time taken for fetching "+Integer.toString(all.size())+" records: "+Long.toString((stop-start)/(1000*100))+" micro secs (Priming)");
}Running the test after priming and calculating the average time taken for retrieval yielded pretty identical results as shown below. So, there should be something in the code that is contributing to the issue.
| Iteration | Copy (micro secs) | Copy (Priming) (micro secs) |
|---|---|---|
| 1 | 1602699 | 1677258 |
| 2 | 1609670 | 1663935 |
| 3 | 1519785 | 1638693 |
| 4 | 1587678 | 1713244 |
| 5 | 1521048 | 1705601 |
| 6 | 1552100 | 1653090 |
| 7 | 1571336 | 1661922 |
| 8 | 1674482 | 1571964 |
| 9 | 2360497 | 1759636 |
| 10 | 1700609 | 1830021 |
Inspecting the code closer, I noticed a copy operation that might be responsible for the some of the time taken. What if the copy operation is eliminated? How would the performance look like?
private Collection<String> getAllNoCopy(String incomingString, String fieldName, LoadingCache<String, PatriciaTrie<String>> cache)
throws Exception{
fieldName = fieldName.toLowerCase();
PatriciaTrie<String> trie = cache.get(fieldName);
Map<String, String> map = trie.prefixMap(incomingString.toLowerCase());
return map.values();
}
@Test
public void testTrieNoCopy() throws Exception{
PatriciaTrie<String> trie = cache.get("ipAddress");
assertNotNull(trie);
assertNotEquals(0, trie.size());
//prime the cache
long start = System.nanoTime();
Collection<String> all = getAllNoCopy("1", "ipAddress", cache);
long stop = System.nanoTime();
System.out.println("Time taken for fetching "+Integer.toString(all.size())+" records: "+Long.toString((stop-start)/(1000))+" micro secs (No Priming)");
start = System.nanoTime();
for(int i=0; i<100; i++) {
all = getAllNoCopy("1", "ipAddress", cache);
}
stop = System.nanoTime();
assertNotEquals(0,all.size());
System.out.println("Average Time taken for fetching "+Integer.toString(all.size())+" records: "+Long.toString((stop-start)/(1000*100))+" micro secs (Priming)");
}Running the above test produced the following results.
| Iteration | Copy (micro secs) | Copy (Priming) (micro secs) | No Copy (micro secs) | No Copy (Priming) (micro secs) |
|---|---|---|---|---|
| 1 | 1602699 | 1677258 | 4622 | 13 |
| 2 | 1609670 | 1663935 | 3633 | 13 |
| 3 | 1519785 | 1638693 | 2611 | 13 |
| 4 | 1587678 | 1713244 | 5157 | 12 |
| 5 | 1521048 | 1705601 | 3287 | 12 |
| 6 | 1552100 | 1653090 | 1738 | 12 |
| 7 | 1571336 | 1661922 | 1618 | 13 |
| 8 | 1674482 | 1571964 | 3767 | 15 |
| 9 | 2360497 | 1759636 | 10579 | 14 |
| 10 | 1700609 | 1830021 | 16134 | 12 |
| 1669990.4 | 1687536.4 | 5314.6 | 12.9 |
Ok, the time taken for retrieval has been reduced from 1.7 secs to 500 milli seconds! A good start. But the nature of auto-complete is such that the first invocation of the method must return within 500 ms.
Furthermore, it turns out copying of the data into a different structure before returning, does contribute to performance in the first invocation. For every subsequent invocation the difference is quite noticeable - 500 milli secs vs 12 micro secs. It appears that PatriciaTrie is caching the data.
The other issue with this specific Patricia Trie implementation is that there is no way to fetch the top 10 results. So, we had to find another way. Enter Lucene!
Apache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java. Lucene is the storage engine for full text search engines such as ElasticSearch and Solr.
Lucene, provides easy to use APIs to store and retrieve data for high performance search applications.
To build a Lucene index in memory all we have to do is:
- Create a
Directory - Open the
IndexWriter - Load the strings into the index.
That SIMPLE!
private long initLuceneIndex() throws Exception{
Analyzer analyzer = new StandardAnalyzer();
// Store the index in memory:
Directory directory = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
long numDocs = loadLuceneIndex(iwriter);
System.out.println("Number of docs inserted: "+ numDocs);
iwriter.close();
}To search the Lucene index all we have to do is:
- Open the directory for reading using
DirectoryReader.open - Create an IndexSearcher
- Create a query
- Search the index
Can it get any more simpler? I don't think so.
private long searchIndex(){
DirectoryReader ireader = DirectoryReader.open(directory);
System.out.println("Number of docs read: "+ ireader.numDocs());
IndexSearcher isearcher = new IndexSearcher(ireader);
// create the query for searching the index
// here we are search for all ipAddress starting with
RegexpQuery regexpQuery = new RegexpQuery(new Term( "ipAddress", ".*"+"1"+".*"));
long start = System.nanoTime();
ScoreDoc[] hits = iSearcher.search(regexpQuery, 10).scoreDocs;
for(ScoreDoc hit : hits){
Document hitDoc = iSearcher.doc(hit.doc);
System.out.println("Doc: "+ hitDoc.toString());
assertTrue(hitDoc.get("ipAddress").contains("1"));
}
}Here are supporting methods for loading the Lucene index.
private long loadLuceneIndex(IndexWriter writer) throws Exception {
long numDocs = 0;
for(int i=0;i<=10000000;i++){
String ip = randomIpAddress();
String hostName = RandomStringUtils.randomAlphanumeric(10);
String macAddress = randomMACAddress();
String userName = RandomStringUtils.randomAlphanumeric(10);
Document doc = new Document();
doc.add(new Field("ipAddress", ip, StringField.TYPE_STORED));
doc.add(new Field("hostName", hostName, StringField.TYPE_STORED));
doc.add(new Field("macAddress", macAddress, StringField.TYPE_STORED));
doc.add(new Field("userName", userName, StringField.TYPE_STORED));
writer.addDocument(doc);
numDocs++;
}
return numDocs;
}
private String randomIpAddress(){
Random rand = new Random();
return String.format("%d.%d.%d.%d", rand.nextInt(256), rand.nextInt(256), rand.nextInt(256), rand.nextInt(256));
}
private String randomMACAddress(){
Random rand = new Random();
byte[] macAddr = new byte[6];
rand.nextBytes(macAddr);
macAddr[0] = (byte)(macAddr[0] & (byte)254); //zeroing last 2 bytes to make it unicast and locally adminstrated
StringBuilder sb = new StringBuilder(18);
for(byte b : macAddr){
if(sb.length() > 0)
sb.append(":");
sb.append(String.format("%02x", b));
}
return sb.toString();
}
So far so good, but the performance numbers are going what we are after. Lets run a performance test.
TODO add the performance numbers
Can we do better? Yes of course. In the next part of the blog series, we shall talk about ways to extract even better performance.