Conversation
|
Hi @dpol1 thanks for the PR ! Can you check the pipeline status? Guess it needs auto formatting. Thanks |
|
Oops, my bad 😅 |
956a76f to
721d892
Compare
rzo1
left a comment
There was a problem hiding this comment.
For me, it looks good but I am not deep into SOLR.
|
Thanks for the PR, @dpol1 |
| "--name", | ||
| collectionName, | ||
| "-n", | ||
| "--conf-name", |
There was a problem hiding this comment.
Some options (like --conf-name) are not documented in the Solr Control Script Reference, but are shown when running bin/solr create --help so I guess are ok to use.
external/solr/src/main/java/org/apache/stormcrawler/solr/SolrConnection.java
Outdated
Show resolved
Hide resolved
external/solr/src/main/java/org/apache/stormcrawler/solr/SolrConnection.java
Outdated
Show resolved
Hide resolved
- Drop pseudo-async CompletableFuture.runAsync() around blocking cloudClient.request(). - Use NIO-based LBAsyncSolrClient.requestAsync() for Solr 10. - Route writes via LBSolrClient.Endpoint built from leader baseUrl/coreName. - Access CloudHttp2SolrClient#getLbClient() via explicit cast, with a note on potential SolrJ API changes.
|
Thanks for the changes @dpol1. Unfortunately I'm currently sick, so I'll be able to review earliest tomorrow. |
external/solr/src/main/java/org/apache/stormcrawler/solr/SolrConnection.java
Outdated
Show resolved
Hide resolved
external/solr/src/main/java/org/apache/stormcrawler/solr/SolrConnection.java
Show resolved
Hide resolved
external/solr/src/main/java/org/apache/stormcrawler/solr/SolrConnection.java
Show resolved
Hide resolved
|
Thanks for the patience 😇. @dpol1 I've left some more comments. |
|
I tried to run from this branch using a local Solr cluster in Linux. With Storm 2.8.2, I was getting the following exception: Switched to Storm 2.8.5 and then this was gone but was getting the following instead: This was because of imports Then another Jetty related exception causing runtime issues: I found this was because Storm 2.8.5 brings Jetty 12.1.6 and Solr depends on Jetty 12.0.27. After all these changes I managed to run and didn't notice any unexpected behavior while stopping and restarting one of the two Solr nodes :-). |
Thanks for tracking this down and documenting all the steps. Pinning Jetty to The more robust fix (imho) would be to shade the Solr client with its own relocated Jetty (e.g. under If we keep the pinning approach, it would be worth adding a comment in the POM to make the coupling explicit: otherwise it's easy to miss during a future Storm upgrade. The StringUtils stuff is fixed in #1847 . |
|
Regarding shading: We are doing something similar in TomEE: https://github.com/apache/tomee/blob/main/deps/commons-dbcp2-shade/pom.xml so might be an inspiration. |
2b89d34 to
e73adbf
Compare
I'll proceed with the shading-based approach - I'll still be able to implement it in that PR. However, I'll open a ticket to document this - is that what you had in mind, @rzo1? Or should we refine it later?
Agreed, good practice for both |
- Adjusts Solr command-line arguments in scripts and tests to use the new long-form syntax (e.g., --conf-name, --zk-host). - Introduces Maven shading for Jetty classes to resolve runtime version conflicts with Apache Storm. This prevents NoSuchMethodError by relocating Solr's Jetty dependencies, allowing coexistence with Storm's bundled Jetty version. FIxes apache#1859
|
Yep can be done here. The Issue is great, so it appears in the changelog. |
There was a problem hiding this comment.
The shade plugin configuration has two issues that will prevent it from achieving the goal of embedding Jetty with no transitive Jetty deps:
- createDependencyReducedPom must be true
Currently set to false, which means the published POM still declares org.eclipse.jetty:* and org.eclipse.jetty.http2:* as transitive dependencies. Consumers will pull in both the shaded
Jetty (inside the jar) and the original Jetty jars defeating the purpose.
- solr-solrj-jetty must be included in the artifactSet
solr-solrj-jetty provides HttpJettySolrClient and ConcurrentUpdateJettySolrClient, which reference org.eclipse.jetty classes directly. Since it is not in the artifactSet, its classes are
not pulled into the shaded jar and its bytecode is not rewritten. At runtime it will try to load the original (unshaded) org.eclipse.jetty.* either causing ClassNotFoundException or hitting Storm's conflicting Jetty 12.1.x.
The two wildcard includes already cover all Jetty artifacts in the dependency tree (solr-solrj and solr-solrj-zookeeper have zero Jetty transitive deps), so no additional includes are
needed beyond solr-solrj-jetty.
--- a/external/solr/pom.xml
+++ b/external/solr/pom.xml
@@ -59,7 +59,8 @@
<createSourcesJar>true</createSourcesJar>
<useBaseVersion>true</useBaseVersion>
- <createDependencyReducedPom>false</createDependencyReducedPom>
+ <createDependencyReducedPom>true</createDependencyReducedPom>
<artifactSet>
<includes>
+ <include>org.apache.solr:solr-solrj-jetty</include>
<include>org.eclipse.jetty:*</include>
<include>org.eclipse.jetty.http2:*</include>
There was a problem hiding this comment.
Thanks for the feedback, now should be fine, changed!
- Adjusts Solr command-line arguments in scripts and tests to use the new long-form syntax (e.g., --conf-name, --zk-host). - Introduces Maven shading for Jetty classes to resolve runtime version conflicts with Apache Storm. This prevents NoSuchMethodError by relocating Solr's Jetty dependencies, allowing coexistence with Storm's bundled Jetty version. FIxes apache#1859
9835b91 to
c71b174
Compare
Fixes #1831