Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Apache StormCrawler is an open source collection of resources for building low-l

## Quickstart

NOTE: These instructions assume that you have [Apache Maven](https://maven.apache.org/install.html) installed. You will need to install [Apache Storm 2.8.4](http://storm.apache.org/) to run the crawler.
NOTE: These instructions assume that you have [Apache Maven](https://maven.apache.org/install.html) installed. You will need to install [Apache Storm 2.8.5](http://storm.apache.org/) to run the crawler.

StormCrawler requires Java 17 or above. To execute tests, it requires you to have a locally installed and working Docker environment.

Expand Down
14 changes: 7 additions & 7 deletions THIRD-PARTY.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ List of third-party dependencies grouped by their license type.
* Apache Commons Logging (commons-logging:commons-logging:1.3.5 - https://commons.apache.org/proper/commons-logging/)
* Apache Commons Math (org.apache.commons:commons-math3:3.6.1 - http://commons.apache.org/proper/commons-math/)
* Apache FontBox (org.apache.pdfbox:fontbox:3.0.5 - http://pdfbox.apache.org/)
* Apache Hadoop Client API (org.apache.hadoop:hadoop-client-api:3.4.2 - no url defined)
* Apache Hadoop Client Runtime (org.apache.hadoop:hadoop-client-runtime:3.4.2 - no url defined)
* Apache Hadoop Client API (org.apache.hadoop:hadoop-client-api:3.4.3 - no url defined)
* Apache Hadoop Client Runtime (org.apache.hadoop:hadoop-client-runtime:3.4.3 - no url defined)
* Apache HBase - Client (org.apache.hbase:hbase-client:2.6.4-hadoop3 - https://hbase.apache.org/hbase-build-configuration/hbase-client)
* Apache HBase - Common (org.apache.hbase:hbase-common:2.6.4-hadoop3 - https://hbase.apache.org/hbase-build-configuration/hbase-common)
* Apache HBase - Hadoop Compatibility (org.apache.hbase:hbase-hadoop-compat:2.6.4-hadoop3 - https://hbase.apache.org/hbase-build-configuration/hbase-hadoop-compat)
Expand Down Expand Up @@ -254,14 +254,14 @@ List of third-party dependencies grouped by their license type.
* rome (com.rometools:rome:2.1.0 - http://rometools.com/rome)
* rome-utils (com.rometools:rome-utils:2.1.0 - http://rometools.com/rome-utils)
* server (org.opensearch:opensearch:2.19.5 - https://github.com/opensearch-project/OpenSearch.git)
* Shaded Deps for Storm Client (org.apache.storm:storm-shaded-deps:2.8.4 - https://storm.apache.org/storm-shaded-deps)
* Shaded Deps for Storm Client (org.apache.storm:storm-shaded-deps:2.8.5 - https://storm.apache.org/storm-shaded-deps)
* SnakeYAML (org.yaml:snakeyaml:2.6 - https://bitbucket.org/snakeyaml/snakeyaml)
* snappy-java (org.xerial.snappy:snappy-java:1.1.10.4 - https://github.com/xerial/snappy-java)
* sniffer (org.opensearch.client:opensearch-rest-client-sniffer:2.19.5 - https://github.com/opensearch-project/OpenSearch.git)
* SparseBitSet (com.zaxxer:SparseBitSet:1.3 - https://github.com/brettwooldridge/SparseBitSet)
* storm-autocreds (org.apache.storm:storm-autocreds:2.8.4 - https://storm.apache.org/external/storm-autocreds)
* Storm Client (org.apache.storm:storm-client:2.8.4 - https://storm.apache.org/storm-client)
* storm-hdfs (org.apache.storm:storm-hdfs:2.8.4 - https://storm.apache.org/external/storm-hdfs)
* storm-autocreds (org.apache.storm:storm-autocreds:2.8.5 - https://storm.apache.org/external/storm-autocreds)
* Storm Client (org.apache.storm:storm-client:2.8.5 - https://storm.apache.org/storm-client)
* storm-hdfs (org.apache.storm:storm-hdfs:2.8.5 - https://storm.apache.org/external/storm-hdfs)
* swagger-annotations-jakarta (io.swagger.core.v3:swagger-annotations-jakarta:2.2.22 - https://github.com/swagger-api/swagger-core/modules/swagger-annotations-jakarta)
* T-Digest (com.tdunning:t-digest:3.2 - https://github.com/tdunning/t-digest)
* urlfrontier-API (com.github.crawler-commons:urlfrontier-API:2.5 - https://github.com/crawler-commons/url-frontier/urlfrontier-API)
Expand Down Expand Up @@ -341,7 +341,7 @@ List of third-party dependencies grouped by their license type.

* Angus Activation Registries (org.eclipse.angus:angus-activation:2.0.2 - https://github.com/eclipse-ee4j/angus-activation/angus-activation)
* istack common utility code runtime (com.sun.istack:istack-commons-runtime:4.1.2 - https://projects.eclipse.org/projects/ee4j/istack-commons/istack-commons-runtime)
* Jakarta XML Binding API (jakarta.xml.bind:jakarta.xml.bind-api:4.0.4 - https://github.com/jakartaee/jaxb-api/jakarta.xml.bind-api)
* Jakarta XML Binding API (jakarta.xml.bind:jakarta.xml.bind-api:4.0.5 - https://github.com/jakartaee/jaxb-api/jakarta.xml.bind-api)
* JavaBeans Activation Framework (com.sun.activation:jakarta.activation:1.2.1 - https://github.com/eclipse-ee4j/jaf/jakarta.activation)
* JavaBeans Activation Framework API jar (jakarta.activation:jakarta.activation-api:1.2.1 - https://github.com/eclipse-ee4j/jaf/jakarta.activation-api)
* JAXB Core (org.glassfish.jaxb:jaxb-core:4.0.5 - https://eclipse-ee4j.github.io/jaxb-ri/)
Expand Down
2 changes: 1 addition & 1 deletion archetype/src/main/resources/archetype-resources/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Have a look at the code and resources and modify them to your heart's content.

## Native

You need to install Apache Storm. The instructions on [setting up a Storm cluster](https://storm.apache.org/releases/2.8.4/Setting-up-a-Storm-cluster.html) should help.
You need to install Apache Storm. The instructions on [setting up a Storm cluster](https://storm.apache.org/releases/2.8.5/Setting-up-a-Storm-cluster.html) should help.
You also need to have an instance of URLFrontier running. See [the URLFrontier README](https://github.com/crawler-commons/url-frontier/tree/master/service); the easiest way is to use Docker, like so:

```
Expand Down
2 changes: 1 addition & 1 deletion archetype/src/main/resources/archetype-resources/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ under the License.
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<stormcrawler.version>${project.version}</stormcrawler.version>
<storm.version>2.8.4</storm.version>
<storm.version>2.8.5</storm.version>
<urlfrontier.version>2.4</urlfrontier.version>
</properties>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ under the License.
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<stormcrawler.version>${StormCrawlerVersion}</stormcrawler.version>
<storm.version>2.8.4</storm.version>
<storm.version>2.8.5</storm.version>
</properties>

<build>
Expand Down
2 changes: 1 addition & 1 deletion external/solr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ You'll be asked to enter a groupId (e.g. com.mycompany.crawler), an artefactId (

This will not only create a fully formed project containing a POM with the dependency above but also a set of resources, configuration files and sample topology classes. Enter the directory you just created (should be the same as the artefactId you specified earlier) and follow the instructions on the README file.

You will of course need to have both Apache Storm (2.8.4) and Apache Solr (9.8.0) installed.
You will of course need to have both Apache Storm (2.8.5) and Apache Solr (9.8.0) installed.

Official references:
* [Apache Storm: Setting Up a Development Environment](https://storm.apache.org/releases/current/Setting-up-development-environment.html)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
This has been generated by the StormCrawler Maven Archetype as a starting point for building your own crawler with [Apache Solr](https://solr.apache.org/) as a backend.
Have a look at the code and resources and modify them to your heart's content.

You need to have Apache Storm (2.8.4) installed, as well as a running instance of Apache Solr (9.8.0).
You need to have Apache Storm (2.8.5) installed, as well as a running instance of Apache Solr (9.8.0).

## Generated resources

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ under the License.
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<stormcrawler.version>${StormCrawlerVersion}</stormcrawler.version>
<storm.version>2.8.4</storm.version>
<storm.version>2.8.5</storm.version>
</properties>

<build>
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ under the License.
<additionalparam>-Xdoclint:none</additionalparam>
<!-- dependency versions -->
<junit.version>6.0.3</junit.version>
<storm-client.version>2.8.4</storm-client.version>
<storm-client.version>2.8.5</storm-client.version>
<!-- Jackson's version should be in-line with the one in Storm -->
<jackson-annotations.version>2.20</jackson-annotations.version>
<jackson.version>2.20.1</jackson.version>
Expand Down