Skip to content

Latest commit

 

History

History
28 lines (24 loc) · 965 Bytes

File metadata and controls

28 lines (24 loc) · 965 Bytes

gor-spark

Spark enabled GOR

GOR scalable through the Spark engine (https://spark.apache.org)

Checkout and build SparkGOR

git clone git@github.com:gorpipe/gor-spark.git
cd gor-spark
./gradlew clean installDist

Usage

Now you can use SparkSQL from within GOR

spark/build/install/gor-scripts/bin/gorpipe "select * from genes.gor limit 10"
spark/build/install/gor-scripts/bin/gorpipe "create xxx = select * from <(select * from genes.gor) where Gene_Symbol like 'B%'; gor [xxx] | top 10"

SDK usage

Scala demo: gorspark.scala

spark-shell --packages org.gorpipe:gor-spark:3.10.2 --exclude-packages "org.apache.logging.log4j:log4j-core,org.apache.logging.log4j:log4j-api" -I gorspark.scala

Python demo: gorspark.py

pyspark --packages org.gorpipe:gor-spark:3.10.2 --exclude-packages "org.apache.logging.log4j:log4j-core,org.apache.logging.log4j:log4j-api" -I gorspark.py