CROSSDATA

Crossdata is a distributed framework and a fast and general-purpose computing system powered by Apache Spark. It unifies the interaction with different sources supporting multiple datastore technologies thanks to its generic architecture and a custom SQL-like language using SparkSQL as the core of the project. In addition, Crossdata supports batch and streaming processing so that you can mix data from both input technologies. Supporting multiple architectures imposes two main challenges: how to normalize the access to the datastores, and how to cope with datastore limitations. Crossdata provides connectors which can access to multiple datastores natively, speeding up the queries by avoiding the overhead and the block of resources of the Spark Cluster when possible.

This repository converts some modules of Crossdata project in a package ready to be deployed as a Spark package.

For more information, visit:

CROSSDATA AS A SPARK PACKAGE

If you want to use crossdata as a Spark Package into your Spark distribution, just follow these steps:

> mvn clean install -DskipITs -DskipUTs
> mvn package -Ppackage -DskipITs -DskipUTs

Once the package phase is done, you can find the spark-crossdata-${crossdata.version}.jar at the spark-crossdata/target directory.

Now, you can start your spark-shell as:

SPARK-HOME> bin/spark-shell --jars $CROSSDATA-HOME/spark-crossdata/target/spark-crossdata-${crossdata.version}.jar

Inside the spark shell, you can import our crossdata datasources:

scala> import com.stratio.crossdata.connectors._

You can create a Crossdata context (XDContext) as follows:

scala> import org.apache.spark.sql.crossdata._
scala> val xdcontext=new XDContext(sc)

Next, you can import data from the Cassandra source:

scala> xdcontext.sql("IMPORT TABLES USING com.stratio.crossdata.connector.cassandra OPTIONS ( cluster \"Test Cluster\", spark_cassandra_connection_host '127.0.0.1')")

And then check if the Cassandra tables are in Crossdata Catalog:

scala> xdcontext.sql("SHOW TABLES").show(false)

Finally you can execute your queries:

scala> xdContext.sql("SELECT....")

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
project		project
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CROSSDATA

CROSSDATA AS A SPARK PACKAGE

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CROSSDATA

CROSSDATA AS A SPARK PACKAGE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages