izhar/data-engineering

Project ID: 53357

Description

Data Engineering related packages:

Currently contains:

  • apache-spark2 (including Spark SQL Hive2 server, Spark Master, Spark Worker and History Server systemd service)
  • apache-spark3 (including Spark SQL Hive2 server, Spark Master, Spark Worker and History Server systemd service)

Note: if you want to have both spark2 and spark3 on the same machine, make sure you change their default listening ports for thrift server and history server.

Installation Instructions

Enabling Repository

Enabling copr repository:

dnf copr enable izhar/data-engineering

Apache Spark 2

Installation

Installing apache spark:

dnf install apache-spark2 mariadb-server

You will first need to create a database in mariadb for Hive metastore

sudo mysql

in the mysql shell

create database hive_metastore; grant all privileges on hive_metastore.* to hive@'%' identified by 'hive';

Then load the hive metastore table structures

sudo mysql hive_metastore < /usr/share/apache-spark2/hive-metastore.sql

Then you can start spark thrift server and make use of spark sql

sudo systemctl start spark2-thrift.service spark2-historyserver.service
Accessing / Launching Spark

Beeline:

spark2-beeline -u "jdbc:hive2://localhost:10000"

Spark Submit

spark2-submit /path/to/job.py

PySpark Shell

spark2-pyspark

History Server : http://localhost:18080

Apache Spark 3

Installation

Installing apache spark:

dnf install apache-spark3 mariadb-server

You will first need to create a database in mariadb for Hive metastore

sudo mysql

in the mysql shell

create database spark3_hive_metastore; grant all privileges on hive_metastore.* to hive@'%' identified by 'hive';

Then load the hive metastore table structures

sudo mysql spark3_hive_metastore < /usr/share/apache-spark3/hive-metastore.sql

Then you can start spark thrift server and make use of spark sql

sudo systemctl start spark3-thrift.service spark3-historyserver.service
Accessing / Launching Spark

Beeline:

spark3-beeline -u "jdbc:hive2://localhost:10000"

Spark Submit

spark3-submit /path/to/job.py

PySpark Shell

spark3-pyspark

History Server : http://localhost:18080

Active Releases

The following unofficial repositories are provided as-is by owner of this project. Contact the owner directly for bugs or issues (IE: not bugzilla).

Release Architectures Repo Download
EPEL 8 x86_64 (201)* EPEL 8 (71 downloads)
EPEL 9 x86_64 (11)* EPEL 9 (13 downloads)

* Total number of packages downloaded in the last seven days.