izhar/data-engineering
Project ID: 53357
Description
Data Engineering related packages:
Currently contains:
- apache-spark2 (including Spark SQL Hive2 server, Spark Master, Spark Worker and History Server systemd service)
- apache-spark3 (including Spark SQL Hive2 server, Spark Master, Spark Worker and History Server systemd service)
Note: if you want to have both spark2 and spark3 on the same machine, make sure you change their default listening ports for thrift server and history server.
Installation Instructions
Enabling Repository
Enabling copr repository:
dnf copr enable izhar/data-engineering
Apache Spark 2
Installation
Installing apache spark:
dnf install apache-spark2 mariadb-server
You will first need to create a database in mariadb for Hive metastore
sudo mysql
in the mysql shell
create database hive_metastore;
grant all privileges on hive_metastore.* to hive@'%' identified by 'hive';
Then load the hive metastore table structures
sudo mysql hive_metastore < /usr/share/apache-spark2/hive-metastore.sql
Then you can start spark thrift server and make use of spark sql
sudo systemctl start spark2-thrift.service spark2-historyserver.service
Accessing / Launching Spark
Beeline:
spark2-beeline -u "jdbc:hive2://localhost:10000"
Spark Submit
spark2-submit /path/to/job.py
PySpark Shell
spark2-pyspark
History Server : http://localhost:18080
Apache Spark 3
Installation
Installing apache spark:
dnf install apache-spark3 mariadb-server
You will first need to create a database in mariadb for Hive metastore
sudo mysql
in the mysql shell
create database spark3_hive_metastore;
grant all privileges on hive_metastore.* to hive@'%' identified by 'hive';
Then load the hive metastore table structures
sudo mysql spark3_hive_metastore < /usr/share/apache-spark3/hive-metastore.sql
Then you can start spark thrift server and make use of spark sql
sudo systemctl start spark3-thrift.service spark3-historyserver.service
Accessing / Launching Spark
Beeline:
spark3-beeline -u "jdbc:hive2://localhost:10000"
Spark Submit
spark3-submit /path/to/job.py
PySpark Shell
spark3-pyspark
History Server : http://localhost:18080
Active Releases
The following unofficial repositories are provided as-is by owner of this project. Contact the owner directly for bugs or issues (IE: not bugzilla).
Release | Architectures | Repo Download |
---|---|---|
EPEL 8 | x86_64 (221)* | EPEL 8 (97 downloads) |
EPEL 9 | x86_64 (29)* | EPEL 9 (46 downloads) |
* Total number of downloaded packages.