This blog to show, how you can use Hitachi's PDI [Pentaho Data Integration] to submit Spark jobs for machine learning using Java and Python libraries. Also the pi is calculated via a Spark submit, task for you to locate that :)
Below snapshots are using Spark's Java and Python Machine learning algorithms.
OS used: Ubuntu 16.04
Tool: Spoon 7.1 a.k.a Pentaho Data Integration (Design Tool)
Install Spark:
tar zxvf spark-2.1.0-bin-hadoop2.7.tgz
Start Master and Slave:
Browse to check spark session is up at port=8080!?
Launch PDI Job (Spark - Java ML Library):
Submitting Spark Python ML job via PDI:
Install Python Libraries -
PDI job to submit Spark-Python (Showing Entry), K-Nearest Neighbor run for various seed values and analytics:
Below snapshots are using Spark's Java and Python Machine learning algorithms.
OS used: Ubuntu 16.04
Tool: Spoon 7.1 a.k.a Pentaho Data Integration (Design Tool)
Install Spark:
tar zxvf spark-2.1.0-bin-hadoop2.7.tgz
Start Master and Slave:
Browse to check spark session is up at port=8080!?
Launch PDI Job (Spark - Java ML Library):
Submitting Spark Python ML job via PDI:
Install Python Libraries -
PDI job to submit Spark-Python (Showing Entry), K-Nearest Neighbor run for various seed values and analytics: