Saturday, July 06, 2019

Real time prediction ie (Prediction as Service) using Pentaho at Hitachi !!


How Pentaho is used towards Prediction as Service (real time prediction from machine learning models)!!

Here will show how you can use R based machine learning models from Spoon's PMI-Plugin Machine Intelligence to generate ML models and how to access ML models real time from Ctool's CDA-Community Data Access.


Data set : "Give me some credit" Kaggle

Task: Predict delinquency in next 2 years from a given data sets (links below).




Server &Tools :
A running Pentaho Server
Clients:
PDI - Pentaho Data Integration aka Spoon (towards jobs and transformations)
CDA- Community Data Access (comes with Pentaho Server) - Here executes transformation real time gets output in browser and returns JSON.

PUC Overview:





Here you can double click get_Score (Green ball) and it opens up in browser. Then you change attributes as desired and it will predict the first attribute (class attribute).







Below is a snapshot while changing the values from fields, you can click on Arrow so that it will execute "tf_getScore" transformation, which uses a ML machine learning (ie Decision Tree) and predict result in the first column. Also the data can be called via REST API which gives JSON payload, can be used for further processing (shown below pictures) in the pipe line.












Machine Learning Model generation and Scoring:

You can download PMI-Plugin Machine Intelligence under Spoon (Pentaho Data Integration) to create models. Here is an example of how to create model. Here you can use testing method as CV, Test

Generate ML Model












Scoring from a generated ML model:






Ultimately you can also stored predicted result in database from batch oriented system perspective and then create Mondrian model to visualize under Pentaho Analyzer.



Code: Download and upload in PUC-Pentaho User Console under "public"                                                                  
Conclusion: 
Pentaho's PMI enables in deploying machine learning models faster, via testing/scoring models, feature selection, tuning parameters. Also we saw you can visualize or extract to JSON through real time access of ML models.