Saturday, September 16, 2023

Pentaho PDI : working with Mongo & Kafka while processing hierarchical JSON!!

Need: When you are working with JSON objects, sometimes you need to create hierarchical JSON and store into Mongo collection. Then you may read Mongo collection and stream under a Kafka topic!

Here is flight data coming from various CSV files and once the files are collected, then used a Date dimension to capture few more columns as part of JSON. That is the source JSON. The data in preview mode, looks like this.







Then once you process the data through hierarchical [plugin from PDI], need to install before launching Spoon. 





Once stored the hierarchical collection, then read the data from Mongo collection and produce under the Kafka stream. Provide necessary parameter under a secured Kafka broker as below.















Already Kafka listener (consumer) listening (also secured Kafka parameters are supplied to the consumer site as well given below. Let this transformation is running and if the topics are already in Kafka broker then keep processing through consumer. This consumer transformation refers to stream transformation and then produces data in text file output or any other format of output as desired. 






Friday, April 28, 2023

OpenAI APIs via Pentaho-PDI (aka Spoon)

As OpenAI made the generative AI models publicly available to use (ie ChatGPT is usage of generative AI from OpenAI); If you don't want to use bot, then OpenAI offers APIs to invoke towards AI application usage.


In order to test the APIs offering from OpenAI, you need to have an account, then generate the API keys to use in your application. You can use "Postman" to test endpoints (like below as an example) and then will show how to use Pentaho's Spoon to invoke using "REST client" step for the OpenAI APIs.




While using the Spoon, once you supply JSON body, it invokes the above API endpoint and gives you the result back. You need to authenticate with your email and token (API-KEY) generated under your account > "API Keys" within OpenAI interface.

JSON :{"model":"code-davinci-edit-001","input":"What day of the wek is it?","instruction":"Fix the spelling mistakes"}


Similarly you can access other endpoints for example parse unstructured data,  


A table summarizing the fruits from Goocrux: There are many fruits that were found on the recently discovered planet Goocrux. There are neoskizzles that grow there, which are purple and taste like candy. There are also loheckles, which are a grayish blue fruit and are very tart, a little bit like a lemon. Pounits are a bright green color and are more savory than sweet. There are also plenty of loopnovas which are a neon pink flavor and taste like cotton candy. Finally, there are fruits called glowls, which have a very sour and bitter taste which is acidic and caustic, and a pale orange tinge to them. | Fruit | Color | Flavor |




Endpoint API : https://api.openai.com/v1/completions
JSON body: {"model": "text-davinci-003", "prompt": "A table summarizing the fruits from Goocrux:\n\nThere are many fruits that were found on the recently discovered planet Goocrux. There are neoskizzles that grow there, which are purple and taste like candy. There are also loheckles, which are a grayish blue fruit and are very tart, a little bit like a lemon. Pounits are a bright green color and are more savory than sweet. There are also plenty of loopnovas which are a neon pink flavor and taste like cotton candy. Finally, there are fruits called glowls, which have a very sour and bitter taste which is acidic and caustic, and a pale orange tinge to them.\n\n | Fruit | Color | Flavor |",  "temperature": 0,  "max_tokens": 100,  "top_p": 1.0,  "frequency_penalty": 0.0,  "presence_penalty": 0.0}

The APIs from OpenAI is very powerful and you can try may other APIs like image generation, embedding, audio, moderation etc. You may tryout OpenAI APIs here: https://platform.openai.com/docs/api-reference

Hope you enjoyed this blog on OpenAI APIs usage using Spoon(PDI)!