Thursday, February 01, 2024

Pentaho (PDI) to translate human language (i.e. NLP task) using Open-AI!

LLMs (Large Language Models), example like [LLaMA, GPT-4, ChatGPT, Megatron-Turning NLG, Mistral 7B] continue to fascinate the way NLP tasks [Question/Answering, Translation of human languages, Sentiment analysis, Text Summarization] being able to solve! Here is a question for Open-AI .

Who was the first person to land on moon.. and the answer from the system is basically given the statistical distribution of words in public corpus, what words are likely to follow the sequence? Here Neil Armstrong!!


How about human language translation, which can increase productivity coming from different cultural background, LLMs can be trained to covert one language to few more languages !? Here is an example.



Wow, how Open-AI could easily translate into few languages for an English sentence!

How about using this in a data pipe using ETL/ELT tool like (Pentaho Data Integration-PDI) to call REST end point of  Open-AI Chat Completions. Below is a transformation (data pipeline) which takes input as above API endpoint, to supply a language, input data, which engine to use.



The input is a step 'Data grid', then prepare a JSON payload (using step 'Add constant'), using string replacement, replace the string for LANG, SENTENCE, GEN_AI_MODEL as per the input coming from Data grid step. Once the JSON body is prepared then pass onto the end point via calling 'REST client' step in Spoon.


















Once REST client invokes Open-AI end point and gets the result (with code 200 as success), then you can see the data coming out from "dummy" step as below.

Translated Output


Conclusion: This simple translator in a data pipe line, shows you how you can use Open-AI REST end-point, while solving a business scenario where you are dealing with customer feedbacks, VoC data sets, increase in efficiency so sales, strengthen partnerships, avoid misunderstandings/disputes etc. 

The challenge you may get into while using this model is performance, how long the Open-AI takes for a paragraph or even for a document to translate, another is cost: for large text size as these systems charge in number of tokens, so an Open source LLM, may be another option. 

Hope you enjoyed!







No comments: