Tuesday, April 29, 2014

How to get S3 files from AWS via Talend ETL?

This blog is to capture files from S3 [AWS] via Talend. There was a requirement to capture files comes in various patterns on clickstream data. Some files got common prefix [ ex Page, PageSummary, PageError].

Each tS3List:
- Uncheck "List all buckets objects"
- Provide your bucket name under "Bucket name", provide "Key prefix" as needed
In my case under the bucket, several directories. So used "directory_name/File Prefix"

This is how i could distinguish the above example of common prefix.
"directory_name/Page 2014-"
"directory_name/PageSummary"
"directory_name/PageError"


tS3Get:
Bucket: Provide your bucket name
Key: ((String)globalMap.get("tS3List_1_CURRENT_KEY"))  -- See NO double quotes
File: ""/Users/shota/"+((String)globalMap.get("tS3List_1_CURRENT_KEY"))

PS: These all files are connected with a central S3 connection object.

Any question, please provide a comment.