1. Develop/Propose various ETL pipelines for the combination of hadoop eco-system with/NOT spark using AWS cluster;
2. Implement the multi-purpose ELT pipeline (like kappa or lambda architecture) that ingress data from various sources (NOT limited to mySQL, Oracle, NoSQL, Flat files like PDF, Excel, CSV, .docx, Live/streaming) to hadoop/hdfs by using wide range of bigdata tools;
3. Ensure the data quality like format, fields, precision, No. of rows etc.., during data migration from data sources to HDFS;
4. Pipeline must able to handle huge data in GB's;
5. Troubleshoot by creating breakpoint in the ETL pipeline at various levels (at each hadoop eco-system tool) like resource level & code level towards memory management, performace tuning, optimization;
8 freelanceria on tarjonnut keskimäärin %project_bid_stats_avg_sub_26% %project_currencyDetails_sign_sub_27% tähän työhön
I work with similar use cases on day to day basis. I have around 3 years of hands on experience with big data technologies. Along with work I can help you with understanding. Reach out to me if you are interested.