I have to build an ETL pipeline of a data from a collaborating hospital data csv file.
Goal: Store the data in a cleaned and structured format into a database/file of choice. Write the code in Python or language of choice. Design a solution that can be scaled to TB of records.
1. Make assumptions and justify them where things are unclear with comments in the code.
2. Write unit tests for all your functions.
3. Write data tests to ensure that the data is correct.
4. Remove Protected health information (PHI): Names, Addresses etc.
5. Clean data. Remove invalid values. Normalize it where reasonable.
6. Add a column that calculates the average of all three glucose measurement time points.
7. Add a column based on the average of all three glucose measurement time points that indicates whether it’s normal, prediabetes or diabetes.
8. Store data in a database or file format of choice.
7 freelanceria on tarjonnut keskimäärin $120 tähän työhön
I can qualitatively design and develop required ETL using MS SQL Server because I am Senior MS SQL/BI Developer with more than 10 years of exceptional professional experience.
Hi, I'm interested in Data Science. I worked SparkSql. I can deliver in 5 days. Working in coordination is my priority. I hope you contact me. Best Regards.