Problem solution for Spark use case
$750-1500 USD
Maksettu toimituksen yhteydessä
1) SO I have 60 Millions usa and canada postals
Created dataframe customer_df => good and bad => 60M
2) downloaded some Good postals from internet to filter the customer_df data
Created one dataframe good_df => which is good postals => 1M
3) Perfomed Join between customer_df and good_df wiht zipcode to seperate the good values
filter_df = good zip [url removed, login to view](cus_df,zipcode)
4) Then seperated bad data with the below logic
bad_df = [url removed, login to view](filter_df)
Now still we can filter bad_df with city names
city_df = [url removed, login to view](bad_df,city)
Then did unioin between both df's
total_filter = [url removed, login to view](city_df)
it taking 1.30 mints (used spark with 8 node cluster each node 32 gb => spark-submit driver memory -8g and num-executors - 8 and executor-memory- 8g)
any other technology or any other tool to clean-up the data within 15 to 20 mints(again customer data is 60M
Projektin tunnus: #14803384
Tietoa projektista
16 freelanceria on tarjonnut keskimäärin $1101 tähän työhön
I am a data scientist and have experience with Big Data Technologies like Spark and Hadoop. I also have experience with NoSQL databases like HBase, Cassandra, etc. Previously I have worked with in Spark related projec Lisää
Hello, I am 7+ years experienced Big data developer and I understand the job and will provide the desired solution. Please spare some time to discuss further. Relevant Skills and Experience My Key skills are: Java J Lisää
I have experience in tuning and debugging Spark jobs for one of the Fortune 6 companies which processes large amount of data.
Hello, With an experience of 7 Years into Java, 3 Years into Hadoop & 1+ year into Spark, excellent solution is guaranteed. Whats your value for "--master" and "--deploy-mode" in spark-submit command Relevant Skills Lisää
I have experience in working with Apache Spark and the manipulate DataFrames and RDD, by means of python
Hello, i have a lot experience in the field g feel free to ask for my work,............................
django PHP Arduino hadoop metatrader web design python machine learning HTML,HTML5 graphic design wordpress Android unity3d Relevant Skills and Experience django PHP Arduino hadoop metatrader web design python machi Lisää