I need a simple server side script (LINUX server environment) written that will do the following:
1) monitor a directory and it's sub directories on a regular interval (every 30 minutes)
2) When a CSV file ([login to view URL]) is found in one of the directories it will be loaded for analysis
3) The script will have a set of rules used to analyze and edit the loaded CSV
4) the rules will target the email addresses in the CSV for proper formatting
5) If a malformed or invalid email addresses is detected the whole record should be removed
6) The script will also have a preloaded (and easily updateable) negative database of email addresses and domains. This negative data base will be provided as a large CSV and should be easy for the administrator to update on a regular basis without extensive technical skills.
7) It will compare all email addresses and domains in the loaded CSV to the negative database, when matches are found the entire record will be removed from the loaded CSV.
8) ones these steps are completed the cleaned CSV will be exported back to the subdirectory where it was found with "_clean" added to the end of the filename so it does not overwrite the original ([login to view URL])
NEW Functionality:
9) The script should interact with the Facebook API and check each email address in the loaded CSV after it has been compared to the negative database to see if there is a facebook account associated with it.
10) A SECOND output file should be created containing ONLY the records from the _clean file that ALSO have Facebook accounts associated with them.
11) That file should be outputted to the original subdirctory as well with "_clean_FB" added to the end of the original file name.
Important notes:
1) The system should be fast, effecient, and be able to handle VERY large files 100s of MB per CSV. I understand that large files will take time but your programming approach should take these files into account and target efficiency as much as possible.
2) The negative email database is over 600mb so the script must be able to very efficiently do the comparisons and cleaning against such a big file.
3) The script must be flexible on the input CSV format. Some CSVs may onle have 2-3 columns of info, others may have many more. THe location of the email field will not be set, and the header row may or may not be present. THis means the script must check the file to determine where the email field is in real time and adjust accordingly.
Please post any questions and I will be happy to answer them!
This should be a quick easy project for somebody who knows what they are doing!
I will be looking to expand this project in the future and the right provider will get a significant amount of business from me if they provide good service & pricing!
I have plenty more projects for the right provider as well so please bid accordingly.
thanks!
-Bobby
Hi there!
My name is Jarett Dunn. I recently completed a project that took URLs from an Excel file and ran them against three web checks and outputted an Excel file. I can easily take the knowledge I learned from that project and apply it to your needs. I am familiar with Linux systems and can easily create the functionality needed to check the folders on a regular basis. The Facebook API is really easy to use and shouldn't take that long to build into your system. The preferred method of coding would be Java, but I'm open to suggestions for other languages.
Please keep in mind that I am new to GAF and would like a long-lasting relationship with you over multiple projects after completing this one in a timely manner, with a piece of software you can be proud of.
Please feel free to contact me and we can discuss prices and requirements, and hopefully I'll be awarded the job!
Cheers and we'll be in touch!
-Jarett