Load Apache combined format log files into AWS Redshift for analysis


How are you? I hope i find you well.

I am looking for a small project with the following workflow:

We have apache servers that has more then 300 millions request each month. After an hour, the apache server archive the log files into .gz file. So I need a python script that:

- Unzip the .gz file

- Read all the files content

- Parsing the files into: IP, date, query, target, id (Example of the log: [login to view URL] - - [28/Aug/2017:14:38:47 +0000] "GET /0.4/query?target=%2Be9ChyFL&id=3ac218e5787584c09d96d230ed563ceb267b59f4&nonce=9d33f05fda3a5b0d09d6bf73f4078e9c44673c2d&lang=ru-RU&version=chrome-20170609&auth=c6def3a8b002ecf04e7dd629460161fc516da4f9 HTTP/1.1" 200 1148 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" 571)

- Inserting all the variables above into redshift database (for example 'insert into {table} values('{ip}','{date}','{query}','{target}','{id}')'

- This process need to be automatic - once the scripts sees new archive files in the server it immediately start working on.

- I need that our dev team to query our database, the results will be fast and won't take 2 hours. It's bigdata project indeed.

Thank you, looking for your reply.

Taidot: Python, Redshift

Näytä lisää: aws log analysis, amazon redshift, pig script analyze apache log files, aws php script download log files, bash scripts apache log files processing, bash script merge apache log files, script sort multiple apache log files, volume apache log files mysql, deleting unix log files use process, format csv files need upload outlook, script move log files check name, log files big, search unix log files, macro update log files status excel sheet, java program parse log files create report, steel belted radius log files

Tietoa työnantajasta:
( 0 arvostelua ) Israel

Projektin tunnus: #15110882

4 freelanceria on tarjonnut keskimäärin %project_bid_stats_avg_sub_26% %project_currencyDetails_sign_sub_27% tähän työhön


Indeed it's a bigdata project. I will develop the script in Python and it will run through cron Relevant Skills and Experience Extensive experience with cloud platforms (Amazon AWS/Google Cloud) AWS services including Lisää

$199 USD 3 päivässä
(10 arvostelua)

Hello, i am a solutions architect. I am expert on Python and AWS. I have been working on a click stream project recently. I like your project. Please contact me. Thanks. İlgili Beceriler ve Deneyim python, aws, big da Lisää

$270 USD 10 päivässä
(14 arvostelua)

I'm assuming that your web servers are on ec2 since you're using redshift. This can be solved with a python lambda using boto3. Relevant Skills and Experience I have experience automating AWS management tasks and larg Lisää

$155 USD 3 päivässä
(0 arvostelua)

I am a AWS Redshift expert, I can do this job easily. Relevant Skills and Experience I have more than 8 years of experience in data warehousing and from past 2 years I am working in Reshift database. Proposed Milest Lisää

$222 USD 3 päivässä
(0 arvostelua)