Peruttu

webscraper for prices spesific webpage (I have separate 1 hour to scope, seperate 5 mins to read... than apply for the job)

I need a price information from n different pages at the sametime in every x miliseconds ( around 30 mins ).

Need to 7/24 run as a service at windows or a docker service in cloud (preffered),

Scrapping for all pages that has in XML needs to be concurrent (thread) request to get the data at the same timeline.

After filling all data, you can make the calculations and then you can save all values.

PossibleDiagram has attached.

Save every actionto a log file with in relateddatename to see if error occurs.

1- You will get config of running process from XML file (time, log, mails....)

2- You will get the pages, Xpath and regex from XML file.

3- You will get calculations from XML.

4- You will get warning conditions from XML

5- SQLite is enough for DB, but If you prefer to use another for better results.

6- I wish to run this project in Amazon Web Service but if it' s not possible I will provide a VM to make a setup.

Alert logic

Alert logic is easy.

<alert id='warn1' if='balances>5' ops='admin;operator' alertype='sms' text='balance is low'/>

this is an SMS type alert find admins from ops such as admin find his GSM (if you can' t find it, bypass... but log it)

call web page of sms which all details under SMS chids in XML with the formula

//

{URL}?GSMNO={findadminGSMnumber}&text={GETALERTTEXT}&{child}={child/text()}....

Currently its

[login to view URL];123123&text=balance%20is%20low&user=tako&pass=mako&title=POAS&attr1=123&attr2=345

You will make get requests.

If it' s email type of alert... Get the mail, send ... title will be the same with Text.

You will use SMTP details from config.

I need a XML configuration file such as

<config>

<settings>

<time>65000</time>

<logfile>log{date}.log</logfile>

<admins>

<admin id='cats' mail='asdasd@[login to view URL]' gsm='213123'/>

<admin id='felicia' mail ='test@[login to view URL]' gsm='123123' />

</admins>

<smtp>

</serveraddress> -- generally planning sendin mail from gmail

</username>

</password>

</TLSport> --IFneeded

</SSLport> --IFneeded

</smtp>

<sms>

<url>[login to view URL]</url>

<user>tako</user>

<pass>mako</pass>

<title>POAS</title>

<attr1>123</attr1>

<attr2>345</attr2>

</sms>

</settings>

<pages>

<page id='scrape1' startUrl='[login to view URL]' selector='//*[@id="root"]/div/main/header/div/div/div[2]/span[1]' regex='\\d{5}' />

<page id='scrape2' startUrl='[login to view URL]' selector='//*[@id="tab10"]/table/tbody/tr[2]/td[3]' regex='\\d{3}' />

<page id='scrape3' startUrl='[login to view URL]' selector='//*[@id="jsParityTable"]/div[3]/div[1]/div[2]' regex='\\d{9}' />

</pages>

<calculates>

<calc id='convert1' formula='(scrape3/scrape2)*scrape1' />

<calc id='balances' formula='(scrape1+scrape3)*(scrape1-scrape2)' />

</calculates>

<alerts>

<alert id='warn1' if='balances>5' ops='admin;operator' alertype='sms' text='balance is low'/>

<alert id='warn2' if='convert1=5' ops='operator' alerttype='email' text='convert is 5'/>

<alert id='warn3' if='scrape = 0' ops='admin' alerttype='email;sms' text='scrape is null' />

</alerts>

</config>

General Accepts

- Page number won' t be more than 10, probably 4 but it must be capable work 10 (telling you because of thread issue )

- Minimum x time will be 5 mins, so you have 5 mins to calculate

- Calculates won' t be more then 10.

- All Scraps will be a numeric value, mostly money.

- All values default is 0. If you can' t calculate some how... log why... where you coudn' t calculate (xmlid) and set 0

- Calculates will be done by ordering of XML. It' s not gonna happen with concurrent threads, calculations will be made after all concurrent page scrapping finished.

- Calculates can use calculate ID' s in formula. calc1 formula can be 10x5, calc2 formula could be calc1*5 - I need to get 250... if calc1 is not calculated because of XML order, calc2 result will be 0, but you need to log that problem in to log file as "calc1 was null" - calc2 coudn' t calculated.

I have separate 1 hour to scope, seperate 5 mins to read... than apply for the job

- I need a webservice feed which is serving these information for PowerBI or Qlikview to check.
I think 3 service will be enough... service credentials will be in XML file.

Taidot: Amazon Web Services, Python, tietojärjestelmäarkkitehtuuri, Tietojen kaavinta verkosta, XML

Näytä lisää: Angular 1.x simple frontend, magento 1.x, magento 1.x product export, prices of webpage designs, how can a student who majored in computer graphics and design get 1 year relevant work experience, 99 1 x 38 1 mm design, 1 hour email address, 1 hour backlinks, 1-800-flowers work from home, virtual assistant services 1 hour free city of toronto toronto gta jobs office mgr receptionist, looking for photography for 1 hour april 11, how to get 1 hour videographer in bali, can i hire a lawyer for 1 hour, virtual personal assistant wanted for 1 hour each day for the month of january 2016, sound design for a 1 hour movie, i need 1 mobile app, i have 1 question for a amazon expert for hire, html email signature designed in less than 1 hour, find developers for 1 hour london, cheap photographer for 1 hour in new york

Tietoa työnantajasta:
( 13 arvostelua ) İstanbul, Turkey

Projektin tunnus: #15893667