I need the following completed right away to extract data which will be done in daily, weekly or monthly intervals. It needs to be entirely automated once I input the url for the web page into the mysql database.
There are basically three types of pages to extract data from.
A web page with no form.
A web page with a form to get the desired data.
The first project will be data extraction with no form on the page. After this first project the next project will use form filling as well. Requirements: Linux, php5 minimum and mysql5 minimum.
This script will be run as cron job. A db query will be done based upon the interval type like daily, weekly or monthly. After each url is fetched, cookies and cache/session must be deleted. (CURL??) I also need a separate script to pull table headers from page to help input field mapping when inputting url before the main script is run.
How it runs: on server with linux/php/mysql/curl
[url removed, login to view] Query - Fetch URL page & field mapping from query & parse text in tables to extract data according to field name mapping table and upload extracted data to database tables.
[url removed, login to view] tables with last run date for extraction.
[url removed, login to view] log entry for errors where data not extracted.
Total of two scripts. Need this immediately.
1. Script to pull table headers from page and enter into database.
2. Script to parse table data and upload to mysql database.
Variables will be used so that cron jobs will use the same script for different queries.