We need a scraper for KSL . com / auto / search
It needs to retrieve the data directly from the HTTP response, without having to render the page.
It needs to be written in Python 3
We also need investigation into whether or not a cheaper or free proxy can be used (instead of Crawlera - we have an API key / subscription already).
The scraper needs to get all properties of each individual listing (and therefore will probably have to visit each individual listing URL).
The scraper needs to feed the data into our existing MySQL 5.7 2nd generation InnoDB database on Google Cloud Platform with the following table structure:
CREATE TABLE [login to view URL] (
id INT(11) NOT NULL AUTO_INCREMENT,
name VARCHAR(256) DEFAULT NULL,
source VARCHAR(64) DEFAULT NULL,
price INT(11) NOT NULL,
year INT(11) NOT NULL,
make VARCHAR(64) NOT NULL DEFAULT '',
model VARCHAR(64) NOT NULL DEFAULT '',
mileage INT(11) NOT NULL,
transmission VARCHAR(64) DEFAULT NULL,
num_cylinders INT(11) DEFAULT NULL,
drive_type VARCHAR(64) DEFAULT NULL,
body_type VARCHAR(64) DEFAULT NULL,
fuel_type VARCHAR(64) DEFAULT 'Other',
title_type VARCHAR(64) DEFAULT NULL,
vin VARCHAR(64) DEFAULT NULL,
trim VARCHAR(64) DEFAULT NULL,
color VARCHAR(128) DEFAULT NULL,
location VARCHAR(256) DEFAULT NULL,
source_id VARCHAR(24) DEFAULT '',
url VARCHAR(512) DEFAULT '',
date_listed DATETIME DEFAULT NULL,
date_found DATETIME DEFAULT NULL,
date_updated DATETIME DEFAULT NULL,
num_doors INT(11) DEFAULT NULL,
date_analyzed DATETIME DEFAULT NULL,
seller_type VARCHAR(15) DEFAULT NULL,
details LONGTEXT DEFAULT NULL,
PRIMARY KEY (id)
Below are a few notes regarding the fields:
ID - will populate automatically
name - listing title
date_analyzed - should be left null
details - this is where the listing description will be stored
date_listed - this is the date the listing was posted to KSL
date_updated - this is the date the record was updated in the database
source_id - this is the listing ID from KSL
source - should always be 'ksl'
We need the script. The script needs to be able to be repeatedly run on a Google Compute Engine Virtual Machine instance running Debian 9.
The script needs to be able to search by all of the search filters that [login to view URL] offers. They need to be optional (such that it will run without any filters).
Each time the script sees a listing that already exists in the database, it needs to update the row in the database with all data available from the search page (it should not visit the individual listing URL if the vehicle listing is already in the database).
17 freelancers are bidding on average $175 for this job
Hi. Webscraping is one of my main skill. I will use selenium of python3. I can help with that and can start right away! I 'd like discuss with you via chatting. I will wait. Thank you! From Apollo!~
Hello. I am a senior web developer and free now. I have rich experiences in web scrapping with Python. I am confident that I can bring you the best result. Looking forward to hear from you. Best regard.
Hi. Great app writer for your projects. I have writen scraping app for many years. I am ready to write your project. Thank you for visiting my profile
Hello dear I can help you. As a python master web developer, I will provide you perfect result. I am very confident for this project. I want to discuss in details on it with chat. Waiting for your reply. Thank you.