Summary of Project
Project requires to convert data on webpage to csv format. You can use any source language as long as source is well documented. I would prefer Java. You can also iMacro as its works with firefox and is free add on. This might reduce cost based on your knowledge with iMacro.
This project also requires you to use Zillow api to get value of property using zillow api.
If you know scrapping, this project should be very small.
Web scrapping application to be built will be called "Program" here after
Steps of data extraction.
1. User will click on search button after putting some criteria manually.
2. Result set will be displayed on webpage. Results will be more than 1 page. Complete Source of sample page attached ( [login to view URL])
3. User will initiate the web-scrapping program built in this project.
Web scrapping program will:
A. Program will Take each row of data and copy it in csv
B. Program will Click 3rd hyperlink that will open pop up html page ( attached complete format. [login to view URL] )
C. Program will parse the HTML page and copy as data element in the page in same row as written in step A. Look at the [login to view URL] for description . Basically its html table where you get values and copy in CSV.
4. Take Address fields from the csv current row( Column name of each address field will be provided by user in properties files or any other way convenient for the application) and calculate zillow value using webservice. Details at
[login to view URL] Add Zestimate following data returned by result set:
* Zestimate (in $)
* Last updated date
* 30-day change (in $)
* Valuation range (high) (in $)
* Valuation range (low) (in $)
* Percentile Value
* Zillow Home Value Index
* Zillow Home Value Index 1-Yr change
You should restrict the number of request you can make in one shot and interval between each request by number stored in properties file provide by the user or any other convenient way. E.g. 1000 request per session and 2 second interval per request.
D. Got to Next row and repeat again from Step A. Loop till end and then click next page till no more page and rows are there.