Suljettu

Web Crawler and Scraper for Business Database

I need a customized web crawling program to scrape data off an extensive business contact database that contains millions of members. This program must be able to circumvent server detection, either by bandwith throttling or another device. The database will require multiple templates for extraction, however the end user will have the capability to determine the specific crawling rules, keywords, and depth of crawl. The end product must be able to be convereted into an Excel file, MSFT Access, or MySQL database.

--

Features that are desired:

1. Multiple Data Types in Single Extraction Template

(i.e., Free Text, Tabled Information, Multiple Tables)

2. Multiple Types of File Lists or Data Inputs

(i.e., Excel, Access, MySQL, SQL, etc.)

3. Multiple Extraction Datastores

(i.e., Excel, Access, MySQL, SQL, etc.)

4. Automatic Table Creation during Extraction

(Supported in Excel, MySQL, SQL)

5. SQL 2005 Express Instance

(Stores Meta-Data, Program Variables, and can store Extracted Data)

6. Comprehensive Meta-Data Logging

(For Auditing, Data Cleansing, and Data Joining)

7. Wizard driven DataSet Initialization String Creation

(The HTML for Extraction Area Start and Row Start)

8. Manual Editing of DataSet initialization String

(User Defined DataSet HTML)

9. Automatic Table Row Count Calculation

(Automatically Calculates Number of Tables Rows on each HTML Page)

10. Wizard driven Field Creation

(The HTML for Data Extraction Start and Column Start)

11. Manual Editing of Field Start and Stop HTML

(User Defined Start and Stop Tags)

12. Supports Optional Fields

(Accurate Extraction of data that appears in some rows, but not in others)

13. Built In Data Cleansing

(Remove HTML, Preserve Text Whitespace, Full URL from Relative, and more)

14. Test Extraction w/Step by Step Replay for Troubleshooting

(Expedites Troubleshooting)

15. One-Click Save to Datastore Option

(Extract while browsing in the DataPage Editor)

16. Basic Automation Wizard

(Simple Extraction Automation via File List from Excel, Access, MySQL and SQL)

Packages

1. WinHTTP Stack

(Server quality HTTP platform that allows up to 10 page per second downloading)

2. Multi-Step Task Execution

(Simulate user tasks like Log-in, get Cookie or SessionID, Submit Searches)

3. Bandwidth Throttling

(Scale between 10 request/second to 1 request/hour to simulate real user)

4. Download Images and Files

(Edit File Path and File Naming Conventions)

5. Customize User Agent, Referrer URL, Relative URL, Cookies, and more.

6. Powerful SQL based File List Manipulation and Concatenation

7. Package Run Scheduling

(Run Normally or Silently from Windows Scheduler or other program interface)

9. Create URL File Lists

(Manually or using Excel, Access, MySQL, and SQL) X

10. Advanced Web Crawler

(Control Depth, Number of pages, and parameters of Link to be crawled or ignored) X

--

This program must be able to avoid automated detection or blocking from the host. Remember, it must be able to extract entries in the millions at very high speeds. The database which I need to scrape is [url removed, login to view]

Payment will be transfered via an escrow once three successful tests of the program have been completed to my full specifications.

I will utilize the web crawler to search for companies via U.S. NAICS industry descritpions ([url removed, login to view]). Once the listed companies under that industry appear in [url removed, login to view], I will require their business information be scraped and put into my database.

The information required includes:

Company Name

Address

Alt. Company Nam (DBA)

Phone Number

Location Type

Est. Annual Sales

Est. # of Employees

Est. # Employees at Loc.

Year Started

State of Incorp.

Contact's Name

Contact's Title

Parent Company

NAICS Description

For example, if I want to collect the business information of every company that is in the NAICS Industry "Spices and Seasonings", I will enter a search in [url removed, login to view] for "Spices and Seasonings. Every related company will be listed, and that information will need to be scraped and compiled into my own database.

A company listed under "Spices and Seasonings" ([url removed, login to view])

For the test of the finished product I will require sample excell, MSFT Access, and MySQL databases for the companies in the following industries: "Spices and Seasonings", "Soft Drink Manufacturing", and "Snack Food Manufacturing".

Some NAICS industry descriptions may return more results than can be listed by Manta, which requires a specialized solution in further crawling via either category or description. For example, Snack Food Manufacturing has 4862 companies but not all can be listed at one time. [url removed, login to view]

Taidot: .NET, ASP, C-ohjelmointi, Javascript, Tutkimus

Näytä lisää: manta scraper, scraper html tables access, crawler manta, scrape mantacom, web crawler program, windows phone for business, web templates images, web templates for free, web templates create, web templates business, web template food, web solution companies, web search remove, web search images, web searches database, web page template free, web pages templates html, web pages templates free, web html 5 template free, web crawler features, web-crawler, want to create my own database, user tests, user specifications template, user interface companies

Tietoa työnantajasta:
( 0 arvostelua ) East Brunswick, United States

Projektin tunnus: #409692

7 freelancers are bidding on average $693 for this job

rhoware

I have already done the program before, hopefully I could help you on this project. Thank you.

750 $ USD 1 päivässä
(1 arvostelu)
1.0
project12

Hi! Please see pm Contact me please, i can make your project

750 $ USD 20 päivässä
(0 arvostelua)
0.0
sg2009

WE are ready,pl check pmb

700 $ USD 20 päivässä
(0 arvostelua)
0.0
qrichtech

delivery with quality is our motto "Qrich" we will start work on your project with each and every mode of your feedback to reach the quality. lets start. :) note:work will be started with clear requirements from y Lisää

700 $ USD 30 päivässä
(0 arvostelua)
0.0
user200901

Live worked done on many ETL Process for FMCG Company using SSIS with any DB like flat file, Excel, Access, Oracle, SQL. I can do my best. More.. look my profle

700 $ USD 30 päivässä
(0 arvostelua)
0.0
vvhasyagar

I did a similar project for a real estate company

500 $ USD 15 päivässä
(0 arvostelua)
0.0
waynwill

[url removed, login to view]

750 $ USD 5 päivässä
(0 arvostelua)
0.0