Keyword Data Collection

This project is specifically to build a keyword database which continually updates itself. The target keywords will simply be extractions of product details from various shopping sites.

The extractor will start with a handful of core shopping sites, but later additional sites will be added for specific category centric keyword databases. Sites that will get keywords extracted are:

- [url removed, login to view] (later adding their international sites)

- Ebay (later adding international as well) – Special instructions for this below

- [url removed, login to view]

- Yahoo Shopping

Extraction data must be in the following format in the database:

- Category > Brand > Product > Model > Features

The data will be stored in the database with the following minimum tables:

Source – The Site it came from

Date Added – Date keyword was first added to db

Date Updated – Date keyword was last updated in the db (this means target site updated the term)

Date Expired – Date keyword was removed from db (this means the target site removed the term)

Category – The category the keyword goes under according to the source including sub categories if applicable. (eg. Digital Cameras might be found under Electronics > Cameras > Digital Cameras)

Brand – The brand the keyword goes under according to the source. (eg. Kodak

Product – The main product name. (eg. Powershot)

Model – This relates to the product name an can come in variations. For example, a certain shirt might come in models small, medium, large. While for a digital camera.. like the Powershot, it might be Powershot AS 750 << AS 750 being the model.

Features – This relates to the product and model. Main features that are included with the product stored here for use.

Related Tags – If there are any related tags to the product, to store them here.

Usage Tracking – This will be used by another program that utilizes the keywords. This part of the db will not apply to the extraction process, but is meant for later use.

Prefix – This also doesn’t relate to the extraction process, but will be used later when utilizing the keywords.

Suffix - This also doesn’t relate to the extraction process, but will be used later when utilizing the keywords.

Geo Target – This indicates if the target content is meant for specific country. For example, [url removed, login to view] source is US, while [url removed, login to view] is Canada.

Amazon ASIN – This field is only for Amazon source to record the ASIN of the product. This will make for easier use of Amazon AWS API interfacing the db data.

The keyword extractor must also have an option to cleanup invalid characters (ie. /#,'-) from the extracted data.

The program must have proxy support to avoid possible blocking from target sites.

We are dealing with a large amount of data. The Categories will be easy, the Brands as well, but once we get into Products and Models and Features it will mean much more data. I would advise having the system progressively extract the complete data either Category or by Brand.

Special Instructions Regardng eBay – Using the following page:

[url removed, login to view]

The program only needs to extract the categories on these pages, and then also take the product titles which will be stored in the Products area. Nothing can be done in regards to brand, model, and features, so these areas will be left blank. However the other functions of the database will still apply.

After an initial product category or brand has been added to the database, the program will be scheduled to check for updates every two weeks. Updates include changed product data, tags, or deletion.

Some existing extractors/services that might help:

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view];query=Net%3A%3AAmazon

[url removed, login to view]

[url removed, login to view]


While the database will be used to work with other programs, it will also be useful to be able to extract the data in variations. The ability to output for example:

Keyword list of all products under Electronics

Keyword list of all products under Kodak

Keyword list of particular feature set

Basically any combination you can think of that can be outputted / downloaded as a text/CSV file.

Remember that later we will want to add other sites for input to the db. It’s understood this will require another custom setup. That’s ok.

I have had another keyword db designed last year that was similar to this but the programmer is no longer available. That program only took a week to complete. I am hoping this can be completed within two weeks if not less. There are a LOT of extraction programs out there that can reduce the dev time on this. If you can use one for this then please do so.

This is a server side program. Which means you are using C+, CGI, PHP, or Ruby. Because of the scheduled ongoing updates this is most ideal in a server environment. Also, the server is Linux only, so please do not propose anything that requires MicroSlow.

Payment terms should be based on escrow payment. If required, milestones with escrow payments released on completion will be acceptable also. Considering how short this project is though, I would prefer escrow and release on completion.

PM me if you have any questions regarding this project or require clarification.

Those that PM me with a boilerplate bragging about projects that have no relation to this project whatsoever will be considered spam. Though examples of work with databases and data extraction and custom interfaces will be acceptable.

Looking forward to working with the successful freelancer.

Taidot: C-ohjelmointi, Perl, PHP, Ruby on Rails, Tietojen kaavinta verkosta

Näytä lisää: yahoo freelancer, www support freelancer, www i freelancer net, www freelancer support, www freelancer org in, www freelancer eg, www freelancer com search projects, www freelancer com main, www freelancer com eg, www dev net com, work model of freelancer com, working for freelancer com in canada, working for freelancer canada, working for first data, work collection, work as freelancer programmer, when is escrow com start, well being amazon, want to add in freelancer com, updates freelancer, t shirt categories, the last freelancer, the international freelancer, the freelancer core, terms used in freelancer

Tietoa työnantajasta:
( 11 arvostelua ) Toronto, Canada

Projektin tunnus: #499386

4 freelanceria on tarjonnut keskimäärin %project_bid_stats_avg_sub_26% %project_currencyDetails_sign_sub_27% tähän työhön


Very much experience in Data Crawler & Scraping Scripts. For more details, kindly check PM.

$750 USD 10 päivässä
(248 arvostelua)

Check your PMB for details!!!

$750 USD 12 päivässä
(61 arvostelua)

I can do this work. Thanks, Suresh

$600 USD 10 päivässä
(533 arvostelua)

Dear Sir, Pls check your PM

$600 USD 15 päivässä
(0 arvostelua)