Recipe Crawler

I need a crawler that runs on linux, is easy to install on multiple computers if needed and crawls through a list of different recipe sites i provide, it should have a number of features.

1. Download the page, including any images (recipe pictures etc) and store them in a folder with the folder name specified in the recipe database,

2. Process the downloaded page, put ingredients in a database field, then description in another field, and other information in another field.

Should be like this, recipe id 1, ingredient linked to recipe id 1, amount, quantity etc, have a look at phprecipebook, i want to mirror that structure in terms of processing the data and storing it in a mysql database, but also having another few fields for source name, source url, image url etc that sort of information.

3. Should be able to store quantities as well if that is within a textbox as some sites do,

4. should only record recipes, i want to build a database of millions of recipes so this would essentially be a giant google style crawler (but for only recipes)

5. It should be able to be speed limited, but also work in round robin fashion, so instead of overloading one site running quickly crawling, i should be able to have a list of base domain's and under those domain's url's, and the crawler should start on one url within one domain, then go to the next domain and leave the first, then the third domain, so it is getting lots of information very quickly but from different domains if that makes sense.

Should be semi template based so its easy to add new recipe sites, and modify what information is recorded if the layout of the site changes.

6. should be able to crawl recipe sites directly, or work through numerious proxy sites if my ip gets blocked, and if it crawls through recipe sites it should also be able to record the source url of the page being downloaded, without the proxy url, so say it goes through [url removed, login to view] it should record source as [url removed, login to view]

Thats what i mean, I will provide a big list of recipe sites i want the system to crawl, and i want it to extract all information, including ingredients (one by one in database) description, images, categories, related recipes, any other descriptions about recipes like starter, desert, etc, or gluten free etc.

All information other than images should be stored in mysql database, images stored in a folder and referenced within the database, can use open source crawlers or tools but needs to be easy to run, easy to add new recipe sites to crawl, and run on linux. (maybe even php is an idea? up to you)

Taidot: Linux, SQL, Tietojen kaavinta verkosta, Windows Desktop

Näytä lisää: recipe crawler, crawling recipes, crawler recipes, recipe database crawl, record data structure, mean data structure, well referenced, template html free download, round name, process data structure, open source sql, mysql database free download, free html download, free download html template, fashion site template, data structure sort, data structure linked list, big lots store, big lots, linked list data structure, crawlers recipe, crawler recipe, mysql recipe database ingredients, google crawler mysql, recipe data crawler

About the Employer:
( 33 reviews ) Beaconsfield Upper, Australia

Projektin tunnus: #485843

Myönnetty käyttäjälle:


Please view my pm.

425 $ USD 5 päivässä
(0 arvostelua)

9 freelanceria on tarjonnut keskimäärin 508 $ tähän työhön


We can help in your project, we have extensive experience in related scrapping projects.

750 $ USD 7 päivässä
(41 arvostelua)

I can do it in bash using wget sgrep etc

500 $ USD 7 päivässä
(39 arvostelua)

Kindly check PM for more details

400 $ USD 15 päivässä
(36 arvostelua)

I have completely understood yur requirements. i will give u a jva based program so that it can run on both platforms.. regards umer

750 $ USD 20 päivässä
(14 arvostelua)

Hello, please refer your PMB. Thank you.

300 $ USD 7 päivässä
(15 arvostelua)

Hi,please check PM.

500 $ USD 7 päivässä
(2 arvostelua)


500 $ USD 10 päivässä
(0 arvostelua)

Please see PMB.

450 $ USD 5 päivässä
(0 arvostelua)