This project consist in developing a program/script to make scraping of this webpage: [url removed, login to view]
It's necesary to obtain all company profiles information from all pages of this website in a CSV dataset.
The result of this project is the program and the CSV with all profiles information.
You can see the fields to extract in the image attached (see the name fields below too).
Also you can see an example of the extraction of information from that profile in the CSV attached.
You must search profiles by clicking in the listboxes of Sector, Provincia (province) and Municipio (municipality) fields.
The number of company profiles by province are:
province Nº of profiles
The CSV must have:
- Separator: pipe -> '|'
- Codification: latin1 8859-1
- Enclosure (of the strings fields): "field_string"
- Trim all the fields
- Fields (all fields of a company profile):
f1 : sector (only 4 posible values: "AGROALIMENTARIO", "CONSUMO", "INDUSTRIA","SERVICIO")
f2 : provincia (only 8 posible values: "ALMERIA", "CADIZ", "CORDOBA","GRANADA","HUELVA","JAEN","MALAGA","SEVILLA")
f3 : municipio (number of municipalities values depending of the province)
f4 : razon_social
f5 : nombre_empresa
f6 : direccion
f7 : telefono
f7b : telefono2 (if there were additional phone number must be add with the suffix "2", "3", etc)
f8 : fax
f8b : fax2 (if there were additional fax number then must be add with the suffix "2", "3", etc)
f9 : actividad (if there were multiple values then must be separeted by ", ")
f10 : productos (if there were multiple values then must be separeted by ", ")
f11 : codigo_postal
f12 : correo_electronico
f12b: correo_electronico2 (if there were additional email address then must be add with the suffix "2", "3", etc)
f13 : web
f14 : marcas (if there were multiple values then must be separeted by ", ")
Aditional rules to fields:
- If there were more fields in any profile then must be added the new field in the CSV.
- If there were any field with null value like "-" then must be empty ""
- For "telefono", "telefono2", "fax", "fax2" fields must be a numeric value without any spaces " "
Finally, to make the payment of the project you must generate an invoice with my company information that I will give it to you.
PD: The correct sample CSV is attached with the name "company_profile_sample_new"
18 freelanceria on tarjonnut keskimäärin 173 € tähän työhön
Expert in Web Scraping and have 2 years of experience in this [url removed, login to view] done similar kinds of works like this.I had completed 160 projects [url removed, login to view] its an easy task for me to do.
Hello Greetings Yes I can scrap all data from given website. **** I will start asap **** I am open for free sample **** I have completed similar projects in past