Data Cleaning of a scraped structured data

We're building an automated structured data product that will scrape product/service features (in text form) from any website or webpage.

Because we're scraping the text written on the webpages, our results will sometimes bring us "faulty" inputs (in our case, these are inputs unrelated to the product/services text, such as special characters with HTML tags, footer text, cookies prompt text and many other irrelevant inputs), we need data cleaning methods built to fix this challenge.

For this project, you will work with 1 large dataset, as we will later apply the data cleaning methods from this large dataset to our automated scraping.

The tasks for this project include:

Reviewing our dataset manually for faulty inputs according to our already established instructions

Building a list of blacklist keywords (where we reject the faulty input)

Building functions/methods to identify and discard faulty inputs that don't involve blacklisted keywords

- I want follows;

List of blacklisted keywords (where we reject the faulty input)

Building functions/methods to identify and discard faulty inputs that don't involve blacklisted keywords

Taidot: Datatiede, Data Analysis, JavaScript

Näytä lisää: market research data cleaning guidelines, data cleaning tools market survey, data cleaning project, data cleaning excel, excel project data cleaning, data cleaning excel macro, parsing semi structured data, crawler structured data, data cleaning using vba, data cleaning outsource reviews, scraped rss data, scrape service passport protected website, data cleaning techniques in data warehouse, data cleaning steps in data mining, data cleaning techniques in data mining, data cleaning tools in data mining, data cleaning steps in data science, we do not detect any structured data on your site., data cleaning in big data

Tietoa työnantajasta:
( 0 arvostelua ) Bucharest, Romania

Projektin tunnus: #30232963

13 freelanceria on tarjonnut keskimäärin $148 tähän työhön

(53 arvostelua)

Hi I am a very experienced statistician, data scientist and academic writer. I have completed several PhD level thesis projects involving advanced statistical analysis of data. I have worked with data from several comp Lisää

$250 USD 7 päivässä
(19 arvostelua)

Hello, I am ready here to start the work. I assure you that I will provide you best quality of work within required timeframe & budget. Let's get started the work with me ! Regards, Shalu

$159 USD 4 päivässä
(6 arvostelua)

You are very lucky!!! I have rich experience in scraping. so if you want to see my previous project, I can share. I can finish your project for shortly time without any fail. **if you hire me, I will be your genie wh Lisää

$200 USD 7 päivässä
(3 arvostelua)

Hi, I have a big experience on web scraping and Data cleaning also I am a master's degree in data science. You can see my reviews to prove to you that I worked well on scraping projects. Your project is a challenge fo Lisää

$30 USD 1 päivässä
(2 arvostelua)

Hi , your project is intresting , contact me , thank you !

$140 USD 7 päivässä
(1 arvostelu)

Hi, I have 5+ experience on Python Developing and I have experience of Building Management, Distributed, Database Applications. with Machine Learning, Ensemble Learning, Deep Learning implementations . Expertise in Cla Lisää

$140 USD 3 päivässä
(1 arvostelu)

Hi, I am a python expert with numerous python certifications I am an experienced data scientist and Systems developer with qualification from IBM and Stanford University. Further, my education background as a qualifi Lisää

$110 USD 7 päivässä
(0 arvostelua)

This is Mohamed, a Python programmer and data analyst who is specialized in web scraping I have +1 years of experience in programming with Python , web scraping.,data analysis and data wrangling I will use python scri Lisää

$125 USD 7 päivässä
(0 arvostelua)

Junior Data Scientist looking forward to train my skills of data cleaning, data wrangling and tidying.

$30 USD 3 päivässä
(0 arvostelua)

I am a experienced data scientist from last 5 years and working different organizations. I worked on text data, image data, csb files, databases so all kind of data sources

$140 USD 7 päivässä
(0 arvostelua)

I am a data analyst and an NLP researcher. I worked on many areas related to textual data and features extraction. Hence, I will handle your task effectively and efficiently.

$278 USD 7 päivässä
(0 arvostelua)

Hello My name is Hazem Mohammed. I am a professional data analyst and a python developer. I can see that you have two specific tasks that need to be solved. I encourage you to check my portfolio because I have done tw Lisää

$100 USD 4 päivässä
(0 arvostelua)