Data Preparation Preprocessing(Data Mining)

This project is about preprocessing and preparing data for future analysis.

Details are below.

In this project, some steps will be done using Weka, and the others will be done using Java.

The project will use a modified (red) wine dataset from UCI(search it on google to see the data), although the developed project should be able to work on any dataset involving numerical attributes plus a class attribute (which is the last attribute of the file). The modification is done by mapping tuples whose quality is <=5 to class 0 and mapping other tuples to class 1. All non-class attributes are numerical. The input file is a csv file. We will call the attributes as A1, A2, …, A11, in the left-to-right order given in the input file.

There are two tasks:

Task 1. Discretize all attributes as follows: Use the entropy based method to split each attribute into 2 intervals; in case Weka does not produce a split for a given attribute, use the equal-density method to split the attribute into 4 intervals. The resulting bins will be stored in a csv file called Bins.csv. This file contains rows of the form for each attribute Ai with 2 bins Ai, splitValue, countC0Bin1, countC1Bin1, countC0Bin2, countC1Bin2 This file contains a row with 12 fields (3 split values + 8 counts), for each attribute with 4 bins.

Task 2: Map the dataset into the itemized form using the bins produced in Task 1. This step will produce two files, namely [login to view URL] and ItemizedData.csv. The contents of these two files should follow the format given on [login to view URL] file attached.

There are two possible ways to do the project: 1) Using Weka and Java, providing a Java program producing all results specified above. 2) Using Java only, providing a Java program producing all results specified above.

For option 1, you should submit your Java program, and a jar file called P1.jar. The Java program should work assuming that Weka is installed on the machine. [login to view URL] when run should take the name of the dataset file as a parameter. You should assume that the input dataset is the same folder where the jar file is run.

For option 2, you should submit your Java program, and a jar file called P1.jar. [login to view URL] when run should take the name of the dataset file and a minIG value as parameters. The program will use information gain (IG) to split an attribute if the IG is >= minIG, and use equal-frequency

Only the requested results are required.

Taidot: tiedonlouhinta, tietojenkäsittely, Datatiede, Java, Machine Learning (ML)

Näytä lisää: online data entry based company, extracting data html based tags perl, web updating data entry data conversion based, data extraction data mining data grabbing freelancer work, data mysql based dropdown selection ajax, web site mining data internet, data mining data entry projects, data required process, data required website, phpbb post mining data, poker data mining data, receive data linux based server, data required data analysis steel fabrication, data mining data gathering, data entry based

Tietoa työnantajasta:
( 0 arvostelua ) Dayton, United States

Projektin tunnus: #23897114

10 freelanceria on tarjonnut keskimäärin 238$ tähän työhön


Hello, I can help you with your project - Data Preparation Preprocessing(Data Mining) I have gone through your job posting and become very much interested to work with you. I am an expert in this field. I have alrea Lisää

$220 USD 4 päivässä
(237 arvostelua)

Hi, Reading your project description about pre-processing and preparing data for future Analysis makes me feel like you have me in mind when creating the project. I have all it takes to deliver. Rather than fill this Lisää

$200 USD 3 päivässä
(33 arvostelua)

Hello sir. As a machine/deep learning expert, I'm glad to see your project. If you check my profile, you can see I have deep knowledge in machine/deep learning algorithms with machine/deep learning tools. I also have f Lisää

$300 USD 10 päivässä
(3 arvostelua)

Hello! I'd like to deliver preprocessed and quantized wine item properties. I'm familiar with theory of probabilities as well as computer science. I'll do the job blazingly fast. Please, give me a try!

$224 USD 1 päivässä
(5 arvostelua)

A Data Scientist with experience in SPSS, CALCULUS, Advanced Excel, R programming, R Shiny, R studio and anything related to data science and python Master in Engineering, Electrical and Electronic Engineer, who is dyn Lisää

$277 USD 2 päivässä
(1 arvostelu)

Greetings! Nice to hear from you. I am a machine learning expert can do this project for you. waiting for your positive response. Best Regards Thanks

$111 USD 15 päivässä
(2 arvostelua)

Hello I'm data science using Java & Weka . I made a lots of data science and machine learning projects using Java & Weka. your project i can do ,It's easy for me . let's talk on chat for more details.

$300 USD 5 päivässä
(1 arvostelu)

-------- Pro Java/ Data Processing/Machine Learning (ML)/ Data Mining/Data Science Expert! -------- Hi,dear It is very attracting my mind because I'm very interested in your project and also, have rich experiences and Lisää

$50 USD 7 päivässä
(1 arvostelu)

Hello, how are you? Let me help you on this project. I have skimmed your proposal and think I am skillful enough for this project. As far as I understood your requirements, you want to do "data mining project with java Lisää

$500 USD 11 päivässä
(0 arvostelua)

Sounds like you have a very well-defined project. Should be fairly easy to do. I spent several months doing deep learning research at multiple universities in the UK, including use java and weka for preprocessing data Lisää

$200 USD 3 päivässä
(0 arvostelua)