Predictive Analysis and Classification Project

₹6000-7000 INR

Suljettu

Julkaistu

2 kuukautta sitten

₹6000-7000 INR

Maksettu toimituksen yhteydessä

Overview of the Task: The original test sheets contain many data sets each with 49 numbers. Each data set is a column. Each of the data sets/columns has 7 out of 49 numbers selected as Process numbers. These are given in bold red. Now, the last column, the rightmost column, is the target data set for prediction. All other columns are data sets to be used for training the model. The project's ultimate objective is to predict the 7 process numbers of that last column/data set using Machine Learning models. We are using as many as 5 different types of ML models to predict these 7 pattern numbers from the target data set which is the last column of each test sheet. During this process of prediction, we have come across certain observations. We had to solve those observations and improve the prediction accuracy by overcoming those observations with methods or approaches to be developed by expert data scientists. This task named “Data analysis and classification” is for that objective. We have predicted the 7-process number of approximately 50 data sets using these 5 ML models at various test sizes. These prediction results are illustrated in the Excel workbook file named: “Comparison of prediction results of 50 data sets”. How to read and understand this Excel workbook is explained below: 1) The workbook has 50 sheets. The leftmost sheet is named 388 and it goes to 438 at the rightmost sheet. Out of these 50 sheets data is now filled up to 431, totalling to 44 data sets. Data of the remaining sheets shall be filled in due course as the data becomes available. 2) The numbers given as the sheet names are the numbers of the data sets. From 388, 438. Each of these numbers is also the name of the target data set, the rightmost column of each test sheet. 3) One data asset can have up to 6 to 7 test sheets. Named 388-1, 3881A, 3881B, 388-2 …. up to 388-5. Each test sheet has a varying number of data sets for training and one target data set. The number of data sets in each test sheet is stated in the Test sheet names. 4) A test sheet name starts with the number of the target variable (or target column) where we have to predict the 7 numbers. 5) Each of the 50 sheets of the workbook has a list of 9 numbers predicted by different ML models. The models used were RF - Random Forest Classifier, SVML - SVM Linear Classifier kernel, SVMR - SVM RBF Classifier kernel, SVMP – SVM poly classifier kernel and NB - Naive Bayes Classifier. 6) The actual 7 values or pattern numbers are given in the coloured cells in the top left of each sheet. Wherever these numbers have occurred in prediction results are also coloured with respective colours. 7) You may also notice something like - 388-1, 388-2, 388-3, 388-4, etc. These are different variations of test sheets of the dataset numbered 388 in each of these 5 to 7 various test sheets 388 is the target column. So, we make predictions using each of these test sheets of various sizes. ? Finally, we noticed getting better results by changing the test sizes during the test-train split. So, we have also tested each of the models in different test sizes - 0.2, 0.3, 0.4, 0.5, 0.6. These test size values are given in brackets against each test sheet name. 9) At the top left of each you can also notice 'Result type'. This describes a special data manipulation criterion. 'No column removal' - No columns are removed from the test sheet, 'Two column removal' - First two columns are removed from the test sheet, 'Four column removal' - First four columns the first four training data sets are removed from the test sheet etc. This resulted in increased prediction accuracy a little bit, so please be on the lookout for this variable. The Task: A. You have to first look through various predictions of each sheet, there are 150 predictions in each sheet, and count, list out/tabulate the facts available there such as: a) How many of the pattern numbers have occurred in each type of prediction? b) Which type of prediction has the highest number of correct pattern numbers? c) Which type of prediction has a consistent result? This means having a similar number of correct numbers repeatedly. d) Variations in Dataset: Explore the variations of the same dataset (e.g., 388-1, 388-2) and note any significant differences in prediction accuracy. e) Effect of Test Sizes: Investigate the impact of different test sizes (0.2, 0.3, 0.4, 0.5, 0.6) on prediction accuracy for each model. f) Influence of 'Result Type': Assess how different 'Result Types' affect the accuracy, especially whether column removal enhances or hinders the predictions. And so on…. All such observations/facts available there will help us determine which type of mode and at what test size value has the best performance. B. Analyse each test sheet in detail using various metrics used in data science to determine what are the characteristics of a test sheet or the target data set that gives the best prediction result. a) Prediction Accuracy: Calculate the overall accuracy of predictions for each test sheet. This involves assessing the ratio of correct predictions to the total number of predictions. b) Precision, Recall, and F1 Score: Break down the performance using precision, recall, and F1 score metrics. Precision measures the accuracy of positive predictions, recall assesses the ability to capture all positive instances, and F1 score combines both metrics. c) Feature Importance: If applicable, analyze the importance of features in the prediction. This is particularly relevant if certain columns or variables significantly influence the model's performance. You may use the SHAP graphs generated using interpretML to achieve this. d) Hyperparameter Tuning: Explore the impact of hyperparameter tuning on model performance. Assess how adjustments to parameters influence the predictive accuracy. C. Analyse each dataset (each data set is the same as each column and has 49 numbers) in detail using various metrics that can be derived from a data set without taking into account or considering the prediction results. a) Descriptive Statistics: Compute basic descriptive statistics such as mean, median, standard deviation, minimum, and maximum values. This provides an initial understanding of the central tendency and variability of the dataset. b) Data Distribution: Visualize the distribution of the dataset using histograms, box plots, or kernel density plots. This helps identify any skewness, outliers, or patterns within the data. The objective of this analysis and expected results: After this detailed study and analysis, we will get the following ability/knowledge I) Be able to classify or categorise the Test Sheets into categories or classes like: a) Most friendly with SVM linear with ----test size. b) Needs removal or addition of data set to get various metric values to satisfy getting better prediction results. c) …… d) ….. II) Be able to classify or categorise individual data sets into categories or classes like: a) Most friendly with SVM linear with ----test size. b) Needs removal or addition of data set to get various metric values to satisfy getting better prediction results. c) …. d) …… III) Be able to remove or add training data sets from a test sheet to get the highest possible number of correct predictions per different types of prediction models and test size. IV) Any other corrective actions to help us get high prediction accuracy Plan of Action In order to ensure the precise predictions of these models we have to compute a few metrics. These metrics generally depict the efficiency of the model. The list of these metrics is mentioned below along with details: - Accuracy: Proportion for correctly classified occurrences as defined in the pattern set. You have to compute the counts which are matching to the pattern sets and compute the proportions. Similarly, it will give us the error rate as well. We know the threshold and use it to interpret the results. Confusion Matrix: Accuracy alone is not enough to conclude the efficiency of the model. Conduct the in-depth analysis using underlying information. This matrix will give the True Negatives and True Positives. False Negative and False Positives. These measures will help to understand what are the variations and whether we can rely on a particular model or not. Sensitivity and Specificity: These measures will give us an overview of how many true positives (Predictions) are identified as pattern numbers. Similarly, how many numbers are identified as non-pattern numbers?

Statistical Analysis

Statistics

Machine Learning (ML)

Data Science

Predictive Analytics

Projektin tunnus (ID): 37846723

Tietoa projektista

6 ehdotukset

Etäprojekti

Aktiivinen kuukausi sitten

Haluatko ansaita rahaa?

Sähköpostiosoite

Freelancerin tarjouskilpailun edut

Aseta budjettisi ja aikataulu

Saa maksu työstäsi

Kuvaile ehdotustasi

Rekisteröinti ja töihin tarjoaminen on ilmaista

6 freelancerit tarjoavat keskimäärin ₹8 833 INR tätä projektia

@mapsbox

Hello, I am very familiar with the requirements of your projects, and I can complete them quickly. Let's connect over chat to discuss this further. Regards,

₹7 000 INR 7 päivässä

5,0

(9 arvostelua)

4,0

@TheDataAnalysts

Hello, I read through, and I am a Senior Statistician and I can help you with this project. Kindly text me over the chats we discuss further. Thank you Note: Bid Amount and timeline is just a placeholder; actual value would be provided after 1 on 1 discussion over chat.

₹15 000 INR 3 päivässä

5,0

(6 arvostelua)

4,2

@Bhartifreelancer

Hi there, I have checked your project, which requires Predictive Analysis and Classification Project. I’m a professional academic writer with 7 years’ experience penning different academic research, thesis, essay, and dissertation on various subjects. I am well skilled with numerous citation and referencing styles, including APA, MLA, HARVARD, CHICAGO and Turbain. Kindly send a message in the chat box so I can share some samples with you. Regards Bharti

₹8 000 INR 2 päivässä

4,3

(9 arvostelua)

3,5

@arunkm1309

Hi there! As a highly skilled Machine Learning and Data Science freelancer, I am excited to help you with your project. With intermediate-level expertise in predictive modeling, data analysis, and statistical modeling, I am confident that my skills and experience will add value to your team. My past work in developing predictive models for various industries, including healthcare and finance, has equipped me with the knowledge and expertise to deliver high-quality results for your project. I am committed to working collaboratively with you to understand your needs and deliver customized data-driven solutions. I am passionate about Machine Learning and Data Science, and I'm excited to put my skills and knowledge to work for you. Let's work together to achieve your project goals! Thank you for considering my proposal. Best regards, Arunkumar Maurya

₹6 000 INR 2 päivässä

0,0

(0 arvostelua)

0,0

@outmanHf

Hello! I'm Otmane, a data science freelancer. I'm excited to join your epilepsy-focused project, analyzing deep learning model predictions for pattern numbers. With expertise in statistics and machine learning, I'll provide actionable insights to enhance prediction accuracy. Let's collaborate for success! Best regards, Otmane

₹9 500 INR 7 päivässä

0,0

(0 arvostelua)

0,0

@NormanDaniel711

Hi, I have one year of experience in data science and analytics, and I graduated with a master's degree. I am ready for work at any time. I have conducted numerous analyses, and my expertise includes Machine Learning, Time Series Analysis, Data Visualisation, and programming language (Python and R).

₹7 500 INR 10 päivässä