Skip to main content

Posts

Showing posts with the label Decision Tree Classifier

Genetic Variant Classifier : Random Forest Beats Deep Architecture (By a HUGE MARGIN)

Hello Readers! Welcome to yet another value prediction work! Today, we will be looking at the in-demand dataset , namely Genetic Variant Classifications . We will look at this dataset and go for it's primary objective, that is classification of  the two lab reports and determining whether they both conflict or not. The Kernel you may want to look at for more information : Conflicting result classifications As usual, we will be looking at the dataset with the aim of EDA , Feature Engineering and Predictions Exploratory Data Analysis One would like to see what are the Chromosomes vs Class distribution of this data. For that, you can simply use :  As you can observe in the graph given below, the dataset happens to be heavily biased towards the  non- conflicting  genes and that too with the  CHROM == 2  standing out as the clear bias winner. Since the incidents where the genes are recorded to be  conflict...

Kaggle Dataset Analysis : Is your Avocado organic or not?

Hey readers! Today, allow me to present you yet another dataset analysis of a rather gluttony topic, namely Avocado price analysis. This Data set  represents the historical data on avocado prices and sales volume in multiple US markets. Our prime objectives will be to visualize the dataset, pre-process it and ultimately test multiple sklearn classifiers to checkout which one gives us the best confidence and accuracy for our Avocado's Organic assurance! Note : I'd like to extend the kernel contribution to Shivam Negi . All this code belongs to him. Data Visualization This script must procure the following jointplot  While a similar joint plot can be drawn for conluding the linearly exponent relations between extra large bags and the small ones. Pre Processing The following script has been used for pre processing the input data. Model Definition and Comparisons We will be looking mostly at three different models, namely ra...

Predicting Cost of Tender with 99.24% Accuracy : Miracle!

Data Science is reaching new levels and so are the models. But reaching a whooping 99.24% accuracy using simple feature engineering and a simple Decision Tree Classifier ? That's new! Hello everyone, today I am going to present you my model which can predict value range of a tender in Seattle Trade Permits with a whooping accuracy of 99.24 % (With some obvious caveats which I will discuss in the end). My Kernel : Yet Another Value Prediction The Prediction Kernel BASIC EDA This time out, I am going to use plotly library in Python. This is literally the best option for interactive plots and if you actually visit the kernel, you will understand why. First of all, we will focus on checking out the Top Grossing Contractors in the Seattle area who have earned the most out of the tender acquisitions. This will lead to this interactive graph: Similarly, one could plot out another graph for Amount earned per project. But another thing which caug...

Total Pageviews