BREAST CANCER DETECTION USING MACHINE LEARNING

In this blog, we are going to learn the following things

In this project, we will learn how to detect whether women have breast cancer or not by using machine learning
Uploading our data using ipywidgets
Data understanding and visualization
Plot various kinds of plots using seaborn, matplotlib , heatmap, and much more
Different machine learning techniques like RandomForestClassifier, K neighbors Classifier, SVC

First thing we need is data on which this machine learning can be done you download it from kaggle

Download Whole Project on Github

Various parameters is given in our dataset and their meaning

Attribute Information:

1) ID number

2) Diagnosis (M = malignant, B = benign)

3-32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)

b) texture (standard deviation of gray-scale values)

c) perimeter

d) area

e) smoothness (local variation in radius lengths)

f) compactness (perimeter^2 / area - 1.0)

g) concavity (severity of concave portions of the contour)

h) concave points (number of concave portions of the contour)

i) symmetry

j) fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three

largest values) of these features were computed for each image,

resulting in 30 features. For instance, field 3 is Mean Radius, field

13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Missing attribute values: none

Class distribution: 357 benign, 212 malignant

Since we are not doctor all you need to know is malignant means women have cancer and benign means she does not have

First thing we have to do is import important modules that we are going to use [we are going to call machine learning modules later onwards]

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

import pandas as pd

Upload Our data into Jupyter Notebook

import ipywidgets as widgets

widgets.IntSlider()

from IPython.display import display

w = widgets.IntSlider()

uploader = widgets.FileUpload(

accept='*.csv', # Accepted file extension e.g. '.txt', '.pdf', 'image/*', 'image/*,.pdf'

multiple=False # True to accept multiple files upload else False

)

display(uploader)

we will see an uploading button once we run this code

Show our data

import io

import pandas as pd

input_file = list(uploader.value.values())[0]

content = input_file['content']

content = io.StringIO(content.decode('utf-8'))

df = pd.read_csv(content)

df.head()

The above Line will show our data in Jupyter notebook like all the paramter and value it contains

Note : We have store our data into a variable called df but you can change the variable name as per your requirement

After that we will type df.describe it will show how many value contain each parameter ,mean,median, standard deviation , 25 % ,75%, 100% etc . It tells us how cluster our data is

df.shape #it tells how many rows and columns are present in our data in our case it is 569,33

df.info() #this tells our data types ,no. of non null counts , space occupy by our data , no.of rows and colums

df.isnull().sum() # this line will sum up all the null values in our data that present Note: null value in this case are useless and we need to remove it

After running last command you observe that Unnamed: 32 contains null value so we will drop this column

df =df.drop(columns='Unnamed: 32') # this will drop this column

Data correlation (In simple how does one parameter effect the other parameter) . We will use heatmap to color visualise

corr =df.corr()

plt.figure(figsize=(20,10))

sns.heatmap(corr, annot=True)

Now as we know that our Diagonsis contain two variable M = malignant, B = benign . But machine learing algorithm does not understand work like M and B so we have to convert it into integer so that we can train the model [parameter and goal of this machine learning blog already define in the beginning]

# one hot encoding

df = pd.get_dummies(data=df, drop_first=True) # this will convert it into 1 and 0 form which can be easily use for machine learning algorithm

JUST IN IF YOU WANT TO PLOT MORE GRAPH AND VISUALISE JUST WRITE THIS SIMPLE COMMAND IT WILL DO ALL YOUR WORK AND ALSO CREATE REPORT FOR YOU [MY FAVOURITE COMMAND]

from pandas_profiling import ProfileReport

df.profile_report()

We can also do pairpot in our jupyter notebook by using this simple command

sns.pairplot(df)

We have done enough Visualisation now its show time to do our machine learning

This is a supervised machine learning classification project so that you need to keep in mind

First we have to create our data which contains our data

x = df.iloc[:,1:-1].values # this will all rows and columns except diagonsis one

y = df.iloc[:,-1].values # this will contain all rows and column of Diagnosis

Splitting dataset into training data and testing data . Notice the size define is 0.2 it means 20% of our all data will use as testing data , random_State =0 whenever we run our data we will get same results .x_trian (contains data that is going to use train our machine learning [variable]) ,y_train(contains data that is going to use train our machine learning [Results of x_train variable]) , x_test (contain variable of our dataset)
y_test (will get this value after training our data we will use this to test how accurate our model is)

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=0)

First Machine Learning Model we are going to use RandomForestClassifier

from sklearn.ensemble import RandomForestClassifier

rfc=RandomForestClassifier()

rfc.fit(x_train,y_train)

Here what we have done is we have store our randomForestClasifier into a variable called rfc and rfc.fit(x_train ,y_train) basically here all our magic of machine learning is done here.

After training we have to see how good our model is

y_pred=rfc.predict(x_test)

from sklearn.metrics import classification_report,confusion_matrix,accuracy_score,mean_squared_error

print(classification_report(y_test,y_pred))

print(confusion_matrix(y_test,y_pred))

print("Training Score: ",rfc.score(x_train,y_train)*100)

precision recall f1-score support

0 0.98 0.97 0.98 67

1 0.96 0.98 0.97 47

accuracy 0.97 114

macro avg 0.97 0.97 0.97 114

weighted avg 0.97 0.97 0.97 114

[[65 2]

[ 1 46]]

Training Score: 100.0

We can see different parameter like accuracy , precision ,confusion matrix ,accuracy of our model

Machine Learning With KNeighborsClassifier

from sklearn.neighbors import KNeighborsClassifier

knn=KNeighborsClassifier(n_neighbors=7)

knn.fit(x_train,y_train)

y_pred=knn.predict(x_test)

from sklearn.metrics import classification_report,confusion_matrix,accuracy_score,mean_squared_error,r2_score

print(classification_report(y_test,y_pred))

print(confusion_matrix(y_test,y_pred))

print("Training Score: ",knn.score(x_train,y_train)*100)

print(knn.score(x_test,y_test))

precision recall f1-score support

0 0.96 0.96 0.96 67

1 0.94 0.94 0.94 47

accuracy 0.95 114

macro avg 0.95 0.95 0.95 114

weighted avg 0.95 0.95 0.95 114

[[64 3]

[ 3 44]]

Training Score: 93.4065934065934

0.9473684210526315

For Your Project Related Query contact me on facebook

Engineer Know

BREAST CANCER DETECTION USING MACHINE LEARNING

BREAST CANCER DETECTION USING MACHINE LEARNING

Upload Our data into Jupyter Notebook

First Machine Learning Model we are going to use RandomForestClassifier

Machine Learning With KNeighborsClassifier

Posted by Engineer Know

Post a Comment

0 Comments

Disclaimer

Search This Blog

Blog archive

About Me

Popular Posts

Best Laptop For Solidworks , Maltab , Catia , Ansys , 3D design and animations in 2021

Pelton Wheel explanation with velocity triangles and work done calculation

Facebook

Engineer Know

BREAST CANCER DETECTION USING MACHINE LEARNING

BREAST CANCER DETECTION USING MACHINE LEARNING

Upload Our data into Jupyter Notebook

First Machine Learning Model we are going to use RandomForestClassifier

Machine Learning With KNeighborsClassifier

Posted by Engineer Know

You may like these posts

Post a Comment

0 Comments

Disclaimer

Search This Blog

Blog archive

About Me

Social Plugin

Popular Posts

Best Laptop For Solidworks , Maltab , Catia , Ansys , 3D design and animations in 2021

Pelton Wheel explanation with velocity triangles and work done calculation

Facebook