Tuesday 4 October 2016

Silhouette Coefficient for K-means cluster using R Programming

1.   Import the dataset from file .csv or .txt into Data frame


           >(dataframe <-  read.csv("F:\\mydata\\data.csv"))

            Above command imports the dataset into data frame, which can be used for clustering or classification etc.

2. Ignore the first column if it contains the headings

    >s0  <- shipment1[-1,] 
    or while reading the file, we can choose an option to choose header or not by giving header option to true or false

    >read.csv(file, header = TRUE, sep = ",", quote = "\"",
             dec = ".", fill = TRUE, comment.char = "", ...)

3. Convert all the character data types into numeric data

         >library( taRifx )
        >s00 <- japply( s0, which(sapply(s0, class)=="character"), as.numeric )

4. Perform the K-Means clustering for K=2……n

        >library(“cluster”)
       >s0_cluster <- kmeans(s00,3,20)
Here, First argument is data frame, second if the number of clusters and third is the number of iterations.

5. Find the Silhouette Co-efficient

       >sil <- silhouette(s0_cluster$cluster, dist(s000))
Here, S0_cluster$cluster is the cluster label, and second argument dist(s00) is the distance matrix for the dataset.

6. Plot the graph using the plot function
      
      >plot(sil, main ="Silhouette plot - K-means")
      Below is the graph for the plot.






       

    

No comments:

Post a Comment