KnowledgeForum: Silhouette Coefficient for K-means cluster using R Programming

1. Import the dataset from file .csv or .txt into Data frame

>(dataframe <- read.csv("F:\\mydata\\data.csv"))

Above command imports the dataset into data frame, which can be used for clustering or classification etc.

>s0 <- shipment1[-1,]

or while reading the file, we can choose an option to choose header or not by giving header option to true or false

>read.csv(file, header = TRUE, sep = ",", quote = "\"",
         dec = ".", fill = TRUE, comment.char = "", ...)

>library( taRifx )

>s00 <- japply( s0, which(sapply(s0, class)=="character"), as.numeric )

>library(“cluster”)

>s0_cluster <- kmeans(s00,3,20)

Here, First argument is data frame, second if the number of clusters and third is the number of iterations.

>sil <- silhouette(s0_cluster$cluster, dist(s000))

Here, S0_cluster$cluster is the cluster label, and second argument dist(s00) is the distance matrix for the dataset.

6. Plot the graph using the plot function

>plot(sil, main ="Silhouette plot - K-means")

Below is the graph for the plot.