1. Import the dataset from file .csv or .txt into Data frame
>(dataframe <- read.csv("F:\\mydata\\data.csv"))
Above command imports the dataset into data frame, which can be used for clustering or classification etc.
2. Ignore the first column if it contains the headings
>s0 <- shipment1[-1,]
or while reading the file, we can choose an option to choose header or not by giving header option to true or false
>read.csv(file, header = TRUE, sep = ",", quote = "\"", dec = ".", fill = TRUE, comment.char = "", ...)
3. Convert all the character data types into numeric data
>library( taRifx )
>s00
<- japply( s0, which(sapply(s0, class)=="character"), as.numeric )
4. Perform the K-Means clustering for K=2……n
>library(“cluster”)
>s0_cluster <- kmeans(s00,3,20)
Here, First argument is data frame, second if the number of clusters and third is the number of iterations.
5. Find the Silhouette Co-efficient
>sil <- silhouette(s0_cluster$cluster, dist(s000))
Here, S0_cluster$cluster is the cluster label, and second argument dist(s00) is the distance matrix for the dataset.
6. Plot the graph using the plot function
>plot(sil, main ="Silhouette plot -
K-means")
Below is the graph for the plot.
No comments:
Post a Comment