# Introduction

The antecedent of this post is “MDS in theory” in this blog. Now continue this with an example which is the Groceries dataset in {arules} package.

# Load Groceries dataset

The Groceries dataset represents products that were bought together by buyers. There are 169 products and 9835 purchase transactions in the dataset. In this chapter, we want to cluster products with Jaccard similarity and hierarchical clustering and investigate results of clustering with MDS.

```
library(arules)
data(Groceries)
Groceries <- as(Groceries, "matrix")*1
```

# Hierarchical clustering on produts

```
library(ggdendro)
library(dplyr)
library(ggplot2)
gr.jac <- dissimilarity(t(Groceries), method = "jaccard") #jaccard dissimilarity of columns, works with just binary data
hc <- hclust(gr.jac, method="ward.D2")
cut <- as.data.frame(cutree(hc, k=7))
names(cut) <- "cut"
cut$names <- rownames(cut)
hcdata <- dendro_data(hc, type="triangle")
hcdata$labels <- left_join(hcdata$labels, cut, by=c("label"="names"))
```

```
ggplot(hcdata$segments) +
geom_segment(aes(x = x, y = y, xend = xend, yend = yend))+
geom_text(data = hcdata$labels, aes(x, y, label = label, colour=factor(cut)),
hjust = 1, size = 2.9) + scale_colour_discrete(name = "clusters") +
labs(x="", y="") + coord_flip() + ylim(-1, 2) + xlim(0,170) + theme_bw()
```

# MDS

## MDS with cmdscale()

```
mds.cmdscale <- as.data.frame(cmdscale(as.matrix(gr.jac)))
mds.cmdscale$names <- rownames(mds.cmdscale)
mds.cmdscale$cut <- cut$cut
```

```
ggplot(mds.cmdscale, aes(V1, V2, label=names)) +
geom_point(aes(colour=factor(cut)), size=2.3) +
geom_text(aes(colour=factor(cut)), check_overlap = TRUE, size=2.2,
hjust = "center", vjust = "bottom", nudge_x = 0, nudge_y = 0.005) +
scale_colour_discrete(name = "clusters") +
labs(x="", y="", title="MDS by Jaccard and cmdscale()") + theme_bw()
```

## MDS with smacofSym()

```
library(smacof)
mds.smacof <- smacofSym(as.matrix(gr.jac))
plotdata <- as.data.frame(mds.smacof$conf)
plotdata$names <- rownames(mds.smacof$conf)
plotdata$cut <- cut$cut
```

```
ggplot(plotdata, aes(D1, D2, label=names)) +
geom_point(aes(colour=factor(cut)), size=2.3) +
geom_text(aes(colour=factor(cut)), check_overlap = TRUE, size=2.2,
hjust = "center", vjust = "bottom", nudge_x = 0, nudge_y = 0.015) +
scale_colour_discrete(name = "clusters") +
labs(x="", y="", title="MDS by Jaccard and smacofSym()") + theme_bw()
```

## MDS with isoMDS()

```
library(MASS)
mds.isomds <- isoMDS(as.matrix(gr.jac), k=2)
```

```
plotdata <- mds.isomds$points
plotdata <- as.data.frame(plotdata)
plotdata$names <- rownames(mds.isomds$points)
plotdata$cut <- cut$cut
```

```
ggplot(plotdata, aes(V1, V2, label=names)) +
geom_point(aes(colour=factor(cut)), size=2.3) +
geom_text(aes(colour=factor(cut)), check_overlap = TRUE, size=2.2,
hjust = "center", vjust = "bottom", nudge_x = 0, nudge_y = 0.05) +
scale_colour_discrete(name = "clusters") +
labs(x="", y="", title="MDS by Jaccard and isoMDS()") + theme_bw()
```

There are some indiscriminate points on the plot but in total view good clusters can get with this method. Of course clustering method can be improved for better clusters. There is some point of improvement:

- similarity/dissimilarity method (here was Jaccard)
- clustering method (here was hierarhical clustering with ward.D method)
- MDS (here was PCA like MDS with cmdscale() function, then with smacofSmy() function, and then with isoMDS() function)

Be happyR! 🙂

Advertisements