# Introduction

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. It refers to a set of related ordination techniques used in information visualization, in particular, to display the information contained in a distance matrix. An MDS algorithm aims to place each object in N-dimensional space such that the between-object distances are preserved as well as possible. Each object is then assigned coordinates in each of the N dimensions. The number of dimensions of an MDS plot N can exceed 2 and is specified a priori. Choosing N=2 optimizes the object locations for a two-dimensional scatterplot. Types and details about MDS on Wikipedia.

Suppose we have a matrix below and we want to see clusters of rows in a 2D scatterplot. MDS scatterplot feasible to clustering and to check clusters as well.

```
A <- matrix(c(2,5,2,1,0,0,0,0,1,0,0,0,0,1,3,5,6,0,0,1,0,0,0,2,0,0,1,2,7,2,4,6,2,5,1,0,0,1,0,0,0,1,0,0,3,5,4,0,0,1,0,0,1,0,0,2,0,3,5,7,3,1,4,0,1,0,0,0,0,2,0,0,0,1,3,4,6,0,0,1), byrow=T, nrow=8, ncol=10)
colnames(A) <- letters[1:10]
rownames(A) <- LETTERS[1:8]
print(A)
```

## a b c d e f g h i j ## A 2 5 2 1 0 0 0 0 1 0 ## B 0 0 0 1 3 5 6 0 0 1 ## C 0 0 0 2 0 0 1 2 7 2 ## D 4 6 2 5 1 0 0 1 0 0 ## E 0 1 0 0 3 5 4 0 0 1 ## F 0 0 1 0 0 2 0 3 5 7 ## G 3 1 4 0 1 0 0 0 0 2 ## H 0 0 0 1 3 4 6 0 0 1

# Distance matrix

Every types of MDS require a distance matrix as main argument to work with. Fortunately R provides a lot of possibility to create distance matrix and there are a lot of type of distances so lets play a bit with them. In this post jaccard, cosine, euclidean distance matrices will be used

```
#weighted jaccard similarity matrix setup
sim.jac <- matrix(0, nrow=nrow(A), ncol=nrow(A))
rownames(sim.jac) <- rownames(A)
colnames(sim.jac) <- rownames(A)
#weighted jaccard function
pairs <- t(combn(1:nrow(A), 2))
for (i in 1:nrow(pairs)){
num <- sum(sapply(1:ncol(A), function(x)(min(A[pairs[i,1],x],A[pairs[i,2],x]))))
den <- sum(sapply(1:ncol(A), function(x)(max(A[pairs[i,1],x],A[pairs[i,2],x]))))
sim.jac[pairs[i,1],pairs[i,2]] <- num/den
sim.jac[pairs[i,2],pairs[i,1]] <- num/den
}
sim.jac[which(is.na(sim.jac))] <- 0
diag(sim.jac) <- 1
#weighted jaccard distance
dist.jac <- 1-sim.jac
```

```
#weighted cosine distance
library(SnowballC); library(lsa)
dist.cos <- cosine(t(A))
diag(dist.cos) <- 0
```

```
#weighted euclidean distance
dist.eu <- as.matrix(dist(A, method = "euclidean"))
```

# MDS functions

There are a really good collection of metric MDS functions on r-blogger by Gaston Sanchez.

Lets try some of them.

## Classical MDS with cmdscale() in {stat} package

```
mds.jac <- as.data.frame(cmdscale(dist.jac))
mds.jac$names <- rownames(mds.jac)
mds.cos <- as.data.frame(cmdscale(dist.cos))
mds.cos$names <- rownames(mds.cos)
mds.eu <- as.data.frame(cmdscale(dist.eu))
mds.eu$names <- rownames(mds.eu)
```

```
library(ggplot2)
library(gridExtra)
plot1 <- ggplot(mds.jac, aes(V1, V2, label=names)) +
geom_point(colour="blue", size=2) +
geom_text(colour="blue", check_overlap = TRUE, size=2.5,
hjust = "center", vjust = "bottom", nudge_x = 0, nudge_y = 0.025) +
labs(x="", y="", title="MDS by Jaccard") + theme_bw()
plot2 <- ggplot(mds.cos, aes(V1, V2, label=names)) +
geom_point(colour="red", size=2) +
geom_text(colour="red", check_overlap = TRUE, size=2.5,
hjust = "left", vjust = "bottom", nudge_x = 0.02, nudge_y = 0) +
labs(x="", y="", title="MDS by cosine") + theme_bw()
plot3 <- ggplot(mds.eu, aes(V1, V2, label=names)) +
geom_point(colour="green", size=2) +
geom_text(colour="green", check_overlap = TRUE, size=2.5,
hjust = "center", vjust = "bottom", nudge_x = 0, nudge_y = 0.4) +
labs(x="", y="", title="MDS by Euclidean") + theme_bw()
grid.arrange(plot1, plot2, plot3, ncol=3)
```

Not so miraculously the MDS by Euclidean distance similar to the first two eigen vector of PCA of original matrix, just multiplied by (-1).

```
prcomp(A)$x[,1:2]
```

## PC1 PC2 ## A -3.872709 -2.7896109 ## B 5.863396 -0.9031562 ## C -2.397281 4.9975096 ## D -4.977864 -5.0032304 ## E 4.491605 -0.9683761 ## F -1.827234 6.7701307 ## G -2.643388 -1.1950046 ## H 5.363475 -0.9082621

```
mds.eu
```

## V1 V2 names ## A 3.872709 2.7896109 A ## B -5.863396 0.9031562 B ## C 2.397281 -4.9975096 C ## D 4.977864 5.0032304 D ## E -4.491605 0.9683761 E ## F 1.827234 -6.7701307 F ## G 2.643388 1.1950046 G ## H -5.363475 0.9082621 H

## Non metric MDS with isoMDS() in {MASS} package

Nonmetric MDS is performed by the isoMDS() in {MASS} package.

```
library(MASS)
iso.mds.jac <- isoMDS(dist.jac, k=2)
```

```
longData <- iso.mds.jac$points
longData <- as.data.frame(longData)
longData$names <- rownames(iso.mds.jac$points)
plot1 <- ggplot(longData, aes(V1, V2, label=names)) +
geom_point(colour="red", size=2) +
geom_text(colour="red", check_overlap = TRUE, size=2.5,
hjust = "left", vjust = "bottom", nudge_x = 0.02, nudge_y = 0) +
labs(x="", y="", title="MDS by isoMDS()") + theme_bw()
plot2 <- ggplot(mds.jac, aes(V1, V2, label=names)) +
geom_point(colour="blue", size=2) +
geom_text(colour="blue", check_overlap = TRUE, size=2.5,
hjust = "center", vjust = "bottom", nudge_x = 0, nudge_y = 0.01) +
labs(x="", y="", title="MDS by cmdscale()") + theme_bw()
grid.arrange(plot1, plot2, ncol=2)
```

It seems that tere are just 3 data points on left figure but there are all. If you see data points you can recognize overlapping points: A and D,G; B and E,H; C and F that are very close with the metric type cmdscale() function as well.

```
iso.mds.jac$points
```

## [,1] [,2] ## A 0.4313749 -0.32012400 ## B -0.5286107 -0.09526603 ## C 0.1460758 0.62326014 ## D 0.4311805 -0.32029813 ## E -0.5284727 -0.09541529 ## F 0.1459975 0.62339164 ## G 0.4310571 -0.32016603 ## H -0.5286023 -0.09538230

## Individual difference scaling

### With indscal() function in {SensoMineR} package

It is a special case of MDS family which require Napping data type, i.e products are positioned on a tableclothe by panelist. Their coordinates are used as input.

See details on rdocumentation.org

### With smacof algorithm (MDS by majorization)

smacofRect() function not require distance matrix it can work with any data frame. Matrix or dataframe of preferences, ratings, dissimilarities. There is a good example on r-blogger.

smacofSym() function is the simplest function in the package, and mds() function is the same. In this post I use tis one.

There is theoritical background of smacof on JSS.

Here we compere smacof algorithm with jaccard distance matrix.

```
library(smacof)
mds.smacof <- smacofSym(dist.jac)
```

```
plotdata <- as.data.frame(mds.smacof$conf)
plotdata$names <- rownames(mds.smacof$conf)
plot1 <- ggplot(plotdata, aes(D1, D2, label=names)) +
geom_point(colour="red", size=2) +
geom_text(colour="red", check_overlap = TRUE, size=2.5,
hjust = "left", vjust = "bottom", nudge_x = 0.02, nudge_y = 0) +
labs(x="", y="", title="MDS by smacofSym()") + theme_bw()
plot2 <- ggplot(mds.jac, aes(V1, V2, label=names)) +
geom_point(colour="blue", size=2) +
geom_text(colour="blue", check_overlap = TRUE, size=2.5,
hjust = "center", vjust = "top", nudge_x = 0, nudge_y = -0.01) +
labs(x="", y="", title="MDS by cmdscale()") + theme_bw()
grid.arrange(plot1, plot2, ncol=2)
```

Be happyR! đź™‚