Getting started¶
idendrogram integrates seamlessly with outputs from your favorite hierarchical clustering library, with one-line drop-ins for SciPy, HDBScan and Scikit-learn agglomerative clustering, while dendrograms can be visualized in Plotly, Altair and Matplotlib (limited support). Switching visualization frontends is as simple as passing an argument to the plot()
function.
Basic usage¶
import idendrogram
import scipy.cluster.hierarchy as sch
from idendrogram.targets.altair import to_altair
#cluster the data
linkage_matrix = sch.linkage(
data['data'], method='single', metric='euclidean'
)
threshold = 0.8
flat_clusters = sch.fcluster(
linkage_matrix, t=threshold, criterion='distance'
)
#wrap clustering outputs / parameters into a container
cl_data = idendrogram.ClusteringData(
linkage_matrix = linkage_matrix,
cluster_assignments = flat_clusters
)
#pass to idendrogram and visualize
idd = idendrogram.idendrogram()
idd.set_cluster_info(cl_data)
dendrogram = idd.create_dendrogram(truncate_mode='level', p=10)
to_altair(dendrogram=dendrogram, height=200, width=629)
Clustering library integration¶
scipy.hierarchy.cluster¶
idendrogram is built to support SciPy's hierarchical clustering data structures (linkage matrix and flat cluster assignments). As a result, using idendrogram is as simple as passing outputs of scipy.cluster.hierarchy.linkage
and scipy.cluster.hierarchy.fcluster
functions.
import os, sys
sys.path.insert(1, os.path.join(sys.path[0], '../..'))
import altair
altair.renderers.set_embed_options(actions=False)
pass
import scipy.cluster.hierarchy as sch
from sklearn.datasets import load_iris
import idendrogram
# do the usual scipy hierarchical clustering
data = load_iris(as_frame=True)
linkage_matrix = sch.linkage(
data['data'], method='single', metric='euclidean'
)
flat_clusters = sch.fcluster(
linkage_matrix, t=0.8, criterion='distance'
)
#pass it to idendrogram and visualize
cl_data = idendrogram.ClusteringData(
linkage_matrix = linkage_matrix,
cluster_assignments = flat_clusters
)
idd = idendrogram.idendrogram()
idd.set_cluster_info(cl_data)
idd.create_dendrogram().plot(
backend='altair',
height=200, width=629,
)
Using previously created SciPy's dendrogram objects¶
In some situations, you may have a dendrogram object created by SciPy that you want to visualize using idendrogram. That's possible, too.
## create a scipy dendrogram object
D = sch.dendrogram(
linkage_matrix,
p=4, truncate_mode="level",
no_plot=True
)
If you have just the dendrogram object (and not the underlying linkage matrix), you cannot compute/plot the nodes in the dendrogram, but you can still use the available backends.
## pass it to idendrogram and visualize
idd = idendrogram.idendrogram()
idd.convert_scipy_dendrogram(D, compute_nodes=False).plot(
backend='altair',
height=200, width=629,
show_nodes=False
)
Note that not all customization functionality is available when using SciPy's dendrogram objects. In most cases, it is recommended that you generate the dendrogram with idendrogram itself.
sklearn.cluster.AgglomerativeClustering¶
To use scikit-learn agglomerative clustering outputs, wrap the fit object with idendrogram.ScikitLearnClusteringData
before passing it to idendrogram.
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import load_iris
import idendrogram
data = load_iris(as_frame=True)
# do the usual scikit-learn hierarchical clustering
model = AgglomerativeClustering(
distance_threshold=0.8,
linkage='single',
n_clusters=None
).fit(data['data'])
#pass it to idendrogram and visualize
idd = idendrogram.idendrogram()
idd.set_cluster_info(idendrogram.ScikitLearnClusteringData(model))
idd.create_dendrogram().plot(
backend='altair',
height=200, width=629
)
HDBSCAN¶
idendrogram can visualize HDBSCAN clustering results, too. Wrap the HDBSCAN model object with idendrogram.HDBSCANClusteringData
before passing it as clustering information. The model object is available via cluster_data.get_model()
function in all callback functions (see case studies for ideas on how it can be leveraged).
import hdbscan
clusterer = hdbscan.HDBSCAN()
clusterer.fit(data['data'])
#pass it to idendrogram and visualize
idd = idendrogram.idendrogram()
idd.set_cluster_info(idendrogram.HDBSCANClusteringData(clusterer))
idd.create_dendrogram().plot(
backend='altair',
height=200, width=629
)
Visualization backend support¶
idendrogram can visualize dendrograms in Plotly, Altair and Matplotlib (limited support). Switching visualization frontends is as simple as passing an argument to the plot()
function.
Alternatively, you can also use helper functions available at idendrogram.targets
(to_altair
, to_plotly
, to_matplotlib
, to_json
).
import altair
altair.renderers.set_embed_options(actions=False)
pass
import scipy.cluster.hierarchy as sch
from sklearn.datasets import load_iris
import idendrogram
from matplotlib import pyplot as plt
# do the usual scipy hierarchical clustering
data = load_iris(as_frame=True)
linkage_matrix = sch.linkage(
data['data'], method='single', metric='euclidean'
)
flat_clusters = sch.fcluster(
linkage_matrix, t=0.8, criterion='distance'
)
#pass it to idendrogram and visualize
cl_data = idendrogram.ClusteringData(
linkage_matrix = linkage_matrix,
cluster_assignments = flat_clusters
)
idd = idendrogram.idendrogram()
idd.set_cluster_info(cl_data)
dendrogram = idd.create_dendrogram()
Altair¶
dendrogram.plot(
backend='altair',
height=200, width=629,
)
Plotly¶
dendrogram.plot(
backend='plotly',
height=400, width=650,
)
Matplotlib¶
dendrogram.plot(
backend='matplotlib',
height=300, width=750,
show_nodes=True
)
plt.show()
Compatibility differences¶
idendrogram aims to produce identical outputs no matter what the visualization frontend is used (visualization library defaults not withstanding). However, this is not always possible. Some caveats include:
symlog
scale type is not supported by Plotly;ClusterLink.strokedash
property may need to be tweaked to achieve identically looking results among the libraries- Matplotlib functionality is limited to static charts; attempt is made to convert most size parameters to pixels (in line with Altair and Plotly), but some differences may remain.
In case idendrogram does not produce the required result out of the box, see Customizing other attributes section for guidance on how to make further customizations that fit your needs.