Interactive circle packing plots

Introduction

I was looking for something suited for visualizing hierarchical categorical data that goes beyond the regular bar graphs. This D3 zoomable circle packing visualization, done using the circlepackeR package, uses a series of nested circles that you can click on and zoom in/out of.

To learn more, please see the official documentation by the package author.

If you want to try this yourself, click on "Remix" in the upper right corner to get a copy of the notebook in your own workspace. Please remember to import both the Python (circle_packing_Python) and R (circle_packing_R) runtimes from this notebook under "Runtime Settings" to ensure that you have all the installed packages and can start right away.

Import and pre-process data

As usual, we will use the IBM Telco customer churn dataset, which I have cleaned up in a previous post.

Since I'm quite a bit more comfortable with data wrangling in Python, I will first get the number of customers in each level of every categorical variable using pandas:

## Import data
import pandas as pd

df = pd.read_csv("https://github.com/nchelaru/data-prep/raw/master/telco_cleaned_renamed.csv")

## Get categorical column names
cat_list = [] 

for col in df.columns:
  if df[col].dtype == object:
    cat_list.append(col)
    
## Get all possible levels of every categorical variable and number of data points in each level
cat_levels = {}

for col in cat_list:
  levels = df[col].value_counts().to_dict()
  cat_levels[col] = levels
  
## Convert nested dictionary to dataframe
nestdict = pd.DataFrame(cat_levels).stack().reset_index()

nestdict.columns = ['Level', 'Category', 'Population'] 

## Output data to file
nestdict.to_csv("./results/nested_dict.csv")

## Preview dataframe
nestdict.head()
LevelCategoryPopulation
0Bank transfer (automatic)PaymentMethod1542.0
1ChurnChurn1869.0
2Credit card (automatic)PaymentMethod1521.0
3DSLInternetService2416.0
4DependentsDependents2099.0
5 items
nested_dict.csv

Create circle packing visualization

Now we will take the prepared data and move to R for making the plot:

ip <- as.data.frame(installed.packages()[,c(1,3:4)])
rownames(ip) <- NULL
ip <- ip[is.na(ip$Priority),1:2,drop=FALSE]
print(ip, row.names=FALSE)
## Import libraries
library(tidyverse)
library(circlepackeR)  
library(hrbrthemes)
library(htmlwidgets)
library(data.tree)

## Import data
nestdict <- read.csv(
nested_dict.csv
) ## Prepare data format nestdict$pathString <- paste("world", nestdict$Category, nestdict$Level, sep = "/") population <- as.Node(nestdict) ## Make the plot x <- circlepackeR(population, size = "Population", color_min = "hsl(56,80%,80%)", color_max = "hsl(341,30%,40%)") ## Save widget to HTML file for display saveWidget(x, 'widget.html')

Finally, move the HTML file to the results folder so we can visualize it. Try clicking on the circles!

mv widget.html ./results

At a glance, the sizes of circles in the second level give a quick overview of relative distributions of the levels of each categorical variable. Click on the circles to zoom in and out!

When the occasion is right, this could be a really fun way to add some pizzazz to your visualizations. :)