Adventures in Computing: Shell Command Visualization
When you spend enough time on the command line, you start to notice
that you use certain commands a lot. Things like cd, git, and
sudo get used a lot when you use Linux as your daily
driver. Inspired by this Reddit
post,
I decided to create a tool to visualize the data.
Streaming data
Since different shells will provide different history files, a command
that parses different history files will be time-consuming to
write. But most shells provide a history command that simply
provides output like this:
 9989  conda activate data_analytics
 9990  ipython
 9991  make
 9992  make clean
 9993  make
 9994  bat
 9995  make
 9996  cd ~/work
 9997  cd ~/work/byui/instructor-tools/grade-trends
 9998  ls
 9999  eog output/grade_trend_all.png
10000  ipython
This can be streamed to a tool with a Unix pipe:
history | some-tool
Here’s how I did this in Python:
cmds = []
line = sys.stdin.readline()
while line:
    line = sys.stdin.readline().strip()
    matches = re.match(r"\s*[0-9]+  (.*)", line)
    if not matches:
        continue
    parts = matches.group(1).split(" ")
    cmd = parts.pop(0)
This works well enough, but in a *nix shell you can set environment
variables for the duration of a command like this:
ENV_VAR=foo some-command
We don’t want to see environment variables in the output, so we need to skip those:
while re.match(r"[a-zA-Z_][a-zA-Z_0-9]*=.* ", cmd):
    cmd = parts.pop(0)
At this point, I realized that sudo “hides”, in some senses, the
commands that are actually being executed. When I run sudo emacs,
I’m not running sudo to run sudo; I’m running Emacs with elevated
privileges, which is done with the sudo command. It might be nice to
be able to strip the sudo out and see the underlying command. I
added argument parsing with argparse and updated the loop that finds
the actual command:
while re.match(r"[a-zA-Z_][a-zA-Z_0-9]*=.* ", cmd) or (
    args.strip_sudo and cmd == "sudo"
):
    cmd = parts.pop(0)
The full loop is now:
while line:
    line = sys.stdin.readline().strip()
    matches = re.match(r"\s*[0-9]+  (.*)", line)
    if not matches:
        continue
    parts = matches.group(1).split(" ")
    cmd = parts.pop(0)
    while re.match(r"[a-zA-Z_][a-zA-Z_0-9]*=.* ", cmd) or (
        args.strip_sudo and cmd == "sudo"
    ):
        cmd = parts.pop(0)
    if cmd:
        cmds.append(cmd)
Now we can count everything with the collections.Counter, and load
the result into a pandas DataFrame:
df = pd.DataFrame(Counter(cmds).items(), columns=["command", "count"])
The number of commands to plot can be controlled with a flag:
n_largest = df.nlargest(args.n, ["count"])
Now we’re on to the plotting. Here’s the function signature:
def circle_plot(
    data: np.ndarray,
    labels: List[str],
    max_length: int=100,
    ylim_min: int=-50,
    cmap: str="viridis",
    label_padding: int=5,
    background_color: str="gray",
):
    """ Produce circular plot of data
    Args:
      data (np.ndarray): Data array to plot
      labels (List[str]): String labels for data
      max_length (int, optional): Maximum length of bars, defaults to 100
      ylim_min (int, optional):
        Minimum y-value of plot, used to tune how close the bottom of
        the bars are to each other, defaults to -50
      cmap (str, optional): Matplotlib colormap to use, defaults to 'viridis'
      label_padding (int, optional):
        Padding between labels and the end of bars, defaults to 5
      background_color (str, optional):
        Background color of plot, defaults to 'gray'
    """
We start by normalizing the data:
# Normalize data
if not isinstance(data, np.ndarray):
    data = np.array(data)
data_max: np.int64 = np.max(data)
data_min: np.int64 = np.min(data)
data_norm: np.ndarray = data.copy()
data_norm: np.ndarray = (data_norm - data_min) / (data_max - data_min)
Normalizing the data gives us values that are strictly in the interval
[0, 1], mapping a value of 0 to 0, and the highest value
to 1. This converts the visualization of each data point to be
relative to the others. If there’s a command that’s used much more
often than the next-closest count, we don’t want a lopsided plot with
a single huge bar and lots of tiny bars.
Next, we compute the bars:
# Compute bar characteristics
bar_width: float = 2 * np.pi / len(data)
bar_angles: np.ndarray = np.arange(1, len(data) + 1) * bar_width
To plot the bars in a circle, we use a polar projection, which changes
how the arguments to Axes.bar are handled, and then we plot the
bars:
fig, ax = plt.subplots(subplot_kw={"projection": "polar"})
if isinstance(cmap, str):
	cmap = plt.get_cmap(cmap)
bars = ax.bar(
	bar_angles,
	data_norm * max_length,
	width=bar_width * 0.9,
	color=cmap(data_norm),
	zorder=10,
)
To get thing nicely scaled, a y-axis limit was empirically determined, and the ticks are removed for aesthetics:
ylim_max: float = max_length * 2.3
ax.set_ylim(ylim_min, ylim_max)
ax.set_yticks([])
ax.set_xticks([])
To plot the labels, we do some quick trigonometry and figure out where on the unit circle the bar angle will be. This lets us set the rotation and alignment.
for label, count, bar_angle, bar in zip(labels, data, bar_angles, bars):
	rot = np.degrees(bar_angle)
	if np.pi / 2 <= bar_angle <= 3 * np.pi / 2:
		rot += 180
		rot %= 360
		alignment = "right"
	else:
		alignment = "left"
	ax.text(
		bar_angle,
		bar.get_height() + label_padding,
		s=f"{label} - {count}",
		va="center",
		rotation=rot,
		rotation_mode="anchor",
		ha=alignment,
	)
Some additional aesthetics and returning the figure and axis:
ax.grid(False)
ax.set_facecolor(background_color)
fig.set_facecolor(background_color)
for spine in ax.spines.keys():
	ax.spines[spine].set_visible(False)
return fig, ax
The outputs of the script:

Running with the --strip-sudo flag:

The code is available here, if you’d like to try it out yourself!