Clojure src, test, and meta

One domain's metadata is another domain's data. What differentiates them is context. I argue in this post that we need to consider a new, additional source code context for our Clojure projects: in addition to the ubiquitous src and test folders, we need to add a meta folder.

Metadata is data that describes a context in which the data it annotates bears significance beyond the unadorned sum of its parts. Notably, metadata does not define the context, but a context. Values are the universals, whereas names we give to values and collections of them establish contexts through which we begin to synthesize information from our data.

For this reason, Clojure supports adding metadata to symbols and collections, but not to all arbitrary values, as expressed in Clojure's metadata reference documentation:

Symbols and collections support metadata, a map of data about the symbol or collection. The metadata system allows for arbitrary annotation of data...[But metadata] is not considered to be part of the value of an object. As such, metadata does not impact equality (or hash codes). Two objects that differ only in metadata are equal...[but] an object with different metadata is a different object.

If you need a refresher as to how Clojure itself leverages metadata, expand the section below.

Clojure core metadata uses

The Clojure reader adds metadata to certain forms for tracking the :line and :column of the form's position in its source code file. Certain macros support adding :doc metadata when a string argument is encountered in the correct position. Clojure core forms include an :added entry to denote at which major Clojure version they were added to the language. Function definitions using defn receive an :arglists list based on the function signature, and accept a :tag entry to specify the return type of the function. Macro definitions, which otherwise look like functions, have an extra :macro metadata entry. The :inline and :inline-arities entries indicate conditions under which Clojure should inline a function call. Top-level vars receive automatic :name and :namespace metadata; they can be tagged as :dynamic to denote they can be rebound dynamically via binding; they can be marked as :private to indicate they should not be referenced outside their defining namespace. Clojure's built-in testing support looks for a :test function in metadata.

As of Clojure 1.10, metadata can also be added to concrete values that implement clojure.lang.IMeta to extend Clojure protocols to those individual values (rather than being limited to extending protocols to types). This extension supports implementing clojure.core.protocols/nav such that specific entries within Clojure collections can be "navigated to," which in spirit returns a form of metadata about those items.

In practice, Clojure metadata is often applied in situ—within an ns, def, or defn form. This serves a purpose, but presents the following challenges:

  1. The conceptual scope of this in-place metadata is naturally limited.
  2. Significant amounts of metadata add visual noise to the source code file, making it more onerous to work on the primary Clojure code responsible for the program's implementation.

What do I mean by conceptual scope of metadata? I mean primarily the audience, the human being(s) or the programs intended as consumers of the given metadata. Metadata describes contexts, but there are scopes to those contexts that are relevant to different audiences, at different times, while engaged in different phases of thinking, designing, building, and debugging.

Even though metadata is separate from its target data, some metadata intimately affects the way that a Clojure program can use that data. Such metadata should clearly be applied in place, as is done today, when the Clojure compiler is the audience.

But what of higher-level scopes? What of parent metadata scopes that encompass not only individual functions, but groups of functions, entire namespaces, whole programs, other projects, external systems, entirely different fields of intellectual endeavor?

If we limit ourselves to in-place metadata or even metadata applied via alter-meta! at a file's end, we deprive ourselves of sufficient breathing room to address these higher-level contexts. If we clear our digital desk and make a separate meta space dedicated to the essential work of annotating our programs for different audiences (often ourselves during different phases of work), we unfetter our src code and are free to apply metadata of arbitrary complexity and scope to forms defined in our src folder. And why stop at src? We can annotate tests, helper functions in a top-level dev folder, namespaces in third-party libraries we depend on, etc. Higher-level metadata should not be constrained by classpath borders. And while such metadata should be defined in one place, it often needs to be referenced from many disparate locations to ensure pertinent information is close at hand across a code base.

I propose that meta be a top-level folder in every Clojure project, or depending on classpath and packaging needs, that separate <namespace>_meta.clj or <package>/meta.clj files be used in appropriate folders, established as a canvas of sufficient size and independence from individual code forms for articulating information about our systems to as many degrees and for as many audiences as needed. From that vantage point, we can then adorn all pertinent namespaces and vars with that metadata.

Counterpoint: What kind of metadata deserves this treatment in practice and how would it be globally discoverable if authored as Clojure metadata in separate files? This very question exposes a causality dilemma: without sufficiently powerful tools to view, test, search through and query for metadata, I personally have opted to record information-rich metadata in the form of separate documentation, standalone tutorials, tests in automated suites, and unfortunately in the imperfect mind-meld of teams' tribal knowledge. For Clojure developers, it would be ideal for all that information to be available directly at the REPL, connected to and referenced from source code that underpins it, but that information should also be deliverable as published documentation.

Metazoa: View, test, search, and query Clojure metadata

I have released a new library called Metazoa to provide such tools for Clojure metadata. Metazoa features an extensible API for viewing metadata interactively, rendering it as documents, exercising and testing it to ensure ongoing validity, and discovering metadata on the classpath via full-text search and Datalog querying. It also ships with a handful of types of metadata (called metadata providers) that take advantage of those tools: executable examples, function tables, interactive tutorials at the REPL, and structured documents that can contain other metadata provider values as nodes.

To dive into Metazoa itself, pull in the library and try out its introductory tutorial at the REPL by evaluating:

(require '[glossa.metazoa :as meta])
(meta/view 'glossa.metazoa ::meta/tutorial)

I have also posted an introductory video on YouTube with questionable choices of font size and color scheme, wherein I walk and talk through the same tutorial.

The commit history should reveal that the project is young and there are still many aspects to be documented, but I hope you find it a useful starting point for taking better advantage of Clojure's metadata to capture and share knowledge about your systems.


Tags: documentation clojure metazoa metadata function-table

Copyright © 2024 Daniel Gregoire