XCVB Internals

This file describes the current internals of XCVB.

Unless you check it out as of an official major release as tagged in git, it may or may not be up to date.

Section Design Notes is a general information dump about XCVB internals. Section Current Files attempts to actually be current in guiding you through the parts that constitute XCVB. Section Good places to extend XCVB offers general rules on how to extend XCVB. Section High-level description tries to give a high-level view of the concepts underlying XCVB.

Contents

Design Notes
Current Files
Good places to extend XCVB
High-level description

Design Notes

Introduction

This file tries to document the internals of XCVB. It is meant as a general guide to the code; the code itself should be pretty straightforward once you understand the underlying concepts.

For a general introduction to XCVB, see README.rest.

For a high-level view of the general concepts behind XCVB, see section High-level description. For a guide to the files currently in the build, see section Current Files. For a few ideas where to start extending XCVB, see section Good places to extend XCVB. This section Design Notes is for general purpose notes, and assumes you are familiar with XCVB already.

For known issues and things to actually work on, see TODO.rest.

Debugging XCVB

To debug XCVB, I use the following:

C-u M-x slime env SBCL_HOME=/usr/lib/sbcl xcvb repl

General Architecture of XCVB

1- Read command-line.

Choose the backend that corresponds to what the user requested.

2- Initialize the namespace by finding all the build.xcvb files

and identify which xcvb fullname maps to which source directory.

3- Do a first pass, that

focuses on grains (files and intermediate states of the computation), and in the style of denotational semantics, traverses on each grain's dependencies, making the grain's environment explicit, and recurses top-down to build a graph of all grains involved in the build. Along the way, create computation objects that each describe a command by which a list of output grains is computed from a list of input grains.

4- Do a second pass that iterates bottom-up on the the list of *computations*

produced by the first pass, and in the style of operational semantics, takes appropriate action: write them to a Makefile, write them a .asd file, actually effect them, or otherwise do whatever makes sense for the current backend.

Along the way, we use plenty of extensible helper functions and methods that take as first argument an env argument that allows to customize their behavior. The traversal strategy is characterized by an object of the traversal class that is passed around as this first argument. Said object contains context state for the current graph node.

Many of these extensible helper functions are evaluators for "little languages", and defined using define-simple-dispatcher (see below Simple dispatchers as extension points).

We use (multiple) inheritance for incremental development of our code and specializing the passes to the various stragegies available to the users; this maximizes code sharing between these various strategies but results in code that isn't modular: there is little hope of making further specializations without modifying the code, refactoring the class hierarchy and the existing methods. But that's OK, we don't expect non-XCVB hackers to reuse XCVB internals anyway.

Porting XCVB

To port XCVB to a new Common Lisp implementation, grep the sources for sbcl, which is the best supported implementation so far, and/or grep for clozure or clisp if your implementation is more similar to one of those. Then, everywhere that sbcl appears, unless it's a sbcl bug or workaround, add support for your implementation. Finally, run the tests with your implementation:

make test LISP=ecl

Of course, you favorite implementation will also necessitate adaptations that are unique for it. Don't hesitate to ask for help on the mailing-list.

Speaking of ECL, see the relevant section in the TODO file.

Fullnames

A fullname is used as a unique name to specify a component in the system. Both modules and nodes have fullnames -- usually a node will have the same fullname as the module it was generated from modulo a different file extension.

Eager vs Lazy Module Scanning

There are two ways that the search path may be search for modules, and we have to choose early on because of the naming implications.

In the eager scanning approach, the search paths are scanned for build modules at the beginning of the build session, avoiding recursion into version control repositories or other excluded patterns.

In the lazy scanning approach, the search paths are scanned only when a specific module is requested, and the path to scan relative to directories in the search path can be deduced trivially from the module specification, so only a small number of candidate paths need to be searched at any time.

In the eager approach, the mapping from module name to pathname could be arbitrary, developers or system integrators would be free to organize their sources as they wish and could easily override installed modules with the specific versions they need, without having to mimic a standard installation by moving things around or by using symlinks. Significant long hierarchical names could be used to disambiguate modules globally (but shorter nicknames could also be used if non-ambiguous). What would make a module toplevel would be the fact that it declares such a fullname. The downside would be that eager scanning introduces two levels of complexity. The first complexity is this naming indirection, whereby the whole hierarchy of installed modules would have to be scanned at every build, or a cache of the module map would have to be maintained with timestamp information. The second complexity is the fact that eager scanning requires lazy evaluation of the scanned modules, least the whole Lisp world needs be planned for (and conflicts resolved) everytime anything is built; it must be possible to extract naming information from the module declarations without evaluating any of the computations required to otherwise make sense of the module.

The lazy approach is much simpler, at the cost of constraining developers and/or system integrators to agree on a file hierarchy for their modules. This cost seems to be paid happily by developers of Perl, OCaml, Python, PLT Scheme, and many more languages, so it isn't that high. In this approach, what makes a module toplevel is simply that it's located at the top of a directory in the search-path.

The lazy approach allows to better scale to compilation of software using larger sets of libraries when each piece of software only uses a small subset of the libraries. It corresponds to what ASDF currently does. However, it seems harder to program, especially if we are to scan whole filesystem hierarchies and still handle conflicts gracefully; it is harder to configure if we are to avoid symlinks all over the place without forcing things into a centralized hierarchy like other languages do; and it opens a larger window to race conditions creating unpredictable behaviour if libraries are tweaked in the middle of a run.

On the other hand, the eager approach seems to better fit the destructured way that Common Lispers currently work with respect to filesystem organization (as with other topics) and may be a double pain (socially speaking) to first reject then to retrofit after the fact if we find it's required. Attila Lendvai notably favors the eager approach (which was also my initial reflex).

Filesystem Access

We currently use standard CL functions such as DIRECTORY to find build files.

We don't use CL-FAD because it is broken, and fixing it is an uphill battle that requires code for each implementation, each time with many system-dependent variants.

If we ever require deeper access to the filesystem (e.g. to manipulate timestamps, symlinks, etc.), we should be using IOLib. IOLib is not perfect, but to fix it you only need code for each OS, and there is a dwindling number of relevant OSes; implementation interactions are already handled by CFFI. So, if we ever port XCVB to anything else than Unix, the correct way will be to grow Windows (and JVM, Genera) support in IOLib. See my rant "Who's responsible for that moving part?": http://fare.livejournal.com/139755.html

Conditional Build

XCVB supports conditional compilation of some files.

Previously, we supported conditional reading with #+ and #- using the features from the target system, like people currently do with ASDF. Because the builder and buildee are separate, this means that XCVB must somehow extract the features of the buildee (by running the target lisp) before it reads any of the files.

However, that approach was a hack with several drawbacks:

It doesn't scale to larger builds where the same file would be compiled many times using different compilers and feature sets.

It prevents us from using our "normal" way of naming lisp files for the initial user-customization file from --setup, because we need to load such file for any customization of *features*, yet we currently need the target *features* before we may properly use #+ while reading and interning build files during (search-search-path), as needs be done before we may resolve "normal" names.

XCVB has a syntactic construct to specify conditional dependencies. mudballs had the following syntax: (:usocket (:for (:not (:lispworks)))) to conditionalize :usocket. That didn't directly fit the XCVB model, and so we instead use: (:when (:featurep (:not :lispworks)) "usocket"). (Note that we could imaginably hack the reader to have #+ and #- expand to that, but that would be a lot of pain for little gain).

By mandating such syntax and/or hacking the reader, we may intern the build files before any customization of *features*, then resolve the name of the setup file, use that file and acquire the target *features*, and finally process features during graph building.

Ideas from jknight

Ideas from discussion with jknight that were implemented:

allow python-like hierarchical installation paths instead of named build.xcvb (Isn't that already possible? Just have a top-level build.xcvb fullname'd "/" somewhere!) (The point was probably that we can avoid eager search if only we can agree on some naming hierarchy for both "module names" and filesystem relative paths.)

allow pre-built installation paths that are self-describing enough to make it work.

Makefile already doesn't like spaces in names (except implicit names from current path), so we must be using relative pathnames inside the Makefile by default.

allow xcvb.mk to be usable without the XCVB binary, for bootstrap purposes.

--object-directory=object option.

do away with all calls to TRUENAME.

Current Files

General TODO: remove cruft.

driver.lisp

A collection of utilities and trivial domain-specific languages to be compiled into Lisp processes so as to provide an abstraction layer for XCVB to compile files.

The driver may be used either as part of a master process using XCVB to build software that will be loaded into the current image, or as part of a slave process used by XCVB to actually build the software.

See the README section about Interactive Updates for the master mode whereby a process can drive an XCVB build. This provides an alternative to ASDF to build Lisp software into your image.

The driver includes support for ASDF. If ASDF is used, it may be loaded either before or after the driver is loaded, but must obviously be loaded before it is used. See also the XCVB-ASDF bridge.

The driver also includes some support for filtering error messages, support for debugging and profiling, etc.

Finally on supported implementations it includes helpers to fork subprocesses so that a controlling XCVB can copy the current state of the Lisp world before to compile further files and thus share previous computations.

Goals and Constraints:

The driver must not depend on any software beyond what each implementation provides; otherwise, compiling one of these dependencies could cause "interesting" complications due to non-atomic self-modification.

Design choice when forking, we only model fork(2) and not stream I/O through pipes; the farmer will use with-open-file to open specified named pipes created by the farmer.

It is OK to fail to reap a completely hosed process -- in these cases, it is up to the driving XCVB to take action.

pkgdcl.lisp

This file creates the XCVB package and exports the necessary symbols.

TODO: before we release 1.0, edit this file and make sure that we export all those defined symbols that we want users to use, and only those symbols. Or maybe not. Maybe it is best if we stop worrying about packages, and think of a better way to achieve modularity in CL. For instance, copy some advanced Scheme module system.

conditions.lisp

Conditions, errors, etc. This file is under-used.

specials.lisp

A collection of global variables, many of which can be used to customize the behavior of the user-visible parts XCVB.

macros.lisp

A collection of general-purpose macros.

profiling.lisp

Simple abstraction layer for profiling XCVB when running under a supported implementation (i.e. SBCL).

utilities.lisp

A collection of utility functions.

TODO: Most utilities should be moved to a library that XCVB would depend upon, e.g. alexandria.

logging.lisp

A trivial facility for logging messages directed to the user, depending on a varying degree of verbosity.

lisp-invocation.lisp

Handles the invocation of the target lisp system, abstracting away the differences between implementations.

main.lisp

The main entry file, with XCVB's command-line interface.

Handles common options, shared option definitions, simple commands, etc.

string-escape.lisp

Handles escaping strings, for use in shell invocations and/or Makefile commands.

virtual-pathnames.lisp

TODO: this is in flux and seriously broken.

Abstraction layer for pathnames between the XCVB frontend and the backend, so that the planning algorithm may compute abstract file locations independently from where the execution algorithm may or may not place them.

DESIGN IDEA:

XCVB will identify files by their vpn (virtual pathname)

A same file can be referenced as different roles in multiple rules. (e.g. as the source of a Lisp compilation, the target of a Lisp generator, etc.)

The backend will ensure that an actual output or input/output pathname is never referenced by two different files. (For input only, it's OK if we're using a content-addressed system.)

WORKFLOW:

At initialization, some toplevel vpn to pathname mappings are defined.

When build files are registered, they get a pathname and a trivial vpn.

During traversal, source files are probed as usual, but located by VPN. Files are located by vpn.

In the backend, VPNs are resolved as object file locations are found.

grain-interface.lisp

Definition of the main datastructures of XCVB and helper functions.

Grains are nodes in the planning graph representing files, phonies or transient states, and as far as the Makefile backend goes, correspond to Makefile targets.

For other backends, grains may be a more general state, including things like "a running image where such libraries are loaded". They are named "grains" after reading 'BUILD: A tool for maintaining consistency in modular systems' by Richard Elliot Robbins, 1985. ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-874.pdf

See High-level description.

registry.lisp

Handles the mapping from fullname to grain, and conflicts when multiple builds claim the same name.

search-path.lisp

Handles the search path where XCVB goes looking for available builds, reusing the infrastructure from ASDF 2. See the asdf manual about shell variable CL_SOURCE_REGISTRY.

computations.lisp

Computations are nodes in the planning graph representing computations that thake some grains as input and create some other grains as output. In the Makefile backend, they correspond to the Makefile rules.

manifest.lisp

Manage (cryptographic) digests of files. Notably used by the master and slave to determine what files (not) to (re)load. TODO: In the future, may be used by the farmer to detect when to (re)compile. TODO: In the future, use Ironclad instead of tthsum?

extract-target-properties.lisp

Functions to extract from the target Lisp (the one that will run code) various information such as the *features*, whether CFASLs are supported, etc.

grain-implementation.lisp

Handles parsing module forms in lisp grains, including the naming of other modules within a module form.

names.lisp

Handles the mapping from file to fullname and fullname to file. This includes recursing in the case that the fullname is inherited from a build.xcvb up the filesystem tree.

normalize-dependency.lisp

This file maps the user-level dependency language to an internal dependency language that is stricter, where all names have been resolved to a context-independent fullname, builds have been distinguished from Lisp files, etc.

Plus some utilities to deconstruct this internal representation.

traversal.lisp

This file contains the backend-independent API and support for traversing the implicit graph of grains and dependencies and create explicit computations out of it.

dependencies-interpreter.lisp

This file contains the other backend-independent part of the handling of XCVB's dependency little language: mapping from dependency expression to the traversal of grains and issuing of load commands for such.

static-traversal.lisp

Builds a static graph, for use with GNU Make, ASDF, and other static backends.

driver-commands.lisp

Prints text for driver commands from mini-language.

makefile-backend.lisp

This is our makefile backend, using the static backend traversal and iterating over computations to create Makefile rules.

simplifying-traversal.lisp

This is a simplifying traversal used by the ASDF backend and its associated non-enforcing Makefile backend. It conflates all different kinds of dependencies into one.

list-files.lisp

A traversal that finds all files in a build.

Based on this traversal, a function to remove XCVB forms from all files that are part of a build.

TODO: make it resilient to circularities that may appear when you eliminate conditionals.

asdf-backend.lisp

This is our ASDF backend, traversing the graph in a simplified way and iterating over computations to create an ASDF defsystem.

See doc/README.rest section Converting XCVB builds into ASDF systems. See doc/TODO.rest section Better ASDF backend.

ne-makefile-backend.lisp

This is our non-enforcing makefile backend. It uses variants of both the asdf-backend and the makefile-backend to enable fast builds that don't enforce dependencies.

asdf-converter.lisp

This file contains the code to take an ASDF system and write a build.xcvb file for it and add module forms to the top of all the source files in the system, so that XCVB can now be used to compile the system.

The entry point to this file is the function ASDF-TO-XCVB, (or the corresponding command-line command). It take as a parameter one SYSTEM or a list of several SYSTEMS to merge and convert together so they may be built with XCVB. The converter will first use asdf-dependency-grovel to compute the actual dependencies within your system(s), thus generating a list of components; then it will then process that list of components and create or update a build.xcvb and update all source files to insert a module statement, keeping any manual rules from a previously added module statement but otherwise overriding the main data.

slave.lisp

Entry point to XCVB when called by the Master. Will delegate compilation to the preferred backend (currently the Makefile backend) then output a list of files to load in the master image.

farmer.lisp

Code towards a future standalone backend.

cffi-grovel-support.lisp

Support for CFFI-Grovel. Actually built into XCVB, but written as if it were a plugin.

version.lisp

The version of the current XCVB release.

Good places to extend XCVB

Add more tests

See in test/runme.zsh and in the Makefile.

Extending the driver

So as to remain minimal, the driver includes a cruder mechanism for the semantics of its command language: the macro run maps keywords to functions by looking for the function definition of a symbol of the same name as given keyword in the package :xcvb-driver.

Therefore, to extend the command language for the driver's command language, just define functions in said package, and make sure the files defining these functions are loaded before these commands are used (e.g. by listing them first in dependencies):

(defun xcvb-driver::my-new-function (arguments ...) ...)

Simple dispatchers as extension points

XCVB defines many extensible "little languages" that are processed through uses of the macro define-simple-dispatcher. These little-languages are built atop the SEXP-syntax: an element is either an atom or list; the dispatching function distinguishes atom from lists; atoms have their own semantic function; the semantics of a list is dispatched to a function that depends on the head of the list (its first element, usually a keyword). Semantic functions take as first argument an environment, and the rest of their arguments is whatever follows the head of the list.

If you want extensibility for the atom semantic function, you can use a CLOS generic function. However, for easy readability and debuggability, most of the existing code tries to use simple languages that only use lists, keywords and strings as inputs and outputs of the various functions. We propose that you should stick to this convention for now.

Enriching the system

The simplest such extension point is normalize-dependency in normalize-dependency.lisp. Its input is utterances in the input language for dependencies, and its output is utterances in a restricted variant of this language where names are resolved: names do not depend anymore on the context of the grain in which they were uttered, build names are distinguished from fasl names, superseded ASDF systems are replaced by superseding builds. When you wish to add a new type of dependencies to the input language, you'll need to extend this function.

If you wish to extend the output language, you'll similarly need to extend the "simple dispatchers" that process this output language, currently the simple-dispatchers load-command-for and dependency-namestring. That will probably also mean adding new graph nodes, with appropriate simple-dispatchers in static-traversal.lisp and makefile-backend.lisp.

Finally, we have reserved a place for "extension forms" in the module syntax. Currently, we just use it for an experimental way to have generated files in XCVB. See the simple-dispatcher for handle-extension-form.

Along the way, you may have to modify more files, maybe create an additional one for your feature, maybe create more little languages, more simple-dispatchers, more global data-structures.

Dynamic XCVB extensions

The "X" of "XCVB", while not completely a lie, is not very developed at this time.

If you have the need for dynamic extensions, though, it would be easy enough to provide a command line interface for loading new code before to re-process the rest of the command-line arguments. The new code could extend or override existing simple-dispatchers, define new functions, push new command-line handlers. Loading extensions could also be done as part of parsing build modules.

Backwards Compatibility

Starting whenever we have users who actually use XCVB to bootstrap XCVB and we made an official stable release, we'll want /xcvb/driver to provide backwards compatibility.

Therefore we can no longer change the semantics of :compile-lisp. Instead, we must keep the old :compile-lisp and possibly provide new versions under a different name, such as :compile-lisp/2.

High-level description

At some point, this section should be updated and cleaned up.

XCVB's Architecture

It's OK that we started with hand-coding the behaviour of XCVB in Lisp but eventually, we should have a domain-specific language to describe the structure of the dependency graph and the operations used to build it.

For XCVB version 1, we want to cleanup the internals. [Done]

For XCVB version 2, we want to have a sensible basic design for arbitrary Lisp code.

For XCVB version 4, we want to have a fully generalized design that can handle arbitrary operations used to build projects using any kind of programming language or compiler, not just Lisp.

For the sake of avoiding ambiguity and starting from existing knowledge, we'll try to reuse the vocabulary defined in MIT AITR-874 to describe the concepts at hand.

Grains and Types

Grains are the units of data that the build deals with.

Grains come in many types themselves hierarchically organized in subtypes and supertypes. Main types include files and running processes; subtypes of files include Lisp source files, C source and header files, shell scripts, python scripts or perl scripts, ReST documents, FASL files (that come in as many variants as there are combinations of Lisp implementations and various options), linker object files (for many architectures), executable files (for as many architectures), test report files, etc. Types of processes include Lisp compilation processes, C compilation processes, etc.

Initial, Final and in-between

Some grains are the inputs of the overall build, they are the initial grains. They include the source code and data (usually kept under revision control and otherwise part of the source archive), user-specified configuration (edited by users to fit their local configuration), and any file (such as a compiler) or process (some daemon) in the running environment that is used during the build. Anything else is a derived grain.

Some grains are the outputs of the overall build, they are the final grains. They are anything sought for by the user, usually deliverable files needed for some software installation.

Some grains may be both initial and final, for instance data files that are copied over as is from the source archive to the final installation. Some grains are neither initial nor final, they are called intermediate grains, and are usually thrown away after the build.

For bootstrapping purposes, the content of a final grain is sometimes copied into what will be a source grain in future sessions. Within a given build session, though, we will distinguish the initial grain (that may be the copy of a final grain from a previous session) from the final grain (that may be copied as an initial grain for a next session).

Persistent vs Transient

A session is a given run of XCVB.

Grains are persistent if their contents remain accessible from session to session. They are stored on disk or some other permanent media. Typically, persistent grains are files. Persistent grains cannot spontaneously compute.

Grains are transient when their contents disappear between sessions. Typically, transient grains are processes. Transient grains do the computations. They can transform inputs into outputs, and in between maintain active state. They often are born and die within a session, and may not be trusted to survive until a next session (although if the build is used to monitoring updates to a live robust system, they may well be).

Intention vs Extension

We must distinguish file path from file contents, process id and process state.

File contents are pure extensional data, while file names are intentional anchors. Similarly, running processes are the intentional agents, whereas process states are the extensional states.

In a given session, the mapping of name (intention) to contents (extension) is assumed to not change. Otherwise, an error condition is detected, and the build may be aborted or restarted as a new session. In a given session, it may thus be assumed that a file behaves like persistent pure functional data, deterministically yielding the same contents every time it is read.

On the other hand, the mapping from processes (intention) to process states (extension) is likely to change a lot during a session (and a fortiori between sessions). The computation that happens in a given process consumes its input state that is lost as it produce its output state. Process state is thus behaves like a linear pure functional data. Processes may be created with an birth state, or die and cease to have a state.

Process state is often considered up to some gross equivalence that neglects details such as process id, etc. If you may neglect a small driver that forks the processes and executes requested tasks, or dumps images and resumes from them, then you may be able to explicitly duplicate the state of a running process. So as to be able to reinstate a desired state on another machine or in another session, the build needs to be able to reconstruct such state from persistent data (e.g. by dumping an image), up to the desired equivalence.

Names

A grain can have many names. Its fullname is how it is named in the build graph. A file's path indicates where in the filesystem the grain is stored. Its contenthash is a cryptographic checksum of its contents. Its buildhash is a cryptographic checksum of the abstract computation used to compute the contents of the file and of the buildhashes of any inputs used during the computation.

A cache mapping buildhash to contenthash and contenthash to contents can help save actual computation time in repeated or distributed builds.

Generated "source" code

Generated code files count as derived grains, not source, since they need not be part of the source archive and are not checked in to revision control system (unless they are indeed bootstrapped, in which case the derived final file of a build may be initial source file for the next).

They may make the build more complex in that the build needs to know the associated dependencies before it may plan how to build the derived grains, yet may not be able to know these dependencies before the Lisp file is generated.

In simple cases, the dependencies can be identified before the file is generated, as special off-file declarations in the generating source code.

In other cases, the build system may have to update its build graph after the file is computed, possibly doing a recursive call to the main build entry point.

Plans

Build Graph

The plan is a build graph that contains nodes that represent the intensional grains, and nodes that represent the intensional computations that will derive output grains from input grains, from the initial ones up to the final ones.

As the builder executes the plan, it computes the extensions of derived grains from the extensions of former grains.

At the beginning of a session, only the initial extensions are available. At the end of a session, the final extensions are computed.

The plan is static if it doesn't need to be updated based on the extension of any derived grain. The plan is dynamic if it new nodes need to be created during a given session.

Computations

Computations are nodes in the build graph, that join together a set of input grains to a set of the output grains.

A computation node represents an actual computation whereby the output grains can be computed from the input grains when the latter are available.

The computation may involve creating processes or talking to existing processes; these processes will change state, including being born and dying, as they run the specified commands, creating output grains as they consume input grains.

For caching purposes, a computation also specifies how to compute a buildhash identifying each of the outputs of the operation from the buildhashes of inputs.

Actionable operations

In a given backend (i.e. Makefile, ASDF, etc.), not all operations are permitted. For instance with make, you can't duplicate a process state with fork, whereas you could with a future native backend. Additionally, some Lisp compilers have operations that other compilers don't. For instance, you can't load then dump -- instead you link with a special process. An operation that can actually be executed is called actionable.

And so whether operations are decomposed into basic elements or grouped in actionable wholes depends on the specific target. Also, multiple operations may lead to the "same" outcome up to the desired equivalence, and operations may be combined in different ways. Some combinations may be preferred to others, especially if the some are valid operations on the target whereas the others are not, but also if some operations are reputedly slow whereas others are expected to be fast.

Complete plan

In a complete plan, all operations are actionable. Moreover, initial grains are never the output parameter of an operation, and derived grains appear as the output parameter of an operation exactly once in the build graph.

Plan transformation

From the module declarations, a model may be constructed of which modules contain compile-time or load-time references to which other modules, and otherwise depends on which modules at build-time, etc. This reference model constitutes a build graph, but may not be an actionable build graph, and may notably include intermediate grains representing process states that may be not be reachable in the specific target environment that the build session will take place in.

Thus, the build system may have to transform its plan both for the purpose of correctness and for purpose of performance optimization. A build session with start by computing the reference model from the extension of the initial grains, and from the build request, then apply configuration-dependent transformations to that plan to create a complete plan.

Reference model

We possibly may want to fully distinguish the reference model as something distinct from a build graph, just like the AITR-874 author. Then we could handle reference classes such as:

(:package-ref A B) - A uses a package defined in B (before to either compile or load A, load B).

(:macro-calls A B) - A uses a (reader)macro in B (before to compile A, load B).

(:calls A B) - code in A calls code in B (before you run code in A, load B).

(:setup-calls A B) - while loading A there are calls to code in B (load B before you load A).

Note that the current XCVB model of :depends, :load-depends, :compile-depends corresponds loosely to :package-ref, :setup-calls, :macro-calls. We may want to add a :run-depends to add the functionality of :calls.

Dependencies

The declared dependencies of a node (or declared inputs of an operation) are computed by the operator that creates the operation node, and include all the grains that must be available before the operation may be computed. They are established at the beginning of the session at the time the actionable plan is created, and do not change for a static plan, but may be updated for a dynamic plan, in which case an operation that will update the dependencies might itself be required as part of the plan before the operation may be computed.

The effective dependencies of a node (or effective inputs of an operation) include any grains that were actually consulted in computing the operation. In a static build, these effective dependencies will be the same as the declared dependencies. In a dynamic build, which dependencies were or weren't included may sometimes be discovered only after the fact. If they are found to contain grains that weren't available, a condition is raised, and the build session may be aborted, or the required grains may be computed before the operation may be restarted.

Note that when an object file itself hasn't changed as a result of a computation, but one its dependencies (or the list of dependencies itself) has changed, then one may have to recompile all files that depend on it. This is taken into account by digesting not just the direct dependencies, all the transitive dependencies loaded before the computation. And so, if building a propagation network for recompilation, the list of dependencies itself is to be part of the network that triggers updates to a compilation, not just the individual dependencies listed.

Operators

Abstracting Operations

operators are methods that create concrete operations out of various parameters.

Parameters may include input grains, output grains, options (which compiler/flags to use), etc.

Note however, that the inputs and outputs of one of the operations created by an operator need not be those given as parameter by the operator. For instance, an operator may create intermediate grains, add detected dependencies, recurse by calling other operators, etc. (Operators correspond loosely to what AITR-874 calls request-handlers and reference-handlers, or to what ASDF calls operations).

A same grain can be used as the parameter to multiple operations, or to the same operator in multiple instances with different other parameters. For instance, a file may be both compiled and processed for documentation extraction. Or it may be compiled for speed in one case, compiled for code coverage in another case, and compiled with serializable continuations in a third case.

Generating computation and hashes

The computation, the static buildhash, and the algorithm to compute the dynamic buildhash may all be obtained from the same specification of the relationship between outputs and inputs, at the time the operator is instantiated.

Say the operation has some specification like:

(:compile-lisp :loading (fasl1 fasl2)
               :source lisp3 :target fasl3
               :lisp-compiler image4 :options (o5 o6 o7))

Then the evaluator would invoke the specified Lisp image with the proper options to load the specified fasls and compile the source file into the target fasl.

The direct hash of a grain would be a cryptographic hash of its specification (type and module name) followed by the hash of its file contents (or for a process, the hashes of the binary images and configuration files used to start the daemon).

The static buildhash of each output grain can be obtained in advance of any computation with a cryptographic hash of the concatenation of the specification, something that identifies which output is considered (say "(1)" for the first output, or "(2 3)" for the third file in the second list of outputs, or just the output module name), and the successive static buildhashes of each of the inputs, with the static buildhash of an initial grain being its direct hash.

The dynamic buildhash of each output would be computed in a similar way right before (for a static computation) or right after (for a dynamic computation) the computation, by hashing the concatenation of the specification, of the output identifier, and the successive direct hashes of each declared input (for a static build) or effective input (for a dynamic build).

The hashing algorithm used for all hashing by XCVB will be a Tiger Tree Hash (TTH).

Hashes can be used to index a cache of already computed objects. Static builds with only static computations may use the static buildhash to index objects in the cache in advance of any computation -- and by looking for final objects directly at the top of the graph skipping unneeded intermediate grains. Dynamic builds or builds with dynamic computations must use dynamic buildhash to index objects in the cache and must rebuild all grains from the ground up. Builds with both static and dynamic elements are conceivable.

Graph building

A build starts with an initial request: an operator is called with a set of input parameters.

Notes

To preserve incremental compilation with a central dependency declaration, you could explode the central declaration. Each files at the time that you initialize when creating the xcvb.mk and update (if needed) each time you re-compile a file. Each compiled file would have a corresponding dependencies file caching which dependencies were used during the compilation of it, stored in the obj/ directory and updated when it changed. Lots of infrastructure, but doable.