The Orchestrator

Introduction

One of the most useful tools in a Unix-style environment is the make utility, which keeps a collection of files generated by various programs up-to-date by running only the programs necessary to update files downstream from a modified file. However, it has its limitations. First of all, the "language" used by make is not a full-fledged programming language, and it would be useful to have the flexibility offered by a programming language. Secondly, it would be useful for such a utility to be able to pull values from a file and use them as arguments to a program, rather than be required to use a whole file as an argument.

The Orchestrator is an attempt to provide such a utility. It was written mainly as a utility for my molecular simulation package BrownDye, but I think that it could be useful for a wider range of applications. Right now, it is written in Ocaml, and the following documentation will reflect that. However, I do realize that Ocaml is not widely used on this side of the Atlantic, even though it is my favorite language, so one of my goals for this project is to rewrite the Orchestrator in Python, so it will be accessible to more people. But even if you don't know Ocaml, this documentation should give you an idea of what the Orchestrator does.

An Orchestrator script, which is analogous to a "Makefile", is composed of various source objects. The sources have one or more inputs, and one output. The output of a source can be the input to another source, and the result is a directed acyclic graph. A source can be updated, which results in the updating of all the sources which are upstream, or those sources which ultimately feed into it. Each source has a time associated with it, which is the time of the last update. In order for a source to be up-to-date, its time must be greater (newer) than any of the sources upstream. Often, this updating process will result in the running of a program to process files to create a new file.

Data Types

The orchestrator module Orchestrator introduces two new data types. The first is defined as

type data =
  | Int of int
  | Float of float
  | String of string
  | Float3 of float * float * float
  | Bool of bool
  | Null
Because Ocaml is strongly typed, we have to explicitly define a data type which can hold integers, floating-point values, strings, and booleans. (For my own convenience, I also let it represent a 3D vector of floats.) The Python version will not have this issue. This data type represents the data flowing between the sources. The data type also has the option Null, which represents the absence of a value.

The second data type is the source itself:

type source
Objects of this type are generated by the various functions below.

Functions

 val new_in_file_source: string:file -> source 
This function takes the name of a file and returns a source representing that file. The time associated with this source is the time of the file's last modification.

 val new_command_source: string:command -> string:output_file -> source
This function takes two string arguments. The first argument is the name of the program which is run to generate the output. The second argument is the name of the file to which the standard output of the program command is directed. The inputs to the command source are defined by the following functions:
val add_stdin_prereq: source:command -> source:prereq -> unit

val add_prereq: source:command -> string:tag -> source:prereq -> unit

val add_opt_prereq: source:command -> string:tag -> source:prereq -> unit
The function add_stdin_prereq

Still under construction