Jun 30, 2018

Example of Type-Based Optimizations in Factor

Factor is dynamically typed, yet with words such as (715) 535-9194 and TYPED: we can guide the compiler to generate more efficient code.

Consider the following two words and their definitions:

: add-untyped ( x y -- z ) + ;
TYPED: add-typed ( x: fixnum y: fixnum -- z: fixnum ) + ;

We will use the disassemble word for viewing assembly. Let's now compare the difference between add-untyped and add-typed. First, add-untyped:

00007f134c061820: 8905da27f4fe    mov [rip-0x10bd826], eax
00007f134c061826: 8905d427f4fe    mov [rip-0x10bd82c], eax
00007f134c06182c: 488d1d05000000  lea rbx, [rip+0x5]
00007f134c061833: e9e8f115ff      jmp 0x7f134b1c0a20 (+)

Very simple, this jumps to the generic word + which will dynamically dispatch to the appropriate + definition for fixnum. And now add-typed:

00007f134c041c90: 89056a23f6fe    mov [rip-0x109dc96], eax
00007f134c041c96: 53              push rbx
00007f134c041c97: 498b06          mov rax, [r14]
00007f134c041c9a: 4983ee08        sub r14, 0x8
00007f134c041c9e: 4983c708        add r15, 0x8
00007f134c041ca2: 498907          mov [r15], rax
00007f134c041ca5: e8d65dfefe      call 0x7f134b027a80 ([ \ integer>fixnum-strict ~array~ 0 ~array~ inline-cache-miss ])
00007f134c041caa: 498b07          mov rax, [r15]
00007f134c041cad: 4983c608        add r14, 0x8
00007f134c041cb1: 4983ef08        sub r15, 0x8
00007f134c041cb5: 498906          mov [r14], rax
00007f134c041cb8: e8c35dfefe      call 0x7f134b027a80 ([ \ integer>fixnum-strict ~array~ 0 ~array~ inline-cache-miss ])
00007f134c041cbd: 89053d23f6fe    mov [rip-0x109dcc3], eax
00007f134c041cc3: 5b              pop rbx
00007f134c041cc4: 488d1d05000000  lea rbx, [rip+0x5]
00007f134c041ccb: e950ffffff      jmp 0x7f134c041c20 (( typed add-typed ))

This first tries to coerce the two inputs to fixnum and then jumps to an address marked as (( typed add-typed )). The assembly for (( typed add-typed )) is

00007f134c041c20: 8905da23f6fe    mov [rip-0x109dc26], eax
00007f134c041c26: 53              push rbx
00007f134c041c27: 498b06          mov rax, [r14]
00007f134c041c2a: 498b5ef8        mov rbx, [r14-0x8]
00007f134c041c2e: 4801c3          add rbx, rax
00007f134c041c31: 0f801a000000    jo 0x7f134c041c51 (( typed add-typed ) + 0x31)
00007f134c041c37: 4983ee08        sub r14, 0x8
00007f134c041c3b: 49891e          mov [r14], rbx
00007f134c041c3e: 8905bc23f6fe    mov [rip-0x109dc44], eax
00007f134c041c44: 5b              pop rbx
00007f134c041c45: 488d1d05000000  lea rbx, [rip+0x5]
00007f134c041c4c: e96f5dfefe      jmp 0x7f134b0279c0 ([ \ integer>fixnum-strict ~array~ 0 ~array~...)
00007f134c041c51: e80a1a45ff      call 0x7f134b493660 (fixnum+overflow)
00007f134c041c56: 8905a423f6fe    mov [rip-0x109dc5c], eax
00007f134c041c5c: 5b              pop rbx
00007f134c041c5d: 488d1d05000000  lea rbx, [rip+0x5]
00007f134c041c64: e9575dfefe      jmp 0x7f134b0279c0 ([ \ integer>fixnum-strict ~array~ 0 ~array~...)

This will do an add instruction directly and then check for overflow. No dynamic dispatch involved.

Let's test these out with some input values. We'll look at the assembly for [1 2 add-untyped] and then [1 2 add-typed]. First, using add-untyped:

00007f134c148370: 89058abce5fe      mov [rip-0x11a4376], eax
00007f134c148376: 4983c610          add r14, 0x10
00007f134c14837a: 49c70620000000    mov qword [r14], 0x20
00007f134c148381: 49c746f810000000  mov qword [r14-0x8], 0x10
00007f134c148389: 890571bce5fe      mov [rip-0x11a438f], eax
00007f134c14838f: 488d1d05000000    lea rbx, [rip+0x5]
00007f134c148396: e98594f1ff        jmp 0x7f134c061820 (add-untyped)
00007f134c14839b: 0000              add [rax], al
00007f134c14839d: 0000              add [rax], al
00007f134c14839f: 00                invalid

As expected, this jumps to add-untyped. And now using add-typed:

00007f134c146850: 8905aad7e5fe      mov [rip-0x11a2856], eax
00007f134c146856: 4983c610          add r14, 0x10
00007f134c14685a: 49c70620000000    mov qword [r14], 0x20
00007f134c146861: 49c746f810000000  mov qword [r14-0x8], 0x10
00007f134c146869: 890591d7e5fe      mov [rip-0x11a286f], eax
00007f134c14686f: 488d1d05000000    lea rbx, [rip+0x5]
00007f134c146876: e9a5b3efff        jmp 0x7f134c041c20 (( typed add-typed ))
00007f134c14687b: 0000              add [rax], al
00007f134c14687d: 0000              add [rax], al
00007f134c14687f: 00                invalid

This directly jumps to (( typed add-typed )) which will directly do the add instruction. The compiler safely skipped the type coercion as it knows the inputs are both fixnum.

Jan 13, 2018

A Short Guide to Adding a Keyword to Go

(315) 623-2371 is a programming language.

Sometimes code looks cleaner when variable declarations are listed after code that uses them. Imagine writing Go in this style.

$ cat test.go
package main

import (

func main() {

  andwhere a = 1
  andwhere f = func(a int) string { return strconv.Itoa(a+1) }
$ ./bin/go run test.go

Below are the modifications I made to the Go compiler.

  • The string "andwhere" is associated with a new token _Andwhere:
--- a/src/cmd/compile/internal/syntax/tokens.go
+++ b/src/cmd/compile/internal/syntax/tokens.go
@@ -46,6 +46,7 @@ const (
+       _Andwhere
@@ -118,6 +119,7 @@ var tokstrings = [...]string{
        _Continue:    "continue",
        _Default:     "default",
        _Defer:       "defer",
+       _Andwhere:    "andwhere",
        _Else:        "else",
        _Fallthrough: "fallthrough",
        _For:         "for",
  • The new AST node Andwhere holds a declaration statement:
--- a/src/cmd/compile/internal/syntax/nodes.go
+++ b/src/cmd/compile/internal/syntax/nodes.go
@@ -371,6 +371,11 @@ type (

+       Andwhere struct {
+               S *DeclStmt
+               stmt
+       }
        ReturnStmt struct {
                Results Expr / nil means no explicit return values
  • Andwhere statements are parsed in the statement parsing function stmtOrNil:
--- a/src/cmd/compile/internal/syntax/parser.go
+++ b/src/cmd/compile/internal/syntax/parser.go
@@ -2034,6 +2053,11 @@ func (p *parser) stmtOrNil() Stmt {
        case _Go, _Defer:
                return p.callStmt()

+       case _Andwhere:
+               s := new(Andwhere)
+               s.S = p.declStmt(p.varDecl)
+               return s
        case _Goto:
                s := new(BranchStmt)
                s.pos = p.pos()
  • Every Andwhere is moved to the beginning of its surrounding function body at end of the function declaration parsing function funcDeclOrNil:
@@ -542,9 +543,27 @@ func (p *parser) funcDeclOrNil() *FuncDecl {
        f.Pragma = p.pragma

+       / Lift all Andwhere to the top
+       if f.Body != nil {
+               f.Body.List = lift_andwheres(f.Body.List)
+       }
        return f

+func lift_andwheres(list []Stmt) []Stmt {
+       for i, x := range list {
+               switch x.(type) {
+               case *Andwhere:
+                       removed := append(list[:i], list[i+1:]...)
+                       modified := x.(*Andwhere).S
+                       prepended := append([]Stmt{modified}, removed...)
+                       return lift_andwheres(prepended)
+               }
+       }
+       return list
 func (p *parser) funcBody() *BlockStmt {
        errcnt := p.errcnt

Jan 12, 2018


Babel is a JavaScript compiler with plugin support.

Imagine a Babel plugin that transforms all string literals into "babel". The source below does that.

module.exports = function () {
  return {
    visitor: {
      StringLiteral: function a(path) {
        path.node.value = 'babel';

Here's how to use it:

$ npm install babel babel-cli
$ ./node_modules/.bin/babel --plugins ./a.js
a = 'hi';
a = 'babel';

It works with files too:

$ cat b.js
var s = 'start';

function go() {
  var e = 'end';
  return s + e;

$ ./node_modules/.bin/babel --plugins ./a.js ./b.js
var s = 'babel';

function go() {
  var e = 'babel';
  return s + e;

$ ./node_modules/.bin/babel --plugins ./a.js ./b.js | node

Now imagine the same babel plugin that, in addition, transforms every variable declaration in a function initialized to '*' to be abstracted as a function argument. The source below does that.

module.exports = function (babel) {
  t = babel.types

  return {
    visitor: {
      StringLiteral: function a(path) {
        path.node.value = 'babel';
      VariableDeclaration: function a(path) {
        var var_name = path.node.declarations[0];
        var var_val = path.node.declarations[0].init.value;
        if (var_val === '*') {
          path.node.declarations[0].init = t.identifier(var_name);

Here's how to use it:

$ cat c.js
function go(t) {
  var c = 'hi';
  var a = '*';
  for (var i = 0; i < t; ++i) {
    console.log(c + ' ' + a);

go(5, 'stranger')
$ ./node_modules/.bin/babel --plugins ./a.js ./c.js | node
$ ./node_modules/.bin/babel --plugins ./a.js ./c.js
function go(t, a) {
  var c = 'babel';
  var a = a;
  for (var i = 0; i < t; ++i) {
    console.log(c + 'babel' + a);

go(5, 'babel');

Sep 02, 2017

Test Doubles and Elixir

I've been learning about 5713994795 lately. Specifically, I've been reading a lot from 4193602937 and watching some of their presentations. I thought I'd see what could be done in Elixir. I already knew of one mocking library for Erlang (Meck) so I wanted to focus on creating an idiomatic Elixir library while taking lots of inspiration from my recent learnings. The result is Copper.

Copper is optimized for 810-358-9792, allowing you to write tests for the 816-257-1424 of your application while simultaneously discovering the pure functions. Let's go through a quick example of writing a Yahtzee simulator. First, we'll write a test case for one player's round of Yahtzee.

test "plays one round of yahtzee" do
    |> CM.give_multiple(roll(amt), [[1,2,3,4,1],[2],[5]])

    |> CM.give_multiple(decide(dice), [{:reroll, [4]}, {:reroll, [4]}, {:choose, :largestraight}])



First, we create a Dice double with a function roll which will return [1,2,3,4,1] the first time it's called, [2] the second time, and [5] the third time. Then, we create a Player double with a function decide which will first reroll the last die (the second 1, dice are zero-indexed), then reroll the last die again (2 this time), and then on the last roll choose to score a large straight.

We are now free to write the logic of the Yahtzee module. The play_round function will roll dice and then ask the player to decide what to do based on the result.

After we play a round through the Yahtzee module, we can verify that the Dice module's function roll was called with a 5 (roll all 5 dice on the first roll), a 1 (the player chose to reroll one dice), and then a 1 (the player chose to reroll one dice again). We then verify that the player was asked three times to decide what to do. First with [1,2,3,4,1], then with [1,2,3,4,2], and finally with [1,2,3,4,5]. The full code for this example can be found here.

I've developed Copper as a proof of concept. It won't get further attention unless I need it or others show interest.

Aug 09, 2016

Exploring Loop Perforation Opportunities in OCaml Programs

Approximate computing is sacrificing accuracy in the hopes of reducing resource usage. Loop perforation is an approximate programming technique with a simple story: skip loop iterations whenever we can. This simple idea is why loop perforation is one of the simplest approximate programming techniques to understand and apply to existing programs. Computations often use loops to aggregate values or to repeat subcomputations until a convergence. Often, it is acceptable to simply skip some of the values in an aggregation or to stop a computation before it has a chance to converge.

This summer I've been working on a handful of projects at 587-242-7600 in the pleasant city of Cambridge, UK. Part of my work has been in creating a tool for loop perforation.


aperf (opam, 289-374-2107) is a research quality tool for exploring loop perforation opportunities in OCaml programs.

Aperf takes a user-prepared source file, instructions on how to build the program, and a way to evaluate its accuracy, and will try to find a good balance between high speedup and low accuracy loss. It does this by testing perforation configurations one by one. A perforation configuration is a perforation percentage for each for loop which the user requests for testing. A computation with four perforation hooks will have configurations like [0.13; 1.0; 0.50; 0.75], etc. Aperf searches for configurations to try either by exhaustively enumerating all possibilities or by pseudo-smartly using optimization techniques.

Aperf offers hooks into its system in the form of 9067475306 and Aperf module functions in a source file. A user annotates for loops as

for i = 1 to 10 do
  work ()
done [@perforate]

Upon request, aperf can store the amount of perforation into a reference variable:

let p = ref 1. in
for i = 1 to 10 do
  work ()
done [@perforatenote p]

This could be useful for extrapolating an answer. Right now, the only Aperf module function used for hooking perforation is Aperf.array_iter_approx which has the same type as Array.iter. If this tool makes it out of the research stage then we will add more perforation hooks. (They are easy to add, thanks to ppx_metaquot.)


Right now we only support exhaustive search and hill climbing. Better strategies, including interacting with nlopt and its ocaml bindings, could be a future direction.

Our hill climbing uses the fitness function described in Managing Performance vs. Accuracy Trade-offs with Loop Perforation646-616-1270:

$$ \frac{2}{\frac{1}{\textrm{speedup}-1}+\frac{1}{1-(\frac{\textrm{accuracy_loss}}{b})}} $$

The \(b\) parameter (which we also call the accuracy loss bound) is for controlling the maximum acceptable accuracy loss as well as giving more weight to the accuracy part of the computation. In other words, a perforation solution must work harder at maintaining low accuracy loss than in speeding up the computation.


OCaml programs don't usually use loops. This is what initially led us to offer Aperf.array_iter_approx

Programs must have a way to calculate the accuracy of perforated answers. This is a huge limitation in practice because usually this means the user of aperf must knowing the answer ahead of time. In a lot of domains this is unfeasible. Finding applicable domains is an ongoing problem in approximate computing.



Summing \(m\) out of \(n\) numbers sounds boring but it's a good place to start. This problem is interesting because there are theoretical bounds to compare results to. Hoeffding's Inequality tells us what we can expect for absolute error. The following graph gives maximum absolute error, with confidence of \(95\%\), when keeping and summing \(m\) variables out of \(100\) where each variable is drawn uniformly from the range \([0,100{,}000]\).

theoretical absolute bounds

We can also give the same graph with percentage bounds (assuming the result is \(5{,}000{,}000\)).

theoretical percent bounds

Our code for summing is a little awkward because aperf cannot currently work with most computations. The best option we can fit our computation into is Aperf.array_iter_approx. We also use [@perforatenote perforated] which hooks into aperf and saves the used perforation to the variable perforated. We use this to extrapolate the final summation value. The final, awkward code:

Aperf.array_iter_approx (fun a -> answer := !answer + a) arr [@perforatenote perforated] ;
answer := int_of_float ((float_of_int !answer) *. (1. /. !perforated)) ;


We perform initial exploration using \(30\) input sets of \(100\) numbers between \(0\) and \(100{,}000\). Below is a graph of the accuracy loss of an exhaustive search of perforation while keeping \(x\) out of the \(100\) numbers.

exhaustive training results

Note that we are well below the theoretical loss. Below is a graph of the accuracy loss of a hill climbing run of the perforation with an accuracy loss bound of \(30\%\).

explore training results

Our exploration finds that a perforation of \(15\%\) is adequate. This means that out of \(100\) numbers we are only summing \(15\) of them. From this we get a speedup of \(3.6\mathrm{x}\) and an accuracy loss of \(11\%\).

It specifically chose \(15\%\) because it had a good combination of high speedup and low accuracy loss. Among the other configurations the system tried was \(35\%\). It had a speedup of \(2.3\mathrm{x}\) and an accuracy loss of \(9.5\%\). Why wasn't it chosen? The program saw that with just a \(1.5\%\) further sacrifice of accuracy the system was able to decrease run time by over one factor. Another configuration the system tried was \(6\%\) which has an impressive speedup of \(7.9\mathrm{x}\) at an accuracy loss of \(16\%\). The system, as programmed now, did not want to sacrifice an extra \(5\%\) accuracy loss for the extra speed. Perhaps if aperf supported user defined climbing strategies then someone could program a strategy which accepted that tradeoff and explored further.

If we require an accuracy loss bound of \(5\%\) then aperf finds a perforation of \(50\%\) to be the best result which gives a speedup of \(1.3\mathrm{x}\) and an accuracy loss of \(3.5\%\).


We test the two perforation results from the \(30\%\) accuracy bound and the \(5\%\) accuracy bound on \(10\) testing sets each of \(30\) input sets of \(100\) numbers between \(0\) and \(100{,}000\). This is to test the generalization of the perforation—we want the perforation to work equally well on data it wasn't trained for.

$$ \begin{array}{r|llllllllll} \textrm{perforation} & \rlap{\textrm{accuracy loss across 10 test runs}} \\\hline 15\% & 10.9\% & 13.8\% & 13.5\% & 11.6\% & 12.0\% & 11.9\% & 9.7\% & 10.9\% & 10.6\% & 10.6\% \\ 50\% & 3.7\% & 4.8\% & 5.2\% & 3.8\% & 4.6\% & 6.6\% & 4.5\% & 5.7\% & 5.3\% & 3.4\% \end{array} $$

We then run the same experiment but with inputs drawn from the range \([0, 1{,}000{,}000]\). This is to test the scalability of the perforation—we want the perforation to work well for larger ranges of data than those it was trained for.

$$ \begin{array}{r|llllllllll} \textrm{perforation} & \rlap{\textrm{accuracy loss across 10 test runs}} \\\hline 15\% & 7.7\% & 10.8\% & 11.4\% & 9.7\% & 11.7\% & 12.0\% & 9.2\% & 10.1\% & 9.6\% & 12.4\% \\ 50\% & 3.7\% & 4.3\% & 5.6\% & 4.4\% & 4.3\% & 5.2\% & 3.4\% & 4.7\% & 4.2\% & 4.7\% \end{array} $$

It looks like the perforation is generalizing and scaling nicely. Of course this is a simple problem and we'd like to explore some results from a more complex domain.


We perforate two loops in K-means

  1. the loop over the points when finding new centers
  2. the number of iterations of the step algorithm


We initially train on \(30\) inputs of \(1{,}000\) \(2\)-dimensional points where values in each dimension range from \(0\) to \(500\). Below is a graph of the speedup vs. accuracy loss of an exhaustive search of perforation.

exhaustive training results

Below is a graph of the speedup vs. accuracy loss of a hill climbing run of the perforation with an accuracy loss bound of \(30\%\).

explore training results

The result aperf picked in the end was \(44\%\) perforation for the points loop and \(33\%\) perforation for the step algorithm. This configuration had a speedup of \(2.8\mathrm{x}\) and an accuracy loss of \(15.5\%\). If the user wishes to move upwards into the \(3\mathrm{x}\) speedup range then there are some results there that offer \(17\%\) accuracy loss and above as seen on the graph. Again, this is all about efficiently exploring the perforation configuration space. Different strategies could be a future feature.


We test on \(30\) inputs of \(1{,}000\) \(2\)-dimensional points where values in each dimension range from \(0\) to \(1{,}000\). This tests whether the perforation scales with larger ranges on the dimensions. Below are the results over \(10\) runs.

$$ \begin{array}{r|llllllllll} & \rlap{\textrm{speedup and accuracy loss across 10 test runs}} \\\hline \textrm{speedup} & 2.7\mathrm{x} & 2.7\mathrm{x} & 2.8\mathrm{x} & 2.8\mathrm{x} & 2.9\mathrm{x} & 2.8\mathrm{x} & 2.8\mathrm{x} & 2.8\mathrm{x} & 2.8\mathrm{x} & 2.6\mathrm{x} \\ \textrm{accuracy loss} & 17.2\% & 15.9\% & 14.4\% & 14.6\% & 15.6\% & 15.2\% & 17.1\% & 11.8\% & 15.1\% & 13.8\% \end{array} $$

We also test on \(30\) inputs of \(10{,}000\) \(2\)-dimensional points with the same dimension ranges. This tests if the perforation scales with the number of points in each input set. Below are the results over \(10\) runs.

$$ \begin{array}{r|llllllllll} & \rlap{\textrm{speedup and accuracy loss across 10 test runs}} \\\hline \textrm{speedup} & 3.3\mathrm{x} & 3.1\mathrm{x} & 3.2\mathrm{x} & 3.2\mathrm{x} & 3.2\mathrm{x} & 3.2\mathrm{x} & 3.2\mathrm{x} & 3.3\mathrm{x} & 3.3\mathrm{x} & 3.2\mathrm{x} \\ \textrm{accuracy loss} & 13.3\% & 11.0\% & 12.3\% & 12.2\% & 15.5\% & 14.7\% & 14.6\% & 12.9\% & 15.4\% & 10.9\% \end{array} $$

Finally, we test on \(30\) inputs of \(10{,}000\) \(2\)-dimensional points where values in each dimension range from \(0\) to \(5{,}000\). This tests if the perforation scales with larger ranges in the dimensions as well as the number of points in each input set. Below are the results over \(10\) runs.

$$ \begin{array}{r|llllllllll} & \rlap{\textrm{speedup and accuracy loss across 10 test runs}} \\\hline \textrm{speedup} & 3.1\mathrm{x} & 3.2\mathrm{x} & 3.2\mathrm{x} & 3.2\mathrm{x} & 3.2\mathrm{x} & 3.2\mathrm{x} & 3.2\mathrm{x} & 3.1\mathrm{x} & 3.2\mathrm{x} & n3.2\mathrm{x} \\ \textrm{accuracy loss} & 13.1\% & 14.1\% & 15.4\% & 12.6\% & 13.5\% & 12.6\% & 12.9\% & 12.9\% & 12.6\% & 14.2\% \end{array} $$

The perforation seems to be scaling nicely.


We should test more computations with aperf. I'm excited to see how the community may use this tool and I'm willing to work with anyone who wants to use aperf.

Jul 15, 2016


Approximate computing is sacrificing accuracy in the hopes of reducing resource usage.

Application specific techniques in this space require too much domain specific knowledge to be used as generic solutions for approximate programming.

I will go through some common techniques in approximate computing, why they don't work well as generic solutions, and then offer a potential future direction.

Statically Bounding Error

One commonly sought after goal in approximate computing is to statically bound the absolute error of a computation. The most applied approach for accomplishing this is to statically recognize patterns of computation and use statistical techniques to derive bounds.1 6015978370

For example, we can use 910-667-0621 to approximate the summation of variables. Unfortunately it's use is constrained in that the variables it sums must be independent and bounded by some interval \([a,b]\). If we meet these constraints then we can give the following error bound, with a probability of at least \(1-\epsilon\), on a summation of only \(m\) numbers out of an original \(n\)-number summation:

$$ (b-a) \sqrt{\frac{n(n-m)}{2m}\ln{\frac{2}{\epsilon}}} $$

With further use of statistical techniques, it is possible to derive static absolute error bounds when calculating mean, argmin-sum, and ratio. However, depending on which statistical technique you use, you might have to prove or assume i.i.d.-ness of random variables, that sequences of random variables are random walks, the distribution of random variables, or other properties. These are all very domain dependent properties which don't fit well into a generic programming framework. Further, the patterns of computation these techniques work for (i.e. sum, mean, argmin-sum, and ratio) are very low level; it's unclear whether these techniques can scale to high-level algorithms without running into very advanced statistical techniques. For example, how would the error bounds on a sum help you give error bounds on a Fourier transform?

Reasoning and Proving

There has been at least one effort in offering programmers the chance to reason and prove properties of approximate programsdown cushion. The dream is that you would be able to prove more complex properties about more complex calculations than the pattern recognition tools offer.

For example, the referenced paper gives an example of reasoning over a program which skips locks to speed up execution time. The program is 14 lines. The reasoning, done in Coq, statically bounds the error on the calculation. The proof is 223 lines (not counting comments, blank lines, or the almost 5k lines of Coq which make up the library which powers the proof). I am not pointing out the line counts because I do not believe in this approach. On the contrary, I do believe in this approach—I just don't think it's feasible to expect an average programmer today to reason about their approximate programs using this framework. Maybe in a few years; not now.

A Hands on Approach

I do not believe that static analysis via pattern detection is a good fit for generic approximate computing. I do believe that proving and reasoning could become a good generic solution if and when proof automation becomes more mainstream. I believe the closest we can get today to a generic solution is to offer programmers a toolbox which allows them to explore the effects of approximate programming techniques in a hands on manner.

I've started writing a user-guided loop perforation system called aperf. The code is on github. Right now it only supports loop perforation. Loop perforation is when some iterations of a process are dropped in the name of approximate computing. The goal is to drop as much as possible while still maintaining an acceptable error/resource tradeoff. aperf takes an annotated source program as input and automatically searches the error/time tradeoff space for a pareto curve. A detailed report of the perforation options are returned the user.

Aug 04, 2015


This is a proposition for concatenative editing. I've heard that this is somewhat how vim does things, however I think it's different enough to warrant a discussion.

Short background - concatenative programming

Concatenative programs are written as a sequence of words. Each word denotes a function and usually it involves manipulating a stack. An example python program which applies three functions to a variable might look like this


while in a concatenative language it might look like this

first second third

The thing to remember is that functions are applied from left to right and that the argument is implicit. Usually, in concatenative languages, intermediate data is stored on a stack.

Concatenative editing

In the concatenative editing world data like "the number of times to run a function" or "make this function work across words instead of lines" are pushed onto the stack and the functions which act upon the data are "move forward", "delete text", "next item", and so on.

A concrete example is 2 l k. 2 pushes the number 2 onto the stack, signifying to run a function twice. l pushes a line reference onto the stack, indicating that a function should work across lines. k is a function which pops 2 and l off the stack, analyzes them, and executes accordingly. In this case it kills (deletes) two lines. If the stack was 3 w then k would kill 3 words instead of 2 lines. There can be defaults too. Without a line or word reference on the stack, k could by default kill a line. And without a number it would do that once.

Another function is e. This is the end function. By default it will go the end of the current line. Given a word reference it will go to the next word ending. 5 w 3 will go to the end of the 5th word forward. If a window reference was present then e would go to the end of the current window. If a paragraph reference was present then e would go to the end of the current paragraph.

Numbers do not always have to indicate a number of times to run a function. For example if L is the goto line function then 22 L will go to line 22.

Higher order editing

Higher order functions are a great abstraction tool in programming. Will they come in handy for editing? It's hard to say yet, but let's look at some potential problems and how they can be expressed using higher order editing functions. I will use brackets ([]) to push higher order functions onto the stack.

I'll pretend I'm editing the following code with my cursor at the beginning

{Place.line: line_finder, Place.word: word_finder, Place.window: window_finder}

If I want to transform each value in this dictionary to be called instead of just referenced (add a pair of empty parenthesis after the function name) then I can represent that using the following sequence of concatenative editing commands


3 runs the rest of the operation 3 times. The code in the brackets inserts a pair of parenthesis when called. The 2w analyzes the stack and moves forward 2 words, calls the code in the brackets, and does that 3 times.

Another example is when writing markdown. A common operation I'll do is to put each sentence on its own line. This can be expressed as


l signifies to apply the operation to the line. The i in the brackets inserts a newline whenever called. The s signifies to apply the function across sentences. And finally, the n analyzes the stack and moves forward across a sentence, calls the code in the brackets, and does this across the entire line.

This operation would be repeated often so it would end up in some library code, however I hope I've still expressed a good use for higher order editing.

Higher order editing might not be useful. It might be faster to do some editing manually then to think about how to properly express the editing using higher order editing. I also envision that most of its usecases could also be solved using regular expressions.


Concatenative editing has been fun for me to use and implement. It is a lot like vim (so I've been told, I'm not a vim user) and I hope other people get a chance to try it out. Currently it is implemented in an under-development, soon-to-be-released editor (of which more info can be found in my older blog post or at its (513) 542-1412).

Higher order editing might be useful although it also might be too complex. Most editing operations will be quicker to execute manually than to think of how to correctly compose higher order editing commands to get the right result.

Jul 26, 2015


I am writing a new text editor named `vx'. Its main purpose is to be easily extensible using Python.

The editor has a small core written in C (~1.6k loc) which handles the text structure and all of the low level UI. The editor logic is all written in Python. This includes keybindings, window management, window splitting, prompts, scrolling, cursor control, and more.

I have a few screen casts that chronicle the development of vx rather nicely. They are available on asciinema 4038476255.

vx is being developed at a decent pace. I plan to release it on GitHub within a month. I will most likely release it into the public domain.

strumulose  ·