Category programming

Advent of Code, Day 7

Did this yesterday but didn’t write it up. This was a complete pig, because the test data set had all the dependencies in order: when read line by line, there were no data which relied on the output of data which hadn’t been read yet. Naturally I assumed the real data would be like that, but no. Which involved rewriting the whole damn thing.

The data is in the form

and the problem is to find the ultimate value of a variable, in this case a.

Naturally, we need to parse the input. I’ve written parsers for Angort before, but I thought I’d have a go at writing a combinatorial parser framework (for the lulz, as the young people said a few years ago). I’ll gloss over the framework for now, but the code to parse the basic elements looked like this:

Each parser is a function, and there are functions to wrap parsers inside other parsers and combine them in various ways to produce new parsers (hence “combinatorial”). For example:

creates a parser which consumes the regular expression “[0-9]+”. The next bit,

takes that parser, and modifies it to change its item into an integer.
The last bit

modifies it further to label its results items with the symbol `number, before saving it to the variable NumberParser.
We also have choice parsers:

generates a parser which accepts either identifiers or numbers, and labels the result as an “rvalue” – C jargon for a value which cannot be assigned to (essentially).

Now we create the command parser. This takes the elements above, using seq to make parsers which accept sequences of those elements, and uses choice to make a parser which can accept one of those. Each sequence corresponds to a command. We’re parsing line-by-line rather than the whole file, to make life easier, and so we want to do something when a command is accepted. To this end, each command sequence parser is modified with action, which takes a function to be performed on acceptance:

Finally, if no other thing matches in the choice, anything will always match, consume all remaining tokens and perform an action.

So what are the actions? These set values in a hash called Wires. Each entry is keyed by the name of the variable it assigns, and the value is a list consisting of an operation and the names of the variables it is to be performed on. Possible values are:

Let’s take a closer look at the binary operation parser:

The parser accepts a sequence consisting of an rvalue, a choice of some strings (the binary operation names), another rvalue, an arrow, and a variable name. Its action is to set the variable name to

in the hash. First, the action takes the incoming parsed item and runs itemdehash, which turns it into a considerably simpler kind of hash/list combination. It then runs dehashlist to simplify the result still further, so all we get is a list of the values read in. This process is necessary because everything the parser generates is in the form of parser items, which are hashes containing the item label and its content, and which are messy to work with directly. Now this is done, the action can simply assemble a list out of the new data, and then set that value into the Wires hash with the key also obtained from the list.

Now we need to use this hash to solve the problem. I elected to use the dumb method. A function called resolve repeatedly scans the hash, running an appropriate function for every variable it finds whose value is not known. Known values are stored in the Vars hash. Each subfunction looks to see if the variables it requires are known, and if so calculates and stores the result. This resolve function repeats until no more passes are required – all variables had known values.

Here is the relevant code:

Finally, we need to read the file in line-by-line, tokenising each line (splitting on white space) and parsing it. We then need to resolve the variables, and print the one we want:

Advent of Code, Day 6

Considerably more hairy. I could have used the same trick as Day 3, faking up a 2D array with a hash, but it would have been ludicrously slow. Instead, I added a C++ class for a 2D int array to the library I started in Day 4.

First, we need to write the class. This needs to be tracked for garbage collection, so we make it a subclass of GarbageCollected. This code goes after the %shared line, so we’re in the library’s namespace:

Very simple, mapping a 2D array onto a 1D one. I’m not even doing any bounds checking (yes, this is bad and wrong of me).

Now we need to define a type object, which will be a singleton to handle the interface between Angort and C++ for this type. This is a simple one, with no iterators because it’s not a container (well, it is, and I could have added an iterator, but I didn’t):

The constructor adds the type to the type system. The get() method takes an Angort value, typechecks it, and returns the underlying pointer to the Int2DArray if it’s correct. Otherwise it throws an Angort runtime error. The set() method is used to create a new array and set an Angort value to it. It clears any previous value (decrementing any refcount and perhaps deallocating data), sets the value’s type, creates and sets the array, and finally increments the reference count on the value.

Finally, we create the type object singleton on the heap, and use a directive to tell the preprocessor about the new type so we can use it in word definitions:

That second line says that the typecode int2d is associated with the type object tInt2D and wraps objects of the type IntArray2D.
Having done this, the preprocessor can automatically unpack parameters of our new type in words we define. The first word we can write is that to create a new array:

This takes to integer arguments (unwrapped to p0 and p1), and uses them to create a new array using our type object. This array is set into a new value pushed onto the top of the stack.

We can now write words to access the array:

In both these words, the definition uses the marker A – this means the first typecode after the bar, which is the int2d we set up in the %type directive. Thus, in int2dget, the arguments are two integers and a 2D int array. Parameters p0 and p1 are integers, and p2 will be an Int2DArray.

That’s all the new C++ code, and when this library is loaded in we’ll have our new garbage-collected 2D integer array.

The Angort code for part 1 is fairly simple. We start with loading the libraries, and creating our new array (storing it in the global B):

Now some words to access our global array, making life easier:

The first word, s, sets a value in the global, taking coordinates and a value to set. The next word looks odd because we’re not explicitly using a parameter list – the x,y coordinates are assumed to be on the stack so all the word needs to do is push the array and call the C++ getter word.

Now to write a more involved function (the terms ‘function’ and ‘word’ are interchangeable in Angort – ‘word’ comes from Forth). This function op will take a range of coordinates and run another function on all of them:

The called function f is assumed to take x,y on the stack.

Now we can write a function to take a line of text and process it:

Ugly. We split the line by spaces and/or colons using a pre-compiled regex, and then look at the first one or two elements to determine the command. Each case sets the function to one of three anonymous functions, and sets the split point from which to to get the numeric arguments (different for “toggle” and “turn on/off”). After the cases, we slice the string using the split point we stacked, convert and push the numeric arguments, and call op with the function we set. There are far better ways to do it.

We now run this function on all lines in the file, and count the number of non-zero elements in the array afterwards:

Part two involves modifying the anonymous functions in the cases, and adding a new function to clip to zero (just for ease of reading):

Runtimes aren’t great, 1m8s for part 1 and 1m47s for part 2. Angort is a silly language to do this sort of thing in. I could have made it a lot faster by writing the rectangle set/get/toggle operations in C++, of course.

Advent of Code, Day 5

This was pretty quick – Angort has POSIX regular expressions, so it was just a matter of working out the right ones (and counting the resulting matches). Here’s the code for both parts:

The only vaguely interesting bit is in part 1, where I map over a list of strings at compile time (that’s what << >> does) to produce a list of regexes to match, which I then iterate over.

Advent of Code, Day 4

This one’s a bit hairier in Angort, because I don’t have an MD5 facility. I could either reimplement MD5 in Angort, or use someone else’s C/C++ code and make a library. I decided to do the latter (because I’m not completely insane).

Writing a library in Angort is easy. First, I downloaded a C implementation of MD5 from here. I modified the include file so that I’d be able to use the code from C++:

Note the new extern "C" {} blocks, guarded by #ifdef. When included from C++, this will tell the compiler these are C functions.

Now the library code. Angort has a handy Perl script called to preprocess special directives in C++, making libraries easy. We write a file called advent.cpp using these directives. Here’s the first part:

Include a few files, specify we’re going to use the angort namespace, and then two directives: one giving the Angort namespace the library functions will be loaded into, and another saying we’re making a shared library plugin. Then the actual code for a new Angort function, advent$md5:

The first line specifies a word (Angort function) with arguments. The word is md5, and it has a single string argument (the “s”). Anything following that is a text description, and these typically start with a “stack picture” in brackets. Here, the stack picture specifies that before the word runs there should be a string on the stack, and afterwards there will also be a string on the stack (the result).

The code generated by the directive will pop the arguments into parameter variables p0,p1… and so on. In this case, we have a single string parameter, p0, declared as a const char *. We get its length and declare an output buffer out. Next comes the code which generates the hash into the output buffer. Then, we build a hex string using the ugly and evil sprintf(buf+strlen(buf),...) thing, which you should never do, but it’s OK here because looking at the code I can guarantee it’s always going to be exactly 32 hex digits. Finally, we push that string onto the Angort stack, which will return the value from our new word.

One last thing is an initialisation function, which in this case doesn’t do much – just prints a string giving the build date and time:

This isn’t required, but I always add one of these just to make sure I’m running a recent version of a library.

We need to compile this code. I’ve got a simple build script which looks like this:

It generates advent.plugin.cpp using the script, and then compiles both this and the MD5 file, specifying -fPIC. This tells the compiler to generate position independent code, which is required for shared libraries. Finally it links the files into a shared library called advent.angso, where .angso is the standard extension for an Angort plugin.

We can test the plugin with a simple one-liner on the Angort command line (assuming we build the library in a lib subdirectory:

Looking at the output from a web-based MD5 generator, this is fine.

Finally we can write our program. This is a simple one liner, which keeps a count on top of the stack. The value is concatenated to our key string (which produces a string) and MD5’d, and the first five characters of the result are compared with “00000”. The infinite loop exits when the the result is true, and then we print the count, on top of the stack:

We have to remember to include the library, either by invoking Angort with angort -llib/advent or putting this line in a script with "lib/advent" library drop at the top.
The only difficult thing about this is the use of over. This is an occasionally useful stack manipulation word, stolen from Forth, which has the stack picture “(a b — a b a)” : that is, it takes the value underneath the stack top and pushes a copy of it onto the stack.

Part two is just as easy, just changing the program to

On my pretty slow laptop, the 5-digit run takes 2.6 seconds, and the 6-digit run takes 76 seconds. MD5 isn’t quick.

Advent of Code, Day 3

Slightly tricker, but still easy. Here, I keep a hash of visited houses – for my own amusement, the “mark” function increments the count so I can see how often a house was visited. I convert the input into a list of chars using identical code to before, and map over this list with a hash get to produce a list of [deltax,deltay] tuples. I then just iterate over this, updating x and y and marking the hash accordingly (the hash keys are strings of the form “x/y”; ugly, but Angort can’t key a hash on a list).

The second part is a little more complex, because it involves using alternate entries in the list to update different pairs of coordinates, marking both. I could have done this with a boolean toggle, but instead used an explicit iterator object.

Advent of code

I don’t know whether I’m going to have time to do all these, but I thought I’d try at least the first few puzzles of Advent of Code in my pet language, Angort.

Here’s day 1, which involves counting parentheses. I do the first part by making a list of the characters in the input, mapping that list to get -1 or 1 from a hash of characters to integers, then summing the list. The second part is done by going through that same -1,1 list and keeping a running count.

Day 2 is a bit more involved. First I have a couple of functions – one finds the minimum of a list of numbers, the other finds the minimum of a list of lists [a,b] such that the result has the minimum a*b (i.e. if we pass in the sides of faces of a cube, we get the smallest face out). I then make a list of inputs, each of which is a list of 3 integer elements (length,width,height). Then I go through the list of lists, accumulating the first result; and then again accumulating the second result.

Angort joy

Yes, it’s my own language, and I use it largely out of bloodymindedness. But the more I use it, the more I find it’s actually bloody good. Here’s an example.

I’m working with a homemade dataglove (and perhaps more on that later) which requires a bit of postprocessing on the input. For example, it’s useful to smooth the accelerometer values, and do edge detection on the “buttons” (when the fingertips touch the thumb).

In Angort, we can write a general purpose “smoother” or low-pass filter which does

x_t = (1-k)i + kx_{t-1}

(i.e. the output is k times the last output plus 1-k times the input) by writing a function to make smoothing functions:

:mksmoother |factor:prev| 
        ?prev isnone if ?val!prev then
        ?val 1.0 ?factor -  * ?prev ?factor * +
        dup !prev

It might look a bit hairy, but it’s pretty straightforward once you’re used to the syntax.
Whenever we need to smooth an input, Angort allows us to create smoothers inline at compile time using the << >> syntax:

dataglove$x <<0.9 mksmoother>>@ !SX
dataglove$y <<0.9 mksmoother>>@ !SY
dataglove$z <<0.9 mksmoother>>@ !SZ

Each of these lines reads the accelerometer value for an axis, runs it through a smoother with a smoothing constant of 0.9, and stores it in another variable. Each smoother has its own state, through the joy of closures.

Similarly with edge detectors:

:mkposedgedetect |:prev|
        ?i ?prev not and if 1 else 0 then
        ?i !prev

    # use positive edge detectors to spot the button down.

        dataglove$ismiddledown &lt;&lt;mkposedgedetect&gt;&gt;@
          if <code>middle case
        dataglove$isindexdown &lt;&lt;mkposedgedetect&gt;&gt;@
          if </code>index case
        dataglove$isringdown &lt;&lt;mkposedgedetect&gt;&gt;@
          if <code>ring case
        dataglove$islittledown &lt;&lt;mkposedgedetect&gt;&gt;@
          if </code>little case
        none otherwise

Here we use the case syntax, which is more like if..elseif..elseif..else in other languages.

I might be wrong, but it strikes me that using and creating these anonymous functions inline in this way would be quite tricky in most other languages.

A little bit of Angort

Those of you who know me will know that I have a tendency to expand all my projects until they contain a crap programming language. It’s just common sense – I like my stuff to be infinitely reconfigurable, and the only way to guarantee the “infinite” part is to make it Turing-complete.

Also, programming languages fascinate me, and I don’t trust them. How does garbage collection work exactly? How does lexical closure work? As an old-school low-level programmer, I had to have answers to these questions before I could work in the languages with these facilities. So one on-going project over the last couple of years has been Angort, a language which does those things.

In fact, I wrote the code for my final year project in Angort. I kept thinking this was silly, and tried to write it in C or C++ a few times, but it got really messy. It was so much easier in Angort.

What Angort does

  • Fully garbage collected: minor garbage collection works by reference counting, while major garbage collection checks for cycles. If it’s good enough for Python, it’s good enough for me.
  • Functions are first-class objects.
  • Full lexical closure and anonymous functions.
  • A fair few functional programming facilities.
  • Hash tables and lists.
  • Plugins are written as shared libraries.
  • No dependencies (although the command line interpreter won’t be fun without GNU readline, and the optional plugins, provided as examples of what can be done, do need various libraries)
  • Pretty small executable – about 200K

What Angort doesn’t do

Parsing is boring (and I’d already written an earlier language, Lana, with lots of syntax), so Angort has a very simple – almost nonexistent – parser, in that 90% of the time each token encountered is directly compiled to an instruction. Angort is therefore a postfix language, rather like Forth or Postscript. This has disadvantages: notably that bugs are pretty easy to write, and code can be hard to read unless you’re used to it. But it does have some advantages too: understanding the compiler is easy, very little optimisation is required (which is good, because Angort doesn’t do any) and the code is incredibly compact.

As I said, Angort doesn’t optimise the bytecode it generates. This means you may end up writing some pretty suboptimal code, unless you’re smart – you’ll need to eliminate common subexpressions and fold your own constants.

Quick examples

Here’s a program to print the sum of all the squares of all the numbers from 1 to 100:

This does the following:

  • Stack the number zero
  • Stack a range from 1 to 101 (not including the 101)
  • Iterate over this range, doing:
    • stack the current item
    • duplicate it
    • multiply the top two stack items (effectively squaring)
    • add to the top item on the stack (initially the zero we put right at the start)
  • Finally, print the accumulated sum.

We could also have done it this way:

which works like this:

  • Stack the number zero
  • Stack a range from 1 to 101
  • map the range through the anonymous function (dup *) (which squares the number passed in) producing a list of all the squares
  • Use the functional programming technique “reduce” with the initial zero as the accumulator, the list we already have, and the anonymous function (+) to combine pairs of items
  • Print the result, which is the sum of all the items in the list.

And, of course, you can write functions. A lot of Angort is based on Forth, but Angort makes life much easier by providing straightforward parameters and local variables:

The local definition block, delimited by vertical bars, describes the parameters (before the colon) and the locals (after the colon). Internally, parameters are exactly the same as locals. The only difference is that parameters are popped off the stack and stored in the variables before the function runs. We can access variables by using ?name to get a variable, and !name to store.

Values are returned from functions simply by leaving them on the stack.


Big example

So now I’m going to show a fairly big example. This is a graphical program, using the fairly crude interface to SDL I knocked up over the weekend. It generates about 1500 particles (little square blobs) and explodes them out from the centre of the screen at random velocities, bouncing them off the screen sides. I’ll go through it bit by bit to demonstrate the fun.

First we’ll import the SDL plugin.

The sdl stacks the symbol "sdl" - a symbol is a single-word string which is manipulated internally as an integer for efficiency - this makes hash table indexing much faster. The library word then searches for a library of that name, loads it, and then stacks a reference to the namespace created. Then import will import that namespace, so we don't need to refer to the stuff defined inside it by putting sdl$ in front of everything.

Now we've got SDL loaded, we can start work. First we'll turn full garbage collection off - normally this happens every 100000 or so instructions, but we're going to be processing a lot of GC-able objects. This is a graphics demo, so even an imperceptible delay is undesirable. We'll just set the autogc interval to a negative number to disable it:

Now we define a function (often called a word, after the Forth terminology) called frand. This generates a random floating pointer number from 0 to 1. The first line ends with a comment (starting with a hash). This comment is in the form of a stack picture: this commenting convention shows in brackets the stack both before and after the function is called, separated by a long dash.

The word itself takes a large random integer, finds it mod 50000, divides it by 50000 (converting to a float in the process, because it's dividing by a float) so giving a random number from 0.0 to 1.0, and then multiplies that by the number that's still lying around on the stack from being passed in.

Now we just define a couple of constants:

Now for a couple more words. The first takes a parameter len and uses it to generate a random vector of that length:

It does this by:

  • stacking a random angle from 0 to TWOPI
  • duplicating this on the stack
  • finding the cosine of the angle and multiplying it by len
  • swapping the top two stack items (so we now have the other copy of the angle on top)
  • finding the sine of the angle and multiplying it by len

so we end up with the sine and cosine of the angle, each multiplied by the length, on the top of the stack. This is probably the hairiest piece of code in the entire example, because I'm being a bit clever with the stack to get more speed.

Now we write a word to generate a random colour, in the form of a list containing three integers from 128 to 255.

First we start the word definition, this time specifying no parameters but a local variable called f. We then define an anonymous function which generates a value from 128 to 255, and store this in the local variable. Finally, we build the list by calling this anonymous function three times - ?f@ fetches the value of f (using ?f) and then runs it (using @):

The square brackets delimit a list. Actually, it's cleverer than that - the [ word puts a new list on the stack. Then both , and ] put the topmost stack item onto the list below it on the stack, leaving the list there. There are a few little bits of hackery - the close bracket is ignored if it immediately follows an open bracket, and it also works with hash tables as we'll see below.

Now we have the most complex piece of code. This is a function called mkthingy, which returns a new "object" which flies around the screen. This object is in the form of a hash table.

First, we start the definition. No parameters, but three local variables: this, dx and dy.

Now the code. We generate a random number from 0 to 2, and then square it. Then we generate a random vector of that length and store the vector in dx and dy:

Now we put a new hash on the stack, ready to add things to - but first we copy the hash reference into the this local, ready for use later. We remember to leave a copy behind on the stack so we can still add things!

Next we add the "object's" fields' initial values to the hash - this consists of stacking the key (always a symbol here) followed by the value, and using a comma operator to write to the hash. These values consist of its position at the centre of the screen (determined from the screen size globals we'll set up on initialisation) and the random vector we generated earlier, which will be the velocity. We also set a random colour using the randc function.

Now we add an anonymous function to the hash. This is a method which will move the thingy by its velocity, and reflect it off the sides of the screen. Anonymous functions are delimited by parentheses and can take parameters and local variables - in this case we have the locals x and y.

The first thing the function does is add the x member to the dx member and store the result in the x local.

A few things to note: firstly, we can get access to a value in a hash table which has been stored using a symbol key using a bit of syntactic sugar: ?symbol. So here, ?this?x means "get the local variable this, then get the value of x stored within it (assuming it's a hash)".
Secondly, the this local variable is defined in the context of the outer function mkthingy -  it will therefore be stored in a closure, so this anonymous function will remember its value when the outer function exits. It contains a reference to the hash itself, so this "method" always manipulates the right "object." (As an aside, this effectively makes this anonymous function akin to a C# delegate: it contains both a reference to an object and the method to call on it. Useful!) We repeat the process for the Y coordinate:

Now we do a little bit of maths, to check to see if the new coordinates are going to out of range next time round. If they are, we negate the appropriate component of the velocity.

The "if...then" syntax in Angort is a little odd: rather than the more traditional if (condition) then (code) endif, Angort has (condition) if (code) then.

Finally, we write the local copies of the coordinates into the hash:

Now we add another method, or delegate, to the hash. This one draws the item.

Firstly, it iterates over the colour member (which is a list) to unpack the three items (r,g,b values) onto the stack. It then calls the SDL plugin's col function, which will set those values as the foreground colour.

Then we simply draw a 10x10 rectangle in the current foreground colour at the x,y position stored in the hash.

That completes the hash, and also the mkthingy function, which returns the new hash on the stack:

Next we come to the event handlers. First, we define what happens when a key is pressed:

Here, we have defined an anonymous function which takes a single argument x. If this is equal to the ASCII value of "q" (for "quit"), then we call the done function in the plugin, which will end the main loop. We then call onkeydown in the plugin, which sets the callback to call when a keydown event occurs.
Similarly, we define the draw callback. This will clear the screen, move and draw all the thingies in the global list Thingies, and then flip the front and back buffers:

Note how we call the methods: i ?move @ (to write it with spaces) will get the current hash (Thingies will be a list of hashes), fetch the `move value from it (which will be a method/delegate/closure or whatever) and call it.
Now we can initialise SDL, opening a full screen application at whatever the current resolution is:

We store the screen width and height (minus 10 so we don’t have to do that calculation in each move call) in globals called Width and Height:

Now we do a little bit of clever trickery with lists to make a list of 1500 thingies, storing it in the global Thingies:

Finally we start the main loop, and when it exits we close SDL and quit Angort:


Angort is available from this link here. The documentation is incomplete (in some places hilariously so), but there is some. To install it:

  • clone the repo and cd into it
  • mkdir build; cd build; cmake ..; make
  • then (if that works) run sudo make install to install Angort into /usr/local
  • if you want the plugins, make sure you have the required dependencies:
    • IO plugin has no dependencies and is probably the most useful example
    • SDL plugin (which the code described above uses) requires libsdl and libsdl-image development libraries (available as libsdl1.2-dev and libsdl-image1.2-dev packages in Ubuntu)
    • and the two plugins in MPC, which control Music Player Daemon and allow ID3 tagging of MP3 files, require libmpdclient and libtag. You probably won’t want them; I use them to control my music system.
  • to build a plugin, just cd to angort/plugins/(plugin name) and run ./build (once you’ve done the install make). This will build the plugin’s .angso file in the angort/libs directory. Running sudo make install from the angort/build directory again will install any plugins there into /usr/local/share/angort

UPDATED 23/7/14: Because of a recent language change (the self keyword for recursive anonymous function definitions), the local variable previously called self is now called this.



A good evening

  • Lana works again (i.e. it doesn’t cheerfully move the bytecode around in memory while you’re trying to run it)
  • I’ve implemented the Hough transform and the inverse Hough transform
  • The following ImageLana code runs:

    And I get the following images out:

    An initial test image

    The initial test image

    The Hough transform of the image

    The Hough transform of the image

    The inverse Hough transform

    The inverse Hough transform of the previous image

Moving part of a Subversion repository

I keep everything in Subversion. Lecture notes, bits of writing, everything. Including bits of miscellaneous code, the prototypes of which sometimes blossom (or metastasize, depending on your point of view) into full-sized projects.

To this end, I have a “miscellaneous code” repo with subdirectories for each new thingy. Most of these are very small, but sometimes they get large or I might want them to become public, in which case I have to turn part of this repo into a whole new repo.

To do this, I need to:

  • dump the repository
  • filter the repository to include only the bit I want
  • change paths
  • populate a new repo with the new data
Let’s assume we have a local repository (the only kind you can dump) in svn/misc_code and we want to filter for our project barry. import the whole thing into a new repository as trunk.
First, we do steps one and two:

This will give you a huge file you can now perform step 3 on : editing the paths. In our case, we want to change everything that was in the node path barry to be in the node path trunk. Just open the file and do a global replace of


Now you can create the new repo and load the data into it:

Also, pretty much none of this will work with binary data because of the hell that is md5 checksums being screwed up. But for text-only repos, it should be fine.

Copyright © Found
Jim Finnis' personal blog

Built on Notes Blog Core
Powered by WordPress