Friday, August 27, 2010

Old blog: Map in Java

(Found in my years old defunct blog. Less relevant now that Java is getting lambdas.)

Scheme
(map fn lst ...)

;;  ==> list
      

map is a higher order function that applies the function fn of n arguments to each element in the lists lst ...

Java has no map. Enhanced for isn't adequate for the following reasons:

  • No return value: Since for is a control structure, not a function, it doesn't return anything useful. This means a data structure has to be created beforehand, and added to at each iteration of the for loop.
  • No function arguments: This is not for's problem. Functions (methods) are not first class in Java.
  • One iterable only: The enhanced for loop can only iterate over one Iterable data structure. map can operate on any number of lists.

Examples

Simple one list mapping

Scheme
(define squares (list 1 4 9 16))

(define (sqrts nums)
  (map sqrt nums))

(sqrts squares)

;;  ==> (1 2 3 4)
      

Java
/* No generics for clarity.  Assume also the existance of a static
   method `toList' that takes an Array and returns an ArrayList. */

List squares = toList(new Integer[] {1, 4, 9, 16});

List sqrts(List nums) {
    List out = new ArrayList();
    for (Integer i : nums)
        out.add(Math.sqrt(i));
    return out;
}

sqrts(squares);

//  ==> [1, 2, 3, 4]
      
Multiple list mapping

Scheme
(define even (list 2 4 6 8))
(define odd (list 1 3 5 7))

(define (sums . lists)
  (apply map + lists))

(sums even odd)

;;  ==> (3 7 11 15)
      

Java
/* Assume the existance of a static method `isSameLength' that tests
   whether all Lists in an array are the same length. */

List even = toList(new Integer[] {2, 4, 6, 8});
List odd = toList(new Integer[] {1, 3, 5, 7});

List sums(List... lists) {
    List out = new ArrayList();
    if (!isSameLength(lists))
        throw new RuntimeException("Lists must have same length.");
    int numargs = lists.length;
    int length = lists[0].size();
    for (int i = 0; i < length; i++) {
        int val = 0;
        for (int k = 0; k < numargs; k++)
            val += lists[k].get(i);
        out.add(val);
    }
    return out;
}

sums(even, odd);

//  ==> [3, 7, 11, 15]
      

The problem with this is that it isn't generalised.

Map in Java

First we'll need a something that acts as a first-class function.

Java
public interface Proc {

    public Object apply(Object... args);

}
      

Next, a static map method.

Java
public static List map(Proc proc, List... lists) {
    List out = new ArrayList();
    if (!isSameLength(lists))
        throw new RuntimeException("`map' requires lists of identical length");
    int numargs = lists.length;
    int length = lists[0].size();
    for (int i = 0; i < length; i++) {
        Obj[] args = new Obj[numargs];
        for (int k = 0; k < numargs; k++)
            args[k] = lists[k].get(i);
        out.add(proc.apply(args));
    }
    return out;
}
      

Now, the multiple lists example again.

Java
List even = toList(new Integer[] {2, 4, 6, 8});
List odd = toList(new Integer[] {1, 3, 5, 7});

Proc sum = new Proc() {
    public Object apply(Object... args) {
        return (Integer)args[0] + (Integer)args[1];
    }
};

List sums(List... lists) {
    return map(sum, lists);
}

sums(even, odd);

//  ==> [3, 7, 11, 15]
      

Much shorter, but with some limitations:

  • It doesn't use generics, and therefore
  • requires casting in Proc implementations; and
  • it is hardcoded to return an ArrayList.
Parametrized map

First, Proc is changed.

Java
public interface Proc1<R, A> {

    public R apply(A arg);

}

public interface Proc2<R, A, B> {

    public R apply(A argA, B argB);

}

public interface ProcN<R, A, B,..., N> {

    public R apply(A argA, B argB,..., N argN);

}
      

And a parameterized map.

Java
public static <R, A, X extends List<R>> X map(Proc1<R, A> proc, X out, List<A> list) {
    for (A a : list)
        out.add(proc.apply(a));
    return out;
}

public static <R, A, B, X extends List<R>> X map(Proc2<R, A, B> proc,
        X out, List<A> listA, List<B> listB) {
    if (!isSameLength(listA, listB))
        throw new RuntimeException("`map' requires lists of identical length.");
    int length = listA.size();
    for (int i = 0; i < length; i ++)
        out.add(proc.apply(listA.get(i), listB.get(i));
    return out;

// And so on up to `map<R, A, B,..., N, X>.
      

The example would now be:

Java
List even = toList(new Integer[] {2, 4, 6, 8});
List odd = toList(new Integer[] {1, 3, 5, 7});

Proc2<Integer, Integer, Integer> sum = 
    new Proc<Integer, Integer, Integer>() {
        public Integer apply(Integer argA, Integer argB) {
            return argA + argB;
        }
    };

List sums(List listA, List listB) {
    return map(sum, new ArrayList(), listA, listB);
}

sums(even, odd);

//  ==> [3, 7, 11, 15]
      
The cost of parameterizing

The obvious limitation of the generic implementation is that arbitrary numbers of arguments are no longer supported in ProcN or map.

Reduce

reduce is easily written on top of Proc2.

Java
public static <R, A> R reduce(Proc2<R, R, A> proc, List<A> list, R initial) {
    int size = list.size();
    R out = initial;
    for (A a : list)
        out = fun.apply(out, a);
    return out;
}

List nums = toList(new Integer[] {1, 2, 3, 4});

Integer squareSum(List list) {
    return reduce(new Proc2<Integer, Integer, Integer>() {
        public Integer apply(Integer a, Integer b) {
            return a + (b * b);
        }
    }, list, 0);
}

squareSum(nums);

//  ==> 30
      

Friday, August 20, 2010

Why I like Factor

Factor is a relatively new language and implementation in the tradition of Forth, Lisp, and Smalltalk. Like Forth, Factor is concatenative and uses a postfix syntax. Also like Forth, Factor emphasises small procedures and constant refactoring. Like many Lisp and Smalltalk implementations, Factor compiles code into a loadable image. Like Lisp, Factor can perform arbitrary computation at compile time using macros while parse time evaluation allows the creation of new syntax forms.

Unlike Forth, Lisp, and Smalltalk, Factor is modern and unencumbered. Lisp suffers from an ossified specification, Smalltalk has lived inside its own image for so long it's out of touch with the rest of the world, and Forth provides few of Factor's higher level features like garbage collection and dynamic code updating.

Factor’s focus on correctness and efficiency is not commonly found in other modern ‘dynamic’ languages. The results speak for themselves.

1. Factor vocabularies are compiled to machine code. When using the interactive listener the code is compiled before execution (like SBCL).

2. Factor is fast. Not C fast, but SBCL fast.

3. Almost all structures are modifiable at runtime. This includes classes at any position in the hierarchy as well as FFI bindings. When reloading a modified vocabulary, definitions no longer present will be deleted from the running image. These changes are propagated to all dependent code and the changes to the running image are kept consistent.

4. A simple foreign function interface. No quirky header files or IDL necessary; everything is done from within Factor. No reloading of Factor necessary either: create bindings incrementally and test them as you go.

5. Deployment images. Factor’s deployment tool only loads code the resultant binary will run. Minimal image size for a hello world program is a few hundred kilobytes.

6. Common Lisp style condition system. Don’t let stack unwinding lose an exception’s context. Recover anywhere between where the exception is caught and where it was thrown.

Why having a competent designer pays off

In contrast to other popular languages like Ruby and Python, Factor does not suffer from limiting creeds like, “There should be one way to do it,” or from BDFLs who don’t see the value in useful language constructs discovered as early as the 70s. Here's the payoff...

7. Correct lexical scoping. Usually in Factor one uses its postfix syntax and data flow combinators, however it also sports a locals vocabulary defined entirely in factor which implements lexical scoping and lambdas. Despite the fact that this is an addition in a language where lexical variables are not the default, Factor's lexical scoping is correct and its lambdas are uncrippled. Python 2.x is unable to rebind a variable in a parent scope (3.x overcomes this by introducing a new keyword, ugh!), and Ruby only recently (1.9) gave blocks their own scope.

8. Usable higher order functions. Combinators are used in Factor all the time. For some unfathomable reason Ruby allows only one block argument per function while in Python the use of such esoterica as map and reduce is discouraged.

9. Tail call optimization. Yep, that’s right, that helpful recursion thingy. Another thing Python doesn't have because, oh I don’t know, ask Guido. Come to Factor and free your mind from loopiness!

10. Low, hi, and FFI. Factor scales well from highly abstracted garbage collected code, to micro optimization using inline ASM. Along the way, the FFI allows Factor quotations to be used as C callbacks, and provides factor-side memory allocation should you need to store values on the stack or in a memory pool.

11. Methods are orthogonal to classes. Generic words and their methods are defined outside classes. Adding a generic word to a preexisting class is as simple as defining it. Extensibility without nasty monkey patching or name clashes, and a nice fit with a hierarchy that can contain anything from tuples, to predicate classes and C structs. No hideously bloated base-class needed (I'm looking at you Smalltalk). If Programmer Pooh adds a display-in-opengl method to object he need not modify core code, nor will his method clash with any present or future methods on object.

Why having a smart community pays off

Smart, or at least willing to learn new things.

12. No whiners! Try educating a bunch of Java programmers in the utility of higher order functions or tail-recursion...

13. Macros, syntax, and combinators, oh my! If you don’t whine, you get cool stuff. Macros akin to those in Common Lisp, but hygienic like Scheme, because if you don’t have variables, you can’t capture them. Syntax words like Lisp’s reader macros, but better because no dispatch character craziness is necessary. Observe the usefulness of regular expressions that are syntax checked at parse time, or the EBNF vocabulary which compiles an inline EBNF description into parsing code. For higher order goodness check out Factor’s unique dataflow combinators. This is not your ancestor’s stack shuffling!

More stuff I like

14. Live documentation. Factor has live documentation ala Emacs, but prettier and better linked. The documentation is contained in separate files to the code it describes. For those of us who hate hunting for scratchings of code amid screeds of API comments, this is a good thing.

An acceptable Lisp?

Someone once wrote about Ruby being an acceptable Lisp. They were wrong. Ruby doesn’t have macros, reader macros, native compilation, or a REPL from which everything can be modified. (Oh, but it does have three different kinds of lambda!)

Factor isn’t an acceptable Lisp either. Factor is a mighty fine Lisp. Factor is better than Lisp. It has all the things that make Lisp great and more. Factor will make your code beautiful. Factor will cook you breakfast. Factor reads like English, (lisp (like (not))). All hail Factor!

Summation

I like Factor because it hits the sweet spot of pandering to nothing but being a great language. It lifts much of the good stuff from great languages of yore and gives them an improved spin. It hasn’t yet succumbed to the whining hordes of mediocrity. It is written by a team that really know what they are doing. If that doesn’t describe your language, then give Factor a try.