Friday, April 27, 2012

IFn vs __call__

I'm going to coin a phrase: "Those who don't understand the work of Rich Hickey are doomed to reinvent it, poorly". I have found this to be true so many times, I should write it on a post-it note and stick it on my workstation.

Unlike the CLR, the Python VM is a very different beast from the Java VM. For instance, in Java, there is no way to make a arbitrary object act like a function. To code around this issue, Clojure implements IFn which looks something like this:

interface IFn
{
    object invoke();
    object invoke1(object a);
    object invoke2(object a, object b);
    object invoke3(object a, object b, object c);
    etc....
}

At compile-time Clojure then figures out which invoke to call, and dispatches accordingly

However, no such restrictions exist in the Python VM (hereafter the PVM). Instead, python defines the __call__ method:

class MyObject(object):
  def __call__(self, arg):
     return arg + 1

assert(MyObject()(1) == 2)

For quite some time now, Clojure-Py has handled multiply-arity functions thusly:

def foo(*args):
   if len(args) == 0:
      print "no arg"
   elif len(arg) == 2:
      print "two args"
   elif len(args) >= 3:
      print "more than two args"

The best aspect of handling multiple arity functions this way is that it's more "pythonic". In this way, Clojure functions are Python functions and act exactly the same. More recently however, I encountered a form like this while attempting to translate core.match to clojure-py:

(deftype Foo
     IFn
     (invoke [x y] (+ x y)))


Well that's no good. Either we can add some ugly compilation conditionals, move this deftype into a separate file that uses __call__ instead of invoke, or we can switch to using IFn in Clojure-Py as well.

The more I thought about it, the more the use of IFn makes sense. Consider this function

(defn sum
   ([] 0)
   ([x] x)
   ([x y] (+ x y))
   ([x y & more] (reduce sum (+ x y) more)))

If we call (sum 1 2), the VM executes the following:

1) wrap 1 and 2 into a tuple
2) call sum with the tuple as an argument
3) run the tuple through 3 length checks to find the appropriate block to execute
4) assign the local variables x and y to tuple[0] and tuple[1]

However, if we were to use IFn, the compiler would run the above code at compile time, and instead call sum.invoke2(1, 2) directly. This saves us a memory allocation (from the tuple creation), several if statements, and a few other checks. Well this should run much faster!

The reality is, from my testing it looks like switching to IFn will give us about a 10x speed up for Clojure-Py code running on PyPy. On CPython it won't make much of a difference one way or the other, but it will allow for cleaner porting from other Clojure-JVM programs. And we can still have our old compatibility with Python:

class MyFn(IFn):
     def invoke2(self, x, y):
         return x + y
     def __call__(self, x y):
         return self.invoke2(x, y)

But what about Python functions? We can't do tuple.invoke2(x, y)! The tuple function does't have a invoke2 method! In ClojureScript, clojure simply adds invoke to each and every function in the system. In PVM, we can't modify functions. We could infer the information at compile-time, but that's no good in the cases where function could be replaced at run-time.

Ah! So this is why Clojure-JVM never allows direct invocation of static methods. Instead users must use the . form. So, instead of doing:

(py/tuple 1 2)

we will now require users to use the standard Clojure form of:

(. py (tuple 1 2))

This tells the compiler to not try to use a invoke2 call, but instead to invoke tuple via a interop call.

Users could get a handle to a python function through several means, perhaps as a returned callback, or by doing a getattr call on a module. In these cases, users will have to use the wrap-fn function:

(let [x (wrap-fn (.-tuple py))]
      (x 1 2))

All this to say, I've found (once again) that making Clojure-Py conform tightly to the standard Clojure way of implementing functions not only simplifies porting Clojure-JVM code, but also often makes the resulting code much faster.

In the future, maybe I should research why Clojure was written the way it was before blindly fitting it into the Python mold. Now, where's that post-it note...there it is....*starts writing*


Wednesday, April 18, 2012

Clojure-Py and Distributed Concurrency (Part 2)

In the last part of this series on Clojure-Py and distributed concurrency we discussed how tasklets are created. In this post, we will discuss the scheduler that manages message passing between tasklets. In the following articles we will use process and tasklet interchangeably. When "process" is mentioned, we don't mean  a os process, but instead, a share-nothing python tasklet, based on a generator.

Using the python yield bytecode we are able to get data both into and out of tasklets. Whenever a tasklet yields it will return a python tuple in the following format:

(state-kw & args)

There are several states that each tasklet can take:

::spawn - creates a new tasklet
::send - sends a message to a  tasklet (via a pid)
::recv - receives a message from a tasklet
::become - keeps the same pid, but swaps out the function being executed by the process
::next-msg - the message just sent via ::recv was unhandled, re-queue it and try the next message
::die - kills the process
::yield - simply reschedules this tasklet for a new time-slice, useful for inside loops that would otherwise block

The scheduler then is simply a loop that iterates over all the tasklets. If a ::recv command was given last, then the scheduler looks for a message to hand to the tasklet. If the tasklet returns ::next-msg, then the scheduler instantly passes in the next message, then the next, then the next, until a different tasklet state is returned. This allows tasklets to search the message queue. If no message can be found then the tasklet is put into a sleep state until another process sends it a message to awaken the tasklet.

The ::become state is useful for creating what the Erlang people call "universal servers". For example:

(defn universal-server []
  (recv [fn args]
             (become fn args)))

A universal server is a server that simply listens for the first message it receives, retrieves a fn and args from that message, then transforms itself into a server that runs that fn. The astute reader will recognize this as a form of tail-call-optimization. If Clojure-Py had TCO, we could simply call (fn args) instead of (become fn args), but since Clojure-Py (like Clojure on the JVM) is limited by the VM it runs on, we must implement this using yield bytecodes.

From here, the other tasklet states should be rather clear. The scheduler is really nothing more than a loop that manages the mini state machines we know as tasklets. Putting it all together:

(defn crazy-server []
   (receive [:ping] (! spid :pong)     ; spid is implicitly supplied by receive, yields (::send spid :pong)
                [:clone] (! spid (spawn crazy-server)) ; yields (::spawn crazy-server)
                [:become fn args] (become fn args)     ; yields (::become fn args)
                [:die] (die)))                                       ; yields (::die)

Normally there will be one scheduler for each running instance of the Python VM. However, if in the future, PyPy gets full STM support, multiple VMs could be started.

In the next part, we will discuss fault tolerance, network communication, and OS process management.


Tuesday, April 10, 2012

Clojure-Py and Distributed Concurrency (Part 1)

As most users of Python are well aware, Python has a GIL. At a basic level, this means that no two bytecodes can be executing at a given time. When one thread is executing bytecodes, another thread cannot be accessing the internals of the VM. For a language that embraces concurrency at such a basic level, this can present a problem.

To start with, we should state that there is absolutely no problem with implementing Clojure's STMs primitives in Python, they simply won't be of much use. More correctly, they will be of use, they simply won't allow for much of a performance improvement over code that doesn't use STM.

So what is the grand plan for concurrency in clojure-py? In short, we plan on bringing the ideas behind Erlang into the clojure world. We plan on merging the idea of concurrency oriented programming with lisp. As an intro, I suggest that the reader watch this intro by one of the co-creators of Erlang, Joe Armstrong: http://www.infoq.com/presentations/erlang-software-for-a-concurrent-world

There are three main concepts behind Erlang:

Share-nothing light-weight processes
Message Passing
Fault Tolerance

---------

The Erlang VM implements some very light weight "green-threads". To properly understand this, the reader should understand that threads in Python are OS level threads. That is, the OS kernel allocates a separate stack for each thread, and these threads are quite heavy. On Linux, threads require around 4-8MB per thread...and no, Windows is not much better. In addition to this, switching threads requires a jump into and back out of kernel space. This context switch takes a rather high number of CPU cycles.

If we want to match the level of concurrency that Erlang provides, this simply won't do. We're not looking for hundreds of "threads"...we're looking at thousands. Even that is a bit conservative, we're looking for millions of threads. So how do we do this in Clojure-py? Well this is where generators come into play.

The best way, perhaps, to understand generators is to see them in action:


>>> def gen_test():
...  yield 1
...  yield 2
...  yield 3
...
>>> z = gen_test()
>>> z
<generator object gen_test at 0x022987D8>
>>> z.next()
1
>>> z.next()
2
>>> z.next()
3
>>> z.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

The use of the yield statement is what triggers the creation of a generator in Python. When the yield statement is called, the stack of the generator is saved onto the heap, and a pointer to this data is handed to the calling function. From there, invoking .next() on the generator runs to the next yield, or until the function terminates, when the generator throws a StopInteration exception to signal generator termination. By itself, this really does nothing for us. However, in Python 2.6 yield was converted from being a statement to being an expression. Since yield now returns a value, we can do something like this:


>>> def print_stuff():
...  while True:
...   x = yield "Ready"
...   if x == "Stop":
...    return
...   else:
...    print x
...
>>> z = print_stuff()
>>> z.next()
'Ready'
>>> z.send(1)
1
'Ready'
>>> z.send(1)
1
'Ready'
>>> z.send(4)
4
'Ready'
>>> z.send(5)
5
'Ready'
>>> z.send("Stop")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

This is what is called co-routines. Many readers may recognize this pattern as a form of cooperative multitasking. With the yield expression, it is very possible to quickly switch between co-routines without ever jumping back into kernel space, or taking the penalty of a context switch.

This is how we plan to solve the problem of concurrency in Clojure-Py. Using macros, we can construct some code like this:

(defn log-info [file]
   (loop [file (py/open "file.log" "w")]
            (receive [:log message] (do (.write file message)
                                                     (recur file))
                         [:stop] (die))))


Clojure-Py will introduce the receive function. Receive's bindings will follow the semantics of core.match and will actually be based on this library. Receive will generate a yield expression that takes the return value of yield and performs pattern matching on the message.

(defn log-hello-world []
  (let [file "file.log"
         pid (spawn log-info file)]
        (! pid "Hello")
        (! pid "World")
        (! pid :stop)))

(schedule log-hello-world)

This code creates a parent process, spawns a child log-info process, then sends information to it. The function ! is taken from Erlang (and may change in name at some point). Unlike stackless python, the scheduler will put messages into queues for each process. From there the processes are free to take from these queues at their leisure. Since all communication between processes happens in a message passing manner, it is every possible to introduce a bit of serialization and networking to allow processes to send across networks.

(spawn fnc & args) ; spawns a task inside this python interpreter
(spawn-on fnc & args) ; spawns a task on this machine, but in (possibly) a different interpreter
(spawn-on machine-name fnc & args) ; spawns a task on a different machine

Using these functions we can very effectively get around the GIL. Spawn-on with no machine argument will automatically load-balance the task between any child processes on the same physical box. Each process will have its own GIL, but these GILs will work independently of each other.

To sum up the first part of this series on distributed programming on Clojure-Py. We aim to use co-routines to allow for efficient distributed computing from within a Clojure environment. These generator based "tasklets" are very small and efficient. On a modern laptop, 1 million "tasklets" were created in about 3 seconds under Python 2.7. Through the use of core.match and serialization, it is possible to allow network transparency in the message passing framework.

Anyone interested in learning more about this should read up on Erlang a bit more. Clojure-Py's vision for concurrency oriented programming borrows heavily from Erlang.

In the next part we will discuss a bit more about how the scheduler will work, and what fault-tolerance will mean to the system.



Clojure-Py 0.2 Released!

We are proud to announce the release of Clojure-Py 0.2.


There are numerous changes in this version, a few highlights:

* implemented deftype with protocol support
* implemented defprotocol and extend
* implemented reify
* implemented defrecord
* binding is supported
* implemented defmulti and defmethod
* support for derive, supers, bases, etc.
* try/catch, try/finally, with-open
* python generators are seq'able (better interop)
* over 290 tests are implemented
* countless bug fixes

Known issues:

There is a bug in the compiler currently that does not support
try/catch/finally blocks. try/catch and try/finally work just fine,
it's only the combination of these three. This will be fixed in the
next release.

Source:

https://github.com/halgari/clojure-py

Also via easy_install:

easy_install clojure-py

Roadmap:

version 0.3 -
  Greatly improved compiler re-written for speed, and code simplicity.
  Python 3 support
  Full implementation of clojure.core (including STM features)

version 1.0 -
  Fault-tolerant, distributed programming via Erlang style
concurrency. This will allow us to "code around the GIL" and at the
same time have distributed capabilities not seen in any other Clojure
dialect.

Thanks to the many developers who have been working on this project
the past few weeks. It's exciting to see the clojure-py community
continue to grow!

Wednesday, March 28, 2012

Problems Creating .pyc Files


Having clojure-py create .pyc files should greatly improve start-up times, hopefully bringing them into the 0.2 sec range. To start with, let me begin by explaining how python generates .pyc files.

pyc files are basically marshaled code objects. As such all data contained in code.co_consts must conform to the restrictions of marshal. This means that constants can only be ints, floats, lists, tuples, etc. No user classes are allowed! However, this is not a restriction in the bytecode compiler. For instance, examine the following clojure code:


(py/print '(1 2))

And similar code in Python:

print (1 2)

Here's how Python compiles the code:

LOAD_CONST 1
LOAD_CONST 2
MAKE_TUPLE 2
PRINT_ITEM

And how clojure-py compiles the code:

LOAD_CONST (1 2)
PRINT_ITEM

So the idea is that Clojure builds the tuple every single time it is assigned to a variable. They have to do this because some objects like lists or dicts are mutable. So you actually want each invocation of the above code to return a completely different object. However, in clojure-py we can save thousands of clock cycles simply by creating the list/hashmap/vector at compile-time and then loading it in a single instruction.Sometimes when we know the function to invoke at compile-time we even load functions via LOAD_CONST, this makes our code much faster, but also has one major problem...the code is now non-marshallable.

So what about using Pickle? Well Pickle doesn't support code objects....we could extend Pickle (cPickle can't be extended in the way we need it to be), but that's going to be very slow.

One path I tried going down is to store all consts in globals. That is, we generate code to create const values and then store the results in globals, loading them with LOAD_GLOBAL instead of LOAD_CONST. This kindof worked, but when I was done, it was riddled with bugs, and the compiler had become way more complex. Compilers are already complex, making it worse seemed like a bad idea.

So, now I should write some performance notes. From my testing compiling core.clj is pretty evenly split between three modules:

lispreader.py   -- reads lisp objects
compiler.py    -- creates bytecode from lisp objects
byteplay.py    -- compiles bytecode to actual binary data that Python understands

Now, the output of lispreader can be quite easily run through cPickle. This is the "low hanging fruit" so to speak. With few tweaks, we can pickle our source code and completely remove lispreader.py from the equation.

Secondly I think I'll look into performance improvements in the compiler. There has to be some room for improvement there.

Finally, I'm not actually sure we'll ever be able to compile to .pyc files. And macros are a major reason. Imagine that we have a macro in a.clj and that macro is used in b.clj. If we change the macro, we want to trigger clojure-py to re-create the .pyc files for b.clj. This means we'd have to create some sort of dependency graph for every file, that tracks every module it's dependent on, and does automatic re-compilation.

So anyway, this is why we still don't have .pyc files...and why it's taking some time.

Tuesday, March 13, 2012

How to use protocols

A few nights ago I finally got protocol support up and running in clojure-py. Most of the source code for what we're about to discuss can be found in clojure/lang/protocol.py

To start with, I should say that the meta-programming abilities of Python continue to astound me. I found this nifty method called __subclasses__() that resides in every type defined by Python. Using this, it's quite trivial to transfers entire hierarchies of classes and comes in quite handy for use with protocols.

So to begin with. Let's examine the following use-case. In clojure we have a interface called Seqable. This abstract class (or interface as we call them at my day job programming C#), is responsible for turning a given class into a sequence. For some classes like PersistentList, calling .seq() on them is basically a no-op. A PersistentList is a seq, so no action is needed. However for something like a PersistentVector, we actually need to generate an IndexedSeq and return this new object instead.

Having the Seqable interface works just fine for any classes we define within our program. But what about tuples, stirings, or unicode? It would be nice to be able to .seq() those. Well, the IndexedSeq will work just fine  as the actual implementation, but we actually need a function to dispatch on the input type and create the needed seq objects.

In the past, we defined seq in the rt.py module thusly:

def seq(obj):
    if instance(obj, Seqable):
        return obj.seq()
    elif instance(obj, tuple):
        return IndexedSeq(obj, 0)
   etc....


The problem with this code is it's not extensible. If, later on, we want to extend seq with some other closed class, we're sunk. This is where protocols help.

Protocols consist of two types of classes:

Protocol - This class has pointers to the functions defined by the protocol, as well as a list of all classes that extend this protocol

ProtocolFn - this is an actual protocol function. It dispatches on the type of the first argument passed to it.

protcol.py has some quick-and-easy helper functions:

protocolFromType(ns, tp) - this function takes the typle passed in as tp and enumerates the public functions it defines (any function starting with _ is ignored). Then it creates a ProtocolFn for each and every method in the type, these functions are then added to the namespace passed in via ns. Finally this function tags the interface as being a protocol.

extendForAllSubclasses(tp) - if tp points to a interface that is tagged as a protocol, the protocol is automatically extended for tp and all subclasses of tp.

With these two functions it is then trivial to add seq support to clojure.core:


def _bootstrap_protocols():
    global protocols, seq
    from clojure.lang.protocol import protocolFromType, extendForAllSubclasses
    from clojure.lang.iseq import ISeq as iseq
    from clojure.lang.seqable import Seqable as seqable
    
    protocolFromType("clojure.protocols", seqable)
    extendForAllSubclasses(seqable)
    
    _extendSeqableForManuals()




def _extendSeqableForManuals():
    from clojure.lang.indexableseq import create as createIndexableSeq
    from clojure.lang.persistentvector import PersistentVector
    
    protocols.seq.extendForTypes([tuple, type([]), str, unicode],  lambda obj: createIndexableSeq(obj))
    protocols.seq.extend(type(None), lambda x: None)
    

Later on, when we define LazySeq within clojure.core we simply have to have it inherit from Seqable, then call

(clojure.lang.protocol/extendForAllSubclasses clojure.lang.seqable/Seqable)

And any new classes (LazySeq included) will be added to the protocol. 

Now granted, this doesn't quite follow the way protcols are created on the JVM, but in the next few days I'll wrap this all up in macros so the two will be identical.






Wednesday, March 7, 2012

Thoughts on Protocols

In the past week or so, I've been researching how to implement protocols in Clojure-Py. In order to implement protocols, we need a efficient way to dispatch a function on a given type. A simple solution would be something like this:


(def impls {py/int print-int
                      py/float print-float
                      mytype print-mytype})

(defn print [x & args]
      (apply (get impls (py/type x)) x & args))

While this may look fairly simple, we have one problem. Our implementation will be quite slow on PyPy compared to calling a normal bound method (e.g. foo.bar). This is because PyPy assumes that we will not often modify a variable's type, and it optimizes class attributes accordingly. Therefore this would be much better:

(defn install-fn [type name fn]
     (setattr type (str "__proto__" name) fn))

(install-fn py/int "print" print-int)
(install-fn py/float "print" print-float)
(install-fn mytype "print" print-mytype)

(defn print [x & args]
     (apply  (.-__proto__print (py/type x)) x & args))

This method will be extremely fast, but has one flaw: we can't modify built-in classes. So the solution is to do a hybrid. We will attempt to modify the class with setattr, if that fails, we'll save the function to a hash-map. This way PyPy can optimize away for all our custom types, and everything else will have to hit the hashmap. From my benchmarks, this hybrid approach is, about 2x faster than using a straight hashmap.