martes, 16 de septiembre de 2014

Embedding Python in Common Lisp

The Common Lisp libraries problem

The Common Lisp libraries problem is already known. The Lisp community is not as big as those of more mainstream languages like Python or Java. Although nowadays the problem is not so big (there are lots of good libraries and there's also Quicklisp to gather them all), sometimes there are no Lisp libraries out there for some specific task, or maybe the Lisp library available is partially implemented.

Reusing Python libraries

Python is a mainstream language with a big community and a bunch of solid libraries implemented for it. So, my idea was to try to access those libraries from Lisp.

I tried CLPython first. It is a Python compiler for Common Lisp. All in all, although it is probably good, and works for lots of cases, in my experience it didn't work out too well. I had problems compiling Python libraries, or if not, I got runtime errors when executing the Python code.

So I looked for an alternative, and found Burgled Batteries. It is a Lisp library for accessing Python via its C API. From the library README: "While a number of other Python-by-FFI options exist, burgled-batteries aims for a CLPython-esque level of integration. In other words, deep integration. You shouldn’t have to care that the library you’re using was written in Python—it should Just Work." And indeed, it just works.

The Burgled Batteries API

At the low level layer, Burgled Batteries implements the whole C API (or almost all of it) via CFFI and makes it accessible as normal Common Lisp functions.

At a higher level, it provides access to Python by passing a string with Python code, and also declaring Python functions and then evaluating them, as shown in the following example from the library README:
(asdf:load-system "burgled-batteries")
(in-package #:burgled-batteries)
(startup-python)

(run "1+1") ; => 2

(import "feedparser")
(defpyfun "feedparser.parse" (thing))
(documentation 'feedparser.parse 'function)
; => "Parse a feed from a URL, file, stream, or string"
(feedparser.parse "http://pinterface.livejournal.com/data/atom")
; => #<HASH-TABLE>

(shutdown-python)
As the example shows, marshalling of data types between Lisp and Python is in place (the result of calling the "Python" function is a hash table). More about how this works, and other peculiarities of this Python bridge (like memory handling) appear in Burgled Batteries README.

While this high level API may be enough to access Python libraries with a simple API, it is not so good if what we need to do with Python is more involved. There's no automatic generation of Lisp functions from a Python module introspection implemented at the moment, and manually defining every Python function one would like to use via defpyfun can be cumbersome.

Embedded Python like syntax

So I decided to try to improve the way to communicate with the Python interpreter. Instead of generating Lisp functions from Python modules introspection, I thought that providing an embedded Python like syntax could be a good idea.

This is how the example above looks when using the embedded syntax:
(asdf:load-system "burgled-batteries.syntax")
(in-package #:burgled-batteries)
(burgled-batteries.syntax:enable-python-syntax)
(startup-python)

(import :feedparser)
[^feedparser.parse('http://pinterface.livejournal.com/data/atom')]
; => #<HASH-TABLE>

(shutdown-python)
Python syntax appears between brackets ([]). Note that it is not necessary to declare Python functions in the Lisp world anymore. This is very similar to what Clojure does to access Java. The embedded syntax is implemented as a reader macro (of course), and using ESRAP to do the parsing. In case you are interested on what is going on behind the scenes, you can inspect which calls to the C api are being made by quoting the Python expression:
PYTHON> '[^.feedparser.parse('http://pinterface.livejournal.com/data/atom')]
(LET ((#:TRANSFORMED2520
       (CFFI:CONVERT-FROM-FOREIGN
        (CALL* (REF* "feedparser") "parse"
               (STRING.FROM-STRING*
                "http://pinterface.livejournal.com/data/atom"))
        'PYTHON.CFFI::OBJECT!)))
  #:TRANSFORMED2520)
It is possible to access Lisp references from the Python syntax; that makes the integration quite easy. For instance:
PYTHON> (let (($url "http://pinterface.livejournal.com/data/atom"))
           [^feedparser.parse($url)])
=>

#<HASH-TABLE :TEST EQUAL :COUNT 12 {1005647793}> 
 
As you can see, Lisp variables start with the $ character.
What's more, the idea is that the control flow is implemented in Lisp, making calls to Python via the embedded syntax. Here is a more involved example to see this in action:

PYTHON> (import :icalendar)
PYTHON> (import :datetime)
PYTHON> (let (($cal [icalendar.Calendar()]))
  [$cal.add('prodid', '-//My calendar product//mxm.dk//')]
  (let (($event [icalendar.Event()]))
    [$event.add('summary', 'Python meeting about calendaring')]
    [$event.add('dtstart', datetime.datetime(2005,4,4,8,0,0))]
    [$event.add('dtend', datetime.datetime(2005,4,4,10,0,0))]
    [$event.add('dtstamp', datetime.datetime(2005,4,4,0,10,0))]
    (let (($organizer [icalendar.vCalAddress('MAILTO: noone@example.com')]))
      [$organizer.params['cn'] = icalendar.vText('Max Rasmussen')]
      [$organizer.params['role'] = icalendar.vText('CHAIR')]
      [$event['organizer'] = $organizer]
      [$event['location'] = icalendar.vText('Odense, Denmark')]

      [$event['uid'] = '20050115T101010/27346262376@mxm.dk']
      [$event.add('priority', 5)]

      (let (($attendee [icalendar.vCalAddress('MAILTO:maxm@example.com')]))
         [$attendee.params['cn'] = icalendar.vText('Max Rasmussen')]
         [$attendee.params['ROLE'] = icalendar.vText('REQ-PARTICIPANT')]
         [$event.add('attendee', $attendee, encode=0)])

      (let (($attendee [icalendar.vCalAddress('MAILTO:the-dude@example.com')]))
         [$attendee.params['cn'] = icalendar.vText('The Dude')]
         [$attendee.params['ROLE'] = icalendar.vText('REQ-PARTICIPANT')]
         [$event.add('attendee', $attendee, encode=0)])

      [$cal.add_component($event)]
      [^$cal.to_ical()])))
=>

"BEGIN:VCALENDAR
PRODID:-//My calendar product//mxm.dk//
BEGIN:VEVENT
SUMMARY:Python meeting about calendaring
DTSTART;VALUE=DATE-TIME:20050404T080000
DTEND;VALUE=DATE-TIME:20050404T100000
DTSTAMP;VALUE=DATE-TIME:20050404T001000Z
UID:20050115T101010/27346262376@mxm.dk
LOCATION:Odense\\, Denmark
MAILTO:MAXM@EXAMPLE.COM:attendee
MAILTO:THE-DUDE@EXAMPLE.COM:attendee
ORGANIZER;CN=\"Max Rasmussen\";ROLE=CHAIR:MAILTO: noone@example.com
PRIORITY:5
END:VEVENT
END:VCALENDAR"


Here we accessed the iCalendar Python library. The result is a printed calendar as a string in the Lisp world.

A nice feature of using this syntax is that it is quite compact and readable (doing the same via macros is possible, but not as compact and readable as this); it looks a lot like Python with some minor differences, so it is very clear where the Python code is; and last but not least, it is very easy to copy and paste Python code and make it work with just a few modifications.

The syntax is not exactly that of Python, because we need to decide whether we are referring to a Lisp or a Python binding, and there's also some syntax for indicating when we want the marshalled object [^obj] or just the pointer to the object [obj]. The final syntax is not fully decided yet, I'm still playing with some ideas.

Conclusion

The embedded Python like language integrates quite well with Lisp, and allowed to avoid having to manually define the Python functions we want to access, or generating any glue by introspection. Instead, ffi calls are done on the fly.

It is hard to imagine being able to implement an embedded language like this in other language than Lisp. Access to the compiler parser and PEGs made the implementation very easy.

The burgled-batteries.syntax contrib library is available here, and it is work in progress.