Python is not Haskell

.title-slide

# Python is not Haskell

.right[Andrey Vlasovskikh]
.right[JetBrains]

.right[PyCon Belarus 2015]

---

.center[![PyCharm](media/pycharm-logo.png)]

* I'm [@vlasovskikh](http://twitter.com/vlasovskikh) and [pirx.ru](http://pirx.ru/)
    * From St. Petersburg, Russia
* I work for JetBrains
    * PyCharm: IDE for Python and Web development
    * [40% off](https://www.jetbrains.com/specials/backto2014.html) for
      .strong[Belarus], Ukraine and Russia till 2014-02-05
* I organize FProg
    * Functional programming meetups
    * In FP since 2007

---

# FP is popular

* Functional programming languages
    * Scala, Clojure, Erlang, Haskell, F#
* Python
    * Not a FP language
    * Python devs express interest in FP

---

# Benefits of FP in Python?

* Use FP in Python or move to other languages?
* Share my observations

---

.title-slide

# FP in Python

The results

---

# FP in Python

* Pythonic parts of FP
    * Pure functions
    * Comprehensions and generators
    * Type hints
* Python is not Haskell
    * Recursion and copying is expensive
    * FP code is dense
    * Math terminology doesn't help

---

.title-slide

# Example: funcparserlib

Functional parsing library

---

# Background

* Parsing little languages, external DSLs
    * Competitors: Pyparsing, PLY
* Why I started it
    * Pyparsing API is too large
    * Funcparserlib API is just ~10 functions
* Milestones
    * Started in 2009
    * 0.3.6, May 2013, Production/Stable
    * 12000 downloads/month from PyPI

---

# Parsing combinators API

* Primitive parsers
    * `some(pred)`
    * `forward_decl()`, `finished`
* Composition
    * `p1 + p2`, `p1 | p2`, `p >> f`
    * `many(p)`, `maybe(p)`, `skip(p)`
* Abstraction
    * `Parser`
    * Set of parsers is closed under composition operations

---

# What is parser

* Parser is a function `run()` and some helpers
    * Wrapped into a class for operators `+`, `|`, `>>`

class Parser:
        # Parse tokens
        def parse(self, tokens): ...
        def run(self, tokens, state): ...

# Composition operators
        def __add__(self, other): ...
        def __or__(self, other) ...
        ...

---

# Low-level usage

* Primitive parser of a single character

.python
    >>> from funcparserlib.parser import some, State

>>> p = some(lambda x: x == 'f')

* Usage

.python
    >>> p.run('foo', State(0, 0))
    ('f', State(1, 1))

>>> p.run('foo', State(1, 1))
    Traceback (most recent call last):
        ...
    NoParseError: got unexpected token

---

# Example: S-exp language

* Expressions
    * Symbols

.no-highlight
          some-symbol

* Lists of expressions

.no-highlight
          (and
           (this is (a nested s-exp))
           (the end))

---

# S-exp parser

def tokenize(s):
        ...

def parse(s):
        sym_token = some(lambda t: t.type == 'symbol')
        symbol = sym_token >> (lambda t: t.value)

list = forward_decl()
        expr = symbol | list
        list.define(skip(op('(')) +
                    many(expr) +
                    skip(op(')')))

return expr.parse(tokenize(s))

def op(value):
        return some(lambda t: t.type == 'operator' and
                              t.value == value)

---

# Usage

.python
    >>> parse('(and'
              ' (this is (a nested s-exp))'
              ' (the end))')
    ['and',
     ['this', 'is', ['a', 'nested', 's-exp']],
     ['the', 'end']]

---

.title-slide

# Pythonic parts of FP

---

.title-slide

# Pure functions

---

# Pure functions

* Same result for the same arguments
    * No I/O, no mutable state

.python
    def fibonacci(n):
        if n < 2:
            return 1
        else:
            return fibonacci(n - 1) + fibonacci(n - 2)

* Simple is better than complex

---

# Local mutable state is fine

* Practicality beats purity

.python
    def fibonacci(n):
        a, b = 0, 1
        for i in range(n):
            a, b = b, (a + b)
        return b

---

# Easier to reason about

* Repeat `p` until it fails, return the list of results

.python
    def many(p):
        def run(tokens, state):
            result = []
            try:
                while True:
                    value, state = p.run(tokens, state)
                    result.append(value)
            except NoParseError as e:
                return result, State(state.pos, e.state.max)

return Parser(run)

---

# Easier to test

* No environment setup, no mocks

.python
    class ParsingTest(unittest.TestCase):
        def test_many_backtracking(self):
            """Issue 31."""

x = a('x')
            y = a('y')
            expr = many(x + y) + x + x

self.assertEqual(expr.parse('xyxyxx'),
                             ([('x', 'y'), ('x', 'y')],
                              'x',
                              'x'))

---

# As pure as possible

* The overall strucutre is usually OOP-based
    * Uses I/O, mutable objects
* Make most smaller pieces pure
    * Parsing, transformations, analysis, etc.

---

.title-slide

# Comprehensions and generators

---

# Comprehensions

* List comprehensions

.python
    [x * x for x in range(10) if x % 2 == 0]

map(lambda x: x * x,
        filter(lambda x: x % 2 == 0, range(10)))

* Dict, set, lazy generator comprehensions

.python
    char_codes = {chr(x): x for x in range(256)}

first_chars = {line[0] for line in lines if line}

lazy_tokens = (t for t in tokens if t.type != 'space')

---

# Generator functions

* Lazy iterables and local mutable state
    * Instead of multi-line lambdas, comprehensions

def tokenize(str, token_specs):
        length = len(str)
        line, pos = 1, 0
        i = 0
        while i < length:
            token = match(token_specs, str, i, (line, pos))
            yield token
            line, pos = token.end
            i += len(token.value)

---

# Use generators instead of lists

* Good for peformance
    * Lazy iterables reduce memory usage
    * They could be partially evaluated
* Generator comprehensions and functions
    * See also itertools

---

.title-slide

# Type hints

---

# Functions are arrows between types

* Tokenization and parsing

.python
    str -> Sequence[Token] -> ParseTree

* Types are good for documentation

---

# Type hints used to be informal

* Standard library

`filter(function, iterable)`

Construct an .strong[iterator] from those elements of iterable for which
  .strong[function] returns .strong[true]. iterable may be either .strong[a
  sequence], .strong[a container which supports iteration], or .strong[an
  iterator]. If function is .strong[None], the identity function is assumed,
  that is, all elements of iterable that are false are removed.

---

# Various notations

* Haskell-like in funcparserlib

class Parser:
        """Parser(a, b)"""

def parse(self, tokens):
            """Sequence(a) -> b"""

def __or__(self, other):
            """Parser(a, c) -> Parser(a, b | c)"""

---

# PEP 484: Type hints

* Based on function annotations, Python 3.5
    * Explicit is better than implicit

from typing import *

T = TypeVar('T')

def filter(function: Optional[Callable[[T], Any]],
               iterable: Iterable[T]) -> Iterator[T]:
        ...

* Gradual typing
    * Still dynamic typing, doesn't affect run-time
    * Everything is of type `Any` by default

---

# Write type hints for your public API

* Good for documentation
    * FP-like higher-order functions, generic types, etc.
* Linters and IDEs can find bugs
    * Static analysis of your code

---

.title-slide

# Un-pythonic parts of FP

---

.title-slide

# Recursion and copying

---

# Function calls are expensive

* O(n) recursive Fibonacci
    * ~3 times slower than iterative
    * PyPy doensn't help

def fibonacci(n):
        def rec(n, a, b):
            if n == 0:
                return a
            else:
                return rec(n - 1, b, a + b)
        return rec(n, 0, 1)

* No tail call optimization
    * Inconvenient way to write iterative functions

---

# Recursive descent parsing

* O(2<sup>n</sup>) for extreme grammars

.python
    x = a('x') >> raise_error
    y = a('y')
    expr = many(x + y) + x + x

>>> expr.parse('xyxyxx')
    Traceback (most recent call last):
       ... parse
       ... _add
       ... _add
       ... _many
       ... _add
       ... _shift
    Exception: ...

---

# Copying lists is expensive

* Array-based lists, not linked lists
    * No `x:xs` typical for FP

def run(tokens: List[Token]) -> Tuple[T, List[Token]]:
        ...

* Track current list index in `State`

def run(tokens: List[Token],
            state: State) -> Tuple[T, State]:
        ...

---

# Don't overuse recursion or copying data structures

* Recursion is expensive
    * Replace linear recursion with iteration
    * Think twice about non-linear recursion
* Don't copy lists and dicts too often
    * Replace destructuring with accessing by index
    * Use lazy generators
    * Some persistent data structures in fn.py

---

.title-slide

# FP code is dense

---

# Lambdas look dense

* Parser of `null` in JSON grammar

.python
    a = lambda token: some(lambda t: t == token)
    value = lambda token: token.value

name = lambda s: a(Token('name', s) >> value
    const = lambda x: lambda y: x

null = name('null') >> const(None)

---

# Partial and point-free

.python
    from functools import partial
    from operator import eq, attrgetter

* Partial

.python
    a = lambda token: some(partial(eq, token))
                    # some(lambda t: t == token)

* Point-free

.python
    value = attrgetter('value')
          # lambda token: token.value

---

# Readable version

* Sparse is better than dense
* Readability counts

def a(token):
        return some(lambda t: t == token)

def value(token):
        return token.value

def name(value):
        return a(Token('name', value)) >> value

def make_none(x):
        return None

null = name('null') >> make_none

---

#  Write readable and sparse code

* Because it's pythonic
* FP tricks usually make it worse
    * Could be elegant, but often unreadable

---

.title-slide

# Math terminology doesn't help

---

# FP uses math abstractions

* Abstract algebra, category theory
    * Monoids, lattices, functors, monads, etc.
* Common patterns in "unrelated" data structures

---

# Monad

* The most discussed FP abstraction
    * Hundreds of monad tutorials

class Monad(Generic[T]):
        @staticmethod
        def unit(x: T) -> 'Monad[T]':
            pass

def bind(self,
                 f: Callable[[T], 'Monad[V]']) -> 'Monad[V]':
            pass

* That's the monad
   * Algebraic laws for the monad are missing

---

# List is a monad

* Standard `list` could be extended to become a monad

class list:
        @staticmethod
        def unit(x):
            return [x]

def bind(self, f):
            return list(self._bind(f))

def _bind(self, f):
            for x in self:
                for y in f(x):
                    yield y

---

# Parser is a monad

* A specific computation that returns `T`

class Parser:
        @staticmethod
        def unit(x):
            def run(tokens, state):
                return x, state
            return Parser(run)

def bind(self, f):
            def run(tokens, state):
                value, state2 = self.run(tokens, state)
                return f(value).run(tokens, state2)
            return Parser(run)

---

# Really abstract code

* It only knows about the monad
    * It's `map()` for lists and `p >> f` for parsers

def map(f: Callable[[T], V],
            monad: Monad[T]) -> Monad[V]:
        return monad.bind(lambda x: monad.unit(f(x)))

* It's not worth abstracting
    * Requires math background
    * Hard to debug
    * Ineffective for most concrete monads

---

# Concrete code

* If the implementation is easy to explain, it may be a good idea

class list:
        def map(self, f):
            return [f(x) for x in self]

class Parser:
        def map(self, f):
            def run(tokens, state):
                value, state2 = self.run(tokens, state)
                return f(value), state2
            return Parser(run)

---

# Avoid math terminology unrelated to the problem domain

* Higher learning curve for Python contributors
* Not always useful as programming abstractions
    * The percentage of abstract code is small
* Monads are slow in Python

---

.title-slide

# In conclusion

---

# Summary

* Pythonic parts of FP
    * Make your code as pure as possible
    * Use generators instead of lists
    * Write type hints for your public API
* Python is not Haskell
    * Don't overuse recursion or copying data structures
    * Write readable and sparse code
    * Avoid math terminology unrelated to the problem domain
* Feel free to ask any questions
    * [@vlasovskikh](http://twitter.com/vlasovskikh) and [pirx.ru](http://pirx.ru/)