.title-slide # Python is not Haskell .right[Andrey Vlasovskikh] .right[JetBrains] .right[PyCon Belarus 2015] --- .center[![PyCharm](media/pycharm-logo.png)] * I'm [@vlasovskikh](http://twitter.com/vlasovskikh) and [pirx.ru](http://pirx.ru/) * From St. Petersburg, Russia * I work for JetBrains * PyCharm: IDE for Python and Web development * [40% off](https://www.jetbrains.com/specials/backto2014.html) for .strong[Belarus], Ukraine and Russia till 2014-02-05 * I organize FProg * Functional programming meetups * In FP since 2007 --- # FP is popular * Functional programming languages * Scala, Clojure, Erlang, Haskell, F# * Python * Not a FP language * Python devs express interest in FP --- # Benefits of FP in Python? * Use FP in Python or move to other languages? * Share my observations --- .title-slide # FP in Python The results --- # FP in Python * Pythonic parts of FP * Pure functions * Comprehensions and generators * Type hints * Python is not Haskell * Recursion and copying is expensive * FP code is dense * Math terminology doesn't help --- .title-slide # Example: funcparserlib Functional parsing library --- # Background * Parsing little languages, external DSLs * Competitors: Pyparsing, PLY * Why I started it * Pyparsing API is too large * Funcparserlib API is just ~10 functions * Milestones * Started in 2009 * 0.3.6, May 2013, Production/Stable * 12000 downloads/month from PyPI --- # Parsing combinators API * Primitive parsers * `some(pred)` * `forward_decl()`, `finished` * Composition * `p1 + p2`, `p1 | p2`, `p >> f` * `many(p)`, `maybe(p)`, `skip(p)` * Abstraction * `Parser` * Set of parsers is closed under composition operations --- # What is parser * Parser is a function `run()` and some helpers * Wrapped into a class for operators `+`, `|`, `>>` class Parser: # Parse tokens def parse(self, tokens): ... def run(self, tokens, state): ... # Composition operators def __add__(self, other): ... def __or__(self, other) ... ... --- # Low-level usage * Primitive parser of a single character .python >>> from funcparserlib.parser import some, State >>> p = some(lambda x: x == 'f') * Usage .python >>> p.run('foo', State(0, 0)) ('f', State(1, 1)) >>> p.run('foo', State(1, 1)) Traceback (most recent call last): ... NoParseError: got unexpected token --- # Example: S-exp language * Expressions * Symbols .no-highlight some-symbol * Lists of expressions .no-highlight (and (this is (a nested s-exp)) (the end)) --- # S-exp parser def tokenize(s): ... def parse(s): sym_token = some(lambda t: t.type == 'symbol') symbol = sym_token >> (lambda t: t.value) list = forward_decl() expr = symbol | list list.define(skip(op('(')) + many(expr) + skip(op(')'))) return expr.parse(tokenize(s)) def op(value): return some(lambda t: t.type == 'operator' and t.value == value) --- # Usage .python >>> parse('(and' ' (this is (a nested s-exp))' ' (the end))') ['and', ['this', 'is', ['a', 'nested', 's-exp']], ['the', 'end']] --- .title-slide # Pythonic parts of FP --- .title-slide # Pure functions --- # Pure functions * Same result for the same arguments * No I/O, no mutable state .python def fibonacci(n): if n < 2: return 1 else: return fibonacci(n - 1) + fibonacci(n - 2) * Simple is better than complex --- # Local mutable state is fine * Practicality beats purity .python def fibonacci(n): a, b = 0, 1 for i in range(n): a, b = b, (a + b) return b --- # Easier to reason about * Repeat `p` until it fails, return the list of results .python def many(p): def run(tokens, state): result = [] try: while True: value, state = p.run(tokens, state) result.append(value) except NoParseError as e: return result, State(state.pos, e.state.max) return Parser(run) --- # Easier to test * No environment setup, no mocks .python class ParsingTest(unittest.TestCase): def test_many_backtracking(self): """Issue 31.""" x = a('x') y = a('y') expr = many(x + y) + x + x self.assertEqual(expr.parse('xyxyxx'), ([('x', 'y'), ('x', 'y')], 'x', 'x')) --- # As pure as possible * The overall strucutre is usually OOP-based * Uses I/O, mutable objects * Make most smaller pieces pure * Parsing, transformations, analysis, etc. --- .title-slide # Comprehensions and generators --- # Comprehensions * List comprehensions .python [x * x for x in range(10) if x % 2 == 0] map(lambda x: x * x, filter(lambda x: x % 2 == 0, range(10))) * Dict, set, lazy generator comprehensions .python char_codes = {chr(x): x for x in range(256)} first_chars = {line[0] for line in lines if line} lazy_tokens = (t for t in tokens if t.type != 'space') --- # Generator functions * Lazy iterables and local mutable state * Instead of multi-line lambdas, comprehensions def tokenize(str, token_specs): length = len(str) line, pos = 1, 0 i = 0 while i < length: token = match(token_specs, str, i, (line, pos)) yield token line, pos = token.end i += len(token.value) --- # Use generators instead of lists * Good for peformance * Lazy iterables reduce memory usage * They could be partially evaluated * Generator comprehensions and functions * See also itertools --- .title-slide # Type hints --- # Functions are arrows between types * Tokenization and parsing .python str -> Sequence[Token] -> ParseTree * Types are good for documentation --- # Type hints used to be informal * Standard library `filter(function, iterable)` Construct an .strong[iterator] from those elements of iterable for which .strong[function] returns .strong[true]. iterable may be either .strong[a sequence], .strong[a container which supports iteration], or .strong[an iterator]. If function is .strong[None], the identity function is assumed, that is, all elements of iterable that are false are removed. --- # Various notations * Haskell-like in funcparserlib class Parser: """Parser(a, b)""" def parse(self, tokens): """Sequence(a) -> b""" def __or__(self, other): """Parser(a, c) -> Parser(a, b | c)""" --- # PEP 484: Type hints * Based on function annotations, Python 3.5 * Explicit is better than implicit from typing import * T = TypeVar('T') def filter(function: Optional[Callable[[T], Any]], iterable: Iterable[T]) -> Iterator[T]: ... * Gradual typing * Still dynamic typing, doesn't affect run-time * Everything is of type `Any` by default --- # Write type hints for your public API * Good for documentation * FP-like higher-order functions, generic types, etc. * Linters and IDEs can find bugs * Static analysis of your code --- .title-slide # Un-pythonic parts of FP --- .title-slide # Recursion and copying --- # Function calls are expensive * O(n) recursive Fibonacci * ~3 times slower than iterative * PyPy doensn't help def fibonacci(n): def rec(n, a, b): if n == 0: return a else: return rec(n - 1, b, a + b) return rec(n, 0, 1) * No tail call optimization * Inconvenient way to write iterative functions --- # Recursive descent parsing * O(2
n
) for extreme grammars .python x = a('x') >> raise_error y = a('y') expr = many(x + y) + x + x >>> expr.parse('xyxyxx') Traceback (most recent call last): ... parse ... _add ... _add ... _many ... _add ... _shift Exception: ... --- # Copying lists is expensive * Array-based lists, not linked lists * No `x:xs` typical for FP def run(tokens: List[Token]) -> Tuple[T, List[Token]]: ... * Track current list index in `State` def run(tokens: List[Token], state: State) -> Tuple[T, State]: ... --- # Don't overuse recursion or copying data structures * Recursion is expensive * Replace linear recursion with iteration * Think twice about non-linear recursion * Don't copy lists and dicts too often * Replace destructuring with accessing by index * Use lazy generators * Some persistent data structures in fn.py --- .title-slide # FP code is dense --- # Lambdas look dense * Parser of `null` in JSON grammar .python a = lambda token: some(lambda t: t == token) value = lambda token: token.value name = lambda s: a(Token('name', s) >> value const = lambda x: lambda y: x null = name('null') >> const(None) --- # Partial and point-free .python from functools import partial from operator import eq, attrgetter * Partial .python a = lambda token: some(partial(eq, token)) # some(lambda t: t == token) * Point-free .python value = attrgetter('value') # lambda token: token.value --- # Readable version * Sparse is better than dense * Readability counts def a(token): return some(lambda t: t == token) def value(token): return token.value def name(value): return a(Token('name', value)) >> value def make_none(x): return None null = name('null') >> make_none --- # Write readable and sparse code * Because it's pythonic * FP tricks usually make it worse * Could be elegant, but often unreadable --- .title-slide # Math terminology doesn't help --- # FP uses math abstractions * Abstract algebra, category theory * Monoids, lattices, functors, monads, etc. * Common patterns in "unrelated" data structures --- # Monad * The most discussed FP abstraction * Hundreds of monad tutorials class Monad(Generic[T]): @staticmethod def unit(x: T) -> 'Monad[T]': pass def bind(self, f: Callable[[T], 'Monad[V]']) -> 'Monad[V]': pass * That's the monad * Algebraic laws for the monad are missing --- # List is a monad * Standard `list` could be extended to become a monad class list: @staticmethod def unit(x): return [x] def bind(self, f): return list(self._bind(f)) def _bind(self, f): for x in self: for y in f(x): yield y --- # Parser is a monad * A specific computation that returns `T` class Parser: @staticmethod def unit(x): def run(tokens, state): return x, state return Parser(run) def bind(self, f): def run(tokens, state): value, state2 = self.run(tokens, state) return f(value).run(tokens, state2) return Parser(run) --- # Really abstract code * It only knows about the monad * It's `map()` for lists and `p >> f` for parsers def map(f: Callable[[T], V], monad: Monad[T]) -> Monad[V]: return monad.bind(lambda x: monad.unit(f(x))) * It's not worth abstracting * Requires math background * Hard to debug * Ineffective for most concrete monads --- # Concrete code * If the implementation is easy to explain, it may be a good idea class list: def map(self, f): return [f(x) for x in self] class Parser: def map(self, f): def run(tokens, state): value, state2 = self.run(tokens, state) return f(value), state2 return Parser(run) --- # Avoid math terminology unrelated to the problem domain * Higher learning curve for Python contributors * Not always useful as programming abstractions * The percentage of abstract code is small * Monads are slow in Python --- .title-slide # In conclusion --- # Summary * Pythonic parts of FP * Make your code as pure as possible * Use generators instead of lists * Write type hints for your public API * Python is not Haskell * Don't overuse recursion or copying data structures * Write readable and sparse code * Avoid math terminology unrelated to the problem domain * Feel free to ask any questions * [@vlasovskikh](http://twitter.com/vlasovskikh) and [pirx.ru](http://pirx.ru/)