.title-slide # Static Analysis of Python .right[Andrey Vlasovskikh] .right[JetBrains] .right[#PyConRU 2013] --- .center[![PyCharm](media/pycharm-logo.png)] * PyCharm from JetBrains * Python IDE * Web frameworks: Django, Google App Engine, Flask * Languages: JavaScript, CoffeeScript, HTML, CSS, SASS, LESS, etc. * Licensing * Commercial * Free licenses for open-source projects and classrooms * Discounts for students --- # Python is dynamic (1) * No wordy declarations .python def main(args): xs = [arg for arg in args if arg.startswith('-')] vs .java public static void main(String[] args) { final List
xs = new ArrayList
(); for (String arg : args) { if (arg.startsWith("-")) { xs.add(arg); } } } --- # Python is dynamic (2) * No need to compile before run .center[![Compiling!](media/compiling.png)] .footnote[(Source: http://xkcd.com/303/)] --- # Python is dynamic (3) * Dynamic tricks * Dynamic typing: `x.foo()` * Attribute access: `__getattr__`, etc. * Customizing imports: `sys.path`, import hooks * Data as code: `eval`, `exec` * Etc. --- # Runtime errors * Dynamic is cool, but... * No compile-time checks: `NameError` .python foo = 'hello' print(.strong[fo]) # NameError * Dynamic typing: `TypeError`, `AttributeError` .python foo = 'foo'..strong[uppercase]() # AttributeError x = 3 return [1, 2].extend(.strong[x]) # TypeError * Customizing imports: `ImportError` .python import .strong[foo] # ImportError --- # Find errors by looking at code? * We can see errors in code def f(x, c): if c: result = x return result * Definitely a program can find them too --- # Good news! * Static analysis for Python * Source code is enough, no need to run it ![good-news](media/good-news.png) * It's automatic! * Static analysis tools for your editor or IDE --- .title-slide # What to expect from tools (some examples) --- # Undefined names * Undefined variables, imports, attributes import .strong[foo] def f(x): s = 'hello'..strong[foo]() return .strong[foo] --- # Unbound variables * Not defined on some paths .python def f(xs): for x in xs: last = process(x) return .strong[last] --- # Wrong call arguments * Count mismatch, unexpected keyword args .python flag = True 'foo'.lower(.strong[flag]) fd = open(filename, .strong[encoding='UTF-8']) # Python 2 --- # Type errors * Types mismatch for functions and operators .python index = 0 'foo'.find(.strong[index]) print('foo' .strong[+] 10) --- # Warnings and code style * Unused * Unused names, imports * Unreachable code * Compatibility * Python versions compatibility * Deprecated calls * Code style * Mix tabs and spaces * PEP 8: wrong spacing, visual indent, empty lines * Etc. --- # When tools don't help * Dynamic tricks * Calls to binary modules * Types of function parameters * Return values of complex functions --- .title-slide # The tools --- # Available for popular editors * Engines: plug-ins or bundled * Vim: PyLint, PyFlakes, PEP8 * Sublime: PyFlakes, PEP8 * PyCharm: own engine, PEP8 * PyDev: own engine, PyLint * Etc. * We'll look at them in more detail --- # **Use Tools!** * Tools are time-savers * Let the tool find errors for you * Spend more time on features, less on bugs * Try them * Set up plugins for your editor * Use code inspections in your IDE --- .title-slide # How it works --- # Plan * Find errors automatically * Detectable kinds of errors * Available tools * .current[How it works] * Code insight model * Parsing * Resolving references * Types * Conclusion --- # Layered code model * Algorithms use code insight engine .python Algorithms Inspections / Refactorings Code insight Resolving attributes Type inference Resolving names Parsing Lexing * Engines in this talk * PEP8, PyFlakes, PyLint, PyCharm * Different goals and trade-offs --- # Lexing and parsing * Compiler theory class? .python Text -> List of tokens -> Syntax tree * No, not really * Well-studied domain of computer science * We'll focus on practical aspects --- # Lexer * String with program text to list of tokens * Standard `tokenize` module def f(x): NAME 'def' if x > 0: NAME 'f' return x OP '(' return 0 NAME 'x' OP ')' OP ':' NEWLINE '\n' INDENT ' ' NAME 'if' ... --- # PEP8 tool * Adherence to the PEP-8 coding style * Spacing, indent, empty lines, simple idioms ![PEP8](media/pep8.png) * Fast, focused on the lexical level * PyCharm: PEP8 + own code formatter --- # Parser * Tokens to syntax tree def f(x): NAME 'def' Module if x > 0: NAME 'f' FunctionDef[f] return x OP '(' args=Name[x] return 0 NAME 'x' If OP ')' left=Name[x] OP ':' op=GreaterThan ... right=Num[0] Return Name[x] Return Num[0] * Standard `ast` module --- # Standard or custom? * Built-in function `compile` and `ast` module * In PyFlakes, PyLint * Developed as a part of CPython * Custom parser * In PyCharm * Error recovery (inspect while typing) * Multiple Python versions (compatibility inspections) * Common tree model for many languages --- .title-slide # Resolving references --- # References * Reference in syntax tree * From name usage to name definiton * Kinds of references * Local names: single file scope * Imported names: across files in Python path * Attributes: dynamic typing --- # Resolving local names * Go to defintion * Name reference, definiton scope def f(x): Module if x > 0: FunctionDef[f] <-------+ return x args=Name[x] <----+ | return 0 If | | left=Name[x] ---+ | f(2) op=GreaterThan | | right=Num[0] | | Return | | Name[x] ------+ | Return | Num[0] | Expr | Call | func=Name[f] ------+ args=Num[2] --- # PyFlakes tool * Unresolved and unused names * Also in PyLint, PyCharm * Trade-offs * Fast, focused on definitons of names * Single file only, no imports, no attributes --- # Dynamic challenge #1 * Dynamic names * `globals()` and stack frames * False positive errors globals()['foo'] = 1 print(.strong[foo]) def f(): sys._getframe(0)\.f_locals['bar'] = 2 return .strong[bar] * In PyCharm * Provide ignore settings * Suppress for statement --- # Resolving imports * Unresolved imports * In PyLint, PyCharm * May be slow, recursively analyse imported files .python import .strong[bad_module] import re re..strong[find]('o*', 'foo') * In PyCharm * Index all files on the Python path --- # Dynamic challenge #2 * Dynamic imports * `sys.path`, `__import__`, import hooks * In PyCharm * Detect interpreter path dynamically * Virtualenv * Library-specific code insight * Flask: import hook for extensions * Open-source plugin: [github.com/JetBrains/intellij-plugins][intellij-plugins] .python # Resolves to flask_foo.py from flask.ext import foo [intellij-plugins]: https://github.com/JetBrains/intellij-plugins --- # Resolving attributes * Attributes depend on types * Instance and class attributes def f(name): s = 'Hello {}'.format(name) if len(name) > 0: x = s else: x = 42 return x.lower() --- .title-slide # Types --- # Static types for Python * Python is dynamically typed * No single type for a variable * Much less type info * Constructs * Class and instance types: `int`, `list`, `MyClass` * Tuple types: `(int, int, str)` * Polymorphic types: `list of int`, `dict of (T, V)` * Union types: `str or int or list` --- # Type information sources * Literals, constructors, checks * Define or check types at run-time .python foo = [1, 2, 3] bar = MyClass() if isinstance(x, list): ... --- # Static-only type info * PyCharm: parameter and return types * Sphinx and Epydoc docstrings * Python 3 annotations * Library-specific analysis (stdlib, Django, NumPy, etc.) def foo(x): """Do foo with x. :type x: list of unicode """ return x.pop().strip().lower() --- # Type inference * Return types and assignment chains def f(x): if isinstance(x, float): return x elif x: s1 = 'got {param}'.format(param=x) s2 = s1 return s2 else: raise TypeError() value = f('hello') --- # Dynamic challenge #3 * Binary modules * Don't know return types of functions * Dynamic attributes * Setting attributes outside of classes * Parameter types * No information except arguments in calls * Performance of type inference * Global inter-procedural analysis can be slow --- # PyCharm and PyLint * Many inspections overlap * Trade-off: performance vs depth of analysis * PyCharm * Live mode, shallow analysis * Library-specific analysis * PyLint * Batch mode, deeper analysis * Global inter-procedural type inference --- .title-slide # Conclusion --- # When tools rock (1) * PyFlakes * Local unresolved and unused names * On-the-fly, must-have for Vim, Sublime, etc. * PEP8 * Coding style inspection * On-the-fly, enforces PEP 8 compliance if you follow it --- # When tools rock (2) * PyCharm * Wide range of inspections * On-the-fly, smart code completion, go to defintion * PyLint * Wide range of inspections * Batch mode, deep code analysis as a CI step --- # Wrap up * **Use tools!** * Tools are time-savers * Try them * How they work * Syntax tree + references + types * Dynamic tricks are hard to analyse --- .center.middle # Questions? [@vlasovskikh](http://twitter.com/vlasovskikh) [pirx.ru](http://pirx.ru/)