.title-slide # Static Analysis of Python .right[Andrey Vlasovskikh] .right[JetBrains] .right[Moscow, 2016] --- .center[
] * I'm [@vlasovskikh](http://twitter.com/vlasovskikh) and [pirx.ru](http://pirx.ru/) * From St. Petersburg, Russia * PyCharm: IDE for Python and Web development * I work for JetBrains as the PyCharm Community Lead --- .title-slide # Summer practice 2016 --- # Looking for candidates * Get to know JetBrains * Paid job for 2 months, 40-hour week at our offices * Work on static code analysis of Python * Get appeciated by your users! --- # Requirements * Being passionate about software development * Basic skills in Java or Kotlin * Extra * Willingness to relocate to St. Petersburg * Skills in parsers and compilers * Skills in Python --- .title-slide # Static analysis of a dynamic language --- # Python is dynamic (1) * No wordy declarations .python def main(args): xs = [arg for arg in args if arg.startswith('-')] vs .java public static void main(String[] args) { final List
xs = new ArrayList
(); for (String arg : args) { if (arg.startsWith("-")) { xs.add(arg); } } } --- # Python is dynamic (2) * No need to compile before run .center[![Compiling!](media/compiling.png)] .footnote[(Source: http://xkcd.com/303/)] --- # Python is dynamic (3) * Dynamic tricks and metaprogramming * Dynamic typing, customizing imports, attribute access, data as code class NodeVisitor(object): def visit(self, node): method = 'visit_' + node.__class__.__name__ visitor = getattr(self, method) if visitor: return visitor(node) --- # Downside: runtime errors * Dynamic is cool, but... * No compile-time checks: .python foo = 'hello' print(.error[fooo]) # NameError * Dynamic typing: .python foo = 'foo'..error[uppercase]() # AttributeError three = 3 return [1, 2].extend(.error[three]) # TypeError * Customizing imports: .python import .error[foo] # ImportError --- # Keep it dynamic, get fewer runtime errors def f(x, c): if c: result = x return result * Can we see errors in the code without running it? --- # Solution: static analysis * Source code is enough, no need to run it ![good-news](media/good-news.png) * It's automatic! * Static analysis tools for your text editor or IDE --- # How it works * How to create your static analyzer * Not the source code itself, but all the key ideas * Features of existing analyzers * PEP8, PyFlakes, PyLint, PyCharm * Share our experience --- # The example from os import stat, walk, join class Path(Object): def __init__(self, name): self.name=name def stat(self): return stat(name) def all_files(path): for root, dirs, files in walk(path.filename): for file in files: yield Path(join(root, file)) files = list(all_files(Path('.'))) print(files[0].status()) --- # Errors in the example from os import stat, walk, .error[join] # ImportError class Path(.error[Object]): # NameError def __init__(self, name): self.name=name # PEP 8 def stat(self): return stat(.error[name]) # NameError def all_files(path): for root, dirs, files in walk(path..error[filename]): for file in files: yield Path(join(root, file)) files = list(all_files(Path('.'))) print(files[0]..error[status]()) # AttributeError --- .title-slide # Analysis tool architecture --- # Layered code model * Algorithms use code insight engine .python Code insight Resolving attributes Type inference Resolving names Parsing Lexing * Engines in this talk * PEP8, PyFlakes, PyLint, PyCharm * Different goals and trade-offs --- # Lexer * String with program text to list of tokens * Standard `tokenize` module from os import stat NAME 'from' NAME 'os' class Path(Object): NAME 'import' def __init__(self, name): NAME 'stat' self.name=name NEWLINE '\n' NEWLINE '\n' ... NAME 'class' NAME 'Path' OP '(' NAME 'Object' ... --- # PEP8 tool * Adherence to the PEP-8 coding style * Spacing, indent, empty lines, simple idioms ![PEP8](media/pep8.png) * Fast, focused on the lexical level * PyCharm: PEP8 + own code formatter --- # Parser * Tokens to syntax tree from os import stat Module ImportFrom('os', 'stat') class Path(Object): ClassDef('Path') bases=Name('Object') def __init__(self, name): FunctionDef('__init__') args= Name('self') Name('name') self.name = name Assign Attribute('self.name') Name('name') def stat(self): FunctionDef('stat') args=Name('self') return stat(name) Return Call Name('stat') args=Name('name') --- # Standard or custom? * Built-in function `compile` and `ast` module * In PyFlakes, PyLint * Developed as a part of CPython * Custom parser * In PyCharm * Multiple Python versions (compatibility inspections) * Error recovery (inspect while typing) --- .title-slide # Unresolved local names (`NameError`) --- # Resolving names from os import stat class Path(.error[Object]): def __init__(self, name): self.name = name def stat(self): return stat(.error[name]) * Resolved = found the definition or built-in * Built-in identifiers * Go to from usage to definition * Scopes of definitions: tree-like code structure * Variables, statements, functions, classes --- # Resolving local names * Built-ins, scopes of definitions .python Module FromImport('os', 'stat') <------+ ClassDef('Path') | bases=.error[Name('Object')] | FunctionDef('__init__') | args= | Name('self') <-------------+ | Name('name') <-----------+ | | Assign | | | Attribute('self.name') --|-+ | Name('name') ------------+ | FunctionDef('stat') | args=Name('self') | Return | Call | Name('stat') ---------------+ args=.error[Name('name')] --- # PyFlakes tool * Unresolved and unused names * Also in PyLint, PyCharm * Trade-offs * Fast, focused on definitions of names * Single file only, no imports, no attributes --- # Dynamic challenge #1 * Dynamic names * `globals()`, stack frames, `eval` and `exec`, `__builtins__` * False positive errors = not real errors .python import gettext gettext.install('messages') msg = .error[_]('Hello World') # False positive error * Solutions * Provide ignore settings, suppress for statement --- .title-slide # Unresolved imported names (`ImportError`, `AttributeError`) --- # Resolving imports .python from os import stat, walk, .error[join] # Alternatively import os os..error[join]('foo', 'bar') * Unresolved = module not found or it doesn't define this name * Traverse Python paths in `sys.path` * Parse imported modules, find definitions * Handle imports recursively --- # Dynamic challenge #2 * Binary modules * No source code to analyze .python from .false-positive[sys] import .false-positive[argv] * Solutions * Load and analyze dynamically or generate skeleton files --- # Dynamic challenge #3 * Dynamic path .python import sys sys.path.append('/path/to/foo') import .false-positive[foo] * Solutions * Detect interpreter path dynamically * Virtualenv * Provide user path settings --- # Dynamic challenge #4 * Import hooks * `find_module()`, `load_module()` * Solutions * Library-specific code insight * Flask: import hook for extensions * Open-source PyCharm plugin: [github.com/JetBrains/intellij-plugins][intellij-plugins] .python from flask.ext import foo -> flask_foo.py [intellij-plugins]: https://github.com/JetBrains/intellij-plugins --- .title-slide # Unresolved attributes (`AttributeError`) --- # The updated example import os class Path(object): def __init__(self, name): self.name = name def stat(self): return os.stat(self.name) def all_files(path): for root, dirs, files in os.walk(path..error[filename]): for file in files: yield Path(os.path.join(root, file)) files = list(all_files(Path('.'))) print(files[0]..error[status]()) --- # Resolving attributes class Path(object): ... def stat(self): return os.stat(self.name) def all_files(path): for root, dirs, files in os.walk(...): for file in files: yield Path(os.path.join(root, file)) files = list(all_files(...)) print(files[0]..error[status]()) * Unresolved = no such attribute in this instance or in its class * What's the type of an expression? --- # Primary type info sources * Literals, constructors .python foo = 'hello' bar = MyClass() * Runtime type checks .python if isinstance(baz, list): ... * Type hints .python def f(x: int) -> List[int]: ... --- # Type inference * Return types and assignment chains * Here `value` is either `str` or `float`, a union type def f(x): if isinstance(x, float): return x elif x: s1 = 'hello' s2 = s1 return s2 else: raise TypeError() value = f('hello') --- # Static type system for Python * Simple types * `int`, `list`, `MyClass` * Union types * `typing\.Union[str, int, list]` * Parameterized types * `typing\.List[int]`, `typing\.Dict[T, V]` * Tuple types * `typing\.Tuple[int, int, str]` --- # Type inference in practice class Path(object): ... def stat(self): return os.stat(self.name) def all_files(path): for root, dirs, files in os.walk(...): for file in files: yield Path(os.path.join(root, file)) files = list(all_files(...)) print(files[0]..error[status]()) * Resolve `files[0].status` * The type of `all_files()` is `typing\.Iterable[Path]` * The type of `list()` is `typing\.List[Path]` * The type of `files[0]` is `Path` --- # Static-only type info * PyCharm: parameter and return types * Type hints for Python 2 and 3 * Sphinx and Epydoc docstrings * Type database for libraries * Library-specific analysis (Django, Flask, NumPy, etc.) --- # Dynamic challenge #5 * Overriding attribute access class C(object): def __getattr__(self, name): return name c = C() c..false-positive[foo] * Solutions * Don't check attributes of classes with `__getattr__` --- # Dynamic challenge #6 * Defining attributes outside of classes class C(object): def __init__(self, foo): self.foo = foo def set_bar(c, value): c.bar = value c = C() set_bar(c, 1) c..false-positive[bar] * Solutions * Index attribute definitions if we can find them * Provide ignore settings --- # Dynamic challenge #7 * Parameter types class Path(Object): def __init__(self, name): self.name = name ... def all_files(path): for root, dirs, files in os.walk(path..error[filename]): for file in files: yield Path(os.path.join(root, file)) files = list(all_files(Path('.'))) * Solutions * Infer types only when static type info is present * Check types based on used attributes --- # More dynamic challenges * Binary modules * Don't know return types of functions * Performance of type inference * Global inter-procedural analysis can be slow * Etc. --- # The results * Found 5 of 5 errors from os import stat, walk, .error[join] # ImportError class Path(.error[Object]): # NameError def __init__(self, name): self.name = name def stat(self): return stat(.error[name]) # NameError def all_files(path): for root, dirs, files in walk(path.filename): for file in files: yield Path(join(root, file)) files = list(all_files(Path('.'))) # AttributeError print(files[0]..error[status]()) # AttributeError --- .title-slide # Conclusion --- # Conclusion * Static analysis for Python * Less powerful than for a static language * More helpful because the compiler checks are very basic * Use tools * Tools will save your time, try them * Develop your own code analysis inspections * Apply for the summer practice! * Contact me: [@vlasovskikh](http://twitter.com/vlasovskikh) and [pirx.ru](http://pirx.ru/)