Parsing Stages Illustrated

Author:
Andrey Vlasovskikh
License:
Creative Commons Attribution-Noncommercial-Share Alike 3.0
Library Homepage:
http://code.google.com/p/funcparserlib/
Library Version:
0.4dev

Given some language, for example, the GraphViz DOT graph language (see its grammar), you can easily write your own parser for it in Python using funcpaserlib.

Then you can:

  1. Take a piece of source code in this DOT language:

    >>> s = '''\
    ... digraph g1 {
    ...     n1 -> n2 ->
    ...     subgraph n3 {
    ...         nn1 -> nn2 -> nn3;
    ...         nn3 -> nn1;
    ...     };
    ...     subgraph n3 {} -> n1;
    ... }
    ... '''
    

    that stands for the graph:

    The picture of the graph above

  2. Import your small parser (we use one shipped as an example with funcparserlib here):

    >>> import sys, os
    >>> sys.path.append(os.path.join(os.getcwd(), '../examples/dot'))
    >>> import dot as dotparser
    
  3. Transform the source code into a sequence of tokens:

    >>> toks = dotparser.tokenize(s)
    
    >>> print '\n'.join(unicode(tok) for tok in toks)
    1,0-1,7: Name 'digraph'
    1,8-1,10: Name 'g1'
    1,11-1,12: Op '{'
    2,4-2,6: Name 'n1'
    2,7-2,9: Op '->'
    2,10-2,12: Name 'n2'
    2,13-2,15: Op '->'
    3,4-3,12: Name 'subgraph'
    3,13-3,15: Name 'n3'
    3,16-3,17: Op '{'
    4,8-4,11: Name 'nn1'
    4,12-4,14: Op '->'
    4,15-4,18: Name 'nn2'
    4,19-4,21: Op '->'
    4,22-4,25: Name 'nn3'
    4,25-4,26: Op ';'
    5,8-5,11: Name 'nn3'
    5,12-5,14: Op '->'
    5,15-5,18: Name 'nn1'
    5,18-5,19: Op ';'
    6,4-6,5: Op '}'
    6,5-6,6: Op ';'
    7,4-7,12: Name 'subgraph'
    7,13-7,15: Name 'n3'
    7,16-7,17: Op '{'
    7,17-7,18: Op '}'
    7,19-7,21: Op '->'
    7,22-7,24: Name 'n1'
    7,24-7,25: Op ';'
    8,0-8,1: Op '}'
    
  4. Parse the sequence of tokens into a parse tree:

    >>> tree = dotparser.parse(toks)
    
    >>> from textwrap import fill
    >>> print fill(repr(tree), 70)
    Graph(strict=None, type='digraph', id='g1', stmts=[Edge(nodes=['n1',
    'n2', SubGraph(id='n3', stmts=[Edge(nodes=['nn1', 'nn2', 'nn3'],
    attrs=[]), Edge(nodes=['nn3', 'nn1'], attrs=[])])], attrs=[]),
    Edge(nodes=[SubGraph(id='n3', stmts=[]), 'n1'], attrs=[])])
    
  5. Pretty-print the parse tree:

    >>> print dotparser.pretty_parse_tree(tree)
    Graph [id=g1, strict=False, type=digraph]
    `-- stmts
        |-- Edge
        |   |-- nodes
        |   |   |-- n1
        |   |   |-- n2
        |   |   `-- SubGraph [id=n3]
        |   |       `-- stmts
        |   |           |-- Edge
        |   |           |   |-- nodes
        |   |           |   |   |-- nn1
        |   |           |   |   |-- nn2
        |   |           |   |   `-- nn3
        |   |           |   `-- attrs
        |   |           `-- Edge
        |   |               |-- nodes
        |   |               |   |-- nn3
        |   |               |   `-- nn1
        |   |               `-- attrs
        |   `-- attrs
        `-- Edge
            |-- nodes
            |   |-- SubGraph [id=n3]
            |   |   `-- stmts
            |   `-- n1
            `-- attrs
    
  6. And so on. Basically, you got full access to the tree-like structure of the DOT file

See the source code of the DOT parser and the docs at the funcparserlib homepage for details.

Options

Language

  • English
  • Russian

Content view