Subscribe for updates!

Pyxie


Changelog
Open Source
Credits
Dev Status
Overview
Language Spec
Language Status
Project Status
Downloads, Links, etc
Arduino Profile

Little Python Language Spec

Little Python is a restricted subset of Python 3. (and 2.7)

This is a work in progress. The implementation does not yet match this spec. As a result, the grammar will be slightly bogus. You hopefully get the idea though.

Semi-formal syntactic language features todo:

Built in types to be supported

Simple:

Harder:

Lexical Analysis TODO

Keywords: "and", "not", "or",
          "True", "False",
          "class", "def", "yield", "return",
          "while", "for", "in", "if", "elif", "else", "break", "continue",
          "from", "import",
          "pass",
          "print"

Punctuation: ','  '('  ')'  ':'  '*'  '/'  '+'  '-'  '**' **[TBD]**
             COMPARISON_OPERATOR      **[TBD]**
             ASSIGN

COMPARISON_OPERATOR: (<|>|==|>=|<=|<>|!=|in|not +in|is|is +not)
ASSIGN: '='

Structural: EOL INDENT DEDENT
    EOL -- Should be logical, actually '\n'
    INDENT -- emitted after increased number of leading spaces after EOL
    DEDENT -- emitted after decreased number of leading spaces after EOL

Literals: IDENTIFIER NUMBER STRING
    IDENTIFIER:     [a-zA-Z_][a-zA-Z0-9_]*

    NUMBER: BINARY OCTAL HEX FLOAT INTEGER
        BINARY -- 0b\d+
        OCTAL -- 0o\d+
        HEX -- 0x([abcdef]|\d)+
        FLOAT -- \d+\.\d+
        INTEGER -- \d+

    STRING: DQUOTESTRING | SQUOTESTRING 
        DQUOTESTRING: "([^"]|.)*"
        SQUOTESTRING: '([^']|.)*'

    CHARACTER: SCHARACTER | DCHARACTER
        SCHARACTER: c'([^']|.)'
        DCHARACTER: c"([^"]|.)"

        I'm actually contemplating having b'<char>' instead, but that
        makes single character byte string tricky.  This will probably
        be revisited, but one thought is this: If a single character
        byte string is actually required, do this: b'C'+b'' - ie append
        an empty byte string.  The compiler will be special cased to
        detect this and force the expression to be the single bytestring
        b'C'. It's a bit icky, so for the moment I've added a character
        literal instead to see what works better.

        This isn't ideal, but it deals with the fact that often we do
        want to be able to deal with just characters C in embedded
        systems.

Grammar todo

High Level

program              : statements

statements           : statement
                     | statement statements

statement_block      : INDENT statements DEDENT

statement            : EOL
                     | assignment_statement
                     | general_expression  **[PARTIAL]**
                     | while_statement
                     | break_statement
                     | continue_statement
                     | if_statement
                     | for_statement
                     | import_statement  **[TBD]**
                     | class_statement  **[TBD]**
                     | def_statement  **[TBD]**
                     | return_statement  **[TBD]**
                     | yield_statement  **[TBD]**
                     | pass_statement

NB Previously this included a print_statement. This is now a function call, ala python 3.

Note: general_expression [PARTIAL] means we have parsing of general expressions but not all types have appropriate functionality yet

Open question:

(These are open questions because they are harder to implement on some levels, assert would be useful though, but more useful if try/except were implemented)

non-specific statements

pass_statement       : PASS

Support for class definition [TBD]

class_statement      : CLASS PARENL ident_list PARENR COLON EOL class_block
class_block          : INDENT class_statementlist DEDENT
class_statementlist  : def_statement
                     | assignment_statement

Support for function definition [TBD]

def_statement        : DEF identifier PARENL PARENR COLON EOL statement_block
                     | DEF identifier PARENL ident_list PARENR COLON EOL statement_block

yield_statement      : YIELD general_expression

return_statement     : RETURN
                     | RETURN general_expression

ident_list           : identifier
                     | identifier COMMA ident_list

Assignment Statement

assignment_statement : identifier ASSIGN general_expression

Namespace Functions [TBD]

import_statement     : FROM identifier IMPORT identifier
                     | IMPORT identifier

Block Statements [WIP]

Loops

All of these have been done - to a BARE level. That is:

Selection

if_statement : IF general_expression COLON EOL statement_block
             | IF general_expression COLON EOL statement_block extended_if_clauses

extended_if_clauses : else_clause
                    | elif_clause

elif_clause : ELIF general_expression COLON EOL statement_block
            | ELIF general_expression COLON EOL statement_block extended_if_clauses

else_clause : ELSE COLON EOL statement_block

Expressions involving sub-expressions

general_expression : boolean_expression

boolean_expression : boolean_and_expression
                   | boolean_expression OR boolean_and_expression

boolean_and_expression : boolean_not_expression
                       | boolean_and_expression AND boolean_not_expression

boolean_not_expression : relational_expression
                       | NOT boolean_not_expression

relational_expression : relational_expression COMPARISON_OPERATOR expression
                      | expression

NOTE: Not all types are valid yet, and truthiness needs implementing

Core Expressions [WIP]

expression           : arith_expression  **[WIP]**
                     | arith_expression PLUS expression
                     | arith_expression MINUS expression
                     | arith_expression POWER expression  **[TBD]**

arith_expression     : expression_atom
                     | expression_atom TIMES arith_expression
                     | expression_atom DIVIDE arith_expression

expression_atom      : value_literal
                     | func_call
                     | PARENL general_expression PARENR

value_literal        : number
                     | identifier
                     | string
                     | character
                     | boolean

Note: These are done for ints, floats, and for some strings. ("hello"+"world" for example using std::string)

The lack of strings is why it's not listed as done

Core Literals

number               : INTEGER
                     | FLOAT
                     | HEX
                     | OCTAL
                     | BINARY
                     | MINUS number

string               : STRING

character            : CHARACTER

boolean              : TRUE | FALSE

identifier : IDENTIFIER

Function Calls

func_call            : IDENTIFIER PARENL PARENR
                     | IDENTIFIER PARENL expr_list PARENR

expr_list : general_expression
          | general_expression COMMA expr_list

Practical Details

C++ Interaction

Pyxie is intended to interact with C++, in that it compiles to C++ targetting embedded systems. To that purpose it is useful to be able to pass through commands to C++. In particular the pass through ONLY supports #include pre-processor directives.

The way this is done is through python comments, so for example this is legal:

#include <stdio.h>

As is this:

#include <Arduino.h>

By definition this does not support every aspect that might be needed, but it's a useful start.

Lexical Analysis Implementation

Lexical analyser has the following states:


Informal done list

(The changelog is a better place to look as to what specifically has been done)

Informal todo list

Language features NOT supported [TBC]

Note: Operator precedence needs ironing out [TBD]

Language features To be decided [TBD]



Updated: July 2015 (partially with release 0.0.14)
Help, Contacts & Downloads
Copyright © 2016 Michael Sparks Open Source