Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar railroad diagram #1

Open
mingodad opened this issue Jul 20, 2021 · 20 comments
Open

Grammar railroad diagram #1

mingodad opened this issue Jul 20, 2021 · 20 comments

Comments

@mingodad
Copy link

mingodad commented Jul 20, 2021

Using CocoR (https://github.com/mingodad/CocoR-CPP) parser generator to create a grammar for this project and generating a kind of EBNF accepted by https://www.bottlecaps.de/rr/ui we can see a railroad diagram by copy and paste the EBNF shown bellow on https://www.bottlecaps.de/rr/ui in the tab Edit Grammar then switching to the tab View Diagram.

//
// EBNF generated by CocoR parser generator to be viewed with https://www.bottlecaps.de/rr/ui
//

//
// productions
//

Neat ::= module EOF 
module ::= "module" ident_doted ";" ( parseModuleBody )* 
ident_doted ::= ident ( "." ident )* 
parseModuleBody ::= ( ( "public" | "private" ) )? ( parseClassDecl | parseIntfDecl | parseTemplateDecl | parseExtern | parseDeclaration | parseEnumDecl | parseMacroContinuation | parseFunction | parseUnitTest ) 
parseClassDecl ::= ( abstract_final )? "class" ident ( ":" ident ( "," ident )* )? "{" ( ( override_abstract )? ( access_modifier )? ( parseDeclaration | ( "this" | parseType ident ) ( "(" parseParamList ")" parseStatement | ( "," ident )* ";" ) ) )* "}" 
parseIntfDecl ::= "interface" ident ( ":" ident ( "," ident )* )? "{" ( ( parseDeclaration | parseType ident "(" parseParamList ")" ";" ) )* "}" 
parseTemplateDecl ::= ( "template" ident "(" ident ")" "{" parseClassDecl | parseFunction "}" ) 
parseExtern ::= "extern" "(" "C" ")" parseType ident "(" ( parseIdentifierList )? ")" ";" 
parseDeclaration ::= ( parseImportStatement | parseAliasDecl | parseStructDecl ) 
parseEnumDecl ::= "enum" ident "{" ident ( "," ident )* "}" 
parseMacroContinuation ::= "macro" "(" ident ")" ";" parseModuleBody 
parseFunction ::= parseType ident "(" ( parseIdentifierList )? ")" parseStatement 
parseUnitTest ::= "unittest" parseStatement 
parseImportStatement ::= ( "macro" )? "import" ( "package" "(" ident ")" "." )? ident_doted ":" ident ( string )? ( "," ident )* ";" 
abstract_final ::= ( "abstract" ( "final" )? | "final" ( "abstract" )? ) 
override_abstract ::= ( "override" ( "abstract" )? | "abstract" ( "override" )? ) 
access_modifier ::= ( "public" | "private" | "protected" ) 
parseType ::= parseLeafType ( "*" )* ( ( ( "function" | "delegate" ) "(" ( parseType ( "," parseType )* )? ")" | "[" "]" | "!" parseType ) )? 
parseParamList ::= parseParam ( "," parseParam )* 
parseStatement ::= ( parseReturn | parseIf | parseBreakCont | parseWhile | parseForPrefix | parseScope | parseDeclaration | parseNestedFunctionDecl | parseStatementStatement | parseMultiVarDecl | parseAssignStatement | parseWithStatement | parseEitherCaseStmt | parseExprStatement ) 
parseIdentifierList ::= parseIdentifierTyped ( "," parseIdentifierTyped )* 
parseAliasDecl ::= "alias" ident "=" parseType parseExpression ";" 
parseStructDecl ::= "struct" ident "{" ( ( static_public_private )? ( "this" | "~this" | parseType ident ) ( "(" parseParamList ")" | ( "," ident )* ) ";" )* "}" 
parseReturn ::= "return" ( parseExpression )? ";" 
parseIf ::= "if" "(" ( parseVarDecl | parseExpression ) ")" parseStatement ( "else" parseStatement )? 
parseBreakCont ::= ( "break" | "continue" ) ";" 
parseWhile ::= "while" "(" parseExpression ")" parseStatement 
parseForPrefix ::= "for" "(" ( parseExtFor | parseFor ) ")" parseStatement 
parseScope ::= "{" ( parseStatement )* "}" 
parseNestedFunctionDecl ::= parseFunction 
parseStatementStatement ::= "$stmt" ident ";" 
parseMultiVarDecl ::= ( "mut" )? ( "auto" | parseType ) parseVarInitialization ( "," parseVarInitialization )* ";" 
parseAssignStatement ::= parseAssignment ";" 
parseWithStatement ::= "with" "(" parseExpression ")" parseStatement 
parseEitherCaseStmt ::= parseExpressionLeaf "case" "{" ( parseType ident ":" parseStatement )* "}" 
parseExprStatement ::= parseExpression ";" 
parseParam ::= ( "this." | ( "mut" )? parseType ) ident 
parseLeafType ::= ( parseTupleType | "Vector" "(" parseType "," parseExpression ")" | "typeof" "(" parseExpression ")" | "long" | "int" | "short" | "char" | "ubyte" | "void" | "float" ) 
parseTupleType ::= "(" parseTupleTypeElm ( "," parseTupleTypeElm )* ")" 
parseExpression ::= parseArithmetic 
parseTupleTypeElm ::= parseType ident 
parseIdentifierTyped ::= ( "mut" )? parseType ( ident )? 
static_public_private ::= ( "static" )? ( "public" | "private" ) 
parseVarDecl ::= ( "mut" )? ( "auto" | parseType ) parseVarInitialization 
parseExtFor ::= ( "auto" | parseType ) ident "<-" parseExpression ( ".." parseExpression )? 
parseFor ::= parseVarDecl ";" parseExpression ";" parseAssignment 
parseAssignment ::= parseExpressionLeaf assign_ops parseExpression 
parseVarInitialization ::= ( "=" parseExpression | ident "=" parseExpression ) 
parseExpressionLeaf ::= ( ( "*" | "&" | "!" | "-" | "--" ) parseExpressionLeaf | "new" parseType "(" parseArgumentList ")" | "sizeof" "(" parseType ")" | "cast" "(" parseType ")" | parseTupleExpression | parseExpressionIncDec ( parseProperties )? ) 
assign_ops ::= ( "=" | "+=" | "-=" | "*=" | "/=" | "~=" ) 
parseArgumentList ::= ident_cmp_assign ( "," ident_cmp_assign )* 
parseTupleExpression ::= "(" parseExpression ( "," parseExpression )* ")" 
parseExpressionIncDec ::= ( ( "++" | "--" ) )? parseExpressionBase 
parseProperties ::= ( parseInstanceOf | parseEitherCaseExpr | parseCall | parseMember | parseIndex | parseTemplateInstantiation ) 
parseInstanceOf ::= "instanceOf" "(" parseType ")" 
parseEitherCaseExpr ::= "case" "(" parseEitherCaseElm ( "," parseEitherCaseElm )* ")" 
parseCall ::= "(" ( parseArgumentList )? ")" 
parseMember ::= "." ident 
parseIndex ::= "[" parseExpression ( ".." parseExpression )? "]" 
parseTemplateInstantiation ::= "!" parseType 
parseEitherCaseElm ::= parseType ident ":" "return" parseExpression 
ident_cmp_assign ::= parseExpression ( ( "=" | "==" ) parseExpression )? 
parseExpressionBase ::= ( "." ident | "__HERE__" | "super" | ident ( ":" ident )? | string | floatcon | intcon | charcon | parseStatementExpr | "(" parseExpression ")" | parseArrayLiteral ) 
parseStatementExpr ::= "({" ( parseStatement )* "})" 
parseArrayLiteral ::= "[" parseExpression ( "," parseExpression )* "]" 
parseArithmetic ::= parseExpressionLeaf ( parseBoolOr )? 
parseBoolOr ::= parseBoolAnd ( "||" parseBoolAnd )* 
parseBoolAnd ::= parseComparison ( "&&" parseComparison )* 
parseComparison ::= parseBitShift ( ( "==" | "!=" | "is" | "!is" | ">=" | ">" | "<=" | "<" ) parseBitShift )* 
parseBitShift ::= parseAddSubCat ( ( "<<" | ">>" | ">>>" ) parseAddSubCat )* 
parseAddSubCat ::= parseMulDiv ( ( "+" | "-" | "~" ) parseMulDiv )* 
parseMulDiv ::= parseBitOr ( ( "*" | "/" | "%" ) parseBitOr )* 
parseBitOr ::= parseBitAnd ( "|" parseBitAnd )* 
parseBitAnd ::= parseExpressionLeaf ( "&" parseExpressionLeaf )* 

//
// tokens
//

The CocoR parser so far:

#include "Scanner.nut"

COMPILER Neat

TERMINALS
	T_SYMBOL

CHARACTERS
	letter     = 'A'..'Z' + 'a'..'z' + '_'.
	oct        = '0'..'7'.
	digit      = '0'..'9'.
	nzdigit    = '1'..'9'.
	hex        = digit + 'a'..'f' + 'A'..'F'.
	notQuote   = ANY - '"' - "\r\n".
	notApo     = ANY - '\'' - "\r\n".

	tab        = '\t'.
	cr         = '\r'.
	lf         = '\n'.
	newLine    = cr + lf.
	notNewLine = ANY - newLine .
	ws         = " " + tab + '\u000b' + '\u000c'.


TOKENS
	ident    = letter {letter | digit}.

	floatcon = ( '.' digit {digit} [('e'|'E')  ['+'|'-'] digit {digit}]
						 | digit {digit} '.' {digit} [('e'|'E')  ['+'|'-'] digit {digit}]
						 | digit {digit} ('e'|'E')  ['+'|'-'] digit {digit}
						 )
						 ['f'|'l'|'F'|'L'].

	intcon   = ( nzdigit {digit}
						 | '0' {oct}
						 | ("0x"|"0X") hex {hex}
						 )
						 {'u'|'U'|'l'|'L'}.

	string   = '"' {notQuote} '"'.        // no check for valid escape sequences

	charcon  = '\'' notApo {notApo} '\''. // no check for valid escape sequences

PRAGMAS

COMMENTS FROM "/*" TO "*/" NESTED
COMMENTS FROM "//" TO lf

IGNORE cr + lf + tab

/*-------------------------------------------------------------------------*/

PRODUCTIONS

Neat =
	module
	EOF
	.

module =
	"module" ident_doted ';' {parseModuleBody}
	.

parseModuleBody =
	["public" | "private"] (
		parseClassDecl
		| parseIntfDecl
		| parseTemplateDecl
		| parseExtern
		| parseDeclaration
		| parseEnumDecl
		| parseMacroContinuation
		| parseFunction
		| parseUnitTest
	)
	.

parseImportStatement =
	["macro"] "import" ["package" '(' ident ')' '.'] ident_doted ':' ident [string] {',' ident} ';'
	.

ident_doted =
	ident {'.' ident}
	.

parseClassDecl =
	[abstract_final] "class" ident [':' ident {',' ident}]
	'{' {
		[override_abstract]  [access_modifier] (
			parseDeclaration
			| ("this" | parseType ident) ('(' parseParamList ')' parseStatement | {',' ident} ';')
		)
	} '}'
	.

abstract_final =
	"abstract" ["final"]
	| "final" ["abstract"]
	.

override_abstract =
	"override" ["abstract"]
	| "abstract" ["override"]
	.

access_modifier =
	"public"
	| "private"
	| "protected"
	.

parseIntfDecl =
	"interface" ident [':' ident {',' ident}]
	'{' {
		parseDeclaration
		| parseType ident '(' parseParamList ')'  ';'
	} '}'
	.

parseTemplateDecl =
	"template" ident '(' ident ')'
	'{'
		parseClassDecl
		| parseFunction
	'}'
	.

parseExtern =
	"extern" '(' 'C' ')' parseType ident '(' [parseIdentifierList] ')' ';'
	.

parseDeclaration =
	parseImportStatement
	| parseAliasDecl
	| parseStructDecl
	.

parseEnumDecl =
	"enum" ident '{' ident {',' ident} '}'
	.

parseMacroContinuation =
	"macro" '(' ident ')' ';' parseModuleBody
	.

parseFunction =
	parseType ident '(' [parseIdentifierList] ')' parseStatement
	.

parseUnitTest =
	"unittest" parseStatement
	.

parseStatement =
	parseReturn
	| parseIf
	| parseBreakCont
	| parseWhile
	| parseForPrefix
	| parseScope
	| parseDeclaration
	| parseNestedFunctionDecl
	| parseStatementStatement
	| parseMultiVarDecl
	| parseAssignStatement
	| parseWithStatement
	| parseEitherCaseStmt
	| parseExprStatement
	.

parseParamList =
	parseParam {',' parseParam}
	.

parseParam =
	("this." | ["mut"] parseType) ident
	.

parseType =
	parseLeafType {"*"} [
		("function" | "delegate") '(' [parseType {',' parseType}] ')'
		| '[' ']'
		| '!' parseType
	]
	.

parseLeafType =
	parseTupleType
	| "Vector" '(' parseType ',' parseExpression ')'
	| "typeof" '(' parseExpression ')'
	//| ident {':' ident}
	| "long"
	| "int"
	| "short"
	| "char"
	| "ubyte"
	| "void"
	| "float"
	.

parseTupleType =
	'(' parseTupleTypeElm {',' parseTupleTypeElm} ')'
	.

parseTupleTypeElm =
	parseType ident
	.

parseIdentifierList =
	parseIdentifierTyped {',' parseIdentifierTyped}
	.

parseIdentifierTyped =
	["mut"] parseType [ident]
	.

parseAliasDecl =
	"alias" ident '=' parseType parseExpression ';'
	.

parseStructDecl =
	"struct" ident
	'{' {
		[static_public_private] ("this" | "~this" | parseType ident) ('(' parseParamList ')' | {',' ident}) ';'
	} '}'
	.

static_public_private =
	["static"] ("public" | "private")
	.

parseReturn =
	"return" [parseExpression] ';'
	.

parseIf =
	"if" '(' (parseVarDecl | parseExpression) ')' parseStatement ["else" parseStatement]
	.

parseBreakCont =
	("break" | "continue") ';'
	.

parseWhile =
	"while" '(' parseExpression ')' parseStatement
	.

parseForPrefix =
	"for" '(' (parseExtFor | parseFor) ')' parseStatement
	.

parseExtFor =
	("auto" | parseType) ident "<-" parseExpression [".." parseExpression]
	.

parseFor =
	parseVarDecl ';' parseExpression ';' parseAssignment
	.

parseScope =
	'{' {parseStatement} '}'
	.

parseNestedFunctionDecl =
	parseFunction
	.

parseStatementStatement =
	"$stmt" ident ';'
	.

parseMultiVarDecl =
	["mut"] ("auto" | parseType) parseVarInitialization {',' parseVarInitialization} ';'
	.

parseAssignStatement =
	parseAssignment ';'
	.

parseWithStatement =
	"with" '(' parseExpression ')' parseStatement
	.

parseEitherCaseStmt =
	parseExpressionLeaf "case"
	'{' {
		parseType ident ':' parseStatement
	} '}'
	.

parseExprStatement =
	parseExpression ';'
	.

parseVarDecl =
	["mut"] ("auto" | parseType) parseVarInitialization
	.

parseAssignment =
	parseExpressionLeaf assign_ops parseExpression
	.

assign_ops =
	'='
	| "+="
	| "-="
	| "*="
	| "/="
	| "~="
	.

parseVarInitialization =
	'=' parseExpression
	| ident '=' parseExpression
	.

parseExpressionLeaf =
	('*' | '&' | '!' | '-' | "--") parseExpressionLeaf
	| "new" parseType '(' parseArgumentList ')'
	| "sizeof" '(' parseType ')'
	| "cast" '(' parseType ')'
	| parseTupleExpression
	| parseExpressionIncDec [parseProperties]
	.

parseProperties =
	parseInstanceOf
	| parseEitherCaseExpr
	| parseCall
	| parseMember
	| parseIndex
	| parseTemplateInstantiation
	.

parseInstanceOf =
	"instanceOf" '(' parseType ')'
	.

parseEitherCaseExpr =
	"case" '(' parseEitherCaseElm {',' parseEitherCaseElm}  ')'
	.

parseEitherCaseElm =
	parseType ident ":" "return" parseExpression
	.

parseCall =
	'(' [parseArgumentList] ')'
	.

parseMember =
	'.' ident
	.

parseIndex =
	'[' parseExpression [".." parseExpression] ']'
	.

parseTemplateInstantiation =
	'!' parseType
	.

parseArgumentList =
	ident_cmp_assign {',' ident_cmp_assign}
	.

ident_cmp_assign =
	parseExpression [('=' | "==") parseExpression]
	//| parseExpression
	.

parseExpressionBase =
	'.' ident
	| "__HERE__"
	//| "$"
	| "super"
	| ident [':'ident]
	| string
	| floatcon
	| intcon
	| charcon
	| parseStatementExpr
	| '(' parseExpression ')'
	| parseArrayLiteral
	.

parseStatementExpr =
	"({" {parseStatement} "})"
	.

parseExpression =
	parseArithmetic
	.

parseArithmetic =
	parseExpressionLeaf [parseBoolOr]
	/*
	(
		parseBitAnd
		| parseBitOr
		| parseMulDiv
		| parseAddSubCat
		| parseBitShift
		| parseComparison
		| parseBoolAnd
		| parseBoolOr
	)
	*/
	.

parseBoolOr =
	parseBoolAnd {"||" parseBoolAnd}
	.

parseBoolAnd =
	parseComparison {"&&" parseComparison}
	.

parseComparison =
	parseBitShift {("==" | "!=" | "is" | "!is" | ">=" | ">" | "<=" | "<" ) parseBitShift}
	.

parseBitShift =
	parseAddSubCat {("<<" | ">>" | ">>>") parseAddSubCat}
	.

parseAddSubCat =
	parseMulDiv {('+' | '-' | '~') parseMulDiv}
	.

parseMulDiv =
	parseBitOr {('*' | '/' | '%') parseBitOr}
	.

parseBitOr =
	parseBitAnd {'|' parseBitAnd}
	.

parseBitAnd =
	parseExpressionLeaf {'&' parseExpressionLeaf}
	.

parseTupleExpression =
	'(' parseExpression {',' parseExpression} ')'
	.

parseExpressionIncDec =
	["++" | "--"] parseExpressionBase
	.

parseArrayLiteral =
	'[' parseExpression {',' parseExpression} ']'
	.

END Neat.
@FeepingCreature
Copy link
Contributor

Aaaaa.

This is really cool!! But keeping in mind that macros can accept arbitrary syntax, I'm not sure if it's very viable for parsing?

@mingodad
Copy link
Author

The thing is to get somehow a formal grammar to help guide development/documentation/debug and other people to understand it.

@FeepingCreature
Copy link
Contributor

Right, and my point is that any formal grammar will always be incomplete, because a macro can always come in and add new syntax.

@mingodad
Copy link
Author

I can see that you already used rdparse in https://github.com/FeepingCreature/jerboa, in my opinion CocoR is like rdparse with some extra nice things.

@FeepingCreature
Copy link
Contributor

Any parser generator will run into the issue that code may add new grammar rules at compiletime.

@mingodad
Copy link
Author

I can see your point, but again the users will always start (can count) with the basic/standard syntax to start with.

@FeepingCreature
Copy link
Contributor

Sure. - I mean, by all means, have fun with it. :)

I'll maybe look into a way to embed railroad diagrams in markdown when I start getting into writing detailed documentation.

Though, keep in mind that this is pre-alpha and parts of the syntax will plausibly change.

@mingodad
Copy link
Author

No problem that's the idea having fun and if something useful come out of it that's a premium !

@mingodad
Copy link
Author

By the way I'm trying compile jerboa on ubuntu 18.04 and I'm getting this so far (I'm looking at it now), have abandoned jerboa ?

jerboa/src/vm/dump.c:122:10: error: ‘INSTR_CHECK_CONSTRAINT’ undeclared (first use in this function); did you mean ‘INSTR_SET_CONSTRAINT’?
  122 |     case INSTR_CHECK_CONSTRAINT:
      |          ^~~~~~~~~~~~~~~~~~~~~~
      |          INSTR_SET_CONSTRAINT
/home/mingo/dev/c/A_programming-languages/jerboa/src/vm/dump.c:122:10: note: each undeclared identifier is reported only once for each function it appears in

@mingodad
Copy link
Author

In the enum InstrType there is no INSTR_CHECK_CONSTRAINT maybe you forgot to commit something ?

@mingodad
Copy link
Author

It seems that this project is another iteration of https://github.com/FeepingCreature/fcc

@mingodad
Copy link
Author

I was thinking of http://www.cs.rhul.ac.uk/research/languages/projects/rdp.html when I saw rdparser on jeboa repository.

@mingodad
Copy link
Author

Manually adding INSTR_CHECK_CONSTRAINT to the enum InstrType allow the build process to go further but then it stops here:

Building C object CMakeFiles/repl.dir/src/vm/runtime.c.o
jerboa/src/vm/runtime.c:215:10: fatal error: vm/instrs/float_math.h: No such file or directory
  215 | #include "vm/instrs/float_math.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~

@FeepingCreature
Copy link
Contributor

FeepingCreature commented Jul 20, 2021

Yeah man I haven't touched Jerboa in like three actual years, I think I left a bunch of stuff uncommitted. Maybe go back a few commits? One commit back should do.

Jerboa was never as fast as I wanted, so I lost interest in it.

rdparser is just a recursive descent parser lib.

This project only shares its name with fcc. I couldn't find a good language name, so I just reused the last one I used. But it's a totally separate codebase.

@mingodad
Copy link
Author

I can see that this commit introduces the actuall missing include:

FeepingCreature/jerboa@5ccdba2

@mingodad
Copy link
Author

When I saw Neat-Lang it's like Nim in spirit with a more familiar syntax, I have attempted once do that with nim but their compiler/code generator code was not easy to follow (the bootstrap break if we don't follow some precautions).

I also once embraced D but was burnt by it's constant change without backward compatibility or bug fixes.

I hope you can achieve a nice simple and nice to work programming language with Neat, I once did a refactoring to tinycc to make it reentrant and for easy development using tinycc could be an interesting option.

@FeepingCreature
Copy link
Contributor

FeepingCreature commented Jul 20, 2021

Yay! :)

Yeah we use D at my day job. I think it's pretty stable by now, honestly, and most of the really egregious errors have been fixed by now IMO, but the compiler is sometimes not a joy to work on.

Neat's bootstrap is pretty sensitive as well. I think at some point I should just throw it out and rebootstrap off the generated C files.

I tried to get Jerboa to work again, and it's just completely fucked with three years of bitrot. Sorry.

@mingodad
Copy link
Author

I noticed that there is no individual char representation like in C (ex: '\n') is this a expected feature of Neat or only it 's not yet implemented ?

@FeepingCreature
Copy link
Contributor

Yeah I just haven't gotten around to that yet. Lazyyy. I just use "\n"[0] everywhere rn.

@mingodad
Copy link
Author

I just created a railroad diagram from D based on https://dlang.org/spec/grammar.html and posted it here dlang/dlang.org#3070 (comment) maybe it can help you somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants