|
Aglaia Language Spec v0.1a
|
|
Comments |
Comments are sections of code that are treated as white space by the compiler.
They may be embedded anywhere in your source, except within strings.
There are three kinds of comments: Single line, block, and nested.
Single line:
Single line comments start with // and are terminated by the end of a line.
//this is a comment
this is not
Block:
Block comments start with /* and are terminated by */. Block comments do not nest, but they may span multiple lines.
/* this is a comment
/* so is this */
this is not */
Nested:
Nested comments start with /+ and are terminated by +/. Nested comments can span multiple lines and can of course be nested.
/+ this is a comment
/+ so is this +/
/* as is this */
still a comment +/
not a comment
|
top |
|
Whitespace
|
White space may be a space character, a tab character, the end of a line, or a comment.
White space is largely ignored by the compiler, except when it delimits two symbols that would otherwise be impossible to distinguish.
For example: intmyinteger doesn't make any sense to the computer, but int myinteger does.
An end of line is defined as a newline character, a carriage return character, or the combination of the two in any order.
|
top
|
|
Code Blocks
|
Code blocks are enclosed by a pair of curly braces {}.
Code blocks are divided into sections.
These sections have three basic types: contract, declaratory, and executeable.
Sections are defined by the section name followed by a colon.
Declaratory sections contain things like function and variable declarations.
The types of declaratory sections are: declare, public, protected, and private.
Contract sections contain test that verify the integrity of the program.
The types of contract sections are: invariant, enter, and exit.
Executeable sections contain the code that will actually be executed by the program.
The only type of executeable section is the body section.
These sections will be explained in more detail later.
Each code block defines it's own scope.
Code in blocks can access items that are defined in it's own block, or blocks that enclose it's own block.
A code block cannot access "peer" blocks or blocks that it encloses unless the code is imported wih the import keyword.
|
top
|
|
Modules
|
Modules are the most basic building block of your program.
Modules can be nested.
The sections allowed in modules are: public, private, and invariant.
The invariant section is a contract that is validated every time a function or a property in the module is accessed.
The private and the public sections contain the actual code.
This code can include functions, variables, properties, imports, nested modules, classes, and other miscellaneous type definitions.
Items in the private section can only be accessed from within the module, or from friend modules or classes.
Items in the public section can be accessed from any place in the program that has imported the module.
There may be any number of each type of section in the source, but the compiler will treat them all as if they were in the same section.
Order of declaration is not important.
This means that a function can access a variable that is not declared until later.
module a_module
{
invariant:
//assertions belong here
public:
//"global" functions, properties and variables
private:
//internal functions, properties and variable
}
|
top
|
|
Classes
|
Classes provide an easy to use framework for object oriented programming.
A class is defined with the class keyword, followed by an identifier which is then followed by a code block.
The code block of the class can contain these sections: invariant, public, protected, private.
These sections behave basically the same way as they do in a module code block.
You may have noticed the addition of the protected section.
This section behaves much like the private section, except that classes derived from the class also have access to code in this section.
In addition to importing classes and modules a class may also inherit another class.
This is accomplished through the use of the inherit keyword.
A class that is inherited by another class merges with the inheriting class.
This means that all the parents members can be accessed as if they belonged directly to the child class.
A class may inherit any number of classes.
Declaring a function or a property of the same name as a function or property in a parent class is illegal.
However, one may override the function or property of the parent class.
Once a function or property is overriden all references to that function or property refer to the item in the child class, not the item in the parent class.
This includes references from the parent class.
A class can include abstract functions or properties.
This is done by replacing the code block for that item with a semicolon.
A class that includes an abstract function that has not been overriden cannot be imported.
Once a class has been imported it becomes a type that can be instantiated just like a structure
class A
{
public:
int x;
void abstractfunc();
}
//class a contains x and abstractfunc
class B
{
public:
int y;
protected:
inherit A;
override void abstractfunc()
{
}
}
//class B contains x,y, and abstractfunc
module C
{
public:
import class A; //illegal, class contains abstract member
import class B; //legal, abstract member has been overriden
B z;
//code in this module can access z.y, but not z.x or z.abstractfunc
}
Variable, property and function members of a class may be static.
This means that all instances of the class share that resource.
Static functions and properties cannot be overriden.
Static member are declared by prepending the static keyword to the member declaration.
|
top
|
|
Structures
|
Structures are similar to classes. A structure defines a new type. The code block may contain invariant, public, and private sections.
Just like a class a structure may contain functions, properties, variables, and nested structures.
A structure may not have a module or class nested within it.
A structure can import classes and modules.
A structure can be defined from within any dcelaratory section.
Unlike a class, a structure may not have abstract functions or properties.
Structures also do not support inheritance.
The main reason for including structures in addition to classes is to provide a memory layout mechanism.
For this reason compilers are forbidden to do things like align variables inside of structures to byte boundaries.
Structures may also be used as a lightweight class replacement.
module a
{
public:
struct b
{
public:
int x;
private:
int y;
}
}
|
top
|
|
Enums
|
Enumerated types work basically the exact same way as they do in C.
The syntax is the keyword enum followed by an identifier, followed by a code block.
The code block contains a comma delimited list of identifiers that may be explicitly given an integer value with the usual assignment operator.
enum bool{true:=1,false:=0}
enum bleh{a,b:=0,c:=3}
If there is no value explicitly associated with the identifier then the compiler will choose one arbitrarily.
The value chosen by a compiler will be unique.
If two identifiers are given the same value and they are both in the same enum declaration, then they may be considered equivalent.
|
top
|
|
Literals
|
There are two main types of literals.
String literals and numerics literals.
Strings:
Strings may be surrounded by double quotes, or single quotes.
Single quoted strings may not include escape sequences, but they may include things like carriage returns.
Double quoted strings may include escape characters, but may not include carriage returns.
Strings literals can be concatenated by simple proximity. This means that if only whitespace separates two string literals they are automatically concatenated.
char x[]="hello "'world';//x contains "hello world"
String literals do not include null terminators as they do in C.
Numerics:
Decimal numeric literals are simply written out, just as they are anywhere else.
int x:=10;//x contains ten
real x:=1.0;//x contains one
real x:=1;//x contains one
real x:=.1;//x contains one tenth
real x:=0.1;//x contains one tenth
Hexadecimal and binary literals work the same way that decimal literals do, except they must be prefixed by 0x and 0b respectively
int x:=0b11;//x contains three
real x:=0b1.1;//x contains three halves
real x:=0x1.f;//x contains 31 sixteenths
|
top
|
|
Type Primitives
|
There are three "built-in" types.
These are: int, real, char.
The int type represents integer numbers.
The default byte size of the integer is left up to the implementation, but should probably be what the target processor is most efficient at handling.
One may, however, specify a size by suffixing the type name with a number.
For example, int8 x; int64 y; specifies an eight bit number and a 64 bit number.
The real type represents real numbers (like the float type in C).
Like the int the default size of a real is implementation specific.
You may also specify a specific size by suffixing the type name with a number.
The char type is used to represent character data.
Functionally almost the same as an int.
It is provided merely to make source code readable.
The default size of a char is one byte.
You may suffix numbers onto the type name to specify a different size when larger character sets are in use.
The modifier unsigned may be prepended to the real and int types to indicate that only nonnegative values are valid. unsigned int x;
Note on sizes:
Reccomended default sizes for the average platform are 32 bit int and real.
Reccomended supported sizes are int8, int16, int32, int64, real32, real64 real128.
|
top
|
|
Statements
|
The source code is made up of a series of statements.
Each statement tells the compiler or the running program to take some action.
Statements take the form of a bit of code followed by a code block, a semicolon, or a colon, depending on the context.
With the exception of module and class declarations, all statements must reside inside of a code block.
Statements that define sections of a code block are terminated with a semicolon.
Variable declarations, assignments, etc are terminated by a semicolon.
Conditional statements, type declarations and functions are in general terminated by a code block.
|
top
|
|
Generics
|
The language provides a method of generic programming.
Generics are implemented at the module and class level by passing parameters when importing.
The syntax is as follows:
module a(param1, char param2[param3], int param3, ...)
{
public:
param1[param3] x; //this declares an array of param3 elements of type param1
}
module b
{
public:
import module inta:=a(int, "hello", 5);
import module intb:=a(int, "hello", 5);
//intb and inta share the same memory space, they are identical
}
When importing a module or class using generics you must provide an alias to refer to that specific type of module or class.
This is done by using the syntax shown above.
If an imported module shares the same type as the same module imported anywhere else in the program, then they share memory.
The same rule goes for static members of classes.
Parameters may be a type, or a value.
If the parameter is a type it appears in the parameter list merely as an identifier (param1).
If the parameter is a value then it appears just as it would in a function declaration.
Value parameters must be constants, determineable at compile time.
|
top
|
|
Functions
|
Functions are similar to functions in C.
A function declaration consists of a return type followed by an identifier (it's name), followed by a parameter list, followed by a code block.
type funcname(...)
{
}
Return types:
A function may return any valid type.
A function may also return nothing by specifying a void return type.
The keyword return followed by a value (or nothing if void) is used to exit the function.
A function without a return statement along every possible execution path is generates an error.
Parameter Lists
Parameter's may be of any type and must be given an identifier. The keywords in, out, inout may be prepended to any parameter in the list to define how that parameter should be used.
In parameters are used to pass values into the function.
Any modifications to the variable will not show up outside of the function.
Modifications to Inout, and Out parameters will show up outside the function (unless literals, or constants are passed, rather than variables).
The semantic difference between Inout and Out is nonexistent, both are provided for the programmer, not the compiler.
A parameter may be made read only by using the const modifier.
This is useful when dealing with things like pointers.
This is legal with all three types of parameters, but only makes sense with in parameters.
Programmers are therefore encouraged to only define in parameters to be constant.
Parameters without modifiers are assumed to be in parameters.
Functions that are nonstatic members of classes, structures, etc. are implicitly passed a parameter called this.
In structures and classes this is a pointer to the instance under which the the function was called.
Functions may be declared more than once, provided the parameter list differs, and the overloaded function declarations (with the exception of one primary function) are prefixed with the overload operator.
Code Block:
For sections are allowed in a function code block. These are: enter, exit, declare, body.
Every function must include a body section. If no section is specified then the entire code block is assumed to be part of the body section.
This section is where the actual executeable code goes.
The enter section is a contract that is verified every time the function is called.
The exit section is a contract that is verified every time the function returns.
The declare section is where you must declare any local variables, nested functions, etc.
|
top
|
|
Properties
|
Properties are a special kind of function.
They allow one to simulate member variables with functions.
Property functions come in two flavors.
Readable and writeable.
A writeable property takes one in parameter of the type of variable it is simulating.
It also must return a value of the type is is simulating.
This allows chaining of commands C style.
a.x:=a.y:=2;
Readable functions merely return a value of the type that is being simulated.
A property may include either or both types of functions, however, overloading is not allowed, and there may only be one type involved for each property.
property int fakeint(int x) //writeable
{
}
property int fakeint() //readable
{
}
|
top
|
|
Constructors
|
Constructors are used to setup initial environments for classes, structures, and modules.
There are three types of constructors.
These are static, constant, and initialization.
Initialization constructors are only valid for classes and structures.
Static and constant constructors are valid for classes, structures, and modules.
Constant constructors are simple assignments in declaration blocks.
The value of each static constructor should be able to be determined at compile time.
module a
{
public:
int x:=0;
}
Initialization constructors are functions that have no return type and are declared with the name of the structure or class.
This type of constructor is called every time it's parent type is instantiated.
Constructors may be overloaded, just like any other function.
Calling an overloaded constructor is done by providing a parentheses enclosing the parameter list immediately after the instance declaration.
class a
{
public:
a()
{
}
overload a(int x)
{
}
}
module b
{
public:
a x;
a y(5);
}
Static constructors are used to initialize static elements in classes and structures and modules (everyting in a module is static).
Static constructors are called once when a program starts, before entering the entrypoint.
The syntax for static constructors is basically the same as for initialization constructors.
The keywords static must be prefixed to the function declaration and the parameter list must be empty.
|
top
|
|
Destructors
|
There are two types of destructors: deallocation and static.
Static destructors are called just before a program exits.
Deallocation constructors are called when the item is deallocated, whether by exiting scope, garbage collection, or manual deallocation.
The syntax for destructors is the same as for initialization and static constructors.
Both types of destructors must have an empty parameter list.
Modules my only include static destructors, while classes and structures may include both types.
|
top
|
|
Loops
|
There is only one kind of loop structure.
It is called loop.
The syntax for a loop is the keyword loop followed by a code block.
A code block may include the following sections: declare, doing, body.
If no section is specified the entire code block is assumed to be a body section.
The declare section behaves as it does in modules, classes, functions, etc.
It is where you declare and initialize variable that are local to the loop.
The doing section is where you put code specific to the operation of the loop itself.
This may include counter incrementing, tests for the loops end, etc.
The body section is where all the work that the loop is doing is placed.
There may be any number of doing and body statements in any order in the loop.
The compiler treats the doing and body sections as one big body section.
The doing section is provided for the programmer only, it has no effect on the code inside it.
There is a special keyword used only in loops.
It is until.
The until statement provides a way of breaking out of the loop.
The loop will continue to loop through all the code in it's code block until it reaches an until statement that evaluates to true.
To demonstrate a loop in action here is a simple sort algorithm:
void sort(inout int[] list)
{
declare:
int getsmallestvalue(int start)
{
declare:
int returnVal:=start;
body:
loop
{
declare:
int i:=start;
doing:
i+=1;
until(i>=getlength(list));
body:
if(list[i]<list[returnVal]){returnVal:=i;}
}
return returnVal;
}
body:
loop
{
declare:
int i:=0;
int buffer;
int smallVal;
doing:
until(i>=getlength(list));
body:
buffer:=list[smallVal:=getsmallestvalue(i)];
list[smallVal]:=list[i];
list[i]:=buffer;
doing:
i+=1;
}
}
Note:
In order to enforce the proper usage of the doing and body sections it is reccomended that compilers throw warnings when a statement other than an until in the doing section does not affect a variable declared in the loop.
It is also reccomended that until statements in the body section throw warnings.
|
top
|
|
If..else
|
The if keyword is used define conditional statements.
The syntax for the if statement is the if keyword followed by a pair of parentheses, followed by a code block.
The parentheses must contain an expression that returns an integer value.
If the value returned by the expression is non-zero then the code in the code block will be executed.
If the returned value is zero the code block will be skipped.
The else keyword must immediately follow an if statements code block.
It will be executed if the expression in the if statement returned a zero.
Else must be followed by either a code block or another if statement.
Else statements are optional.
You may have an if without an else, but not an else without an if.
If and else code blocks may contain declare and body statements.
|
top
|
|
Switches
|
A switch statement is used to compare a value for equality with several other values, and then execute a specific section of code depending on the results of the comparison.
Syntax consists of the keyword switch followed by a pair of parentheses, followed by a code block.
The code block may consist of the following sections:
declare, case, default.
If the declare section is present it must appear first.
There may be any number of case sections, but only zero or one default section.
The parentheses contain an expression.
The result of that expression is compared for equality with the results of the expressions in the case statements in the code block.
Comparisons are done in the same order that they appear in the case statements.
Once a case expression is found that is equal to the switch expression, execution jumps to that case, and no more comparisons are done.
If no case expressions match the switch expression then execution jumps to the default section if present.
If there is no default section, and no case expressions match, then execution will skip the code block completely.
The switch statement exhibits fall through behavior.
This means that if a break does not occur at the end of a case or default block, then code in subsequent sections will be executed.
The syntax for defining a case section differs from other section definitions.
It consists of the keyword case followed by a pair of parentheses that contain an expression, followed by a colon.
switch(var)
{
case(1):
case(2):
//code here will be executed if var==1 or var==2
break;
case(3):
//code here will be executed if var==3
break;
default:
//code here will be executed if var
//is not equal to one or two or three
}
|
top
|
|
Labels
|
Any statement in an executeable code blocks may be labelled.
The syntax for labels is an identifier followed by the @ sign, followed by the statement being labelled.
There is no goto statement, but labels may serve as targets for jumps in assembly and targets for breaks and continues.
|
top
|
|
Breaks
|
The break keyword can be used to break out of switches or loops.
The keyword by itself breaks out of the innermost switch or loop.
The keyword followed by a label identifier breaks out of the switch or loop with the corresponding label.
label@loop
{
declare:
int i=0;
doing:
switch(i)
{
case(1):
break;//this goes to blah
case(2):
break label;//this goes to foo
}
blah@i+=1;
}
foo@//stuff here
|
top
|
|
Continues
|
The keyword continue my be used in a loop.
The keyword by itself simply skips any code remaining to be executed in the current iteration of the innermost loop and transfers execution to the first statement in the first doing or body section.
The keyword followed by a label jumps to the beginning to the loop specified by the label.
|
top
|
|
Exceptions
|
Exceptions are controlled by the try, catch and throw keywords.
Exceptions are thrown by using the throw keyword followed by an expression.
If the exception is thrown inside a try block then the code in the appropriate catch block is executed.
If the exception is thrown outside of a try block, or there is no appropriate catch block, then the stack unwinds until an appropriate catch block is found.
If no appropriate catch blocks are found, then the program crashes.
The syntax for a try block is the keyword try followed by a code block.
The code block may contain a declare section and must include a body section.
If no section is specified then a body section s assumed.
The try statement must be followed immediately by one or more catch blocks.
The syntax for a catch block is the keyword catch optionally followed by a pair of parentheses containing a type name and an identifier.
The catch block will handle all exceptions where the return type of the expression in the throw statement matches the type specified in the catch.
If there is no type specifier in the catch statement then all exceptions are caught, regardless of their type.
The catch statement is followed immediately by a code block.
The rules for the code block are the same as the rules for the try code block.
try
{
try
{
throw 1;
}
//first throw 1; is caught here
catch(int blah)
{
throw 2;
}
try
{
throw 3;
}
//throw 3; would not be caught here
//it would be caught in the outer statement
catch(char blah)
{
}
}
//second throw 2; is caught here
catch(int blah)
{
}
|
top
|
|
Assertions
|
Assertions provide a method for ensuring the integrity of the program.
An assert statement consists of the keyword assert followed by a pair of parentheses.
The parentheses contain an expression that evaluates to an integer value.
If that value is zero then an uncatchable exception is thrown.
Assertions will usually appear in contract sections, but may also be used in executable sections.
Assertions, along with any other code in the contract sections should only be compiled into debug builds.
They should be ignored in release builds.
|
top
|
|
Deprecation
|
The keyword deprecated may be prefixed to any module, class, structure, function, etc.
When any item marked as deprecated is accessed by code outside of that item, then the compiler should throw an error.
The compiler should provide a way to change this error to a warning.
This provides a method to safely phase out outdated code.
|
top
|
|
Constants
|
Constants are special readonly variables.
After they are declared no values may be assigned to them.
If they are classes or structures then any member functions that are called must obey the same rules with the member variables.
Of course there are a million ways around this, but constants are there for a reason and compilers should do their best to flag suspicious behavior.
Constants are declared by prefixing the const modifier to a variable declaration.
|
top
|
|
Pointers and arrays
|
Pointers are work basically the same as they do in C.
A pointer is declared by postfixing the asterisk to the type in the variable declaration.
If you wish to get the addess of a variable, then you should use the addressof operator.
Pointers are dereferenced with either * or -> just as they are in C++.
int* blah;
Static arrays are given their length at compile time.
Static arrays are declared by postfixing a type with brackets containing an integer value.
Static arrays are really just constant pointers.
//as far as the casual programmer is concerned
//there is no difference between these two statements
int[5] blah;
const int* blah:=new int[5];
Elements of static arrays may be accessed by postfixing brackets containing an index expression to the array name.
Array indexes are zero based.
blah[0]:=1; //this assigns 1 to the first element of the array
Dynamic arrays contain a pointer and a length value, in that order.
They are declared by postfixing empty brackets to the type in the declaration.
Bounds checking should be done on these arrays when built for debugging.
Bounds checking is optional in release builds.
Element access is done the same way for both static and dynamic arrays.
Advanced array manipulation (resizing, copying, etc) will be delegated to the runtime library for now.
Advanced declarations may be created by staring on the right and working to the left
//you want to declare an array of three pointers
//to a dynamic array of ints
//array of three
[3] blah;
//array of three pointers
*[3] blah;
//array of three pointers to a dynamic array
[]*[3] blah;
//array of three pointers to a dynamic array of ints
int[]*[3] blah;
|
top
|
|
Delegates
|
Delegates replace C-style function pointers.
A delegate contains a pointer to the function, as well as a stack frame pointer, or an object/structure pointer when necessary.
The syntax for declaring a delegate is nearly identical to declaring a function.
Simply replace the function name with the delegate keyword, then replace the code block with an identifier and a semicolon.
You can then call the original function using the delegates identifier.
Note that there is nothing preventing you from doing something stupid like creating a delegate that references an object instance, then deleting that object and calling the delegate.
Also note that static functions are considered "constant".
Static functions include mudule level functions as well as static members of classes and structures.
This means that generic modules and classes may include delegates in their parameter lists.
int bleh()
{
}
int foo()
{
declare:
int delegate() bar:=bleh;
body:
return bar();
}
|
top
|
|
Identifiers
|
Identifiers start with a letter and are followed by any number of letters, numbers or _ in any order.
Identifiers may be arbitrarily long and are case sensitive.
Implementations that limit the length of identifiers should throw an error when encountering too long identifiers.
They should not under any circumstances discard the excess characters as is done in C.
Identifiers cannot be keywords.
No two identifiers in the same code block can be the same, even if they serve different purposes.
This means that a label can't have the same name as a variable, a variable can't have the same name as a function, etc.
The letters i-n-t, r-e-a-l, c-h-a-r followed by any number are considered keywords, even that particular size is not implemented.
This means that int[3] int3; is illegal, even though I doubt there will ever be an implementation that supports three bit ints.
|
top
|
|
Memory Allocation/Deallocation
|
Memory is allocated with the new keyword.
This keyword works the same as it does in C++.
This language is garbage collected, which means that usually there is no need to manually deallocate memory.
However, when manual deallocation is required, the free keyword may be used.
This keyword works the same as C++'s delete.
|
top
|
|
Inline Assembly
|
Any executeable section may include inline assembly.
The assembly code block is defined by the asm keyword, followed by a pair of curly braces that enclose the assembly.
Mnemonics and so forth are going to be processor dependent, but the syntax should remain as true to the main language as possible.
This means lines should end with a semicolon.
Comments should be double slash, slash asterisk, slash plus style.
Labels should be an identifier followed by an at sign.
Literals should be written the same.
Etc, etc.
|
top
|
|
Operators
|
The following is a table of all the operators for this language:
Symbol | Meaning | Example |
:= | Assignment |
int x:=0;//x contains 0
x:=3;//x contains 3 |
+ | Addition |
x:=3+3;//x contains 6 |
- | Subtraction/Negation |
x:=3-2;//x contains 1
x:=3+-2;//x contains 1 |
* | |
|
/ | |
|
<< | |
|
>> | |
|
% | |
|
& | |
|
| | |
|
~ | |
|
^ | |
|
+= -= *= /= %= <<= >>= &= |= ^= | Self arithmetic |
x+=2;//same as x:=x+2; |
&& | |
|
|| | |
|
! | |
|
< > <= >= == != | |
|
[] | |
|
() | |
|
cast() | |
|
addressof() | |
|
sizeof() | |
|
:: | |
|
-> | |
|
|
top
|
|
Version Control
|
Source level version control is possible.
This is done via the version statement.
The version statement may occur in any section in any code block.
The syntax for this statement is the keyword version followed by a code block.
The code block may contain any number of sections that are given any name.
There can only be one section with any given name.
The code in these sections must follow the rules of the section that the version code block is embedded in.
There is one special section defined by the default keyword.
Code in this section will be compiled in only if no other sections are satisfied.
Only one section is allowed to be used for any given version block.
If the name of the section is a numeric literal, then the highest number that is not greater than the target number is used.
Targets are specified at compile time.
Only one section of each version statement may be satisfied.
If more or less than one section are satisfied, then the compiler must throw an error.
version
{
x86:
asm
{
//x86 assembly goes here
}
ppc:
asm
{
//powerpc assembly goes here
}
}
version
{
windows:
//windows specific code goes here
linux:
//linux specific code goes here
default:
//code that works on all other platforms goes here
}
|
top
|
|