Java代写Compiler | COMP0012 Compilers Lexing and Parsing Coursework

使用Java为Z sec编程语言构建词法分析器和解析器

COMP0012 Compilers Lexing and Parsing Coursework Late Submission Assessment
Submission Deadline Monday 26th August 2019 @ 11:55PM
The goal of this coursework is to build a lexer and parser for the Z ̃sec programming language. Use JFlex 1.6.1 and Cup version 11b-20160615, using only the specified versions, to automatically generate code for your scanner and parser1. This coursework broadly requires writing regular expressions covering all legal words in the language (Lexer.lex), and a context free grammar describing its rules (Parser.cup).
You must work on this coursework individually and any kind of collaboration, including sharing your own tests with others, is strictly forbidden. Please submit your work (JFLex/CUP specifications) before Monday 26th August 2019 @ 11:55PM. Detailed submission instructions are given at the end of the document.
“There will always be noise on the line.”
—Engineering folklore
Despite our best efforts, this project description contains errors2. Part of this coursework is then to start to develop the skills you will need to cope with such problems now. First, you think hard about issues you discover (of which only a small subset will be errors in this problem statement) and make a reasonable decision and document your decision and its justification accompanying your submission. Second, you can ask stakeholders for clarification.
1 Interpreting the Specification
Throughout your career in IT, you will have to contend with interpreting and understanding specifications. If you fail a test case because of some ambiguities in the specification, you must go on to explain why it is a problem and justify how you decide to resolve it. When marking, we will consider the issues you discover; if your justification is sound, you will get full marks for the relevant test cases.
We have numbered each paragraph, table and program listing in this specification. For each issue that you find, make a note in the following format:
Paragraph: 72
Problem: it is not clear whether we can omit both the start and end indices
in sequence slicing, like “foo := bar[:]”.
Our solution: this is possible in many other languages that support list
slicing (like Python), so our compiler accepts this syntax.
Paragraph: 99
Problem: the spec has an assignment using the equality operator,
“foo = bar;”.
Our solution: we think this is a mistake, and our compiler does not accept
this statement.
Call this file ambiguities.txt and turn it along with your implementation of the parser. 1Section 4 explains why I have imposed these constraints on the permitted versions of these tools.
2Nonetheless, this document is already a clearer specification than any you will find in your subsequent careers in industrial IT. 1

The Z ̃sec Language
You are to build a lexer and a parser for Z ̃sec.
2
§1 §2
§3 Z ̃sec has two types of comments. First, any text, other than a newline, following # to the end of the line isacomment.Second,anytext,includingnewlines,enclosedwithin/# … #/isacomment,andmayspan multiple lines.
§4 An identifier starts with a letter, followed by an arbitrary number of underscores, letters, or digits. Identifiers are case-sensitive. Punctuation other than underscore is not allowed.
§5 Z ̃sec is a security typed language3. Every primitive type must be labelled as either low or high4. A low label is declared with L and a high with H. Information is allowed to flow from L to H, but not vice versa. Security labels allow the compiler to check whether confidential information has been leaked publicly. Checking that the information flow is correct is not part of the coursework. We let S range over H and L in the remainder of this specification.
§6 A character is a single letter, punctuation symbol, or digit wrapped in ’’ and has type char S. The allowed punctuation symbols are space (See http://en.wikipedia.org/wiki/Punctuation) and the ASCII symbols, other than digits, on this page http://www.kerryr.net/pioneers/ascii3.htm.
§7 The boolean constants are T and F and have type bool.
§8 Numbers are integers (type int S), rationals (type rat S), or floats (type float S). Negative numbers are
represented by the ’-’ symbol before the digits. Examples of integers include 1 and -1234; examples of rationals include 1/3 and -345_11/3; examples of floats are -0.1 and 3.14.
§9 Sequences (type seq) are ordered containers of elements. Sequences have nonnegative length. A sequence has a type: its declaration specifies the type of elements it contains. For instance, l : seq<int S > := [1,2,3], and str : seq<char S > := [ ’f’, ’r’, ’e’, ’d’, ’d’, ’y’]. You can use the top S keyword to specify a sequence that contains any type, writing s : seq<top S > := [ 1, 1/2, 3.14, [ ’f’, ’o’, ’u’, ’r’] ].Thezerolengthlistis[].
§10 Z ̃sec sequences support the standard indexing syntax. For any sequence s, the operator len(s) : seq → N returns the length of s and the indices of s range from 0 to len(s)-1. The expression s[index] returns theelementinsatindex.Stringliteralsaresyntacticsugarforcharactersequences,so”abc”is[ ’a’, ’b’ , ’c’].Stringliteralsaretreatedaslowbydefault:theydonotrequireasecurityannotationS.Forthesequence s : seq<charL> := “hello world”,s[len(s)-1]returns‘d’ands[len(s)-1]returns‘d’and len(s) returns 11.
§11 Sequences in Z ̃sec also support sequence slicing as in languages like Python or Ruby: id[i:j] re- turns another sequence, which is a subsequence of id starting at id[i] and ending at id[j]. Given a := [1,2,3,4,5], a[1:3] is [2,3,4]. When the start index is not given, it implies that the subsequence starts from index 0 of the original sequence (e.g., a[:2] is [1,2]). Similarly, when the end index is not given, the subsequence ends with the last element of the original sequence (e.g., a[3:] is [4,5]). Finally, indices can be negative, in which case its value is determined by counting backwards from the end of the original sequence: a[2:-1] is equivalent to a[2:a.len-1] and, therefore, is [3,4,5], while s[-2] is 4. The lower index in a slice must be positive and smaller than the upper index, after the upper index has been subtracted from the sequences length if it was negative.
3See https://goo.gl/XN8ebs and https://goo.gl/ZawHTW. 4Language-Based Information-Flow Security, Sabelfeld and Myers, 2003.
A program in Z ̃sec consists of a list of declarations which defines global variables and new data types as well as functions. This list cannot be empty: it must have a definition for a main function.
2

Primitive Data Types Aggregate Data Types Literal
bool S, int S, rat S, float S, char S seq
string
Table 1: Z ̃sec data types.
Kind
Boolean Numeric
Defined Over
bool int,rat,float
Syntax
!, &&, ||
+ – * / ^
in, ::, len(s), s[i], s[i:j], s[i:], s[:i]
< <= = !=
Sequence seq
Comparison
int,rat,float bool, int, rat, float
Table 2: Z ̃sec operators.
§12 Table 1 defines Z ̃sec’s builtin data types. In Table 2, “!” denotes logical not, “&&” logical and and “||” logical or, as is typical in the C language family. Note that “=” is referential equality and “:=” is the assignment operatorinZ ̃sec.Theinoperatorcheckswhetheranelement(key)ispresentinasequence,asin2 in [1,2,3] andreturnsaboolean.Notethatinonlyoperatesontheoutermostsequence:3 in [[1],[2],[3]]isF,or false. “::” operator denotes concatenation, “s[i]” returns the ith entry in s and “len(s)” returns the length of s as defined in the discussion of sequences and their indexing above.
2.1 Declarations
§13 The syntax of field or variable declaration is “id : type”. A data type declaration is
tdef type_id { declaration_list } ;
where declaration_list is a comma-separated list of field/variable declarations. Once declared, a data type can be used as a type in subsequent declarations. The security label of a new data type is the least upper bound of the labels of its constituent fields. It is not declared by the programmer but calculated during the compiler’s semantic analysis (i.e. not in this coursework).
§14 For readability, Z ̃sec supports type aliasing: the directive “alias old_name new_name ;” can appear in a declaration list and allows the use of new_name in place of old_name.
alias seq<char L> string;
tdef person { name : string, surname : string, age : int H };
tdef family { mother : person, father : person, children : seq<person>
};
Listing 1: Z ̃sec data type declaration examples.
§15 Listing 2 shows the syntax of function declarations in Z ̃sec. Specifically, a function’s name is an identifier. The formal_parameter_list separates parameter declarations, which follow the variable/field declaration syntaxid : type,withcommas.Afunction’sbodycannotbeempty;itconsistsoflocalvariabledeclarations, if any, followed by statements. The return type of a function is returnType; it is omitted when the function is main, or does not return a value.
2.2 Expressions
§16 Z ̃sec expressions are applications of the operators defined above. Parentheses enforce precedence. For
user-defined data type definitions, field references are expressions and have the form id.field. Function calls 3

fdef name (formal_parameter_list) { body } : returnType ; fdef name (formal_parameter_list) { body } ;
Listing 2: Z ̃sec function declaration syntax.
Assumes“person p”previouslydeclared
p.age + 10
b – foo(sum(10, c),
s1 :: s2 :: [1,2]
bar()) = 30
Illustrates method calls Assumess1ands2havetypeseq<int S>
Table 3: Z ̃sec expression examples.
are expressions; the actual parameters of function calls are also expressions that, in the semantic phase (i.e. not this coursework), would be required to produce a type that can unify with the type of their parameter. Table 3 contains example expressions.
2.3 Statements
§17 In Table 4, var indicates a variable. An expression_list is a comma-separated list of expressions. As above, a body consists of local variable declarations (if any), followed by statements. Statements, apart from if-else and loop, terminate with a semicolon. The return statement appears in a function body, where it is optional. In any if statement, there can be zero or one else branch.
Assignment Input
Output Function Call
Control Flow
var := expression ;
read var ;
print expression ;
functionId ( expression_list );
if ( expression )then body fi
if ( expression ) then body else body fi loop body pool
break N; # N is optional and defaults to 1. return expression ;
Table 4: Z ̃sec statements.
§18 Variables may be initialised at the time of declaration: “id : type := init ;”. For newly defined data types, initialisation consists of a sequence of comma-separated values, each of which is assigned to the data type fields in the order of declaration. Listing 3 contains examples.
§19 The statement read var; reads a value from the standard input and stores it in var; the statement print prints evaluation of its expression parameter, followed by a newline.
§20 The if statement behaves like that in the C family language. The unguarded loop statement is the only loopconstructinZ ̃sec.Toexitaloop,onemustusebreak N,usuallycoupledwithanifstatement;theoptional argument N is a positive integer that specifies the number of nested loops to exit and defaults to 1. The use of break statement is forbidden outside a loop. Listing 4 shows how to use loop and break.
§21 Listing 5 shows an example program, contain two functions. The function main is the special Z ̃sec function where execution starts. Z ̃sec’s main returns no value.
4

3
§22
b : int H := 10;
c : string := “hello world!”;
d : person := “Shin”, “Yoo”, 30;
e : char L := ’a’;
f : seq<rat L> := [ 1/2, 3, 4_2/17, -7 ]; g : int L := foo();
Listing 3: Z ̃sec variable declaration and initialization examples.
a : seq<int L> := [1, 2, 3]; b : seq<int L> := [4, 5, 6]; i : int L := 0;
j : int L := 0;
loop
if (2 < i) then
break; fi
loop
if (2 < j) then
break; # break to the outer loop fi
if (b[j] < a[i]) then
break 2; # break out of two loops
fi
j := j + 1;
pool
i := i + 1;
j := 0;
pool
Error Handling
Listing 4: Z ̃sec loop example.
Your parser will be tested against a test suite of positive and negative tests. This testing is scripted; so it is important for your output to match what the script expects. Add the following function definition into the “parser code”sectionofyourCupfile,betweenits{:and:}delimiters.
public void syntax_error(Symbol current_token) { report_error(
“Syntax error at line ” + (current_token.left+1) + “, column ”
+ current_token.right + “. “, null );
}
Listing 6: Z ̃sec compiler error message format.
§23 The provided SC class uses a boolean field syntaxErrors of the parser object to decide whether parsing is successful. So please add such a public field to the Parser class and set it to true when a syntax error is generated.
5

• •
• •
• • •
main { # Main is not necessarily last.
a : seq<int L> := [1,2,3];
b : seq<int L> := reverse(a); # This is a declaration. print b; # This is the required statement.
};
fdef reverse (inseq : seq<top L>) { outseq : seq<top L> := [];
i : int L := 0;
loop
if (len(inseq) <= i) then break;
fi
outseq := inseq[i] :: outseq;
i := i + 1;
pool return outseq;
} : seq<top L>;
Listing 5: Z ̃sec example program. Submission Requirements and Instructions
Your scanner (lexer) must
Use JLex (or JFlex) to automatically generate a scanner for the Z ̃sec language;
Make use of macro definitions where necessary. Choose meaningful token type names to make your
specification readable and understandable;
Ignore whitespace and comments; and
Report the line and the column (offset into the line) where an error, usually unexpected input, first occurred. Use the code in Section 3, which specifies the format that will be matched by the grading script.
Your parser must
Use CUP to automatically produce a parser for the Z ̃sec language;
Resolve ambiguities in expressions using the precedence and associativity rules;
Print “parsing successful”, followed by a newline, if the program is syntactically correct.
Your scanner and parser must work together.
4 §24
§25
§26 §27
§28 I have provided a Makefile on Moodle. This makefile must build your project, from source, when make is issued,
• using JFlex 1.6.1
• using Cup version 11b-20160615 • using Java SE 8 Update 221
• on CentOS 7.6
Once the scanner and parser have been successfully produced using JFlex and CUP, use the provided SC class to test your code on the test files given on the course webpage.
6

§29 If your submission fails to build using this Makefile with these versions of the tools and on the specified operating system, your mark will be zero.
§30 The provided makefile has a test rule. The marking script will call this rule to run your parser against a set of test cases. This set of test cases includes public test cases provided via Moodle and private ones; they include positive tests, on which your parser must emit “parsing successful” followed by a newline and nothing else, and negative tests on which your parser must emit the correct line and column of the error, as specified in Section 3 above. Your mark will be the number of positive tests cases you correctly pass and the number of negative test cases you correctly fail divided by the total number of test cases.
§31 Each student must use the automatic testing system (detailed in Section 5) to submit the coursework.
§32 The deadline for this coursework is Monday 26th August 2019 @ 11:55PM. We strictly adhere to UCL
Academic Manual – Chapter 4: Assessment Framework for Taught Programmes and therefore impose the university’s late submission penalties, shown in ??. However, the automatic testing system will reject any submissions 5 working days after the deadline. If you still would like to submit after that, please attach your code in an email to David Kelly <[email protected]>.
§33 The deadline for this coursework is Monday 26th August 2019 @ 11:55PM. Please visit this page for more detail about UCL’s Late Summer Assessment policy.
5 Automatic Testing Guidelines
Your coursework should be developed using Git. We have created bare git repositories for each of you on a department server. Whenever you push a commit to them, a test suite consisting of a set of public (already given to you) and private test cases is run and then the results of the test run will be emailed to you. You can push your work to the repository as many times as you like before the deadline. All pushes will be automatically rejected 5 working days after the deadline. We will mark the last commit that you have pushed. We advise you to test access to your repository as soon as possible and to push your final commit well before the deadline.
5.1 SSH access to the Computer Science department
Ensure that you can SSH into the department before continuing. Run the following command:
ssh [email protected]
If you do not have access, you are probably using the wrong username: your CS username is different to your UCL username (which you use to log in to Moodle, Portico etc.) Do not try to log in more than three times with an incorrect username, as your IP address will be banned by the department. Please visit Tech Support on the 4th floor of the Malet Place Engineering Building to figure out what your CS username is.
5.2 Pushing your work
Decompress the provided project.tar.gz, which gives you a directory project/ with the following structure:
project/
src/
lib/
bin/
tests/
Initialisethisdirectoryintoagitrepositoryusinggit init.AddyourUCLrepositoryasaremote(thecommand below is a single line):
git remote add origin [email protected]:/cs/student/
comp0012_lsa/YOUR_CS_USERNAME
7

Develop your lexer (i.e. Lexer.lex) and parser (i.e. Parser.cup) in the src/ subdirectory. Stage and commit allyourlocalchangesusinggit addandgit commit.Onceyourcourseworkisbuildingandyouareready to submit, create a text file called student.txt in the root of your repository with a single line containing your student number, UCL email, and name, in this format:
12345678 <[email protected]> Grace Hopper
You MUST provide the correct student number and UCL email address; otherwise, your submission will be rejected. Next, stage and commit student.txt. At this point, your repository should contain at least the following files:
• student.txt
• src/SC.java
• src/Parser.cup • src/Lexer.lex
Push your repository via git push origin master. If all goes well, you should receive an email from “Compilerbot” after a few minutes, which contains your test scores. The server will reject submissions that do not contain a student.txt file or other important files; if your push is rejected, read the error message.
If you think there is an error with this automatic testing system, please contact Zheng Gao <[email protected] uk>.
8