Articles

  • No categories

The Java Compiler Kit

[toc title=”Table of Contents” hint=”Click on links to Navigate the Guide” class=toc-right]
The Java Compiler Kit is a straightforward implementation of a [[Java compiler]], designed with extensibility in mind.  Jkit differs from most Java compilers, in that it provides a simple [[intermediate language]] suitable for [[static program analysis]].  In building JKit, the aims where:

  • To help with teaching compilers by considering one for a fully fledged language (Java), rather than a stripped-down imitation language.
  • To aid research in programming languages, compilers and verification.

JKit was started in 2008, and has since been used in a number of research projects.

Overview

As is common in compiler design, JKit uses a (configurable) staged pipeline for processing source files. The pipeline begins by reading and parsing the source file into an [[abstract syntax tree]]; then, various stages are executed in a careful manner to process the AST in various ways, before it is converted into an [[intermediate language]] called the Java Intermediate Language (JIL); the JIL code is finally converted into [[Java Bytecode]] and written out as a [[Java Class File]]. The following gives a pictorial overview of the pipeline which highlights the most important stages:

List of all stages in JKit PipelineThe following briefly describes the purpose of each stage.  The first 11 stages make up the [[Compiler#Front End|front end]], whilst the remainder make up the [[Compiler#Back End|back end]]:

  1. Java File Reader. This stage is responsible for reading a Java source file, parsing it and constructing an [[Abstract Syntax Tree]] (AST).  The ANTLR parser generator is used to construct an initial parse tree, and this is then refined to form the completed AST.
  2. Skeleton Discovery. This stage is responsible for identifying the names of all classes being compiled.  This includes the names of any inner classes which are declared.  This information is used in the following stage to determine what type a given name actually refers to.
  3. Type Resolution. This stages determines the full type of any variable declarations with incomplete types.  For example, a variable declaration String x has an incomplete type.  Most likely, the full type would be java.lang.String.  However, this depends on what classes are declared in the given source file, and on what import statements are present.
  4. Skeleton Building. At this point, more complete information regarding the classes being compiled is known.  In particular, the full type of each method return and parameter is available.  This stage collates all of this information and puts it into a skeleton.  The skeleton for a class records the full type name of the class, its superclass and any interfaces being implemented.  It also records the full type for all method parameters and return values.  This information is critical for type propagation, which occurs later.
  5. Scope Resolution. This stage determines the scope for all variables used in a method.  For example,  suppose an expression x+y may occurs in a method for some class being compiled.  The question is, what are x and y?  They could be local variables, fields of this class, fields of a superclass, or fields of an enclosing class (for non-static inner classes).  To determine this, we need to traverse the list of enclosing scopes and possibly the hierarchy of class skeletons we have constructed.  When a skeleton is needed that is not one of the classes being compiled, the ClassLoader will load it on demand for us using the current CLASSPATH.
  6. Type Propagation. This stage determines the type of all expressions which occur in a method being compiled.  For example, consider an expression x+y.  The type of this expression depends on the declared types of variables x and y.  The Java Language Specification incorporates the notion of binary numeric promotion to determine the proper type.
  7. Type Checking. This stage is responsible for checking that types are used appropriately in all methods being compiled.  Incorrect usages result in syntax errors being produced.  In some cases, automatic conversions may be applied where appropriate.  For example, an int variable may be assigned to a float variable, but this requires the compiler to insert an automatic conversion.
  8. Anon Class Rewrite. This stage is responsible for breaking down anonymous [[inner class|inner classes]].  The stage simply rewrites the source code to something which is equivalent, but doesn’t contain an anonymous inner class.  Instead, it will create a normal class with a special name and replace the anonymous inner class with that.
  9. Inner Class RewriteThis stage is responsible for breaking down [[inner class|inner classes]]. The stage simply rewrites the source code to something which is equivalent, but doesn’t contain an inner class. Instead, it will create a normal class with a special name made up from the parent class name and the inner class name.
  10. Enum RewriteThis stage is responsible for breaking down [[enumerated type|enumerations]]. The stage simply rewrites the source code to something which is equivalent, but doesn’t contain an inner class. Instead, it will create a normal class with a special name made up from the parent class name and the inner class name.
  11. JIL Generation
  12. Dead-Code Elimination
  13. Definite Assignment
  14. Bypass Methods
  15. Bytecode Generation
  16. Peephole Optimisation
  17. ClassFile Generation

Getting Started

Command-Line Options

Further Reading

Java Intermediate Language