Native Intermediate Representation¶
NIR is high-level object-oriented SSA-based representation. The core of the representation is a subset of LLVM instructions, types and values, augmented with a number of high-level primitives that are necessary to efficiently compile modern languages like Scala.
Contents
Introduction¶
Lets have a look at the textual form of NIR generated for a simple Scala module:
object Test {
def main(args: Array[String]): Unit =
println("Hello, world!")
}
Would map to:
pin(@Test$::init) module @Test$ : @java.lang.Object
def @Test$::main_class.ssnr.ObjectArray_unit : (module @Test$, class @scala.scalanative.runtime.ObjectArray) => unit {
%src.2(%src.0 : module @Test$, %src.1 : class @scala.scalanative.runtime.ObjectArray):
%src.3 = module @scala.Predef$
%src.4 = method %src.3 : module @scala.Predef$, @scala.Predef$::println_class.java.lang.Object_unit
%src.5 = call[(module @scala.Predef$, class @java.lang.Object) => unit] %src.4 : ptr(%src.3 : module @scala.Predef$, "Hello, world!")
ret %src.5 : unit
}
def @Test$::init : (module @Test$) => unit {
%src.1(%src.0 : module @Test$):
%src.2 = call[(class @java.lang.Object) => unit] @java.lang.Object::init : ptr(%src.0 : module @Test$)
ret unit
}
Here we can see a few distinctive features of the representation:
- At its core NIR is very much a classical SSA-based representation. The code consists of basic blocks of instructions. Instructions take value and type parameters. Control flow instructions can only appear as the last instruction of the basic block.
- Basic blocks have parameters. Parameters directly correspond to phi instructions in the classical SSA.
- The representation is strongly typed. All parameters have explicit type annotations. Instructions may be overloaded for different types via type parameters.
- Unlike LLVM, it has support for high-level object-oriented features such as garbage-collected classes, traits and modules. They may contain methods and fields. There is no overloading or access control modifiers so names must be mangled appropriately.
- All definitions live in a single top-level scope indexed by globally unique names. During compilation they are lazily loaded until all reachable definitions have been discovered. pin and pin-if attributes are used to express additional dependencies.
Definitions¶
Var¶
..$attrs var @$name: $ty = $value
Corresponds to LLVM’s global variables when used in the top-level scope and to fields, when used as a member of classes and modules.
Const¶
..$attrs const @$name: $type = $value
Corresponds to LLVM’s global constant. Constants may only reside on the top-level and can not be members of classes and modules.
Declare¶
..$attrs def @$name: $type
Correspond to LLVM’s declare when used on the top-level of the compilation unit and to abstract methods when used inside classes and traits.
Define¶
..$attrs def @$name: $type { ..$blocks }
Corresponds to LLVM’s define when used on the top-level of the compilation unit and to normal methods when used inside classes, traits and modules.
Trait¶
..$attrs trait @$name : ..$traits
Scala-like traits. May contain abstract and concrete methods as members.
Types¶
Pointer¶
ptr
Corresponds to LLVM’s pointer type with a major distinction of not preserving the type of memory that’s being pointed at. Pointers are going to become untyped in LLVM in near future too.
Integer¶
i8
i16
i32
i64
Corresponds to LLVM integer types. Unlike LLVM we do not support arbitrary width integer types at the moment.
Struct¶
struct @$name
struct { ..$types }
Has two forms: named and anonymous. Corresponds to LLVM’s aggregate structure type.
Control-Flow¶
unreachable¶
unreachable
If execution reaches undefined instruction the behaviour of execution is undefined starting from that point. Corresponds to LLVM’s unreachable.
jump¶
jump $next(..$values)
Jumps to the next basic block with provided values for the parameters. Corresponds to LLVM’s unconditional version of br.
if¶
if $cond then $next1(..$values1) else $next2(..$values2)
Conditionally jumps to one of the basic blocks. Corresponds to LLVM’s conditional form of br.
switch¶
switch $value {
case $value1 => $next1(..$values1)
...
default => $nextN(..$valuesN)
}
Jumps to one of the basic blocks if $value
is equal to
corresponding $valueN
. Corresponds to LLVM’s
switch.
Operands¶
All non-control-flow instructions follow a general pattern of
%$name = $opname[..$types] ..$values
. Purely side-effecting operands
like store
produce unit
value.
call¶
call[$type] $ptr(..$values)
Calls given function of given function type and argument values. Corresponds to LLVM’s call.
elem¶
elem[$type] $ptr, ..$indexes
Compute derived pointer starting from given pointer. Corresponds to LLVM’s getelementptr.
extract¶
extract[$type] $aggrvalue, $index
Extract element from aggregate value. Corresponds to LLVM’s extractvalue.
insert¶
insert[$type] $aggrvalue, $value, $index
Create a new aggregate value based on existing one with element at index replaced with new value. Corresponds to LLVM’s insertvalue.
stackalloc¶
stackalloc[$type]()
Stack allocate a slot of memory big enough to store given type. Corresponds to LLVM’s alloca.
bin¶
$bin[$type] $value1, $value2`
Where $bin
is one of the following:
iadd
, fadd
, isub
, fsub
, imul
, fmul
,
sdiv
, udiv
, fdiv
, srem
, urem
, frem
,
shl
, lshr
, ashr
, and
, or
, xor
.
Depending on the type and signedness, maps to either integer or floating point
binary operations in LLVM.
comp¶
$comp[$type] $value1, $value2
Where $comp
is one of the following: eq
, neq
, lt
, lte
,
gt
, gte
. Depending on the type, maps to either
icmp or
fcmp with
corresponding comparison flags in LLVM.
conv¶
$conv[$type] $value
Where $conv
is one of the following: trunc
, zext
, sext
, fptrunc
,
fpext
, fptoui
, fptosi
, uitofp
, sitofp
, ptrtoint
, inttoptr
,
bitcast
.
Corresponds to LLVM
conversion instructions
with the same name.
classalloc¶
classalloc @$name
Roughly corresponds to new $name
in Scala.
Performs allocation without calling the constructor.
Values¶
Attributes¶
Attributes allow one to attach additional metadata to definitions and instructions.
Linking¶
link¶
link($name)
Automatically put $name
on a list of native libraries to link with if the
given definition is reachable.
pin¶
pin(@$name)
Require $name
to be reachable, whenever current definition is reachable.
Used to introduce indirect linking dependencies. For example, module definitions
depend on its constructors using this attribute.
pin-if¶
pin-if(@$name, @$cond)
Require $name
to be reachable if current and $cond
definitions are
both reachable. Used to introduce conditional indirect linking dependencies.
For example, class constructors conditionally depend on methods overridden in
given class if the method that are being overridden are reachable.