Native Intermediate Representation¶
NIR is high-level object-oriented SSA-based representation. The core of the representation is a subset of LLVM instructions, types and values, augmented with a number of high-level primitives that are necessary to efficiently compile modern languages like Scala.
Introduction¶
Lets have a look at the textual form of NIR generated for a simple Scala module:
object Test {
def main(args: Array[String]): Unit =
println("Hello, world!")
}
Would map to:
pin(@Test$::init) module @Test$ : @java.lang.Object
def @Test$::main_class.ssnr.ObjectArray_unit : (module @Test$, class @scala.scalanative.runtime.ObjectArray) => unit {
%src.2(%src.0 : module @Test$, %src.1 : class @scala.scalanative.runtime.ObjectArray):
%src.3 = module @scala.Predef$
%src.4 = method %src.3 : module @scala.Predef$, @scala.Predef$::println_class.java.lang.Object_unit
%src.5 = call[(module @scala.Predef$, class @java.lang.Object) => unit] %src.4 : ptr(%src.3 : module @scala.Predef$, "Hello, world!")
ret %src.5 : unit
}
def @Test$::init : (module @Test$) => unit {
%src.1(%src.0 : module @Test$):
%src.2 = call[(class @java.lang.Object) => unit] @java.lang.Object::init : ptr(%src.0 : module @Test$)
ret unit
}
Here we can see a few distinctive features of the representation:
At its core NIR is very much a classical SSA-based representation. The code consists of basic blocks of instructions. Instructions take value and type parameters. Control flow instructions can only appear as the last instruction of the basic block.
Basic blocks have parameters. Parameters directly correspond to phi instructions in the classical SSA.
The representation is strongly typed. All parameters have explicit type annotations. Instructions may be overloaded for different types via type parameters.
Unlike LLVM, it has support for high-level object-oriented features such as garbage-collected classes, traits and modules. They may contain methods and fields. There is no overloading or access control modifiers so names must be mangled appropriately.
All definitions live in a single top-level scope indexed by globally unique names. During compilation they are lazily loaded until all reachable definitions have been discovered.
Definitions¶
Var¶
..$attrs var @$name: $ty = $value
Corresponds to LLVM’s global variables when used in the top-level scope and to fields, when used as a member of classes and modules.
Const¶
..$attrs const @$name: $type = $value
Corresponds to LLVM’s global constant. Constants may only reside on the top-level and can not be members of classes and modules.
Declare¶
..$attrs def @$name: $type
Correspond to LLVM’s declare when used on the top-level of the compilation unit and to abstract methods when used inside classes and traits.
Define¶
..$attrs def @$name: $type { ..$blocks }
Corresponds to LLVM’s define when used on the top-level of the compilation unit and to normal methods when used inside classes, traits and modules.
Struct¶
..$attrs struct @$name { ..$types }
Corresponds to LLVM’s named struct.
Trait¶
..$attrs trait @$name : ..$traits
Scala-like traits. May contain abstract and concrete methods as members.
Class¶
..$attrs class @$name : $parent, ..$traits
Scala-like classes. May contain vars, abstract and concrete methods as members.
Module¶
..$attrs module @$name : $parent, ..$traits
Scala-like modules (i.e. object $name
) May only contain vars and
concrete methods as members.
Types¶
Void¶
void
Corresponds to LLVM’s void.
Vararg¶
...
Corresponds to LLVM’s varargs. May only be nested inside function types.
Pointer¶
ptr
Corresponds to LLVM’s pointer type with a major distinction of not preserving the type of memory that’s being pointed at. Pointers are going to become untyped in LLVM in near future too.
Boolean¶
bool
Corresponds to LLVM’s i1.
Integer¶
i8
i16
i32
i64
Corresponds to LLVM integer types. Unlike LLVM we do not support arbitrary width integer types at the moment.
Float¶
f32
f64
Corresponds to LLVM’s floating point types.
Array¶
[$type x N]
Corresponds to LLVM’s aggregate array type.
Function¶
(..$args) => $ret
Corresponds to LLVM’s function type.
Struct¶
struct @$name
struct { ..$types }
Has two forms: named and anonymous. Corresponds to LLVM’s aggregate structure type.
Unit¶
unit
A reference type that corresponds to scala.Unit
.
Nothing¶
nothing
Corresponds to scala.Nothing
. May only be used a function return type.
Class¶
class @$name
A reference to a class instance.
Trait¶
trait @$name
A reference to a trait instance.
Module¶
module @$name
A reference to a module.
Control-Flow¶
unreachable¶
unreachable
If execution reaches undefined instruction the behaviour of execution is undefined starting from that point. Corresponds to LLVM’s unreachable.
ret¶
ret $value
Returns a value. Corresponds to LLVM’s ret.
jump¶
jump $next(..$values)
Jumps to the next basic block with provided values for the parameters. Corresponds to LLVM’s unconditional version of br.
if¶
if $cond then $next1(..$values1) else $next2(..$values2)
Conditionally jumps to one of the basic blocks. Corresponds to LLVM’s conditional form of br.
switch¶
switch $value {
case $value1 => $next1(..$values1)
...
default => $nextN(..$valuesN)
}
Jumps to one of the basic blocks if $value
is equal to corresponding
$valueN
. Corresponds to LLVM’s
switch.
invoke¶
invoke[$type] $ptr(..$values) to $success unwind $failure
Invoke function pointer, jump to success in case value is returned, unwind to failure if exception was thrown. Corresponds to LLVM’s invoke.
throw¶
throw $value
Throws the values and starts unwinding.
try¶
try $succ catch $failure
Operands¶
All non-control-flow instructions follow a general pattern of
%$name = $opname[..$types] ..$values
. Purely side-effecting operands
like store
produce unit
value.
call¶
call[$type] $ptr(..$values)
Calls given function of given function type and argument values. Corresponds to LLVM’s call.
load¶
load[$type] $ptr
Load value of given type from memory. Corresponds to LLVM’s load.
store¶
store[$type] $ptr, $value
Store value of given type to memory. Corresponds to LLVM’s store.
elem¶
elem[$type] $ptr, ..$indexes
Compute derived pointer starting from given pointer. Corresponds to LLVM’s getelementptr.
extract¶
extract[$type] $aggrvalue, $index
Extract element from aggregate value. Corresponds to LLVM’s extractvalue.
insert¶
insert[$type] $aggrvalue, $value, $index
Create a new aggregate value based on existing one with element at index replaced with new value. Corresponds to LLVM’s insertvalue.
stackalloc¶
stackalloc[$type]()
Stack allocate a slot of memory big enough to store given type. Corresponds to LLVM’s alloca.
bin¶
$bin[$type] $value1, $value2`
Where $bin
is one of the following: iadd
, fadd
, isub
, fsub
,
imul
, fmul
, sdiv
, udiv
, fdiv
, srem
, urem
, frem
, shl
,
lshr
, ashr
, and
, or
, xor
. Depending on the type and
signedness, maps to either integer or floating point binary
operations in
LLVM.
comp¶
$comp[$type] $value1, $value2
Where $comp
is one of the following: eq
, neq
, lt
, lte
, gt
,
gte
. Depending on the type, maps to either
icmp or
fcmp with
corresponding comparison flags in LLVM.
conv¶
$conv[$type] $value
Where $conv
is one of the following: trunc
, zext
, sext
,
fptrunc
, fpext
, fptoui
, fptosi
, uitofp
, sitofp
, ptrtoint
,
inttoptr
, bitcast
. Corresponds to LLVM conversion
instructions
with the same name.
sizeof¶
sizeof[$type]
Returns a size of given type.
classalloc¶
classalloc @$name
Roughly corresponds to new $name
in Scala. Performs allocation without
calling the constructor.
field¶
field[$type] $value, @$name
Returns a pointer to the given field of given object.
method¶
method[$type] $value, @$name
Returns a pointer to the given method of given object.
dynmethod¶
dynmethod $obj, $signature
Returns a pointer to the given method of given object and signature.
as¶
as[$type] $value
Corresponds to $value.asInstanceOf[$type]
in Scala.
is¶
is[$type] $value
Corresponds to $value.isInstanceOf[$type]
in Scala.
Values¶
Boolean¶
true
false
Corresponds to LLVM’s true
and false
.
Zero and null¶
null
zero $type
Corresponds to LLVM’s null
and zeroinitializer
.
Integer¶
Ni8
Ni16
Ni32
Ni64
Correponds to LLVM’s integer values.
Float¶
N.Nf32
N.Nf64
Corresponds to LLVM’s floating point values.
Struct¶
struct @$name {..$values}`
Corresponds to LLVM’s struct values.
Array¶
array $ty {..$values}
Corresponds to LLVM’s array value.
Local¶
%$name
Named reference to result of previously executed instructions or basic block parameters.
Global¶
@$name
Reference to the value of top-level definition.
Unit¶
unit
Corresponds to ()
in Scala.
Null¶
null
Corresponds to null literal in Scala.
String¶
"..."
Corresponds to string literal in Scala.
Attributes¶
Attributes allow one to attach additional metadata to definitions and instructions.
Inlining¶
mayinline¶
mayinline
Default state: optimiser is allowed to inline given method.
inlinehint¶
inlinehint
Optimiser is incentivized to inline given methods but it is allowed not to.
noinline¶
noinline
Optimiser must never inline given method.
alwaysinline¶
alwaysinline
Optimiser must always inline given method.
Linking¶
link¶
link($name)
Automatically put $name
on a list of native libraries to link with if
the given definition is reachable.
pin¶
pin(@$name)
Require $name
to be reachable, whenever current definition is
reachable. Used to introduce indirect linking dependencies. For example,
module definitions depend on its constructors using this attribute.
pin-if¶
pin-if(@$name, @$cond)
Require $name
to be reachable if current and $cond
definitions are
both reachable. Used to introduce conditional indirect linking
dependencies. For example, class constructors conditionally depend on
methods overridden in given class if the method that are being
overridden are reachable.
pin-weak¶
pin-weak(@$name)
Require $name
to be reachable if there is a reachable dynmethod with
matching signature.
stub¶
stub
Indicates that the annotated method, class or module is only a stub
without implementation. If the linker is configured with
linkStubs = false
, then these definitions will be ignored and a
linking error will be reported. If linkStubs = true
, these definitions
will be linked.
Misc¶
dyn¶
dyn
Indication that a method can be called using a structural type dispatch.
pure¶
pure
Let optimiser assume that calls to given method are effectively pure. Meaning that if the same method is called twice with exactly the same argument values, it can re-use the result of first invocation without calling the method twice.
extern¶
extern
Use C-friendly calling convention and don’t name-mangle given method.
override¶
override(@$name)
Attributed method overrides @$name
method if @$name
is reachable.
$name
must be defined in one of the super classes or traits of the
parent class.