Native Intermediate Representation ¶

NIR is high-level object-oriented SSA-based representation. The core of the representation is a subset of LLVM instructions, types and values, augmented with a number of high-level primitives that are necessary to efficiently compile modern languages like Scala.

Contents

Native Intermediate Representation
- Introduction
- Definitions
  - Var
  - Const
  - Declare
  - Define
  - Struct
  - Trait
  - Class
  - Module
- Types
  - Void
  - Vararg
  - Pointer
  - Boolean
  - Integer
  - Float
  - Array
  - Function
  - Struct
  - Unit
  - Nothing
  - Class
  - Trait
  - Module
- Control-Flow
  - unreachable
  - ret
  - jump
  - if
  - switch
  - invoke
  - throw
  - try
- Operands
  - call
  - load
  - store
  - elem
  - extract
  - insert
  - stackalloc
  - bin
  - comp
  - conv
  - sizeof
  - classalloc
  - field
  - method
  - dynmethod
  - as
  - is
- Values
  - Boolean
  - Zero and null
  - Integer
  - Float
  - Struct
  - Array
  - Local
  - Global
  - Unit
  - Null
  - String
- Attributes
  - Inlining
  - Linking
    - link
    - pin
    - pin-if
    - pin-weak
    - stub
  - Misc
    - dyn
    - pure
    - extern
    - override

Introduction ¶

Lets have a look at the textual form of NIR generated for a simple Scala module:

object Test {
  def main(args: Array[String]): Unit =
    println("Hello, world!")
}

Would map to:

pin(@Test$::init) module @Test$ : @java.lang.Object

def @Test$::main_class.ssnr.ObjectArray_unit : (module @Test$, class @scala.scalanative.runtime.ObjectArray) => unit {
  %src.2(%src.0 : module @Test$, %src.1 : class @scala.scalanative.runtime.ObjectArray):
    %src.3 = module @scala.Predef$
    %src.4 = method %src.3 : module @scala.Predef$, @scala.Predef$::println_class.java.lang.Object_unit
    %src.5 = call[(module @scala.Predef$, class @java.lang.Object) => unit] %src.4 : ptr(%src.3 : module @scala.Predef$, "Hello, world!")
    ret %src.5 : unit
}

def @Test$::init : (module @Test$) => unit {
  %src.1(%src.0 : module @Test$):
    %src.2 = call[(class @java.lang.Object) => unit] @java.lang.Object::init : ptr(%src.0 : module @Test$)
    ret unit
}

Here we can see a few distinctive features of the representation:

At its core NIR is very much a classical SSA-based representation. The code consists of basic blocks of instructions. Instructions take value and type parameters. Control flow instructions can only appear as the last instruction of the basic block.
Basic blocks have parameters. Parameters directly correspond to phi instructions in the classical SSA.
The representation is strongly typed. All parameters have explicit type annotations. Instructions may be overloaded for different types via type parameters.
Unlike LLVM, it has support for high-level object-oriented features such as garbage-collected classes, traits and modules. They may contain methods and fields. There is no overloading or access control modifiers so names must be mangled appropriately.
All definitions live in a single top-level scope indexed by globally unique names. During compilation they are lazily loaded until all reachable definitions have been discovered. pin and pin-if attributes are used to express additional dependencies.

Definitions ¶

Var ¶

..$attrs var @$name: $ty = $value

Corresponds to LLVM’s global variables when used in the top-level scope and to fields, when used as a member of classes and modules.

Const ¶

..$attrs const @$name: $type = $value

Corresponds to LLVM’s global constant. Constants may only reside on the top-level and can not be members of classes and modules.

Declare ¶

..$attrs def @$name: $type

Correspond to LLVM’s declare when used on the top-level of the compilation unit and to abstract methods when used inside classes and traits.

Define ¶

..$attrs def @$name: $type { ..$blocks }

Corresponds to LLVM’s define when used on the top-level of the compilation unit and to normal methods when used inside classes, traits and modules.

Struct ¶

..$attrs struct @$name { ..$types }

Corresponds to LLVM’s named struct.

Trait ¶

..$attrs trait @$name : ..$traits

Scala-like traits. May contain abstract and concrete methods as members.

Class ¶

..$attrs class @$name : $parent, ..$traits

Scala-like classes. May contain vars, abstract and concrete methods as members.

Module ¶

..$attrs module @$name : $parent, ..$traits

Scala-like modules (i.e. object $name) May only contain vars and concrete methods as members.

Types ¶

Void ¶

void

Corresponds to LLVM’s void.

Vararg ¶

...

Corresponds to LLVM’s varargs. May only be nested inside function types.

Pointer ¶

ptr

Corresponds to LLVM’s pointer type with a major distinction of not preserving the type of memory that’s being pointed at. Pointers are going to become untyped in LLVM in near future too.

Boolean ¶

bool

Corresponds to LLVM’s i1.

Integer ¶

i8
i16
i32
i64

Corresponds to LLVM integer types. Unlike LLVM we do not support arbitrary width integer types at the moment.

Float ¶

f32
f64

Corresponds to LLVM’s floating point types.

Array ¶

[$type x N]

Corresponds to LLVM’s aggregate array type.

Function ¶

(..$args) => $ret

Corresponds to LLVM’s function type.

Struct ¶

struct @$name
struct { ..$types }

Has two forms: named and anonymous. Corresponds to LLVM’s aggregate structure type.

Unit ¶

unit

A reference type that corresponds to scala.Unit.

Nothing ¶

nothing

Corresponds to scala.Nothing. May only be used a function return type.

Class ¶

class @$name

A reference to a class instance.

Trait ¶

trait @$name

A reference to a trait instance.

Module ¶

module @$name

A reference to a module.

Control-Flow ¶

unreachable ¶

unreachable

If execution reaches undefined instruction the behaviour of execution is undefined starting from that point. Corresponds to LLVM’s unreachable.

ret ¶

ret $value

Returns a value. Corresponds to LLVM’s ret.

jump ¶

jump $next(..$values)

Jumps to the next basic block with provided values for the parameters. Corresponds to LLVM’s unconditional version of br.

if ¶

if $cond then $next1(..$values1) else $next2(..$values2)

Conditionally jumps to one of the basic blocks. Corresponds to LLVM’s conditional form of br.

switch ¶

switch $value {
   case $value1 => $next1(..$values1)
   ...
   default      => $nextN(..$valuesN)
}

Jumps to one of the basic blocks if $value is equal to corresponding $valueN. Corresponds to LLVM’s switch.

invoke ¶

invoke[$type] $ptr(..$values) to $success unwind $failure

Invoke function pointer, jump to success in case value is returned, unwind to failure if exception was thrown. Corresponds to LLVM’s invoke.

throw ¶

throw $value

Throws the values and starts unwinding.

try ¶

try $succ catch $failure

Operands ¶

All non-control-flow instructions follow a general pattern of %$name = $opname[..$types] ..$values. Purely side-effecting operands like store produce unit value.

call ¶

call[$type] $ptr(..$values)

Calls given function of given function type and argument values. Corresponds to LLVM’s call.

load ¶

load[$type] $ptr

Load value of given type from memory. Corresponds to LLVM’s load.

store ¶

store[$type] $ptr, $value

Store value of given type to memory. Corresponds to LLVM’s store.

elem ¶

elem[$type] $ptr, ..$indexes

Compute derived pointer starting from given pointer. Corresponds to LLVM’s getelementptr.

extract ¶

extract[$type] $aggrvalue, $index

Extract element from aggregate value. Corresponds to LLVM’s extractvalue.

insert ¶

insert[$type] $aggrvalue, $value, $index

Create a new aggregate value based on existing one with element at index replaced with new value. Corresponds to LLVM’s insertvalue.

stackalloc ¶

stackalloc[$type]()

Stack allocate a slot of memory big enough to store given type. Corresponds to LLVM’s alloca.

bin ¶

$bin[$type] $value1, $value2`

Where $bin is one of the following: iadd, fadd, isub, fsub, imul, fmul, sdiv, udiv, fdiv, srem, urem, frem, shl, lshr, ashr , and, or, xor. Depending on the type and signedness, maps to either integer or floating point binary operations in LLVM.

comp ¶

$comp[$type] $value1, $value2

Where $comp is one of the following: eq, neq, lt, lte, gt, gte. Depending on the type, maps to either icmp or fcmp with corresponding comparison flags in LLVM.

conv ¶

$conv[$type] $value

Where $conv is one of the following: trunc, zext, sext, fptrunc, fpext, fptoui, fptosi, uitofp, sitofp, ptrtoint, inttoptr, bitcast. Corresponds to LLVM conversion instructions with the same name.

sizeof ¶

sizeof[$type]

Returns a size of given type.

classalloc ¶

classalloc @$name

Roughly corresponds to new $name in Scala. Performs allocation without calling the constructor.

field ¶

field[$type] $value, @$name

Returns a pointer to the given field of given object.

method ¶

method[$type] $value, @$name

Returns a pointer to the given method of given object.

dynmethod ¶

dynmethod $obj, $signature

Returns a pointer to the given method of given object and signature.

as ¶

as[$type] $value

Corresponds to $value.asInstanceOf[$type] in Scala.

is ¶

is[$type] $value

Corresponds to $value.isInstanceOf[$type] in Scala.

Values ¶

Boolean ¶

true
false

Corresponds to LLVM’s true and false.

Zero and null ¶

null
zero $type

Corresponds to LLVM’s null and zeroinitializer.

Integer ¶

Ni8
Ni16
Ni32
Ni64

Correponds to LLVM’s integer values.

Float ¶

N.Nf32
N.Nf64

Corresponds to LLVM’s floating point values.

Struct ¶

struct @$name {..$values}`

Corresponds to LLVM’s struct values.

Array ¶

array $ty {..$values}

Corresponds to LLVM’s array value.

Local ¶

%$name

Named reference to result of previously executed instructions or basic block parameters.

Global ¶

@$name

Reference to the value of top-level definition.

Unit ¶

unit

Corresponds to () in Scala.

Null ¶

null

Corresponds to null literal in Scala.

String ¶

"..."

Corresponds to string literal in Scala.

Attributes ¶

Attributes allow one to attach additional metadata to definitions and instructions.

Inlining ¶

mayinline ¶

mayinline

Default state: optimiser is allowed to inline given method.

inlinehint ¶

inlinehint

Optimiser is incentivized to inline given methods but it is allowed not to.

noinline ¶

noinline

Optimiser must never inline given method.

alwaysinline ¶

alwaysinline

Optimiser must always inline given method.

Linking ¶

link ¶

link($name)

Automatically put $name on a list of native libraries to link with if the given definition is reachable.

pin ¶

pin(@$name)

Require $name to be reachable, whenever current definition is reachable. Used to introduce indirect linking dependencies. For example, module definitions depend on its constructors using this attribute.

pin-if ¶

pin-if(@$name, @$cond)

Require $name to be reachable if current and $cond definitions are both reachable. Used to introduce conditional indirect linking dependencies. For example, class constructors conditionally depend on methods overridden in given class if the method that are being overridden are reachable.

pin-weak ¶

pin-weak(@$name)

Require $name to be reachable if there is a reachable dynmethod with matching signature.

stub ¶

stub

Indicates that the annotated method, class or module is only a stub without implementation. If the linker is configured with linkStubs = false, then these definitions will be ignored and a linking error will be reported. If linkStubs = true, these definitions will be linked.

Misc ¶

dyn ¶

dyn

Indication that a method can be called using a structural type dispatch.

pure ¶

pure

Let optimiser assume that calls to given method are effectively pure. Meaning that if the same method is called twice with exactly the same argument values, it can re-use the result of first invocation without calling the method twice.

extern ¶

extern

Use C-friendly calling convention and don’t name-mangle given method.

override ¶

override(@$name)

Attributed method overrides @$name method if @$name is reachable. $name must be defined in one of the super classes or traits of the parent class.