speice.io/_posts/2018-09-01-primitives-in-ru...

---
layout: post
title: "Primitives in Rust are Weird (and Cool)"
description: "but mostly weird."
category:
tags: [rust, c, java, python, x86]
---

I wrote a really small Rust program a while back because I was curious. I was 100% convinced it
couldn't possibly run:

```rust
fn main() {
    println!("{}", 8.to_string())
}
```

And to my complete befuddlement, it compiled, ran, and produced a completely sensible output.
The reason I was so surprised has to do with how Rust treats a special category of things
I'm going to call *primitives*. In the current version of the Rust book, you'll see them
referred to as [scalars][rust_scalar], and in older versions they'll be called [primitives][rust_primitive],
but we're going to stick with the name *primitive* for the time being. Explaining
why this program is so cool requires talking about a number of other programming languages,
and keeping a consistent terminology makes things easier.

**You've been warned:** this is going to be a tedious post about a relatively minor issue that involves
Java, Python, C, and x86 Assembly. And also me pretending like I know what I'm talking about with assembly.

# Defining primitives (Java)

The reason I'm using the name *primitive* comes from how much of my life is Java right now.
Spoiler alert: a lot of it. And for the most part I like Java, but I digress. In Java, there's a
special name for some specific types of values:

> ```
bool    char    byte
short   int     long
float   double
```

They are referred to as [primitives][java_primitive]. And relative to the other bits of Java,
they have two unique features. First, they don't have to worry about the
[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions);
primitives in Java can never be `null`. Second: *they can't have instance methods*.
Remember that Rust program from earlier? Java has no idea what to do with it:

```java
class Main {
    public static void main(String[] args) {
        int x = 8;
        System.out.println(x.toString()); // Triggers a compiler error
    }
}
```

The error is:

```
Main.java:5: error: int cannot be dereferenced
        System.out.println(x.toString());
                            ^
1 error
```

Specifically, Java's [`Object`](https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html)
and things that inherit from it are pointers under the hood, and we have to dereference them before
the fields and methods they define can be used. In contrast, *primitive types are just values* -
there's nothing to be dereferenced. In memory, they're just a sequence of bits.

If we really want, we can turn the `int` into an
[`Integer`](https://docs.oracle.com/javase/10/docs/api/java/lang/Integer.html) and then
dereference it, but it's a bit wasteful:

```java
class Main {
    public static void main(String[] args) {
        int x = 8;
        Integer y = Integer.valueOf(x);
        System.out.println(y.toString());
    }
}
```

This creates the variable `y` of type `Integer` (which inherits `Object`), and at run time we
dereference `y` to locate the `toString()` function and call it. Rust obviously handles things a bit
differently, but we have to dig into the low-level details to see it in action.

# Low Level Handling of Primitives (C)

We first need to build a foundation for reading and understanding the assembly code the
final answer requires. Let's begin with showing how the `C` language (and your computer)
thinks about "primitive" values in memory:

```c
void my_function(int num) {}

int main() {
    int x = 8;
    my_function(x);
}
```

The [compiler explorer](https://godbolt.org/z/lgNYcc) gives us an easy way of showing off
the assembly-level code that's generated: <span style="font-size:.6em">whose output has been lightly edited</span>

```nasm
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16

        ; We assign the value `8` to `x` here
        mov     DWORD PTR [rbp-4], 8

        ; And copy the bits making up `x` to a location
        ; `my_function` can access (`edi`)
        mov     eax, DWORD PTR [rbp-4]
        mov     edi, eax

        ; Call `my_function` and give it control
        call    my_function

        mov     eax, 0
        leave
        ret

my_function:
        push    rbp
        mov     rbp, rsp

        ; Copy the bits out of the pre-determined location (`edi`)
        ; to somewhere we can use
        mov     DWORD PTR [rbp-4], edi
        nop

        pop     rbp
        ret
```

At a really low level of memory, we're copying bits around using the [`mov`][x86_guide] instruction; nothing crazy.
But to show how similar Rust is, let's take a look at our program translated from C to Rust:

```rust
fn my_function(x: i32) {}

fn main() {
    let x = 8;
    my_function(x)
}
```

And the assembly generated when we stick it in the [compiler explorer](https://godbolt.org/z/cAlmk0):
<span style="font-size:.6em">again, lightly edited</span>

```nasm
example::main:
  push rax

  ; Look familiar? We're copying bits to a location for `my_function`
  ; The compiler just optimizes out holding `x` in memory
  mov edi, 8

  ; Call `my_function` and give it control
  call example::my_function

  pop rax
  ret

example::my_function:
  sub rsp, 4

  ; And copying those bits again, just like in C
  mov dword ptr [rsp], edi

  add rsp, 4
  ret
```

The generated Rust assembly is functionally pretty close to the C assembly:
*When working with primitives, we're just dealing with bits in memory*.

In Java we have to dereference a pointer to call its functions; in Rust, there's no pointer to dereference. So what
exactly is going on with this `.to_string()` function call?

# impl primitive (and Python)

Now it's time to <strike>reveal my trap card</strike> show the revelation that tied all this together: *Rust has
implementations for its primitive types.* That's right, `impl` blocks aren't only for `structs` and `traits`,
primitives get them too. Don't believe me? Check out [u32](https://doc.rust-lang.org/std/primitive.u32.html),
[f64](https://doc.rust-lang.org/std/primitive.f64.html) and [char](https://doc.rust-lang.org/std/primitive.char.html)
as examples.

But the really interesting bit is how Rust turns those `impl` blocks into assembly. Let's break out the
[compiler explorer](https://godbolt.org/z/6LBEwq) once again:

```rust
pub fn main() {
    8.to_string()
}
```

And the interesting bits in the assembly: <span style="font-size:.6em">heavily trimmed down</span>

```nasm
example::main:
  sub rsp, 24
  mov rdi, rsp
  lea rax, [rip + .Lbyte_str.u]
  mov rsi, rax

  ; Cool stuff right here
  call <T as alloc::string::ToString>::to_string@PLT

  mov rdi, rsp
  call core::ptr::drop_in_place
  add rsp, 24
  ret
```

Now, this assembly is a bit more complicated, but here's the big revelation: **we're calling
`to_string()` as a function that exists all on its own, and giving it the instance of `8`**.
Instead of thinking of the value 8 as an instance of `u32` and then peeking in to find
the location of the function we want to call (like Java), we have a function that exists
outside of the instance and just give that function the value `8`.

This is an incredibly technical detail, but the interesting idea I had was this:
*if `to_string()` is a static function, can I refer to the unbound function and give it an instance?*

Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link
because I seriously love this thing):

```rust
struct MyVal {
    x: u32
}

impl MyVal {
    fn to_string(&self) -> String {
        self.x.to_string()
    }
}

pub fn main() {
    let my_val = MyVal { x: 8 };

    // THESE ARE THE SAME
    my_val.to_string();
    MyVal::to_string(&my_val);
}
```

Rust is totally fine "binding" the function call to the instance, and also as a static.

MIND == BLOWN.

Python does the same thing where I can both call functions bound to their instances
and also call as an unbound function where I give it the instance:

```python
class MyClass():
    x = 24

    def my_function(self):
        print(self.x)

m = MyClass()

m.my_function()
MyClass.my_function(m)
```

And Python tries to make you *think* that primitives can have instance methods...

```python
>>> dir(8)
['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__',
'__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__',
...
'__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__',
...]

>>> # Theoretically `8.__str__()` should exist, but:

>>> 8.__str__()
  File "<stdin>", line 1
    8.__str__()
             ^
SyntaxError: invalid syntax

>>> # It will run if we assign it first though:
>>> x = 8
>>> x.__str__()
'8'
```

...but in practice it's a bit complicated.

So while Python handles binding instance methods in a way similar to Rust, it's still not able
to run the example we started with.

# Conclusion

This was a super-roundabout way of demonstrating it, but the way Rust handles incredibly minor details
like primitives leads to really cool effects. Primitives are optimized like C in how they have a
space-efficient memory layout, yet the language still has a lot of features I enjoy in Python
(like both instance and late binding).

And when you put it together, there are areas where Rust does cool things nobody else can;
as a quirky feature of Rust's type system, `8.to_string()` is actually valid code.

Now go forth and fool your friends into thinking you know assembly. This is all I've got.

[x86_guide]: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
[java_primitive]: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
[rust_scalar]: https://doc.rust-lang.org/book/second-edition/ch03-02-data-types.html#scalar-types
[rust_primitive]: https://doc.rust-lang.org/book/first-edition/primitive-types.html