--- layout: post title: "Primitives in Rust are Weird (and Cool)" description: "but mostly weird." category: tags: [rust, c, java, python, x86] --- I wrote a really small Rust program a while back because I was curious. I was 100% convinced it couldn't possibly run: ```rust fn main() { println!("{}", 8.to_string()) } ``` And to my complete befuddlement, it compiled, ran, and produced a completely sensible output. The reason I was so surprised has to do with how Rust treats a special category of things I'm going to call *primitives*. In the current version of the Rust book, you'll see them referred to as [scalars][rust_scalar], and in older versions they'll be called [primitives][rust_primitive], but we're going to stick with the name *primitive* for the time being. Explaining why this program is so cool requires talking about a number of other programming languages, and keeping a consistent terminology makes things easier. **You've been warned:** this is going to be a tedious post about a relatively minor issue that involves Java, Python, C, and x86 Assembly. And also me pretending like I know what I'm talking about with assembly. # Defining primitives (Java) The reason I'm using the name *primitive* comes from how much of my life is Java right now. Spoiler alert: a lot of it. And for the most part I like Java, but I digress. In Java, there's a special name for some specific types of values: > ``` bool char byte short int long float double ``` They are referred to as [primitives][java_primitive]. And relative to the other bits of Java, they have two unique features. First, they don't have to worry about the [billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions); primitives in Java can never be `null`. Second: *they can't have instance methods*. Remember that Rust program from earlier? Java has no idea what to do with it: ```java class Main { public static void main(String[] args) { int x = 8; System.out.println(x.toString()); // Triggers a compiler error } } ``` The error is: ``` Main.java:5: error: int cannot be dereferenced System.out.println(x.toString()); ^ 1 error ``` Specifically, Java's [`Object`](https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html) and things that inherit from it are pointers under the hood, and we have to dereference them before the fields and methods they define can be used. In contrast, *primitive types are just values* - there's nothing to be dereferenced. In memory, they're just a sequence of bits. If we really want, we can turn the `int` into an [`Integer`](https://docs.oracle.com/javase/10/docs/api/java/lang/Integer.html) and then dereference it, but it's a bit wasteful: ```java class Main { public static void main(String[] args) { int x = 8; Integer y = Integer.valueOf(x); System.out.println(y.toString()); } } ``` This creates the variable `y` of type `Integer` (which inherits `Object`), and at run time we dereference `y` to locate the `toString()` function and call it. Rust obviously handles things a bit differently, but we have to dig into the low-level details to see it in action. # Low Level Handling of Primitives (C) We first need to build a foundation for reading and understanding the assembly code the final answer requires. Let's begin with showing how the `C` language (and your computer) thinks about "primitive" values in memory: ```c void my_function(int num) {} int main() { int x = 8; my_function(x); } ``` The [compiler explorer](https://godbolt.org/z/lgNYcc) gives us an easy way of showing off the assembly-level code that's generated: whose output has been lightly edited ```nasm main: push rbp mov rbp, rsp sub rsp, 16 ; We assign the value `8` to `x` here mov DWORD PTR [rbp-4], 8 ; And copy the bits making up `x` to a location ; `my_function` can access (`edi`) mov eax, DWORD PTR [rbp-4] mov edi, eax ; Call `my_function` and give it control call my_function mov eax, 0 leave ret my_function: push rbp mov rbp, rsp ; Copy the bits out of the pre-determined location (`edi`) ; to somewhere we can use mov DWORD PTR [rbp-4], edi nop pop rbp ret ``` At a really low level of memory, we're copying bits around using the [`mov`][x86_guide] instruction; nothing crazy. But to show how similar Rust is, let's take a look at our program translated from C to Rust: ```rust fn my_function(x: i32) {} fn main() { let x = 8; my_function(x) } ``` And the assembly generated when we stick it in the [compiler explorer](https://godbolt.org/z/cAlmk0): again, lightly edited ```nasm example::main: push rax ; Look familiar? We're copying bits to a location for `my_function` ; The compiler just optimizes out holding `x` in memory mov edi, 8 ; Call `my_function` and give it control call example::my_function pop rax ret example::my_function: sub rsp, 4 ; And copying those bits again, just like in C mov dword ptr [rsp], edi add rsp, 4 ret ``` The generated Rust assembly is functionally pretty close to the C assembly: *When working with primitives, we're just dealing with bits in memory*. In Java we have to dereference a pointer to call its functions; in Rust, there's no pointer to dereference. So what exactly is going on with this `.to_string()` function call? # impl primitive (and Python) Now it's time to reveal my trap card show the revelation that tied all this together: *Rust has implementations for its primitive types.* That's right, `impl` blocks aren't only for `structs` and `traits`, primitives get them too. Don't believe me? Check out [u32](https://doc.rust-lang.org/std/primitive.u32.html), [f64](https://doc.rust-lang.org/std/primitive.f64.html) and [char](https://doc.rust-lang.org/std/primitive.char.html) as examples. But the really interesting bit is how Rust turns those `impl` blocks into assembly. Let's break out the [compiler explorer](https://godbolt.org/z/6LBEwq) once again: ```rust pub fn main() { 8.to_string() } ``` And the interesting bits in the assembly: heavily trimmed down ```nasm example::main: sub rsp, 24 mov rdi, rsp lea rax, [rip + .Lbyte_str.u] mov rsi, rax ; Cool stuff right here call ::to_string@PLT mov rdi, rsp call core::ptr::drop_in_place add rsp, 24 ret ``` Now, this assembly is a bit more complicated, but here's the big revelation: **we're calling `to_string()` as a function that exists all on its own, and giving it the instance of `8`**. Instead of thinking of the value 8 as an instance of `u32` and then peeking in to find the location of the function we want to call (like Java), we have a function that exists outside of the instance and just give that function the value `8`. This is an incredibly technical detail, but the interesting idea I had was this: *if `to_string()` is a static function, can I refer to the unbound function and give it an instance?* Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link because I seriously love this thing): ```rust struct MyVal { x: u32 } impl MyVal { fn to_string(&self) -> String { self.x.to_string() } } pub fn main() { let my_val = MyVal { x: 8 }; // THESE ARE THE SAME my_val.to_string(); MyVal::to_string(&my_val); } ``` Rust is totally fine "binding" the function call to the instance, and also as a static. MIND == BLOWN. Python does the same thing where I can both call functions bound to their instances and also call as an unbound function where I give it the instance: ```python class MyClass(): x = 24 def my_function(self): print(self.x) m = MyClass() m.my_function() MyClass.my_function(m) ``` And Python tries to make you *think* that primitives can have instance methods... ```python >>> dir(8) ['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', ... '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', ...] >>> # Theoretically `8.__str__()` should exist, but: >>> 8.__str__() File "", line 1 8.__str__() ^ SyntaxError: invalid syntax >>> # It will run if we assign it first though: >>> x = 8 >>> x.__str__() '8' ``` ...but in practice it's a bit complicated. So while Python handles binding instance methods in a way similar to Rust, it's still not able to run the example we started with. # Conclusion This was a super-roundabout way of demonstrating it, but the way Rust handles incredibly minor details like primitives leads to really cool effects. Primitives are optimized like C in how they have a space-efficient memory layout, yet the language still has a lot of features I enjoy in Python (like both instance and late binding). And when you put it together, there are areas where Rust does cool things nobody else can; as a quirky feature of Rust's type system, `8.to_string()` is actually valid code. Now go forth and fool your friends into thinking you know assembly. This is all I've got. [x86_guide]: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html [java_primitive]: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html [rust_scalar]: https://doc.rust-lang.org/book/second-edition/ch03-02-data-types.html#scalar-types [rust_primitive]: https://doc.rust-lang.org/book/first-edition/primitive-types.html