mirror of
				https://github.com/bspeice/speice.io
				synced 2025-11-03 18:10:32 -05:00 
			
		
		
		
	First posts from speice.io
This commit is contained in:
		
							
								
								
									
										323
									
								
								blog/2018-09-01-primitives-in-rust-are-weird/index.mdx
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										323
									
								
								blog/2018-09-01-primitives-in-rust-are-weird/index.mdx
									
									
									
									
									
										Normal file
									
								
							@ -0,0 +1,323 @@
 | 
			
		||||
---
 | 
			
		||||
slug: 2018/09/primitives-in-rust-are-weird
 | 
			
		||||
title: "Primitives in Rust are weird (and cool)"
 | 
			
		||||
date: 2018-09-01 12:00:00
 | 
			
		||||
authors: [bspeice]
 | 
			
		||||
tags: []
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
I wrote a really small Rust program a while back because I was curious. I was 100% convinced it
 | 
			
		||||
couldn't possibly run:
 | 
			
		||||
 | 
			
		||||
```rust
 | 
			
		||||
fn main() {
 | 
			
		||||
    println!("{}", 8.to_string())
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
And to my complete befuddlement, it compiled, ran, and produced a completely sensible output.
 | 
			
		||||
 | 
			
		||||
<!-- truncate -->
 | 
			
		||||
 | 
			
		||||
The reason I was so surprised has to do with how Rust treats a special category of things I'm going to
 | 
			
		||||
call _primitives_. In the current version of the Rust book, you'll see them referred to as
 | 
			
		||||
[scalars][rust_scalar], and in older versions they'll be called [primitives][rust_primitive], but
 | 
			
		||||
we're going to stick with the name _primitive_ for the time being. Explaining why this program is so
 | 
			
		||||
cool requires talking about a number of other programming languages, and keeping a consistent
 | 
			
		||||
terminology makes things easier.
 | 
			
		||||
 | 
			
		||||
**You've been warned:** this is going to be a tedious post about a relatively minor issue that
 | 
			
		||||
involves Java, Python, C, and x86 Assembly. And also me pretending like I know what I'm talking
 | 
			
		||||
about with assembly.
 | 
			
		||||
 | 
			
		||||
## Defining primitives (Java)
 | 
			
		||||
 | 
			
		||||
The reason I'm using the name _primitive_ comes from how much of my life is Java right now. For the most part I like Java, but I digress. In Java, there's a special
 | 
			
		||||
name for some specific types of values:
 | 
			
		||||
 | 
			
		||||
> ```
 | 
			
		||||
> bool    char    byte
 | 
			
		||||
> short   int     long
 | 
			
		||||
> float   double
 | 
			
		||||
> ```
 | 
			
		||||
 | 
			
		||||
They are referred to as [primitives][java_primitive]. And relative to the other bits of Java,
 | 
			
		||||
they have two unique features. First, they don't have to worry about the
 | 
			
		||||
[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions);
 | 
			
		||||
primitives in Java can never be `null`. Second: *they can't have instance methods*.
 | 
			
		||||
Remember that Rust program from earlier? Java has no idea what to do with it:
 | 
			
		||||
 | 
			
		||||
```java
 | 
			
		||||
class Main {
 | 
			
		||||
    public static void main(String[] args) {
 | 
			
		||||
        int x = 8;
 | 
			
		||||
        System.out.println(x.toString()); // Triggers a compiler error
 | 
			
		||||
    }
 | 
			
		||||
}
 | 
			
		||||
````
 | 
			
		||||
 | 
			
		||||
The error is:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
Main.java:5: error: int cannot be dereferenced
 | 
			
		||||
        System.out.println(x.toString());
 | 
			
		||||
                            ^
 | 
			
		||||
1 error
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Specifically, Java's [`Object`](https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html)
 | 
			
		||||
and things that inherit from it are pointers under the hood, and we have to dereference them before
 | 
			
		||||
the fields and methods they define can be used. In contrast, _primitive types are just values_ -
 | 
			
		||||
there's nothing to be dereferenced. In memory, they're just a sequence of bits.
 | 
			
		||||
 | 
			
		||||
If we really want, we can turn the `int` into an
 | 
			
		||||
[`Integer`](https://docs.oracle.com/javase/10/docs/api/java/lang/Integer.html) and then dereference
 | 
			
		||||
it, but it's a bit wasteful:
 | 
			
		||||
 | 
			
		||||
```java
 | 
			
		||||
class Main {
 | 
			
		||||
    public static void main(String[] args) {
 | 
			
		||||
        int x = 8;
 | 
			
		||||
        Integer y = Integer.valueOf(x);
 | 
			
		||||
        System.out.println(y.toString());
 | 
			
		||||
    }
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
This creates the variable `y` of type `Integer` (which inherits `Object`), and at run time we
 | 
			
		||||
dereference `y` to locate the `toString()` function and call it. Rust obviously handles things a bit
 | 
			
		||||
differently, but we have to dig into the low-level details to see it in action.
 | 
			
		||||
 | 
			
		||||
## Low Level Handling of Primitives (C)
 | 
			
		||||
 | 
			
		||||
We first need to build a foundation for reading and understanding the assembly code the final answer
 | 
			
		||||
requires. Let's begin with showing how the `C` language (and your computer) thinks about "primitive"
 | 
			
		||||
values in memory:
 | 
			
		||||
 | 
			
		||||
```c
 | 
			
		||||
void my_function(int num) {}
 | 
			
		||||
 | 
			
		||||
int main() {
 | 
			
		||||
    int x = 8;
 | 
			
		||||
    my_function(x);
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The [compiler explorer](https://godbolt.org/z/lgNYcc) gives us an easy way of showing off the
 | 
			
		||||
assembly-level code that's generated: <small>whose output has been lightly
 | 
			
		||||
edited</small>
 | 
			
		||||
 | 
			
		||||
```nasm
 | 
			
		||||
main:
 | 
			
		||||
        push    rbp
 | 
			
		||||
        mov     rbp, rsp
 | 
			
		||||
        sub     rsp, 16
 | 
			
		||||
 | 
			
		||||
        ; We assign the value `8` to `x` here
 | 
			
		||||
        mov     DWORD PTR [rbp-4], 8
 | 
			
		||||
 | 
			
		||||
        ; And copy the bits making up `x` to a location
 | 
			
		||||
        ; `my_function` can access (`edi`)
 | 
			
		||||
        mov     eax, DWORD PTR [rbp-4]
 | 
			
		||||
        mov     edi, eax
 | 
			
		||||
 | 
			
		||||
        ; Call `my_function` and give it control
 | 
			
		||||
        call    my_function
 | 
			
		||||
 | 
			
		||||
        mov     eax, 0
 | 
			
		||||
        leave
 | 
			
		||||
        ret
 | 
			
		||||
 | 
			
		||||
my_function:
 | 
			
		||||
        push    rbp
 | 
			
		||||
        mov     rbp, rsp
 | 
			
		||||
 | 
			
		||||
        ; Copy the bits out of the pre-determined location (`edi`)
 | 
			
		||||
        ; to somewhere we can use
 | 
			
		||||
        mov     DWORD PTR [rbp-4], edi
 | 
			
		||||
        nop
 | 
			
		||||
 | 
			
		||||
        pop     rbp
 | 
			
		||||
        ret
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
At a really low level of memory, we're copying bits around using the [`mov`][x86_guide] instruction;
 | 
			
		||||
nothing crazy. But to show how similar Rust is, let's take a look at our program translated from C
 | 
			
		||||
to Rust:
 | 
			
		||||
 | 
			
		||||
```rust
 | 
			
		||||
fn my_function(x: i32) {}
 | 
			
		||||
 | 
			
		||||
fn main() {
 | 
			
		||||
    let x = 8;
 | 
			
		||||
    my_function(x)
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
And the assembly generated when we stick it in the
 | 
			
		||||
[compiler explorer](https://godbolt.org/z/cAlmk0): <small>again, lightly
 | 
			
		||||
edited</small>
 | 
			
		||||
 | 
			
		||||
```nasm
 | 
			
		||||
example::main:
 | 
			
		||||
  push rax
 | 
			
		||||
 | 
			
		||||
  ; Look familiar? We're copying bits to a location for `my_function`
 | 
			
		||||
  ; The compiler just optimizes out holding `x` in memory
 | 
			
		||||
  mov edi, 8
 | 
			
		||||
 | 
			
		||||
  ; Call `my_function` and give it control
 | 
			
		||||
  call example::my_function
 | 
			
		||||
 | 
			
		||||
  pop rax
 | 
			
		||||
  ret
 | 
			
		||||
 | 
			
		||||
example::my_function:
 | 
			
		||||
  sub rsp, 4
 | 
			
		||||
 | 
			
		||||
  ; And copying those bits again, just like in C
 | 
			
		||||
  mov dword ptr [rsp], edi
 | 
			
		||||
 | 
			
		||||
  add rsp, 4
 | 
			
		||||
  ret
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The generated Rust assembly is functionally pretty close to the C assembly: _When working with
 | 
			
		||||
primitives, we're just dealing with bits in memory_.
 | 
			
		||||
 | 
			
		||||
In Java we have to dereference a pointer to call its functions; in Rust, there's no pointer to
 | 
			
		||||
dereference. So what exactly is going on with this `.to_string()` function call?
 | 
			
		||||
 | 
			
		||||
## impl primitive (and Python)
 | 
			
		||||
 | 
			
		||||
Now it's time to <strike>reveal my trap card</strike> show the revelation that tied all this
 | 
			
		||||
together: _Rust has implementations for its primitive types._ That's right, `impl` blocks aren't
 | 
			
		||||
only for `structs` and `traits`, primitives get them too. Don't believe me? Check out
 | 
			
		||||
[u32](https://doc.rust-lang.org/std/primitive.u32.html),
 | 
			
		||||
[f64](https://doc.rust-lang.org/std/primitive.f64.html) and
 | 
			
		||||
[char](https://doc.rust-lang.org/std/primitive.char.html) as examples.
 | 
			
		||||
 | 
			
		||||
But the really interesting bit is how Rust turns those `impl` blocks into assembly. Let's break out
 | 
			
		||||
the [compiler explorer](https://godbolt.org/z/6LBEwq) once again:
 | 
			
		||||
 | 
			
		||||
```rust
 | 
			
		||||
pub fn main() {
 | 
			
		||||
    8.to_string()
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
And the interesting bits in the assembly: <small>heavily trimmed down</small>
 | 
			
		||||
 | 
			
		||||
```nasm
 | 
			
		||||
example::main:
 | 
			
		||||
  sub rsp, 24
 | 
			
		||||
  mov rdi, rsp
 | 
			
		||||
  lea rax, [rip + .Lbyte_str.u]
 | 
			
		||||
  mov rsi, rax
 | 
			
		||||
 | 
			
		||||
  ; Cool stuff right here
 | 
			
		||||
  call <T as alloc::string::ToString>::to_string@PLT
 | 
			
		||||
 | 
			
		||||
  mov rdi, rsp
 | 
			
		||||
  call core::ptr::drop_in_place
 | 
			
		||||
  add rsp, 24
 | 
			
		||||
  ret
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Now, this assembly is a bit more complicated, but here's the big revelation: **we're calling
 | 
			
		||||
`to_string()` as a function that exists all on its own, and giving it the instance of `8`**. Instead
 | 
			
		||||
of thinking of the value 8 as an instance of `u32` and then peeking in to find the location of the
 | 
			
		||||
function we want to call (like Java), we have a function that exists outside of the instance and
 | 
			
		||||
just give that function the value `8`.
 | 
			
		||||
 | 
			
		||||
This is an incredibly technical detail, but the interesting idea I had was this: _if `to_string()`
 | 
			
		||||
is a static function, can I refer to the unbound function and give it an instance?_
 | 
			
		||||
 | 
			
		||||
Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link because I
 | 
			
		||||
seriously love this thing):
 | 
			
		||||
 | 
			
		||||
```rust
 | 
			
		||||
struct MyVal {
 | 
			
		||||
    x: u32
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
impl MyVal {
 | 
			
		||||
    fn to_string(&self) -> String {
 | 
			
		||||
        self.x.to_string()
 | 
			
		||||
    }
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
pub fn main() {
 | 
			
		||||
    let my_val = MyVal { x: 8 };
 | 
			
		||||
 | 
			
		||||
    // THESE ARE THE SAME
 | 
			
		||||
    my_val.to_string();
 | 
			
		||||
    MyVal::to_string(&my_val);
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Rust is totally fine "binding" the function call to the instance, and also as a static.
 | 
			
		||||
 | 
			
		||||
MIND == BLOWN.
 | 
			
		||||
 | 
			
		||||
Python does the same thing where I can both call functions bound to their instances and also call as
 | 
			
		||||
an unbound function where I give it the instance:
 | 
			
		||||
 | 
			
		||||
```python
 | 
			
		||||
class MyClass():
 | 
			
		||||
    x = 24
 | 
			
		||||
 | 
			
		||||
    def my_function(self):
 | 
			
		||||
        print(self.x)
 | 
			
		||||
 | 
			
		||||
m = MyClass()
 | 
			
		||||
 | 
			
		||||
m.my_function()
 | 
			
		||||
MyClass.my_function(m)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
And Python tries to make you _think_ that primitives can have instance methods...
 | 
			
		||||
 | 
			
		||||
```python
 | 
			
		||||
>>> dir(8)
 | 
			
		||||
['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__',
 | 
			
		||||
'__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__',
 | 
			
		||||
...
 | 
			
		||||
'__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__',
 | 
			
		||||
...]
 | 
			
		||||
 | 
			
		||||
>>> # Theoretically `8.__str__()` should exist, but:
 | 
			
		||||
 | 
			
		||||
>>> 8.__str__()
 | 
			
		||||
  File "<stdin>", line 1
 | 
			
		||||
    8.__str__()
 | 
			
		||||
             ^
 | 
			
		||||
SyntaxError: invalid syntax
 | 
			
		||||
 | 
			
		||||
>>> # It will run if we assign it first though:
 | 
			
		||||
>>> x = 8
 | 
			
		||||
>>> x.__str__()
 | 
			
		||||
'8'
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
...but in practice it's a bit complicated.
 | 
			
		||||
 | 
			
		||||
So while Python handles binding instance methods in a way similar to Rust, it's still not able to
 | 
			
		||||
run the example we started with.
 | 
			
		||||
 | 
			
		||||
## Conclusion
 | 
			
		||||
 | 
			
		||||
This was a super-roundabout way of demonstrating it, but the way Rust handles incredibly minor
 | 
			
		||||
details like primitives leads to really cool effects. Primitives are optimized like C in how they
 | 
			
		||||
have a space-efficient memory layout, yet the language still has a lot of features I enjoy in Python
 | 
			
		||||
(like both instance and late binding).
 | 
			
		||||
 | 
			
		||||
And when you put it together, there are areas where Rust does cool things nobody else can; as a
 | 
			
		||||
quirky feature of Rust's type system, `8.to_string()` is actually valid code.
 | 
			
		||||
 | 
			
		||||
Now go forth and fool your friends into thinking you know assembly. This is all I've got.
 | 
			
		||||
 | 
			
		||||
[x86_guide]: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
 | 
			
		||||
[java_primitive]: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
 | 
			
		||||
[rust_scalar]: https://doc.rust-lang.org/book/second-edition/ch03-02-data-types.html#scalar-types
 | 
			
		||||
[rust_primitive]: https://doc.rust-lang.org/book/first-edition/primitive-types.html
 | 
			
		||||
		Reference in New Issue
	
	Block a user