mirror of
				https://github.com/bspeice/speice.io
				synced 2025-10-31 17:40:28 -04:00 
			
		
		
		
	Almost-final-draft of primitives post
This commit is contained in:
		| @ -1,12 +1,13 @@ | |||||||
| --- | --- | ||||||
| layout: post | layout: post | ||||||
| title: "Rust's primitives are Weird (and cool)" | title: "Rust's Primitives are Weird (and Cool)" | ||||||
| description: "but mostly weird." | description: "but mostly weird." | ||||||
| category:  | category:  | ||||||
| tags: [rust, c, java, python, x86] | tags: [rust, c, java, python, x86] | ||||||
| --- | --- | ||||||
| 
 | 
 | ||||||
| I wrote a really small Rust program a while back that I was 100% convinced couldn't possibly run: | I wrote a really small Rust program a while back because I was curious. I was 100% convinced it | ||||||
|  | couldn't possibly run: | ||||||
| 
 | 
 | ||||||
| ```rust | ```rust | ||||||
| fn main() { | fn main() { | ||||||
| @ -14,7 +15,7 @@ fn main() { | |||||||
| } | } | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| And to my complete befuddlement, it compiled, it ran, and it produced a completely sensible output. | And to my complete befuddlement, it compiled, ran, and produced a completely sensible output. | ||||||
| The reason I was so surprised has to do with how Rust treats a special category of things | The reason I was so surprised has to do with how Rust treats a special category of things | ||||||
| I'm going to call *primitives*. In the current version of the Rust book, you'll see them | I'm going to call *primitives*. In the current version of the Rust book, you'll see them | ||||||
| referred to as [scalars](rust_scalar), and in older versions they'll be called [primitives](rust_primitive). | referred to as [scalars](rust_scalar), and in older versions they'll be called [primitives](rust_primitive). | ||||||
| @ -23,24 +24,13 @@ why this program is so cool requires talking about a number of other programming | |||||||
| and keeping a consistent terminology makes things easier. | and keeping a consistent terminology makes things easier. | ||||||
| 
 | 
 | ||||||
| **You've been warned:** this is going to be a tedious post about a relatively minor issue that involves | **You've been warned:** this is going to be a tedious post about a relatively minor issue that involves | ||||||
| a quick jaunt all the way through Java, Python, C, and x86 Assembly, but demonstrates a really cool | Java, Python, C, and x86 Assembly. And also me pretending like I know what I'm talking about with assembly. | ||||||
| way that Rust thinks differently about the world. |  | ||||||
| 
 |  | ||||||
| But because I'm not a monster, here's someone else who's just as excited as you are to learn about |  | ||||||
| primitives: |  | ||||||
| 
 |  | ||||||
|  |  | ||||||
| > [Unreasonably excited doggo][excited_doggo] |  | ||||||
| 
 | 
 | ||||||
| # Defining primitives (Java) | # Defining primitives (Java) | ||||||
| 
 | 
 | ||||||
| My day job is in Java. I'm continually amazed by how much of the world runs on Java, | The reason I'm using the name *primitive* comes from how much of my life is Java right now. | ||||||
| and somehow manages to continue functioning. Like, it can't be that good, because nothing | Spoiler alert: a lot of it. And for the most part I like Java, but I digress. In Java, there's a | ||||||
| in Computer Science functions that well. And yet, Java is maybe one of the few things | special name for some specific types of values: | ||||||
| CS people can high-five and say "you know what, we did a good thing." |  | ||||||
| 
 |  | ||||||
| But that's not what this post is about. In Java, there's a special name for |  | ||||||
| some specific types of values: |  | ||||||
| 
 | 
 | ||||||
| > ``` | > ``` | ||||||
| bool    char    byte | bool    char    byte | ||||||
| @ -72,12 +62,14 @@ Main.java:5: error: int cannot be dereferenced | |||||||
| 1 error | 1 error | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| The reason for this error is that only things inheriting from | Specifically, Java considers [`Object`](https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html) | ||||||
| [`Object`](https://docs.oracle.com/javase/9/docs/api/java/lang/Object.html) | and things that inherit from it as pointers, and thus we have to dereference the pointer | ||||||
| can have instance methods, and the primitive types do not in fact inherit this. | before the fields and methods it defines can be used. In contrast, *primitive types are just values* - | ||||||
|  | there's nothing to be dereferenced. In memory, they're just a sequence of bits. | ||||||
|  | 
 | ||||||
| If we really want, we can turn the `int` into an | If we really want, we can turn the `int` into an | ||||||
| [`Integer`](https://docs.oracle.com/javase/9/docs/api/java/lang/Integer.html) and then | [`Integer`](https://docs.oracle.com/javase/10/docs/api/java/lang/Integer.html) and then | ||||||
| turn that into a `String` and print it, but that seems like a lot of work: | dereference it, but it's a bit wasteful: | ||||||
| 
 | 
 | ||||||
| ```java | ```java | ||||||
| class Main { | class Main { | ||||||
| @ -89,23 +81,15 @@ class Main { | |||||||
| } | } | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| This allows us to create the variable `y` of type `Integer`, and at run time peek into `y` | This creates the variable `y` of type `Integer` (which inherits `Object`), and at run time we | ||||||
| to locate the `toString()` function and call it. | dereference `y` to locate the `toString()` function and call it. Rust obviously handles things a bit | ||||||
| 
 | differently, but we have to look at some low-level details to see how differently it actually is. | ||||||
| So why do we have to jump through the extra hoops for this? The reason is partially that Java |  | ||||||
| treats the primitive values as just a "bag of bits"; there are no functions to call, no references |  | ||||||
| to maintain, it's just a set number of bits to represent a value. If you call a function using |  | ||||||
| `int` or `long` as an argument, internally Java will copy the bits across and your original value |  | ||||||
| can't be modified. |  | ||||||
| 
 |  | ||||||
| And if Rust has a similar "bag of bits" representation for its primitives (spoiler alert: it does), |  | ||||||
| that gives us our first question: how does Rust get away with calling the equivalent of instance methods? |  | ||||||
| 
 | 
 | ||||||
| # Low Level Handling of Primitives (C) | # Low Level Handling of Primitives (C) | ||||||
| 
 | 
 | ||||||
| Now, I still want to show off the "bag of bits" representation of primitives in Rust. But to do that, | We first need to build a foundation for reading and understanding the assembly code the | ||||||
| we have to expose a bit of how your computer thinks about those values. Let's consider the following | final answer involves. Let's begin with showing how the `C` language (and your computer) | ||||||
| code in C: | thinks about "primitive" values in memory: | ||||||
| 
 | 
 | ||||||
| ```c | ```c | ||||||
| void my_function(int num) {} | void my_function(int num) {} | ||||||
| @ -116,21 +100,26 @@ int main() { | |||||||
| } | } | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| And to drive the point home (and pretend like I understand assembly), let's take a look at the result | The [compiler explorer](https://godbolt.org/z/lgNYcc) gives us an easy way of showing off | ||||||
| using the [compiler explorer](https://godbolt.org/z/lgNYcc): <span style="font-size:.6em">whose output has been lightly edited</span> | the assembly-level code that's generated: <span style="font-size:.6em">whose output has been lightly edited</span> | ||||||
| 
 | 
 | ||||||
| ``` | ```nasm | ||||||
| main: | main: | ||||||
|         push    rbp |         push    rbp | ||||||
|         mov     rbp, rsp |         mov     rbp, rsp | ||||||
|         sub     rsp, 16 |         sub     rsp, 16 | ||||||
|  | 
 | ||||||
|         ; We assign the value `8` to `x` here |         ; We assign the value `8` to `x` here | ||||||
|         mov     DWORD PTR [rbp-4], 8 |         mov     DWORD PTR [rbp-4], 8 | ||||||
|  | 
 | ||||||
|         ; And copy the bits making up `x` to a location |         ; And copy the bits making up `x` to a location | ||||||
|         ; `my_function` can access |         ; `my_function` can access (`edi`) | ||||||
|         mov     eax, DWORD PTR [rbp-4] |         mov     eax, DWORD PTR [rbp-4] | ||||||
|         mov     edi, eax |         mov     edi, eax | ||||||
|  | 
 | ||||||
|  |         ; Call `my_function` and give it control | ||||||
|         call    my_function |         call    my_function | ||||||
|  | 
 | ||||||
|         mov     eax, 0 |         mov     eax, 0 | ||||||
|         leave |         leave | ||||||
|         ret |         ret | ||||||
| @ -138,17 +127,18 @@ main: | |||||||
| my_function: | my_function: | ||||||
|         push    rbp |         push    rbp | ||||||
|         mov     rbp, rsp |         mov     rbp, rsp | ||||||
|         ; Copy the bits out of the pre-determined location | 
 | ||||||
|  |         ; Copy the bits out of the pre-determined location (`edi`) | ||||||
|         ; to somewhere we can use |         ; to somewhere we can use | ||||||
|         mov     DWORD PTR [rbp-4], edi |         mov     DWORD PTR [rbp-4], edi | ||||||
|         nop |         nop | ||||||
|  | 
 | ||||||
|         pop     rbp |         pop     rbp | ||||||
|         ret |         ret | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| At a really low level of memory, we're copying bits around; nothing crazy. That's what the `mov` instruction | At a really low level of memory, we're copying bits around using the [`mov`][x86_guide] instruction; nothing crazy. | ||||||
| is intended to do (use [this][x86_guide] as a reference). But to show how similar Rust is, let's take a look at the equivalent | But to show how similar Rust is, let's take a look at our program translated from C to Rust:  | ||||||
| Rust code in the [compiler explorer](https://godbolt.org/z/cAlmk0): <span style="font-size:.6em">again, lightly edited</span> |  | ||||||
| 
 | 
 | ||||||
| ```rust | ```rust | ||||||
| fn my_function(x: i32) {} | fn my_function(x: i32) {} | ||||||
| @ -159,38 +149,48 @@ fn main() { | |||||||
| } | } | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| ``` | And the assembly generated when we stick it in the [compiler explorer](https://godbolt.org/z/cAlmk0): | ||||||
|  | <span style="font-size:.6em">again, lightly edited</span> | ||||||
|  | 
 | ||||||
|  | ```nasm | ||||||
| example::main: | example::main: | ||||||
|   push rax |   push rax | ||||||
|  | 
 | ||||||
|   ; Look familiar? We're copying bits to a location for `my_function` |   ; Look familiar? We're copying bits to a location for `my_function` | ||||||
|   ; The compiler just optimizes out holding `x` in memory |   ; The compiler just optimizes out holding `x` in memory | ||||||
|   mov edi, 8 |   mov edi, 8 | ||||||
|  | 
 | ||||||
|  |   ; Call `my_function` and give it control | ||||||
|   call example::my_function |   call example::my_function | ||||||
|  | 
 | ||||||
|   pop rax |   pop rax | ||||||
|   ret |   ret | ||||||
| 
 | 
 | ||||||
| example::my_function: | example::my_function: | ||||||
|   sub rsp, 4 |   sub rsp, 4 | ||||||
|  | 
 | ||||||
|   ; And copying those bits again, just like in C |   ; And copying those bits again, just like in C | ||||||
|   mov dword ptr [rsp], edi |   mov dword ptr [rsp], edi | ||||||
|  | 
 | ||||||
|   add rsp, 4 |   add rsp, 4 | ||||||
|   ret |   ret | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| The generated Rust looks almost identical to C, and is the same as how Java thinks of primitives: just bits in memory. | The generated Rust assembly is functionally pretty close to the C assembly (and Java as well): | ||||||
|  | *When working with primitives, we're just dealing with bits in memory*.  | ||||||
| 
 | 
 | ||||||
| And now that we're a bit more familiar with the low-level representation of primitives, it's time to answer: | In Java we have to dereference a pointer to call its functions; in Rust, there's no pointer to dereference. So what | ||||||
| how exactly does Rust manage to compile `8.to_string()`? | exactly is going on with this `.to_string()` function call? | ||||||
| 
 | 
 | ||||||
| # impl primitive (and Python) | # impl primitive (and Python) | ||||||
| 
 | 
 | ||||||
| Now it's time to reveal my <strike>trap card</strike> <strike>dirty secret</strike> revelation: *Rust has | Now it's time to <strike>reveal my trap card</strike> show the revelation that tied all this together: *Rust has | ||||||
| implementations for its primitive types.* That's right, `impl` blocks aren't only for `structs` and `traits`, | implementations for its primitive types.* That's right, `impl` blocks aren't only for `structs` and `traits`, | ||||||
| primitives get them too. Don't believe me? Check out [u32](https://doc.rust-lang.org/std/primitive.u32.html), | primitives get them too. Don't believe me? Check out [u32](https://doc.rust-lang.org/std/primitive.u32.html), | ||||||
| [f64](https://doc.rust-lang.org/std/primitive.f64.html) and [char](https://doc.rust-lang.org/std/primitive.char.html) | [f64](https://doc.rust-lang.org/std/primitive.f64.html) and [char](https://doc.rust-lang.org/std/primitive.char.html) | ||||||
| as examples. | as examples. | ||||||
| 
 | 
 | ||||||
| But the really interesting bit is how Rust turns the code we started with into assembly. Let's break out the | But the really interesting bit is how Rust turns those `impl` blocks into assembly. Let's break out the | ||||||
| [compiler explorer](https://godbolt.org/z/6LBEwq) once again: | [compiler explorer](https://godbolt.org/z/6LBEwq) once again: | ||||||
| 
 | 
 | ||||||
| ```rust | ```rust | ||||||
| @ -199,31 +199,32 @@ pub fn main() { | |||||||
| } | } | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| And the interesting bits in the assembly: | And the interesting bits in the assembly: <span style="font-size:.6em">heavily trimmed down</span> | ||||||
| 
 | 
 | ||||||
| ``` | ```nasm | ||||||
| example::main: | example::main: | ||||||
|   sub rsp, 24 |   sub rsp, 24 | ||||||
|   mov rdi, rsp |   mov rdi, rsp | ||||||
|   lea rax, [rip + .Lbyte_str.u] |   lea rax, [rip + .Lbyte_str.u] | ||||||
|   mov rsi, rax |   mov rsi, rax | ||||||
|  |    | ||||||
|   ; Bombshell right here |   ; Bombshell right here | ||||||
|   call <T as alloc::string::ToString>::to_string@PLT |   call <T as alloc::string::ToString>::to_string@PLT | ||||||
|  | 
 | ||||||
|   mov rdi, rsp |   mov rdi, rsp | ||||||
|   call core::ptr::drop_in_place |   call core::ptr::drop_in_place | ||||||
|   add rsp, 24 |   add rsp, 24 | ||||||
|   ret |   ret | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| Now, this assembly is far more complicated, but here's the big revelation: **we're calling | Now, this assembly is a bit more complicated, but here's the big revelation: **we're calling | ||||||
| `to_string()` as a function that isn't bound to the instance of `8`**. Instead of thinking | `to_string()` as a function that exists all on its own, and giving it the instance of `8`**. | ||||||
| of the value 8 as an instance of `u32` and then peeking in to find the location of the function | Instead of thinking of the value 8 as an instance of `u32` and then peeking in to find | ||||||
| we want to call, we have a function that exists outside of the instance and just give | the location of the function we want to call (like Java), we have a function that exists | ||||||
| that function the value `8`. | outside of the instance and just give that function the value `8`. | ||||||
| 
 | 
 | ||||||
| This is an incredibly technical detail, but the interesting idea I had was this: | This is an incredibly technical detail, but the interesting idea I had was this: | ||||||
| *if `to_string()` is a static function, can I refer to the unbound function and give | *if `to_string()` is a static function, can I refer to the unbound function and give it an instance?* | ||||||
| it an instance?* |  | ||||||
| 
 | 
 | ||||||
| Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link | Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link | ||||||
| because I seriously love this thing): | because I seriously love this thing): | ||||||
| @ -242,7 +243,7 @@ impl MyVal { | |||||||
| pub fn main() { | pub fn main() { | ||||||
|     let my_val = MyVal { x: 8 }; |     let my_val = MyVal { x: 8 }; | ||||||
| 
 | 
 | ||||||
|     // THESE ARE THE SAME |     // THESE ARE TOTALLY EQUIVALENT | ||||||
|     my_val.to_string(); |     my_val.to_string(); | ||||||
|     MyVal::to_string(&my_val); |     MyVal::to_string(&my_val); | ||||||
| } | } | ||||||
| @ -252,7 +253,7 @@ Rust is totally fine "binding" the function call to the instance, and also as a | |||||||
| 
 | 
 | ||||||
| MIND == BLOWN. | MIND == BLOWN. | ||||||
| 
 | 
 | ||||||
| Python does something equivalent where I can both call functions bound to their instances | Python does the same thing where I can both call functions bound to their instances | ||||||
| and also call as an unbound function where I give it the instance: | and also call as an unbound function where I give it the instance: | ||||||
| 
 | 
 | ||||||
| ```python | ```python | ||||||
| @ -268,9 +269,9 @@ m.my_function() | |||||||
| MyClass.my_function(m) | MyClass.my_function(m) | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| That said, Python still doesn't treat "primitives" as things that can have instance methods: | And Python tries to make you *think* that primitives can have instance methods... | ||||||
| 
 | 
 | ||||||
| ``` | ```python | ||||||
| >>> dir(8) | >>> dir(8) | ||||||
| ['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', | ['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', | ||||||
| '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', | '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', | ||||||
| @ -285,27 +286,31 @@ That said, Python still doesn't treat "primitives" as things that can have insta | |||||||
|     8.__str__() |     8.__str__() | ||||||
|              ^ |              ^ | ||||||
| SyntaxError: invalid syntax | SyntaxError: invalid syntax | ||||||
|  | 
 | ||||||
|  | >>> # It will run if we assign it first though: | ||||||
|  | >>> x = 8 | ||||||
|  | >>> x.__str__() | ||||||
|  | '8' | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
|  | ...but in practice it's a bit complicated. | ||||||
|  | 
 | ||||||
| So while Python handles binding instance methods in a way similar to Rust, it's still not able | So while Python handles binding instance methods in a way similar to Rust, it's still not able | ||||||
| to run the example we started with. | to run the example we started with. | ||||||
| 
 | 
 | ||||||
| # Conclusion | # Conclusion | ||||||
| 
 | 
 | ||||||
| This was a super-roundabout way of demonstrating it, but the way Rust handles incredibly minor details | This was a super-roundabout way of demonstrating it, but the way Rust handles incredibly minor details | ||||||
| like primitives is one of the reasons I enjoy the language. It's optimized like C in how it lays out | like primitives leads to really cool effects. Primitives are optimized like C in how they have a | ||||||
| memory and is efficient ("bag of bits" representation). And it still has a lot of | space-efficient memory layout, yet the language still has a lot of features I enjoy in Python | ||||||
| the nice features I like in Python that make it easy to work with the language (late/static binding). | (like both instance and late binding). | ||||||
| 
 | 
 | ||||||
| And even given that, there are still areas where Rust shines that none of the other languages discussed do; | And when you put it together, there are areas where Rust does cool things nobody else can; | ||||||
| as a kinda quirky feature of Rust's type system, `8.to_string()` is actually valid code. | as a quirky feature of Rust's type system, `8.to_string()` is actually valid code. | ||||||
| 
 | 
 | ||||||
| There aren't too many grand lessons to be learned from this, the behavior I'm talking about is | Now go forth and fool your friends into thinking you know assembly. This is all I've got. | ||||||
| a relatively minor detail in the grand picture. But it's still something I learned where Rust |  | ||||||
| just gets the details right, and I love it. |  | ||||||
| 
 | 
 | ||||||
| [x86_guide]: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html | [x86_guide]: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html | ||||||
| [excited_doggo]: https://flic.kr/p/2jr8Zp |  | ||||||
| [java_primitive]: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html | [java_primitive]: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html | ||||||
| [compiler_explorer]: https://godbolt.org/ | [compiler_explorer]: https://godbolt.org/ | ||||||
| [rust_scalar]: https://doc.rust-lang.org/book/second-edition/ch03-02-data-types.html#scalar-types | [rust_scalar]: https://doc.rust-lang.org/book/second-edition/ch03-02-data-types.html#scalar-types | ||||||
| @ -93,7 +93,7 @@ a { | |||||||
|  |  | ||||||
|     position: relative; |     position: relative; | ||||||
|     display: inline-block; |     display: inline-block; | ||||||
|     padding: 5px 1px; |     padding: 1px 1px; | ||||||
|     transition: color ease 0.3s; |     transition: color ease 0.3s; | ||||||
|      |      | ||||||
|     /* Hover animation effect for all buttons */ |     /* Hover animation effect for all buttons */ | ||||||
| @ -166,10 +166,7 @@ hr { | |||||||
|  |  | ||||||
| pre { overflow: auto; } | pre { overflow: auto; } | ||||||
|  |  | ||||||
| code, pre { |  | ||||||
|  |  | ||||||
| } |  | ||||||
| small { | small { | ||||||
|         color: gray; |     color: gray; | ||||||
| } | } | ||||||
|  |  | ||||||
|  | |||||||
		Reference in New Issue
	
	Block a user