14

Rust Lang in a Nutshell: 1 Introduction

 4 years ago
source link: https://www.softax.pl/blog/rust-lang-in-a-nutshell-1-introduction/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

This is the first part of a mini-series of articles addressed to all who want to get familiar with Rust programming language. No previous background in Rust is expected, but understanding of basic concepts of programming languages may be of help. The articles try to skip trivia not to bore the reader, but have to bring them up occasionally for clarification purposes.

The articles will introduce core Rust concepts, programming techniques as well as examine a couple of pet projects implemented not only for illustrative purposes, but as standalone problems which should give the reader fun to play with.

About Rust

Rust is a fairly young invention. It was born as a Graydon Hoare's spare time hobby project back in 2006. It grew out of personal interest in compiler design and programming language techniques. The project continued for the next three years and matured enough to show a new perspective of building safe concurrent software. In 2009 Mozilla - Hoar's employer at the time, took interest in the new language and began sponsoring the project. In 2010 at Mozilla Annual Summit Rust was presented for the very first time to a broader audience. The project’s first stable release was announced in 2015.

Rust is open-source and backed with strong community of enthusiasts and as of now there are hundreds of companies using Rust in production, among them Google, Microsoft, Amazon, Dropbox.

The language was created with simplicity, efficiency and highly competitive environments in mind. The source code is compiled to native code with help of LLVM and due to that it’s available on all LLVM supported platforms. It should be usable in all the cases that previously meant almost exclusive usage of C or C++ and its performance is comparable to C++.

Rust is a general purpose system programming language with its direct control of memory layout, and allocation as well as access to underlying hardware and ease to interface with C libraries.

Rust shines with its unique composition of low level C like features and fairly new high level programming concepts which can give a skilled programmer huge advantage in comparison to prevalent technologies.

All that is guided by core Rust principles which are zero cost abstractions , which are the idea that the compiler should take almost the whole burden of abstractions used in the code and produce a highly optimized code with no additional overhead compared to careful manual implementation written with the lower level primitives. This means that we should not pay extra cost for using high level primitives. On the contrary, examples like mandatory garbage collector, forced memory layouts, required indirect memory access or a virtual method dispatching – are all non-zero cost abstractions.

What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better. – Bjarne Stroustrup

In this article we will focus especially on Ownership – which is one of the most unique Rust features, which together with some other language traits like rules of borrowing - guarantees memory safety, and prevents data races in concurrent environments.

Then we will introduce syntax of basic language objects like functions, data-types, structs and methods.

Fasten your seatbelts and let’s begin.

Variables

In Rust variables are immutable by default which means you cannot change them after assignment. You also can not use an uninitialized variable. To declare a variable use let keyword. You can also add type annotation after colon :

let x = 3;
let y : f64 = 9.0;

All variables are required to have a known type at the compilation time because Rust is a statically typed language. But the compiler is smart enough and in case of no type annotations tries to figure out types for us from variable assignment.

Keyword let can be used also in pattern matching statements.

let (a, b) = ("foo", "bar"); // now you can use a and b independently

There are good reasons for use of immutable variables. You can pass them around your code with guarantee that nobody in any unknown to you module of your code will change their value. No one can write to any element of your array, no one can extend or shrink your vectors or invalidate iterators. What’s more, you can safely use them in concurrent code, exchanging them between threads and there will be no race conditions during access.

But there are also some trade offs of immutability. If you have only immutable variables at hand, then you have to copy resources over and over again in cases when you could modify them in place. It’s immediately striking when working with strings and arrays. This is precisely why you can make variables mutable in Rust but you have to be explicit in your intentions. In that case add mut to let declaration. Mutability allows you to change value, but you can not change the type of variable.

let mut x = 3;
x = 7; // now it's ok to change the value

All variable bindings have a scope and live in blocks enclosed by braces {} . When they leave the scope they are destroyed. Variables can also be also shadowed in inner scopes.

Note 1: Rust uses println! macro to print line on standard output. Macro can format parameters in a similar way to well known C printf function. You can write:

let x = 3; println!("My x is {}", x);

Macro accepts formatting string and variable number of parameters. Variable of types accepted by println! can be referenced in formatting string by {}.

Ownership

Ownership is one of the most important concepts in Rust.

It is assumed that every value is represented by a variable binding (you may say name used in code) which is called its owner and lives in a specific scope. The owner is assigned for the first time when value is created, and the value will be destroyed when the owner goes out of scope.

Ownership can be also moved to another variable binding, then you have a new owner but you cannot use original binding any more.

This is adaptation of RAII (Resource Acquisition Is Initialization) technique as very basic language feature. Acquired resource is bound to the owner and whenever the owner goes out of scope, underlined resource is destroyed.

Let’s try some practical examples for better understanding.

let s1 = "Hello world!".to_string();
let s2 = s1;
println!("{}", s1);

This code fails. Ownership of string “Hello world!” was moved from s1 to s2 and therefore binding s1 is invalidated and inaccessible any more.

error[E0382]: borrow of moved value: `s1`
  --> src/main.rs:10:20
   |
1  |     let s1 = "Hello world!".to_string();
   |         -- move occurs because `s1` has type `std::string::String`, which does not implement the `Copy` trait
2  |     let s2 = s1;
   |              -- value moved here
3  |     println!("{}", s1);
   |                    ^^ value borrowed here after move

The majority of programming languages does not enforce single resource owner.  Assignments like s2=s1 are common and lead to resource copy construction or aliasing. Both s2 and s1 are still accessible and without a deep dive into the code we can easily lose track of what exactly has happened to original resource.

Both copying and aliasing of resource can be unwanted in various scenarios. Copying resources may be expensive, and aliasing could lead to data races in concurrent environment. Rust allows for both, but again forces you to be explicit in your intentions.

Copying/Cloning resources

Resource ownership rules by default treat resource represented by a variable binding as “untouchable”. When we assign one binding to another, only this “abstract” ownership is transferred but underlain resource is not affected in any way.

For cheap resources like primitive numeric types, or small data structures it may be useful to allow copying data which results in no ownership transfer. Data is simply duplicated by copying bits in memory.

To achieve that Rust allows to tag data types with Copy trait – in this case you say data type implements Copy trait. Copy is only a marker used by the compiler and does not require type to implement any special methods. Behaviour of Copy is not overloadable – it always means bitwise copying of data allocated on stack.

In Rust Copy trait is implemented by many types:

  • all primitive types: bool, char, i8, i16, i32, i64, u8, u16, u32, u64, usize, f32, f64,
  • arrays of all sizes if item type also implements Copy,
  • tuple types if each component type implements Copy,

and many more. Therefore you can write:

let s1 ="hello world";
let s2 = s1; // copied – x1 is still valid
let x1 = 2;
let x2 = x1; // copied – x1 is still valid 

println!("{} for the {}nd time!", s1, x1);

You can also tag your own types with Copy trait, allowing them to be copied instead of being moved with ownership change.

You can think of types implementing Copy trait as lightweight stack only structures, while strict ownership rules are important in case of heavy heap allocated resources.

There is one more piece of the puzzle: we may have heavy resource but we want to copy it, or maybe our resource has to be copied with special procedure requiring additional code to be executed. For such cases Rust reserves another trait Clone . Clone needs to be implemented for custom types and require function clone() to be provided.

When type is equipped with function clone() we have to use it explicitly in code. Rust will not take care of cloning any objects for us, which seems reasonable because clone is expected to be an expensive operation.

Borrowing

Ownership and move/copy semantics have to deal with resource lifetime and notion of uniqueness or lack thereof. They care of:

  • resource acquisition and destruction
  • ways resource can be shared with others – if it can be duplicated, or has to be unique and therefore can only be moved to the next owner.

But when we are certain that resource is available to use, we don’t need to bother with ownership at all. We just need a way to access data somebody else is responsible for. And this is where borrowing comes into play.

Borrowing is simply obtaining a reference to a data. It is indicated by & and comes in two flavours:

mut
let s1 = "Hello world!".to_string();
let s2 = &s1;
own
println!("{}", s1);
println!("{}", s2);

Passing reference to s1 gives access to data via s2 binding and does not invalidate s1. Moreover if s2 goes out of scope string is not destroyed because it’s owned by s1. Naturally, if s1 went out of scope string would be destroyed.

When it comes to mutable references, they are possible only with mutable data. For example:

let mut s1 = "Hello word!".to_string();
let s2 = &mut s1; //mutable reference

s2.push_str(" That's all!"); //modification
println!("{}", s1);

In this example s1 is a mutable binding and is modified via mutable borrow s2. On terminal we get:

Hello word! That's all!

There are some rules that borrows obey in Rust:

  • any borrow must not outlive the owner of a resource. In particular borrows must belong to no wider scope than  the owner.
  • you may have mutable or immutable borrows, but not both kinds at the same time

Moreover:

  • you may have one or more immutable borrows of the same resource at any time
  • you can have only one mutable borrow at a time

In this way Rust prevents data races because you can have many readers and no writer or sole writer to a resource in your code. All these rules are enforced by Rust at a compile time.

Lifetimes

There is an extremely important consequence of Rust ownership/borrowing system – that is to say: lifetimes. Because of separation of concerns between ownership of resource and referring to a borrowed resource, it is sometimes necessary to help the compiler determine how long references live.

We will deal with lifetimes annotations later, but for now just look at an example:

fn test(x: &u8, y: &u8) -> &u8 {
    x
}

Without inspecting the code of this function (which is extremely simple, but assume for a moment that it’s unknown) we cannot easily determine which reference will be passed as a result – x or y.

The above code compilation fails with error:

error[E0106]: missing lifetime specifier
  --> src/main.rs:1:28
   |
1  | fn test(x: &u8, y: &u8) -> &u8 {
   |                            ^ expected lifetime parameter
   |
   = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `x` or `y`

Because the compiler checks all borrows against the owner of the actual resource, you must give explicit lifetime hints in the function signature.

fn test(x: &’a u8, y: &’b u8) -> &’a u8 {
    x
}

Which means that result lifetime is the same as argument x.

What is it all for

Ownership model with some additional language features and compiler checks prevent many common programming errors, in particular:

  • use of dangling pointers or freeing resource more than once (double free),
  • buffer overflows: random access into a buffer is bounds checked.
  • null pointer dereferencing – in Rust there are no null pointers, and in cases of lack of value we use Option type.

The idea here is the compiler should track who is responsible for a memory, not the programmer.

Functions

Named functions are declared with keyword fn followed by the name and parentheses. Arguments of a named function must be type annotated, as well as return type, if there is one.

fn test (x: u32){}
fn add(x: f64, y:f64) →f64 {}

fn own_and_forget(v: Vec<String>) {}
fn print(v: &Vec<String>) {}
fn change(v: &mut Vec<String>) {}

fn main() {}
fn main() -> Result<(), std::io::Error> {}

Function main has a special role as it is an entry point to a Rust program. As you can see, it can have different signatures.

Arguments are passed to a function according to all discussed earlier ownership/borrowing rules:

fn own_and_destroy(v: Vec<String>) {}     //takes ownership of v
fn inspect(v: &Vec<String>) {}            //references v to peek into
fn change_inplace(v: &mut Vec<String>) {} //references to mutate

In case of the above own-and-destroy function, vector is owned when bound to v, and when function ends vector goes out of scope and therefore is dropped.

let v : Vec<String> = Vec::new(); // create new vector
own_and_destroy(v);
// here we cannot use v anymore!

In Rust functions return exactly one value (although it can be a complex type or tuple/array). From function body value can by returned as the last line of the code (without semicolon at the end) or with use of return keyword for early returns; `return` works also in the last line, but it is considered poor style.

fn foo(x: i32) -> i32 {
    if x > 0 {
        return x;
    }
    x + 1
}

fn baz(x: i32) -> i32 {
    return x + 1; //poor style
}

However, even for early returns, we do not need to use return. All code branches can return, just don’t put ; at the end of the last line in code branch:

fn foo(x: i32) -> i32 {
    if x > 0 {
        x
    }
    x + 1
}

Furthermore, we can use function pointers and execute stored function references later.

fn add(x: f64, y:f64) →f64 {}
…
let f = add;
let res = f(5,7); 

let f2 : fn(f64, f64)→ f64 = add; //with type declaration

Note 1 : Rust encourages you to use snake case convention for function names.

Note 2 : As long as a function is defined in a source file, it can be used before its declaration.

Note 3 : Functions are used also as generics, inside traits implementation, and can have lifetime annotations. All these topics will be covered later.

Data types

Rust is a statically typed language and therefore data types must be known at the compile time. However, the compiler is smart enough to figure out types for programmer in many cases, so there is no need to annotate everything with types.

In Rust there are number of build-in data types:

  • primitive types : bool, char, i8, i16, i32, i64, u8, u16, u32, u64, usize, f32, f64, …
  • arrays – declared with [T;N] – fixed length arrays of N elements of type T.
  • tuples – heterogeneous sequences of elements (S, T,…)
  • slices - are data types related to collections. Slices are of type &[T] or &mut [T] and are dynamically sized views into a sequence of elements of the underlying collection. Note that a slice does not own its elements, they are always borrowed from original collection. What’s more you can implement your own slices into indexable data types (using SliceIndex trait).
  • str – string slice in two variants &str , &mut str.  Slices are of type &[u8] and always point to a valid UTF-8 sequence

Note 1: When you declare an array in your code simply by using [T;N] notation, array is allocated on stack.

Note 2: Standard library provides re-sizable arrays allocated on heap – std::vec::Vec type. Their size is not known at the compile time, but we have a guarantee that underlying memory layout is contiguous. Vector internally is represented by three parameters:

  • pointer to the data
  • length
  • capacity

Vector can be filled with new data till capacity is not achieved, in which case all the data is reallocated with larger capacity. This type is provided by the standard library (lays within std namespace), but it is not a primitive type.

Note 3: Base type to operate on text is String, which is stored as a vector of bytes (Vec<u8>) with additional guarantee to always be a valid UTF-8 sequence. String is not null terminated and does not have to be, because length tracking, as well as all reallocations are taken care of by underlying Vec type.

Note 4 : &str slice can represent string literals.

let hello: &'static str = "Hello, world!";

Literals have 'static lifetime because they are stored directly in the final binary, and therefore are valid for the 'static duration

&str slices can also be used to view into a String

let s = String::from("hello world");
let hello = &s[0..5];

Advanced data types are built as combination of built-in types, and are known as struct s.

Structs

You can combine built-in data types into complex formations using structs. Structs are similar to tuples, except that each struct element has its own name. There are two types of structs in Rust

  • tuple structs – those are the same as tuples, but have a name, most often used to represent fixed known in advance structures, such as colour.
  • structs – like “ C ” structures. Fields if present have to have a name and a type. If fields are of a borrowed type (references), then they must also have lifetime annotations. Unit struct is a special case of struct with no fields.

To define structure, you use struct keyword. Elementary examples looks like this:

// A tuple struct
struct Color(u8, u8, u8);

// no fields – it’s unit struct
struct Nil;

struct Foo {
    text: String,
}

Structs may be instantiated, and fields may be accessed by position or name respectively. They may be owned and borrowed just like simple variables are:

let c = Color(255,0,0);
let r = c.0;

//unit struct instantiated
let nil = Nil; 

let foo = Foo{text: "Hello World!".to_string()}; // new instance
let borrow = &foo.text; //borrowed field
let own = foo.text; //now field is owned

Methods

We can use structs in functions just by specifying our complex type in arguments:

fn fuzz(f: Foo) -> Foo
fn fuzz2(mut f: Foo) -> Foo

fn fuzz3(f: &Foo) -> Foo 
fn fuzz4(f: &mut Foo) -> &Foo

As the code base develops, maintaining multiple functions operating on the same structures may require a considerable effort - therefore Rust allows you to group them around structures as methods.

Methods have to be implemented within one (or more) impl blocks.

If method is supposed to access instance data, then its first argument should refer to a structure in one of the ways:

&self
&mut self
[mut] self

Methods can also be static and not refer to any instance i.e. constructors. Let’s have a closer look at constructors:

impl Foo {
    fn new(t: String) -> Self {
        Foo{text: t}
    }

    fn from_nothing() -> Self {
        Foo{text: String::new()}
    }
}

self keyword in here means a current type, in our case: Foo , and does not refer to any instance.

In Rust methods must have unique names, regardless of the signature and because of that you can’t have different constructors with the same name. It is customary to call base constructor new , but all other constructors should follow name convention and begin with from_ or with_ (please look at https://deterministic.space/elegant-apis-in-rust.html for more information).

It’s worth mentioning that you can still use what is called struct literal syntax and instantiate structures with:

let foo = Foo{text: "Hello World!".to_string()};

which may be considered constructor bypassing and lead to unwanted consequences. For this reason, Rust offers visibility structure and fields modificators (covered later). To use literal syntax for structures you have to have access to all fields of the structure, which is often forbidden and therefore you cannot bypass constructor. This is precisely the case why you cannot instantiate Vec with internal, but you have to use Vec::new() (or vec!() - macro for single step initialization and filling with data.

Let’s add more methods to our Foo type:

impl Foo {
    fn into(self) -> String {
        self.text
    }

    fn sign(mut self) -> Self {
        self.text.push_str("[signature]");
        self
    }
}

These use owned versions of self and therefore consume original object. After invocation of foo.into() , we have taken ownership of foo instance from which internal String will be returned, but instance itself falls out of a scope at the end of a method and is no longer usable in code.

In case of sign method, we consumed original instance as modifiable, signed it content and then returned is as instance of Foo, but all bindings of foo used till now are invalidated:

let foo = Foo::new("foo".to_string());
let foo2 = foo.sign();
    
println!("{}", foo.text); // this won’t work!

The above code does not compile as foo was moved while executing foo.sign(), and finally ended up as foo2. Methods that consume self let you add behaviours of your data type that invalidates all current bindings and compiler will check it for you!

Methods can also borrow instance ( &[mut] self ), modify its contents and possibly return some data.

impl Foo {
    fn as_text(&self) -> &str {
        &self.text
    }
    
    fn is_empty(&self) -> bool {
        self.text.is_empty()
    }   

    fn add(&mut self, t: &String){
        self.text.push_str(t)
    }
}

Let’s now think of some other data types on which we have to check emptiness - in other words, we want to have type interface which allows us to call fn is_empty(&self) -> bool .

This need leads us to another core Rust concept which are Traits. But about this in the next article, so stay tuned.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK