10

Vectors, strings, and slices

 4 years ago
source link: http://smallcultfollowing.com/babysteps/blog/2012/04/23/vectors-strings-and-slices/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Vectors, strings, and slices

Apr 23, 2012

We’ve been discussing a lot about how to manage vectors and strings in Rust. Graydon sent out an excellent proposal which allows for a great number of use cases to be elegant handled. However, I find the syntax somewhat misleading. I’ve proposed one alternative on the mailing list, but I now find I don’t like it, so I thought I’d brainstorm a bit and try to find something better.

There are really three use cases:

  • vectors and strings, which are either allocated on the task heap (@) or exchange heap (~);
  • slices, which are a cheap, stack-bound way to represent subvecs and substrings;
  • fixed-length vectors, which are mainly for C compatibility.

In this post I’m going to focus on the first two cases. The last use case (fixed-length vectors) is, I think, quite distinct from the first two, and we should separate it out. I have also omitted one use case which Graydon’s proposal included: fixed-length strings. I don’t think that the type str/10 is of much use, as it refers to a by-value string that is always 10 characters long. This is not like a fixed-length buffer that can hold strings up to 10 characters long. Rather, as I understand it, it can only store strings of exactly 10 characters. How often are we likely to want that?

Representation

The representation of a vector or string is something like:

struct rust_vec<T> {
    int fill;    // How many bytes are used
    int alloc;   // How many bytes are allocated
    T   data[0]; // Inline data
};

Note in particular that this structure does not have a fixed size. Rather, it will vary depending on how many items are present. This indirection is efficient but causes us a bit of trouble.

The representation of a slice which Graydon proposed is something like this:

struct slice<T> {
    T *data;
    int length;
};

Basically, the pair of a pointer and a length of memory.

As an aside, Fixed-length vectors, have yet a third representation: T[N]. In other words, just a C-like vector. So you can see that slices, “vectors as a whole”, and fixed-length vectors are quite different things to the compiler.

Proposal the first

One idea might be something like this:

Proposed   Graydon   Representation
[T]        [T]       slice<T>
vec<T>               rust_vec<T>
@vec<T>    [T]/@     rust_box<rust_vec<T>>*
~vec<T>    [T]/~     rust_vec<T>*

substr     str       slice<char>
str                  rust_vec<char>
@str       str/@     rust_box<rust_vec<char>>*
~str       str/~     rust_vec<char>*

The literal forms would basically stay the same as they are today. So [x1, x2] has the type vec<T> and "foo" has the type str.

I have intentionally drawn a big distinction between the type of a slice ([T]) and the type of a vector (vec<T>). I think these things are similar but different and people might be easily confused if the notation is too similar.

There are two types here that cannot be expressed in the original system: vec<T> and str. There is a good reason that these types are inexpressible: they do not have a fixed size. So, allowing them as types introduces a certain danger. For example, a function like the following could not be compiled:

 fn foo(x: @vec<T>) {
     let y = *x;
     ...
 }

After all, the size of the stack frame could not be correctly calculated, it would depend on how much data was in x. It is particularly annoying to deal with this situation due to the possibility of writing generic functions like

 fn gen_foo<U>(x: @U) {
     let y = *x;
     ...
 }

There are two solutions to this, which are really the same solution in different guises. The simplest solution is to say that type variables cannot be bound to the types vec<T> and str. This would prevent us from calling gen_foo() with a vector or a string. We’d also have some kind of special treatment around assignments so that let x = [1, 2, 3] ends up with a slice, I guess. Have to think a bit about that.

Alternatively, one could have a bound that indicates data of a known size. This kind would be required to manipulate instances of T by value. But this could rapidly become annoying. You might prefer to have the default be that types do have a known type and you have to say when the type variable might not. This is also ok although generally type bounds enable operators, not disable them.

Both solutions are somewhat annoying and I know that Graydon was trying to avoid them in his design.

Proposal the second

If we wanted to avoid the possibility of types whose size is not known, then we have to take a different tack. We can’t have the @ be a prefix anymore. I’d still rather it come near the front of the type, and not tacked on the end. So far, my preferred notation for this is something like the following:

Proposed   Graydon   Representation
[]T        [T]       slice<T>
[@]T       [T]/@     rust_box<rust_vec<T>>*
[~]T       [T]/~     rust_vec<T>*

""         str       slice<char>
"@"        str/@     rust_box<rust_vec<char>>*
"~"        str/~     rust_vec<char>*

Yes, that’s right, I just proposed using "" as the way to write the type for strings. Pretty wacky, I know. But it seems like we need a type name that has two parts, a begin and an end, so that we can stick the @ and ~ inside of them. An alternative might be to use more words (str vs tstr vs ustr or something).

The literal forms for vectors could be something like [@|...] and [~|...]. I don’t know about strings.

What about fixed-length vectors?

I don’t know. We could do [N]T, but then it kind of looks like a slice. I personally lean towards something like T * N. We also need an expression form. To be honest, I don’t care about this that much, it seems like macros could solve it too.

Summary…

None of these ideas seem perfect. I’m mostly tossing them out there to ensure we keep talking about it.

p.s., I just realized as I read over the post that I forgot about mutability. Something like [mut T] cannot be written as vec<mut T>, at least not today. Sigh. Well, I’ll post this blog post anyway. As I said, nothing here is perfect, just wanted to capture some of the things I’ve been thinking about.


Recommend

  • 6
    • smallcultfollowing.com 4 years ago
    • Cache

    Vectors, slices, and functions, oh my!

    Vectors, slices, and functions, oh my! May 14, 2012 I wanted to bring together the various ideas around vectors and function types into one post. The goals of these changes are to achieve orthogon...

  • 1
    • parsiya.net 4 years ago
    • Cache

    Go Slices and Their Oddities

    May 17, 2020 - 10 minute read - Comments - GoGo Slices and Their OdditiesA friend p...

  • 15

    Get slices of keys and values from a map · YourBasic GoGet slices of keys and values from a map yourbasic.org/golang You can use a range statement to extract slices of keys and values from a 

  • 11
    • yourbasic.org 4 years ago
    • Cache

    Make slices, maps and channels

    Make slices, maps and channels yourbasic.org/golang Slices, maps and

  • 9
    • www.devdungeon.com 4 years ago
    • Cache

    2D Arrays and Slices in Go

    2D Arrays and Slices in Go Submitted by NanoDano on Sun, 07/26/2015 - 22:47

  • 8

    Using Vectors in ActionScript 3 and Flash Player 10 Tuesday, August 19, 2008 One of the new ActionScript features included in the Flash Player 10 Public Beta is...

  • 23
    • www.svgrepo.com 4 years ago
    • Cache

    SVG Repo - Free SVG Vectors and Icons

    Browse 300.000+ SVG Vectors and IconsExplore, search and find the best fitting icons or vectors for your projects using wide variety vector library. Download free SVG Vectors for commercial use.Easy To Search...

  • 7

    Resources Floral Backgrounds, Patterns, and Vectors for Your Designs ByBrooke Arnold PublishedFebruary 25, 2022February 25, 2022...

  • 12

    Column Major and Row Major Vectors and Matrices (for games) Dec 26, 2022 This is yet another post about understanding the difference between column major and row major...

  • 4
    • www.sandordargo.com 2 years ago
    • Cache

    Vectors and unique pointers

    Vectors and unique pointers Sandor Dargo 8 hours ago2023-04-12T00:00:00+02:00In this post, I want to share some struggles I had twice during the last few months. For one of my examples, I wanted to initializ...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK