Evolution of Ruby String - from 1.8 to 2.5
source link: https://www.tuicool.com/articles/hit/3iqMFj6
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Evolution of Ruby String - from 1.8 to 2.5
Introduction
In Ruby, a string is represented as an instance of the String
class. This class has highly evolved between Ruby 1.8 and Ruby 2.5.
So, the purpose of this article is to detail the main changes that occurred for each major release.
Feel free to let a comment if you want to share additional informations.
Before to start, feel free to have a look to my latest project HERE
1.8 to 1.9
Let’s have a look to what are the main differences for the String
class between 1.8 and 1.9
The first difference is that the Enumerable
module is included in the String
class in Ruby 1.8 when it’s not included anymore in Ruby 1.9.
The second difference is that a bunch of new instance methods are available for the String
class in Ruby 1.9.
Feel free to read the Ruby Object Model
article if you are unfamiliar with the Object
class and the ancestor chain.
But the most important change remains in the fact that in Ruby 1.8 strings are considered as a sequence of bytes when in Ruby 1.9 strings are considered as a sequence of codepoints.
A sequence of codepoints, coupled to a specific encoding, allows Ruby to handle encodings.
In effect, on disk, a string is stored as a sequence of bytes.
An encoding simply specifies how to take those bytes and convert them into codepoints.
So, from Ruby 1.9, Ruby natively handle string encoding when in 1.8 the iconv library was required to do this job.
Note that the default encoding of each string is Binary
(read as a sequence of bytes).
Note that the iconv
library is deprecated in Ruby 1.9
1.9 to 2.0
In Ruby 2.0, UTF8
is the default encoding of each string of a running program.
In effect, in Ruby 2.0 the default encoding is UTF8
when in 1.9 it was Binary
.
This behavior is a bit similar with Java which uses UTF16
as default encoding.
Note that from Ruby 2.0, the iconv
library is no longer part of the language.
2.0 to 2.1
In Ruby 2.0, encoding a string from an encoding to the same one — UTF8
to UTF8
for example — results in a no-op
Here we can see that in Ruby 2.0, a UTF8
string that we explicitly encode
in UTF8
returns the string without replacing the unknown codepoints. So the invalid: :replace
operation is omitted.
In Ruby 2.1, the invalid: :replace
operation is processed and the default characters �
Replaces each invalid codepoint in the sequence.
2.1 to 2.5
Since Ruby 2.1 and in addition to providing many performance improvements, the String
class added two main features:
The frozen_string_literal: true
magic comment (since Ruby 2.3)
Case conversion for non ASCII strings (since Ruby 2.4)
Benchmark string allocations
The following benchmark is made by using the benchmark-ips
gem
and it produces the following result for each version of Ruby — from the 1.8 to the 2.5
Here we can see that string allocation in ruby 2.5 is about 4 times more efficient than in Ruby 1.8
Voilà!
Thank you for taking the time to read this post :-)
Feel free to :clap: and share this article if it has been useful for you. :rocket:
Here is a link to my last article: 5 Ruby tips you probably don’t know
.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK