4

Everything You Need to Know About Text String Manipulation

 2 years ago
source link: https://hackernoon.com/everything-you-need-to-know-about-text-string-manipulation-3hgv35cs
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Everything You Need to Know About Text String Manipulation

5
heart.pngheart.pngheart.pngheart.png
light.pnglight.pnglight.pnglight.png
boat.pngboat.pngboat.pngboat.png
money.pngmoney.pngmoney.pngmoney.png

@tom2Rutkat

Front End Engineer + Blockchain Advocate

For those new to coding or even experienced coders, this guide details how to manipulate text strings, just like the pros. It is useful if you haven't worked with strings or user-facing web applications. You will quickly go from beginner to expert using javascript, built-in methods, and powerful regular expressions.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Have you wondered how censoring words on the internet occurs? Perhaps you want to know why your username on apps has to conform to specific rules? This is done through string manipulation using code such as javascript. A string is just a specific name used to label a piece of data that contains text and can consist of alphanumeric characters mixed with numbers and symbols.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Why is it important? Every software application with a presentation layer (web app) applies a form of string manipulation, and it is the foundation of algorithms. Think about how it applies to business ideas as well. Grammarly is an excellent example of a business that is all about string manipulation.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Text And Strings

The first thing to consider is how to engage text manipulation from a visual perspective. For example, if you're a non-coder or just a human being, you know you can write text on paper, on your smartphone, computer, and even rice. Okay, maybe not rice. The writing can occur from left-to-right, top-to-bottom, right-handed, left-handed, etc. Afterward, you can manipulate what you wrote with an eraser, scratching it out, or tapping the backspace key.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

From a coder's perspective, it doesn't work the same way, except when writing the actual code. The code instructions for manipulating strings have restrictions and specific methods. You will learn these methods here but let's start with a visual approach to envision how code will do the magical transformations.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Direction

Like writing, strings can be manipulated from left-to-right and right-to-left. The length of a string can be as little as a single space to pages of text, but most commonly in code, a string will not be longer than a sentence. A string can be a username, phone number, a snippet of code, a poem etc. When working with a specific coding language, there are built-in methods to use, or you can create your own custom method. A combination of these methods can manipulate text to do virtually whatever you want. You can become a string master with the force of practice.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Besides processing a string from left-to-right or right-to-left, it can be broken down and manipulated to individual characters using the number representing the position of any character. This is known as the index value of the string. For example, the string "Hello!" contains 6 characters, so your code can directly access any letter by indicating a corresponding index number.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
"Hello!"
 123456 (number represents position)

Traversing

Several coding methods will process the string in this ascending-numerical order however since computers compute with a basis of zero, the first item position is always 0. To be more accurate, I should state that the computer is traversing, not processing strings. The difference is that "processing" indicates an effect happens, whereas "traversing" indicates a passage or travel across something. When dealing with code instructions, you should be conscious about the computing resources utilized so you may not need to process every character in a string but rather traverse to the individual character you need to change.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

For example, your objective is to remove punctuation, so you have several approaches to remove the "!" From "Hello!". You can use a method to find the position of "!" or you can access the last character of the string. These methods include getting the length of the string, getting the index of "!" or traversing the string in reverse. If you use the length method, you have to remember to subtract 1 since computing starts with zero. Also, spaces count as part of the string and will have an index position, thus increasing the length of the string.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

The INDEX number represents the position of a character in a string.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
"Hello!"
 012345 character positions

"Hello!".length - 1
Length is a property of a string.

Here are methods to get the position of a character in a string:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
"Hello!".indexOf("!") 
Find the first position of a character searching from left-to-right.

"Hello!".lastIndexOf("!") 
Find the last position of a character searching from right-to-left.

"Hello!".length - 1
Find the last character in a string.

All give 5 as the result. You can do the opposite with the charAt() method which returns the character from a string specified by the position.

"Hello!".charAt(5)
Result is "!"

One Character

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Now you know the basics of traversing a string one character at a time, which are from the left, from the right, and from the end using index numbers. However, not all methods return the position of the character you seek. You may prefer a result as a boolean data type instead. Meaning your search is a test that returns true or false.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Boolean test methods: includes,

startsWith
,
endsWith
.
0 reactions
heart.png
light.png
money.png
thumbs-down.png
"Hello!".includes("!")
Returns True

"Hello!".startsWidth("!")
Returns False

"Hello!".endsWith("!")
Returns True

These character checks are not as useful as finding the position of a character because you cannot proceed with your algorithm if your purpose is to modify the string with the same search query. Besides, there are more powerful methods for true/false checks, which we will be described later. Up to this point we have learned to traverse a string left-to-right and right-to-left so what's the next step? Modification!

0 reactions
heart.png
light.png
money.png
thumbs-down.png

We can use several built-in methods or create our own for changing the text in a string. Let's start with the methods which don't require indicating a search query or index position. Since humans care more about uppercase and lowercase letters than computers, we can instantly transform an entire string use these two methods:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
"Hello!".toUpperCase()
Result "HELLO!"

"Hello!".toLowerCase()
Result "hello!"

If you have seen a camel, then you know they have humps, and in programming, when code

LooksLikeThis
- it is called the camel case. This is because it has humps and no spaces. You will have to traverse and recognize this type someday. We do this to make the text easier to read for humans because who likes to read "a sEnTEnCe liKE ThiS!?" Actually, this method is also useful for web apps like blogs which take an article title and create a URL known as a slug.
0 reactions
heart.png
light.png
money.png
thumbs-down.png
Example:
Article name "Mastering String Manipulation"
Slug url "domain.com/mastering-string-manipulation/"

Since there are multiple methods to get the same result, Let's begin with this example of combining strings into one. This is known as concatenation. You can use the

+
symbol or the concat method. Please note that since Javascript does not automatically enforce data types, so you should ensure that the data types are strings as opposed to arrays or booleans when using +. This topic is for another entire article. With the lack of data type enforcement, the erroneous output can occur as a result of type coercion. Meaning the + sign can accidentally change an integer to a string.
0 reactions
heart.png
light.png
money.png
thumbs-down.png
"Hello" + "World"
Result "HelloWorld"

"Hello".concat("World")
Result "HelloWorld"

"12" + 12
Result "1212", not 24.

The newest way to concatenate strings is using template literals which utilize the back-tick symbol

`
and curly braces
{}
after the
$
symbol. Yes, using those three symbols is required. You will see this in emails as well as websites to customize the writing output based on the user's information.
0 reactions
heart.png
light.png
money.png
thumbs-down.png
var myString = "Hello"
var string2 = "World"
console.log(`${myString} ${string2}`)
Result "Hello World"

Previously I stated that empty spaces count towards the length of a string. In other words, they occupy a space in a string and can be manipulated as well. Since we want to be efficient in saving data as well as making text easy to read, we want to prevent unnecessary blank space and this can be done with the trim method.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

It removes empty spaces at the beginning and end of a string but not in the middle. If you want to remove empty space in the middle of a string, you have to utilize a more powerful method known as a "regular expression," which will be described later.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
"  Hello World.  ".trim()
Result "Hello World."

To do the opposite, there is a method for that. You can pad a string at the end or beginning with any character. Let's say your web app deals with sensitive information like credit cards, or you have ID numbers that have to conform to a specific length. You can use the

padStart
and
padEnd
methods for this. For example, a credit card number is saved in the app, but you only want to show the last four digits prefixed with the * symbol.
0 reactions
heart.png
light.png
money.png
thumbs-down.png
"4444".padStart(8, "*")
Result "********4444"

"1234".padStart(4, "0")
Result "00001234"

Besides concatenating strings, you can also repeat them with a multiplier. It's uncommon to repeat text, so the method will be more useful for symbols such as periods. For example, when you need to truncate a string and indicate to the reader that the string continues, you can use ellipses like this... It could also be useful for songs where lyrics are repeated. Actually, it's rare to see this method in code.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
"Hello-".repeat(3)
Result "Hello-Hello-Hello"

Pizza Slice

Let's expand our character searches!

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Using the previous search methods, we are only able to retrieve one character at a time from a string. What if we want to select a word or a section of a string using an index range. Well, we can do that by slicing a pizza and eating the slice we want. Almost! The string method is called slice, so a pizza slice is a good metaphor. For this, you have to pass in the start and end positions of your search query. The start position can be a negative number that will traverse the string in reverse or from the end of it. You may think, wouldn't it be easier to just match a word inside a string? Well, yes, but in some cases, coders may not be able to predict what strings they will encounter or the string will be a pre-determined length.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
"Hello World".slice(6)
Result "World"

"Hello World".slice(6, 8)
Result "Wo"

"Hello World".slice(-3)
Result "rld"

Up to this point, you have learned to traverse strings from the left and from the right, get character positions, do boolean tests, transform character cases, concatenate strings, remove empty space, pad, repeat strings, and extract substrings. How about we learn how to revise our strings with the replace method. Scenarios for this can be removing explicit words, swapping the first name with the last name, swapping "-" for empty space " ".

0 reactions
heart.png
light.png
money.png
thumbs-down.png

The difference with the replace method compared to the previous methods in this article is that replace accepts strings and regular expressions as search queries. It also accepts a function as a second parameter, but we won't go into custom functions at this time. With replace, you don't need to rely on using index positions but you need to be familiar with regular expressions (regexp for short) because it is how you can replace multiple instances of the search query. Note the usage of a regular expression with the forward slashes surround the search term.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
"Very bad word".replace("bad", "good")
Result "Very good word"

"Very bad bad word".replace("bad", "good")
Result "Very good bad word"

"Very bad bad word".replace("bad", "good")

"Very bad bad word".replace(/bad/, "good")
Result "Very good bad word"

"Very bad bad word".replace(/bad/g, "good")
Result "Very good good word"

Cryptic Patterns

Are you beginning to feel the power of string manipulation? You are slowly becoming an expert. A regexp can be denoted using the forward-slash/outside of the search word and the letter g after the second slash / indicates a global search which will replace multiple instances of the word inside the string. Generally, it's better to use

indexOf()
and
replace()
for faster function execution speed and when searching for one instance of a word.
0 reactions
heart.png
light.png
money.png
thumbs-down.png

Otherwise, to understand regular expressions, you have to memorize the symbols on your keyboard - many symbols, including letter cases. In fact, there's nothing regular about "regular expressions". It should be called "cryptic patterns" because no human being can read them without finding the meaning of the symbols used. To simplify the meaning of human language consumption, you can also say they are string-searching algorithms.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Magic Wand

Before I show you some of the characters used, I would like to paint you a picture of the traversing that happens using regexp. First, imagine a magic wand in your hand. Waving the magic wand releases magical stars onto the string which modify it to the desired string you want. Each star represents a symbol in the regular expression, and that is what you have to come up with as a search pattern.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Regular expressions are truly powerful search techniques. You can find a needle in a haystack instantly. Many input forms on the web use regular expressions to convert text into specific formats such as zip codes, phone numbers, domain names, currency values, and the list can go on. Do note that there are different regular expression engines depending on the programming language, and the following is specific to javascript.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
/term/ 
regexp
always has to be contained inside two forward slashes. "A/B/C" is not a regexp. Every character or symbol between the slashes represents something other than the symbol itself.
0 reactions
heart.png
light.png
money.png
thumbs-down.png
/abc/
Any alphabetical character without symbols is equivalent to a regular consecutive search string.
0 reactions
heart.png
light.png
money.png
thumbs-down.png
/\$/
An explicit search for a symbol has to be prefixed with a backward slash \, in this case, it's the dollar symbol. It's called escaping even though none of them will run away. The symbols still need to escape from the wrath of your cryptic search desires.
0 reactions
heart.png
light.png
money.png
thumbs-down.png
/^abc/
and
/abc$/
These symbols don't have to be escaped. They are the carrot ^ and dollar sign $. Their purpose is to restrict the search to the beginning and end of a string, respectively. This is also known as anchoring so they can be called anchors. In this case, it means if "abc" is in the middle of "xyzabczyx", it will be ignored. ^ means the string must start with "abc" and $ means that the string must end with "abc". You can apply one or both.
0 reactions
heart.png
light.png
money.png
thumbs-down.png

What if you don't want to search for an alphabetical character nor a symbol but a formatting change in the string. Since I mentioned an empty space has meaning in code, so does a tab, a new line, and a carriage return. These can be searched using a combination of the backslash and one letter. For brevity, we've excluded the surrounding slashes.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
\n find a newline
\t find a tab
\r find a carriage return

This is mind-blowing, right? You can manipulate empty space and look for invisible metacharacters which control formatting using regexp. Let's try a regexp example based on what we know so far. We want a specific dollar amount at the beginning of a string $10.xx and any cent amount.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
/^\$10\.\d\d/

We are using ^ to match the start
-then a backslash \ to escape the dollar $ sign
-the number 10 followed by an escaped period \.
-the escaped \d represents any digit 0-9, so we have it twice

0 reactions
heart.png
light.png
money.png
thumbs-down.png

As previously mentioned, adding a backslash to any letter changes the search pattern. Here are some search patterns with the backlash and letter combination.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
\w matches any word
\d matches any digit
\s matches empty space

In addition to that, you can match the negation of the opposite with the capital letter equivalents.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
\W don't match a word
\D don't match a digit
\S don't match empty space

Globally Insensitive

Now that you are getting more comfortable with the possibilities of regular expressions, you need to be aware of the letters "g" and "i" at the ending of the regexp term, right after the second forward slash. These are known as flags that modify your search. The "g" means global, so it will return more than one result match if available, while the "i" means insensitive in regards to text case. Uppercase or lowercase will not matter using this flag.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
/term/g Finds multiple instances, not just the first
/term/i Finds uppercase and lowercase characters

To expand on your searches, here's the next eddition of complexity. You may want to find a combination of letters, numbers, or symbols. You can do this by grouping inside parentheses

()
and brackets
[]
. The brackets are specific to character ranges such as 0-9 or A-Z uppercase, a-z lowercase.
0 reactions
heart.png
light.png
money.png
thumbs-down.png

You can use multiple dashes for multiple ranges inside a single set of brackets. The parentheses are not useful alone, but when you have additional search terms in one regexp. To throw in a monkey wrench, the carrot

^
symbol inside a bracket set will negate the search.
0 reactions
heart.png
light.png
money.png
thumbs-down.png
/[abc]/ Matches any of the letter a, b, or c.
/[0-7]/ Matches numbers 0-7 anywhere in the string.
/[^0-7]/ Don't match numbers 0-7 anywhere in the string.

[0-9] is identical to the

\d
for digits while
\w
is identical for [a-z] words.
0 reactions
heart.png
light.png
money.png
thumbs-down.png

Using parentheses

()
is useful when you want to search more than one pattern such as international phone numbers while brackets
[]
or for searching sets. When using parentheses in your search, you may also include the pipe symbol
|
as an OR operator. This means your result can be the search pattern on either side of the pipe. This is known as alternation. Here are examples:
0 reactions
heart.png
light.png
money.png
thumbs-down.png
/[abc](123)/ matches a, b, or c, followed by 123
/gr[ae]y/ matches gray or grey
/(gray|grey)/ matches gray or grey

Quantity to Match

Do you want to match a specific amount of letters or numbers? Perhaps 0 or 1, 1 or many, only 4. It's all possible with regular expression quantifiers. Here's are quantifier symbols and how you can use them. We will use the letter "a" as part of the example.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
/a*/ Match 0 or more letter a
/a+/ Match 1 or more letter a
/a?/ Match 0 or 1 letter a
/a{4}/ Match exactly 4 consecutive letters a.
/a{2,3}/ Match between 2-3 letters a.

The possibilities don't stop here. This is why algorithms utilize regular expressions regularly so becoming an export in them is going to take you a long way. In total, there are 11 metacharacters available for regular expressions.
They are:

\ ^ $ . | ? * + () [] {}

Each one has a purpose.
0 reactions
heart.png
light.png
money.png
thumbs-down.png

Another practical example is to find html tags because they are the foundation of websites. Let's think this through before typing out the expression. We need at least one letter because all tags start with a letter, and while it should be lowercase, we may encounter legacy html that is capitalized. Next, we shall expect more letters or a number such as h1 tags. While the

*
will get one or more characters, we can limit the amount using
{}
instead. The following will capture html tags without attributes:
0 reactions
heart.png
light.png
money.png
thumbs-down.png
/<[A-Za-z][A-Za-z0-9]*>/g Matches html tags

Finally, there is another advanced concept if regular expressions weren't advanced enough. It is called the lookahead. There's a positive and a negative lookahead. It must be placed inside parentheses and begin with a question mark ?. Essentially a lookahead matches the search pattern but does not capture it or you can think of it as to match something not followed by something else. This is useful when making a combined search pattern by grouping. To demonstrate, let's search for a dollar value in a string that is followed by "USD", but we don't want to capture the "USD". We will use the positive lookahead using

(?=
and the negative lookahead using
(?!
.
0 reactions
heart.png
light.png
money.png
thumbs-down.png
/\$30(?=USD)/ Matches $30 from "The product costs $30USD"
/\$30(?!USD)/ Matches $30 from "The USD value is $30"

Begin Your Journey

Now you have gone through the fundamentals of querying, matching, and modifying the data primitives of javascript known as strings. Just reading this won't give you the ability to work these methods. You must use it in practice through code editors and internet browsers. The examples provided in this article can be used to test them for yourself, and you should retype them instead of copying and pasting. So go forth and build up your skills in coding with javascript.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Article Photo credit https://unsplash.com/@agni11

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Also published here.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
5
heart.pngheart.pngheart.pngheart.png
light.pnglight.pnglight.pnglight.png
boat.pngboat.pngboat.pngboat.png
money.pngmoney.pngmoney.pngmoney.png
by Rutkat @tom2. Front End Engineer + Blockchain AdvocateServer Status 201 Created
Join Hacker Noon

Create your free account to unlock your custom reading experience.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK