3

Extract substring between two markers in Python

 2 years ago
source link: https://thispointer.com/extract-substring-between-two-markers-in-python/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

In this article, we will learn to extract a substring between two markers in a string using Python Programming Language. But before that we should know what is a substring in Python programming language?

A substring is a sequence of characters which is a part of a string often created using methods like slicing or by using split() method. Now let’s learn about the methods through which we can extract the given substring between two markers.

Table Of Contents

Extract substring between two markers using Regex

First method we will be using to extract a given substring between two markers is by using the search() method of re module. The re stands for Regular Expression which comes bundled with the Python Programming Language.

The re.search(pattern) methods returns the string found which matches the given pattern. It stops as soon as it locates the mathcing string and returns the string. If no match s found then t returns None.

Advertisements

Lets see an example,

import re
sampleStr = 'ilearncodingfrom;thispointer.com/articles'
# here ; and / are our two markers
# in which string can be found.
marker1 = ';'
marker2 = '/'
regexPattern = marker1 + '(.+?)' + marker2
str_found = re.search(regexPattern, sampleStr).group(1)
except AttributeError:
# Attribute error is expected if string
# is not found between given markers
str_found = 'Nothing found between two markers'
print(str_found)
import re

sampleStr = 'ilearncodingfrom;thispointer.com/articles'

try :
    # here ; and / are our two markers 
    # in which string can be found. 
    marker1 = ';'
    marker2 = '/'
    regexPattern = marker1 + '(.+?)' + marker2
    str_found = re.search(regexPattern, sampleStr).group(1)
except AttributeError:
    # Attribute error is expected if string 
    # is not found between given markers
    str_found = 'Nothing found between two markers'

print(str_found)

OUTPUT :

thispointer.com
thispointer.com

So in the code and output above, you can see that by using the re.search() method, we have successfully found the substring between given two markers.Here we need to look for the string between two markers (; and /) in variable sampleStr. Also we are expecting AttributeError if nothing is found between the given two markers. So, if find nothing between the given two markers(; and /) then instead of giving an AttributeError , it will return a message nothing found between two markers.

Extract substring between two markers using find() and slice()

To extract the substring between two markers, we will be using a combination of find() method and slicing method of Python Programming language. The
find() method will be used to find the string between two markers. It returns -1 if found nothing. Then we will use slice() method to slice the substring in between given two markers. Lets see an example :

sampleStr = 'ilearncodingfrom;thispointer.com/articles'
# find() method will search the
# given marker and stores its index
mk1 = sampleStr.find(';') + 1
# find() method will search the given
# marker and sotres its index
mk2 = sampleStr.find('/', mk1)
# using slicing substring will be
# fetched in between markers.
subString = sampleStr[ mk1 : mk2 ]
print(subString)
sampleStr = 'ilearncodingfrom;thispointer.com/articles'

# find() method will search the 
# given marker and stores its index 
mk1 = sampleStr.find(';') + 1

# find() method will search the given 
# marker and sotres its index
mk2 = sampleStr.find('/', mk1)

# using slicing substring will be 
# fetched in between markers.
subString = sampleStr[ mk1 : mk2 ]

print(subString)

OUTPUT :

thispointer.com
thispointer.com

In the code and output of method 2, you can see that a combination of slice() method and find() methods has been used to extract substring between two markers. Index of markers has been sotred in var mk1 and mk2 using the find() method. Then using slicing, substring has been fetched and printed.

Extract substring between two markers using split() method

Next method that we will be using is the split() method of Python Programming language, to extract a given substring between two markers. The split() method in python splits the given string from a given separator and returns a list of splited substrings.

It recieves two parameters :
separator : separator used to split the string. If given nothing is provided, then space is the default separator.
maxsplit : a number, which specifies the maximum parts in which the string needs to be splitted. Default value is -1 which specifies there is no limit.

Lets see an example of this method :

EXAMPLE :

sampleStr = 'ilearncodingfrom;thispointer.com/articles'
# here ; and / are our two markers
# in which string can be found.
subStr = sampleStr.split(';')[1].split('/')[0]
print(subStr)
sampleStr = 'ilearncodingfrom;thispointer.com/articles'

# here ; and / are our two markers 
# in which string can be found. 
subStr = sampleStr.split(';')[1].split('/')[0]

print(subStr)

OUTPUT :

thispointer.com
thispointer.com

In the code above, its just a one line code comprising multiple split() methods, through which substring has been extracted between two markers. First split() method splits the string from marker ‘;’ and its index 1 has been used in which rest part of the string lies. Then again split() method has been used. But now marker ‘/’ is the separator and it splits the substring from rest of the string and index 0 is printed.

Extract substring between two markers using partition() method :

Next method we will be using to extract the substring between two markers is partition() method. The partition() method splits the string from first occurence and returns a tuple containing three items :

  • Firts : string before the given separator.
  • Second : separator
  • Third : string after the given separator.

It receives only one parameter which is the separator.

Lets see an Example :

EXAMPLE :

sampleStr = 'ilearncodingfrom;thispointer.com/articles'
before, mk1, after = sampleStr.partition(";")
subStr, mk2, after = after.partition("/")
print(subStr)
sampleStr = 'ilearncodingfrom;thispointer.com/articles'

before, mk1, after = sampleStr.partition(";")
subStr, mk2, after = after.partition("/")

print(subStr)

OUTPUT :

thispointer.com
thispointer.com

In the code and output above you can see how partition() method has been used to extract substring between two markers.
irst we partioned the string based on first marker. It splitted the string into three parts i.e. substring before first market, the first marker and the substring after the first marker. We picked the last one i.e. the substring after the first marker. Then we partioned that based on second marker and picked the first entry from returned tuple. This first entry denotes the sbstring before second marker. So as a result we got our substring between two given markers.

Summary

In this article, we learned about substrings and markers. Then we also discussed about four different methods through which we can extract the substring between two markers. Method 1 and Method 3 can be very helpful because method 1 has the better error handling. Whereas, method 3 has a shorter syntax and easier to understand. Otherwise all the methods above are very useful. Try to learn all the methods above and run these codes on your machines. We have used Python 3.10.1 for writing example codes. To check your version write python –version in your terminal.

Pandas Tutorials -Learn Data Analysis with Python

 

 

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK