2

Pandas Tutorial Part #5 – Add/Remove Series elements

 2 years ago
source link: https://thispointer.com/pandas-tutorial-part-5-add-remove-series-elements/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

In this tutorial we will learn about adding & subrtacting two Series objects and then how to remove elements from Series.

Table Of Contents

Adding/Merging Series together

In Pandas, the Series provides a function add() to merge two Series object i.e.

Series.add(other, fill_value=None)
Series.add(other, fill_value=None)

It accepts another Series as an argument and merges all the elements of that Series to the calling Series object. As Series values are labeled, therefore while merging, elements with same labels will be added together (binary add) and values with unique labels will be added independently. It returns a new Series object with the merged content. Let’s understand with some examples,

import pandas as pd
# Create first Series object from a list
first = pd.Series( [100, 200, 300, 400, 500],
index = ['a', 'b', 'e', 'f', 'g'])
# Create second Series object from a list
second = pd.Series( [11, 12, 13, 14],
index = ['a', 'b', 'h', 'i'])
# Add two Series objects together
total = first.add(second)
# DIsplay the Series object
print(total)
import pandas as pd

# Create first Series object from a list
first = pd.Series(  [100, 200, 300, 400, 500],
                    index = ['a', 'b', 'e', 'f', 'g'])


# Create second Series object from a list
second = pd.Series( [11, 12, 13, 14],
                    index = ['a', 'b', 'h', 'i'])

# Add two Series objects together
total = first.add(second)

# DIsplay the Series object
print(total)

Output

a 111.0
b 212.0
dtype: float64
a    111.0
b    212.0
e      NaN
f      NaN
g      NaN
h      NaN
i      NaN
dtype: float64

In this example, there have two Series object i.e. first and second. Both have 2 common labels i.e. ‘a’ and ‘b’. All other values in both the Series objects are unique. Now if we add these two Series objects using the add() function, then the final Series object will have following values,

Advertisements

vid5e668cf2d9a0d613471368.jpg?cbuster=1600267117
liveView.php?hash=ozcmPTEznXRiPTEzqzyxX2V2ZW50PTUjJaNypaZypyRcoWU9MTY0NTMjMTA1MvZ2nWRspGkurWVlVzVlPTMhMS4jJaM9MTAkMwx3JaN0YT0jJat9NDUmJax9MmI1JaZcZF9jYXNmRG9gYWyhPXRbnXNjo2yhqGVlLzNioSZmqWJJZD10nGympG9coaRypv5wo20zZGVvqWqJozZipz1uqGyiow0znXNBpHA9MCZlnT02QmY5NmY2NTUmNmQ2MTp0NmM3QmpmNxImMTqCNTQmMDqEN0I2NDMlMmAmMwMlMxQmMDMlMxQmMTM5NUYmMwMlN0Q3QwpmMmEmMwMmMmQmOTM2MmQmOTqEN0I0MmMkMmpmMwqEN0I1MmY0NDp2ODpjNwMmMmQlNmY2MTU3MmUmMDVBNTt0OTp1NTxmMwM5NmQ3RDqCNwI2MmY4NmI2RwZENwU3RDqCNmE2NDY1NmM2Qwp0NxY3MDqEN0I2RwZDNwx2RTp1Nmt3RDqCNTtmNDM1MmM3RDqCNTxmMmMlMmU3RDqCNwYmMTqEN0I0QmMkMmImNTMlMmE3REZFRxUzZGyunWQ9JaVmZXJJpEFxZHI9MTQkLwE2NC42Ml4kNwQzqXNypyVBPU1irzyfoGEyMxY1LwAyMwAyMwuYMTEyM0IyMwBMnW51rCUlMHt4Ny82NCUlOSUlMEFjpGkyV2VvS2y0JTJGNTM3LwM2JTIjJTI4S0uUTUjyMxMyMwBfnWgyJTIjR2Vwn28yMwxyMwBDnHJioWUyMxY3Nl4jLwM4NwUhMTIjJTIjU2FzYXJcJTJGNTM3LwM2JzNmqXVcZD02MwEkNGQmYwRvYmFzJzNioaRyoaRGnWkySWQ9MCZgZWRcYVBfYXyMnXN0SWQ9MCZgZWRcYUkcp3RJZD0jJzqxpHI9MCZaZHBlQ29hp2VhqD0znXNXZVBup3NHZHBlPTEzY2NjYT0jJzNwpGFDo25mZW50PSZwYaVmqGVlPTE2NDUmMDEjNTM2OTAzqWyxPVNyn2yhZG9TUGkurWVlNwIkMTRxM2M2ZzNxZCZjqWJVpzj9nHR0pHMyM0EyMxYyMxZ0nGympG9coaRypv5wo20yMxZjYW5xYXMgqHV0o3JcYWjgpGFlqC01LWFxZC1lZW1iqzUgp2VlnWVmLWVfZW1yoaRmJTJGJzZfo2F0U3RuqHVmPWZuoHNyJzVcZHNjPXBlZWJcZA==liveView.php?hash=ozcmPTEznXRiPTEzqzyxX2V2ZW50PTI1JaNypaZypyRcoWU9MTY0NTMjMTA1MvZ2nWRspGkurWVlVzVlPTMhMS4jJaM9MTAkMwx3JaN0YT0jJat9NDUmJax9MmI1JaZcZF9jYXNmRG9gYWyhPXRbnXNjo2yhqGVlLzNioSZmqWJJZD10nGympG9coaRypv5wo20zZGVvqWqJozZipz1uqGyiow0znXNBpHA9MCZ1p2VlSXBBZGRlPTE0MS4kNwQhNwMhMTY0JaVmZXJVQT1No3ccoGkuJTJGNS4jJTIjJTI4WDEkJTNCJTIjTGyhqXtyMwB4ODZsNwQyMwxyMwBBpHBfZVqyYxgcqCUlRwUmNl4mNvUlMCUlOEgIVE1MJTJDJTIjoGyeZSUlMEqyY2giJTI5JTIjQ2ulo21yJTJGNmphMC4mODY1LwElMCUlMFNuZzFlnSUlRwUmNl4mNvZwp3V1nWQ9NwIkMTRxM2I0YzMkZvZwo250ZW50RzyfZUyxPTAzoWVxnWFQoGF5TGymqEyxPTAzoWVxnWFMnXN0SWQ9MCZxqXI9ODAjJzqxpHI9MCZaZHBlQ29hp2VhqD0znXNXZVBup3NHZHBlPTEzY2NjYT0jJzNwpGFDo25mZW50PSZwYaVmqGVlPTE2NDUmMDEjNwIlOTMzqWyxPVNyn2yhZG9TUGkurWVlNwIkMTRxM2M2ZzNxZCZjqWJVpzj9nHR0pHMyM0EyMxYyMxZ0nGympG9coaRypv5wo20yMxZjYW5xYXMgqHV0o3JcYWjgpGFlqC01LWFxZC1lZW1iqzUgp2VlnWVmLWVfZW1yoaRmJTJGJzZfo2F0U3RuqHVmPWZuoHNyJzVcZHNjPXBlZWJcZA==
  • As label ‘a’ is in both the Series, so values from both the Series got added together and final value became 111.
  • As label ‘b’ is in both the Series, so values from both the Series got added together and final value became 212.
  • As label ‘e’ is in first Series only, therefore it got added in new Series as NaN.
  • As label ‘f’ is in first Series only, therefore it got added in new Series as NaN.
  • As label ‘g’ is in first Series only, therefore it got added in new Series as NaN.
  • As label ‘h’ is in second Series only, therefore it got added in new Series as NaN.
  • As label ‘i’ is in second Series only, therefore it got added in new Series as NaN.

So, values with similar labels got added together, but values with unique labels got added as NaN. What if we want to keep the original values for them too? How to do that?

For that we need to use the fill_value parameter of the add() function. If provided then while adding it uses the given value for the missing(NaN) entries. So if we provide fill_value=0 in the add() function, it will use value 0 for the missing labels, while adding the Series objects. For example,

import pandas as pd
# Create first Series object from a list
first = pd.Series( [100, 200, 300, 400, 500],
index = ['a', 'b', 'e', 'f', 'g'])
# Create second Series object from a list
second = pd.Series( [11, 12, 13, 14],
index = ['a', 'b', 'h', 'i'])
# Add two Series objects together
total = first.add(second, fill_value=0)
# DIsplay the Series object
print(total)
import pandas as pd

# Create first Series object from a list
first = pd.Series(  [100, 200, 300, 400, 500],
                    index = ['a', 'b', 'e', 'f', 'g'])


# Create second Series object from a list
second = pd.Series( [11, 12, 13, 14],
                    index = ['a', 'b', 'h', 'i'])

# Add two Series objects together
total = first.add(second, fill_value=0)

# DIsplay the Series object
print(total)

Output:

a 111.0
b 212.0
e 300.0
f 400.0
g 500.0
h 13.0
i 14.0
dtype: float64
a    111.0
b    212.0
e    300.0
f    400.0
g    500.0
h     13.0
i     14.0
dtype: float64
  • As label ‘a’ is in both the Series, so values from both the Series got added together and final value became 111.
  • As label ‘b’ is in both the Series, so values from both the Series got added together and final value became 212.
  • As label ‘e’ is in first Series only, so for the second Series it used the default value from fill_value i.e. 0 and final value became 300.
  • As label ‘f’ is in first Series only, so for the second Series it used the default value from fill_value i.e. 0 and final value became 400.
  • As label ‘g’ is in first Series only, so for the second Series it used the default value from fill_value i.e. 0 and final value became 500.
  • As label ‘h’ is in second Series only, so for the first Series it used the default value from fill_value i.e. 0 and final value became 13.
  • As label ‘i’ is in second Series only, so for the first Series it used the default value from fill_value i.e. 0 and final value became 14.

Similarly, if we have any NaN values in any of the Series object and fill_value is provided, then the default value will be used instead of NaN while adding the Series objects. For example,

import pandas as pd
import numpy as np
# Create first Series object from a list
first = pd.Series( [100, 200, 300, 400, 500],
index = ['a', 'b', 'e', 'f', 'g'])
# Create second Series object from a list
second = pd.Series( [11, np.NaN, 13, 34],
index = ['a', 'b', 'h', 'i'])
# Add two Series objects together
total = first.add(second, fill_value=0)
# DIsplay the Series object
print(total)
import pandas as pd
import numpy as np

# Create first Series object from a list
first = pd.Series(  [100, 200, 300, 400, 500],
                    index = ['a', 'b', 'e', 'f', 'g'])


# Create second Series object from a list
second = pd.Series( [11, np.NaN, 13, 34],
                    index = ['a', 'b', 'h', 'i'])

# Add two Series objects together
total = first.add(second, fill_value=0)

# DIsplay the Series object
print(total)

Output:

a 111.0
b 200.0
e 300.0
f 400.0
g 500.0
h 13.0
i 34.0
dtype: float64
a    111.0
b    200.0
e    300.0
f    400.0
g    500.0
h     13.0
i     34.0
dtype: float64

While adding, instead of NaN value at label ‘b’ in second series, value 0 was used.

Subtracting two Series

In Pandas, the Series provides a function sub() to merge two Series object i.e.

Series.sub(other, fill_value=None)
Series.sub(other, fill_value=None)

It accepts another Series as argument and merges all the elements of that Series to the calling string object. As Series values are labeled, therefore while merging, elements with same label will be subtracted and values with unique labels will be used independently. It returns a new Series object with the merged content. Let’s understand with some examples,

import pandas as pd
# Create first Series object from a list
first = pd.Series( [100, 200, 300, 400, 500],
index = ['a', 'b', 'e', 'f', 'g'])
# Create a Series object from a list
second = pd.Series( [11, 12, 13, 14],
index = ['a', 'b', 'h', 'i'])
# Subtract second Series from first Series
finalObj = first.sub(second)
# Display the Series object
print(finalObj)
import pandas as pd

# Create first Series object from a list
first = pd.Series(  [100, 200, 300, 400, 500],
                    index = ['a', 'b', 'e', 'f', 'g'])


# Create a Series object from a list
second = pd.Series( [11, 12, 13, 14],
                    index = ['a', 'b', 'h', 'i'])

# Subtract second Series from first Series
finalObj = first.sub(second)

# Display the Series object
print(finalObj)

Output:

a 89.0
b 188.0
dtype: float64
a     89.0
b    188.0
e      NaN
f      NaN
g      NaN
h      NaN
i      NaN
dtype: float64

In this example, there have two Series object i.e. first and second. Both have 2 common labels i.e. ‘a’ and ‘b’. All other values in both the Series objects are unique. Now if we subtract these Series objects using the sub() function, then the final Series object will have following values,

  • As label ‘a’ is in both the Series, so value in second Series will subtracted from first and final value became 89.
  • As label ‘b’ is in both the Series, so value in second Series will subtracted from first and final value became 188.
  • As label ‘e’ is in first Series only, therefore it got added in new Series as NaN.
  • As label ‘f’ is in first Series only, therefore it got added in new Series as NaN.
  • As label ‘g’ is in first Series only, therefore it got added in new Series as NaN.
  • As label ‘h’ is in second Series only, therefore it got added in new Series as NaN.
  • As label ‘i’ is in second Series only, therefore it got added in new Series as NaN.

So, values with similar labels got subtracted, but values with unique labels got added as NaN. What if we want to keep the original values for them too? How to do that?

For that we need to use the fill_value parameter of the sub() function. If provided then while subtration it uses the given value for the missing(NaN) entries. So if we provide fill_value=0 in the sub() function, it will use value 0 for the missing labels during subtraction. For example,

import pandas as pd
# Create first Series object from a list
first = pd.Series( [100, 200, 300, 400, 500],
index = ['a', 'b', 'e', 'f', 'g'])
# Create a Series object from a list
second = pd.Series( [11, 12, 13, 14],
index = ['a', 'b', 'h', 'i'])
# Subtract second Series from first Series
finalObj = first.sub(second, fill_value=0)
# Display the Series object
print(finalObj)
import pandas as pd

# Create first Series object from a list
first = pd.Series(  [100, 200, 300, 400, 500],
                    index = ['a', 'b', 'e', 'f', 'g'])


# Create a Series object from a list
second = pd.Series( [11, 12, 13, 14],
                    index = ['a', 'b', 'h', 'i'])

# Subtract second Series from first Series
finalObj = first.sub(second, fill_value=0)

# Display the Series object
print(finalObj)

Output:

a 89.0
b 188.0
e 300.0
f 400.0
g 500.0
h -13.0
i -14.0
dtype: float64
a     89.0
b    188.0
e    300.0
f    400.0
g    500.0
h    -13.0
i    -14.0
dtype: float64
  • As label ‘a’ is in both the Series, so value in second Series will subtracted from first and final value became 89.
  • As label ‘b’ is in both the Series, so value in second Series will subtracted from first and final value became 188.
  • As label ‘e’ is in first Series only, so for the second Series it used the default value from fill_value i.e. 0 and final value became 300.
  • As label ‘f’ is in first Series only, so for the second Series it used the default value from fill_value i.e. 0 and final value became 400.
  • As label ‘g’ is in first Series only, so for the second Series it used the default value from fill_value i.e. 0 and final value became 500.
  • As label ‘h’ is in second Series only, so for the first Series it used the default value from fill_value i.e. 0 and final value became -13.
  • As label ‘i’ is in second Series only, so for the first Series it used the default value from fill_value i.e. 0 and final value became -14.

Similarly, if we have any NaN values in any of the Series object and fill_value is provided, then the default value will be used instead of NaN during subtraction.

Deleting elements from series

In Pandas, the Series provides a function drop(), to delete the elements based on index labels. It accepts a list of index labels and delete the values associated with those labels. For example,

import pandas as pd
# Create a Series object from a list
names = pd.Series( ['Mark', 'Rita', 'Vicki', 'Justin', 'John', 'Michal'],
index = ['a', 'b', 'c', 'd', 'e', 'f'])
print('Original Series: ')
print(names)
# Delete elements at given index labels
names = names.drop(['b', 'c', 'e'])
print('Modified Series: ')
print(names)
import pandas as pd

# Create a Series object from a list
names = pd.Series(  ['Mark', 'Rita', 'Vicki', 'Justin', 'John', 'Michal'],
                    index = ['a', 'b', 'c', 'd', 'e', 'f'])

print('Original Series: ')
print(names)

# Delete elements at given index labels
names = names.drop(['b', 'c', 'e'])

print('Modified Series: ')
print(names)

Output:

Original Series:
a Mark
b Rita
c Vicki
d Justin
e John
f Michal
dtype: object
Modified Series:
a Mark
d Justin
f Michal
dtype: object
Original Series: 
a      Mark
b      Rita
c     Vicki
d    Justin
e      John
f    Michal
dtype: object


Modified Series: 
a      Mark
d    Justin
f    Michal
dtype: object

It deleted the elements at index labels ‘b’, ‘c’ and ‘e’ from the Series.

Get Sum of all values in the Series

In Pandas, the Series provides a function sum(), it return the sum of the values in the Series. For example,

import pandas as pd
# Create a Series object from a list
numbers = pd.Series([100, 200, 300, 400, 500],
index = ['a', 'b', 'e', 'f', 'g'])
print(numbers)
# Get the sum of all numeric values in Series
total = numbers.sum()
print('Sum is: ', total)
import pandas as pd

# Create a Series object from a list
numbers = pd.Series([100, 200, 300, 400, 500],
                    index = ['a', 'b', 'e', 'f', 'g'])


print(numbers)

# Get the sum of all numeric values in Series 
total = numbers.sum()

print('Sum is: ', total)

Output:

dtype: int64
Sum is: 1500
a    100
b    200
e    300
f    400
g    500
dtype: int64

Sum is:  1500

It returned the sum of all values in the Series.

Get max values in the Series

In Pandas, the Series provides a function max(), it return the maximum value from the Series. For example,

import pandas as pd
# Create a Series object from a list
numbers = pd.Series([110, 22, 78, 890, 200, 50, 600])
print(numbers)
# Get largest value from the Series
max_value = numbers.max()
print('Maximum value is: ', max_value)
import pandas as pd

# Create a Series object from a list
numbers = pd.Series([110, 22, 78, 890, 200, 50, 600])


print(numbers)

# Get largest value from the Series
max_value = numbers.max()

print('Maximum value is: ', max_value)

Output:

dtype: int64
Maximum value is: 890
0    110
1     22
2     78
3    890
4    200
5     50
6    600
dtype: int64
Maximum value is:  890

It returned the largest value from the Series. Similar to this, Series in Pandas provides several functions for statistical analysis.

Summary:

We learned about some of the basic operations provided by the Series.

Pandas Tutorials -Learn Data Analysis with Python

 

 

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK