29

Make a Bar Chart About Roman Emperors’ Rise to Power with Python

 4 years ago
source link: https://towardsdatascience.com/make-a-bar-chart-about-roman-emperors-rise-to-power-with-python-7d94e4131243?gi=ed8343d5cd1f
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

qQjIRzi.jpg!web

As an enthusiast of both ancient history and Python programming, when I stumbled upon this data set about Roman emperors , I knew what I had to do… use it to make a data visualization in Python!

Browsing the columns, I decided to chart the different ways the emperors rose to power. Sure, you could be born as the son of the emperor, but how often did “seizing power” actually work?

Feel free to code along with me to learn how to read a CSV file and make a bar chart in Python! This tutorial will assume knowledge of basic Python programming, but you can still follow along without it.

To see the data visualization, I’ll be coding in the Spyder IDE , which you can download as part of the Anaconda distribution .

Reading a CSV file with pandas

You can view the CSV file of data about Roman emperors here . It’s all prettified in a nice data table that hides what CSV actually means: comma-separated values.

EnIvyym.png!web

So pretty!

If you view the raw CSV , you’ll see all the data squished together, each column separated by only a comma. Yikes!

q6nqEjm.png!web

Egads!

But actually, CSV is a great file type. We can read CSV data easily with the Python library pandas.

Let’s get started by importing pandas (so we can read this CSV) and matplotlib, another library that will allow us to produce some publication-quality data visualizations.

At the top of your code, write:

import pandas as pd
import matplotlib.pyplot as plt

Using the shorthand pd for pandas and plt for matplotlib.pyplot is fairly standard, and it also saves us a lot of trouble when it comes to typing out long library names.

Now we can use pandas to read the CSV file:

df = pd.read_csv("https://raw.githubusercontent.com/zonination/emperors/master/emperors.csv", encoding='latin-1')

Yes, it’s that easy! OK, a few words about the code:

  • Why did I call the variable I’m storing the CSV in df ? df is a pretty common variable name for a DataFrame object, which is what pandas creates when it reads a CSV file.
  • What’s up with that encoding='latin-1' ? When I first tried to read this CSV file, I got a UnicodeDecodeError. Stack Overflow suggested that I try reading the file with different kind of encoding, such as latin-1 . And voila, the computer was able to read the file!

It’s always good to see what’s being stored inside your variables. We can see the top and bottom of the DataFrame df by running the following code:

print(df.head())
print(df.tail())

You can run programs in Spyder by pressing what looks like a green “play” button.

If you can see something in your console that looks vaguely like a data table with index numbers, you’re probably on the right track!

e2Af2mJ.png!web

In this printing of df.head(), you can only see the index and the last column, which credits a Reddit user for the data table. But, there are actually 16 columns!

Turning the data into a dictionary

All right, I have 16 columns of data. But as I mentioned before, the one column I’m really interested in is the “Rise” column. I wanted to see if it was more common to be born an emperor’s son or seize power through other means.

uiE3I32.png!web

Yeah, it starts out with a lot of “birthright,” but then things get interesting…

I initially thought that I wanted to make a histogram, but as I learned from Python Graph Gallery , a histogram ONLY takes numerical data and just shows the distribution of it. I, on the other hand, had categorical data: different paths to power, like “Birthright” and “Seized Power.” I also needed to calculate a second, numerical variable: how many different emperors rose to power in that way.

With one categorical variable and one numerical variable (to be calculated), what I wanted to make was a bar chart .

With df["rise"] , I could access the whole column of paths to power, a big long list that went like “birthright, birthright, appointment by senate, appointment by army,” etc. But I needed to boil down the paths to power to one of each while also calculating how many emperors took that path.

In short, I needed a dictionary that looked something like this:

rises = {
  "birthright": 10,
  "seized power": 9,
  "appointment by senate": 4
}

I decided to convert df["rise"] to a list and loop through it to create a dictionary. As I looped through it, I would count the number of occurrences of each path to power.

Add this code to create the dictionary of paths to power:

data = {}
for path in list(df["rise"]):
  if path in data:
    data[path] += 1
  else:
    data[path] = 1

Then, I used the dictionary to produce lists of my categorical and numerical values. Add this code below the dictionary:

paths = list(data.keys())
numbers = list(data.values())

Now, I had a list of unique paths to power, as well as the numbers that corresponded to each path.

Of course, it’s always a good idea to print the value of your variables, just to make sure they hold what you think they hold.

Time to make the bar chart!

Making a bar chart with matplotlib.pyplot

The code for making a graph with Python is so simple, it feels like it can’t be real. But it is!

Add this code to produce a bar chart of emperor’s causes of death in Python:

plt.title("Roman Emperors' Paths to Power")
plt.ylabel("Number of Emperors")
plt.xlabel("Paths to Power")
plt.bar(paths, numbers)
plt.show()

my2A73Q.png!web

Literally amazing (except for some squished text on the x-axis — we’ll deal with that soon). You’ll notice that the first three lines of code just add the title and label the axes. All the heavy-lifting takes place by calling plt.bar() in the fourth line. The last line just shows the graph.

And here we can see that the “Birthright” bar seems way higher than the rest of them… so that’s probably a more reliable way of becoming emperor than “seizing power”!

Finishing touches

OK, so you can’t actually read the labels on the x-axis because they’re all squashed together.

Fortunately, you can rotate the labels on the x-axis with this line of code:

plt.xticks(rotation=90)

Be sure to add it in before calling plt.show() .

bEzeUzN.png!web

Lastly, we need to add some color. Our visualization should be eye-catching!

When you call plt.bar() , you can also specify the colors you want through the color parameter. For example, if I write:

plt.bar(paths, numbers, color='mcbkyrg')

That says I want the first column to be magenta (m), the second to be cyan (c), then blue (b), black (k), yellow (y), red (r), and green (g). Then the order starts again. (Read more about colors in matplotlib here .)

aiyYzqY.png!web

Perfection!

Extensions

To add more features to your bar charts, or for inspiration to create a new one, check out the bar charts at Python Graph Gallery !


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK