7

Monitoring a Cassandra Database Cluster with vRealize Operations and vRealize Lo...

 3 years ago
source link: https://blogs.vmware.com/management/2021/04/monitoring-a-cassandra-database-fluentd-part-1.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Uncategorized

Monitoring a Cassandra Database Cluster with vRealize Operations and vRealize Log Insight/Fluentd – Part 1

nguerrera_avatar_1565224438.jpg April 21, 2021

In this two part blog series, vRealize Log Insight and vRealize Operations come together to provide a one stop shop for event monitoring, alerting, and metrics. With the combination of these two powerful pieces of software, we can monitor all the way from the application itself, down to the physical hardware without having to rely on juggling multiple monitoring solutions to paint a complete picture. For the first part of our two-part series, let’s begin with configuring a powerful, open-source log collector, Fluentd, to pull logs from a popular database application to vRLI.

Configuring Fluentd:

To start, we install the Log Insight plugin for Fluentd and create our Fluentd configuration file. If you dont have the Fluentd agent installed, you can follow the agent install instructions on the Fluentd website. In our example I used the td-agent version for Ubuntu.

Once the agent is installed, install the Log Insight plugin by running the command
• ‘sudo td-agent-gem install fluent-plugin-vmware-loginsight’

After the log insight plugin is installed, you should see it load in the Fluentd log when you restart the service.

Now we can add our Cassandra and Log Insight configurations to the td-agent.conf file. Remember to do this on each node that you installed Fluentd on.

So, lets dissect what we added to the td-agent.conf file. We added a new source for the Cassandra logs in ‘/var/log/cassandra/’, but we excluded the debug log, so we don’t flood Log Insight with debug information. We also are tagging these logs as cassandra logs, so they will be a little easier to identify in Log Insight. There is no parsing being done, but if you’re a wiz at regex, you can parse the logs before they go to Log Insight and then send them in a json format.

Second, we configure our output to the Log Insight server, via the API ingestion path. I disabled SSL verification since its just a lab, but for production we might want to enable it.

For a full set of configuration options for the Log Insight plugin, check out the Github page that VMware maintains for it: https://github.com/vmware/fluent-plugin-vmware-loginsight

Extracting Fields:

Once we are finished modifying the configuration file, we can restart the td-agent service and we should start to see logs flowing into Log Insight from each Cassandra node we configured. In the example below, the tag field shows which OS log file the event came from.

Now that we have logs, lets extract some fields and make some alerts and dashboards!

In this example, we will create two extracted fields for our Cassandra logs, so we can easily query and create alerts based on database status.

First, let’s create a field called ‘cass_node’, so we can find our Cassandra nodes quickly. In Log Insight, search for the text ‘node’ with the source field being one of the Cassandra nodes we just configured log forwarding for. Just highlight the IP address between ‘Node /’ and ‘state’, and right click to ‘Extract field’.

We will get an extracted field configuration dialogue on the far right, where it will look for text between ‘Node /’ and ‘state’ in our logs and place it in a custom field. You can make further tweaks as well, but we’re going to keep it simple.

Then we can save that and extract a second field for the Cassandra database status. Follow the same steps as above but make the pre-context ‘state ‘ (with a space after) and the post leave blank. This will grab all text after state and give us our database status.

Save your second extracted field, and now we have two fields to alert on. Now if we do a field search for ‘cass_db_status’ contains ‘shutdown’, we see that both nodes were shutdown during the 10:00 hour on March 24th.

Creating Widgets:

Great, now we have some extracted fields. Now what? Let’s create a dashboard to monitor when this Cassandra event triggers via the ‘cass_db_status’ field.

This is the easiest part, we just have to click the ‘Add current query to dashboard’ button and we can give our new widget a name, type and description.

Once we click ‘Add’, we can monitor Cassandra database shutdown events via the ‘My Dashboards’ section.

That’s it! Feel free to create as many extracted fields as you want for the events you are ingesting, but don’t go too crazy, as too many extracted fields will incur a performance penalty with the Log Insight cluster, depending on your hardware and VM sizing.

Conclusion:

We will want to alert on these conditions but sending an email alert from Log Insight is a little boring, so in our second part of this blog we will configure Log Insight to send the alert to vRealize Operations, where we can then monitor the database metrics using application monitoring, and even the Linux OS and ESX host stats to get a full picture of our Cassandra database cluster.

Nico Guerrera

Nico Guerrera is a senior technical account manager who has been with VMware since 2016. He a member of the cloud management TAM tech lead team and focuses on vRealize…


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK