Metropolis Microservices Platform services Monitoring

Revision as of 20:04, 9 August 2024 by Efernandez (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)






On the vast majority of use cases there is always the need for platform monitoring, HW usage, general network traffic and have some sort of visualization. With the Monitoring Microservices we have that out of the box.

Set up

First we need to enable the services:

sudo systemctl start jetson-monitoring
sudo systemctl start jetson-sys-monitoring
sudo systemctl start jetson-gpu-monitoring

Then verify that the services are added on Ingress config at /opt/nvidia/jetson/services/ingress/config/platform-nginx.conf.
After that we should be able to acess the dashboard by going to the URL: <HOST_ADDR>:3000, It will ask for a login, type admin, admin if it hasn't been changed.

Main dashboard

If we go to the monitoring URL and access the main dashboard, we should see the main dashboard with a vast variety of metrics, ranging from HW usage to CPU time and Network usage.

 
Grafana main dashboard

Each panel can be modified and changed as needed, and rules can be added for alerts. For example the RAM used panel already comes with an alert incorporated. To access or create alerts on the UI we can go to the menu as shown:

 
Grafana panel edit

Then we can check the rule that is set for that metric and add more if needed:

 
Grafana memory usage alert

The alerts can be set do do any sort of things, like email, slack or teams notifications and so on.

API

As well as the other Microservices, the monitoring service also has an API that client application can access information about it. You can check more details for each metric on the official documentation on the following link https://docs.nvidia.com/moj/platform-services/monitoring.html#oss.