Scrafana

CUGRADER is a web-application developed by my friend TK. It serves as a way for students enrolled in the Python Programming course of my university faculty to submit assignments and quizzes. When a Jupyter Notebook File is uploaded to the service, it will automatically run assertion tests based on that particular assignment. The students will see their result immediately, which is a huge improvement over the old system which utilizes Google Forms. Extremely cool, beyond impressive.
This service utilizes an EC2 instance (AWS) paid for by the faculty. It has excellent uptime, until one strange day on September 1, 2025, the service mysteriously went down after students refreshed their page simultaneously, all eager to take their test. This was very critical because the server was unreachable by any means, even a ping could not reach it. The only way the service went alive again was specifically because the professor issued a restart through the AWS dashboard. But that was way after the incident, in the event, a Google Form was created and used as an emergency method. This prompted me to take a look into the logs after the service went up, and I found nothing. I only saw the snapd service struggling for life and then everything stopped, no logs.
At the time, we did not have any additional monitoring or logging tools on the service besides the system one, so we had no idea what caused the incident. So I volunteered to spin up my own Grafana & Prometheus instance on DigitalOcean happily paid for by myself to use in monitoring the system resource usage. I used Ansible to deploy the services used in monitoring to both servers (EC2 & DigitalOcean). The idea is to open a port on the EC2 to allow the Prometheus Database hosted on DigitalOcean to scrape the EC2’s system resource usage, then the data will be used by the Grafana service sitting right beside the Prometheus Database for visualization. It is worth noting that while this approach makes setting up easy, it is heavily discouraged for security reasons and should only be used when the data is inconsequential. With the release of Grafana Alloy, one should push the data to the Database instead of letting the Database scrape the data openly. With that out of the way, I’ve attached the Ansible Playbook used in deploying at the very top of this page for educational reasons.