Status Monitoring and Healthchecks Requirements
Background & Business Value
We would like to ensure that the API Proxy endpoints are up and running. We believe the ability to do this can be helped by using healthcheck and monitoring tool which can send out notifications if any proxies become unavailable (upcheck) or start returning error messages (healthcheck).
Goals
- Establish a monitoring system for the API Proxy Endpoints
- Ensure it can notify the Team
- Establish Standards for types of Monitoring
- Upcheck (required)
- Simply checks if the API Proxy is up and listening. It does not check anything further than the API Gateway.
- Healthcheck (opt-in)
- Checks if the backend API service is healthy and responding. Check flows through the proxy to the backend application server.
- Upcheck (required)
- Establish procedures for setting up uptime and healthcheck monitoring
- Document the features
Assumptions
- Will use UptimeRobot.
- Ask Steven Maglio for credentials
Out of Scope
- Being responsible for notifying the API service owners
- Being responsible for the uptime of backend API services
Requirements
Ticket(s) | Title | User Story | Priority | Notes |
---|---|---|---|---|
/upcheck | As an API Gateway Admin, I want to know if one of our API Proxies is no longer available (is no longer deployed). | MUST HAVE |
| |
/healthcheck | As an API Gateway Admin, I want to give the API service developers a standardized way that they can monitor the health of their applications through the API Gateway. (Testing that a call going through the API Gateway will make it all the way to the backend service and verify that the backend service is responding correctly.) | MUST HAVE |
| |
Healthcheck should be Opt-In | As an API Developer, I don't want to be forced to provide a /healthcheck endpoint. I do want the ability to provide on in the future. | MUST HAVE |
|
User Interaction, Design & Architecture
Creating a new monitor
Examples and References
Questions
Below is a list of questions to be addressed as a result of this requirements document:
Question | Outcome | Decision Date |
---|---|---|
For the /healthcheck endpoints (the ones that flow through the API Gateway to the backend), should we secure them with an API key? Should it be a single API Key that we use on UptimeRobot for all healthchecks? Would this mean that the shared flow that accepted healthcheck requests would only check against that API Key; so other legitimate keys for the overall API Proxy would not work? | ||
When a /healthcheck reports itself as down, should we standardize that the notification will only be sent to the functional/shared account address? Do we want to be more loose and let the API developer specific other addresses to send to? |