Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Incident Report

...

Page Properties


Incident Date(s) 
Incident Start Time (PST)~1:50 AM
Incident End Time (PST)~5:15 AM
Incident Duration (in Hours)~3.5 hours
Est. # Users AffectedEstimated number of users affected
Type of Users Affected

Status
colourBlue
titleStudents
 
Status
colourYellow
title Campus SYSTEMS
  


...

  • Uptime Robot was useful in the early detection of the problem.
  • Christian Montecino has previously reached out to Apigee to determine if these post-call logging steps could be run in a way that would never impact the result of the actual API call being performed. He will most likely continuing his conversation with Apigee into determining the best way possible to do this.
    • Notes
      • The main options of how to send this information to splunk are outlined in this Apigee Community Post.
      • Currently, we are using an Apigee Service Callout Policy (Option 1. Log over HTTP). The Service Callout Policy page lists that the async attribute has been deprecated and will always be "false".
      • "Option 2, Log over TCP", uses the Message Logging Policy to make the call asynchronously. This seems to be the only option that can currently be asynchronous. However, the issue with this policy is that you cannot set a HTTP Header for the call to splunk. Our splunk's setup requires a Authentication Token to be sent in the header.
      • "Option 3, Log via javascript", uses the Javascript Policy to make the call synchronoushly. The async attribute is marked as deprecated and will always be "false".
      • Option 4 on that page only applies to on-premise Apigee installations, and we are using the Cloud version.
    • We are going to go with another approach:
      •  Update the Shared Flow to move the logging into a PostClientFlow.
      •  Change the Service Callout Policy to not use a <Response> element. When that element does not exist, then the policy should run as a fire-and-forget method; which should prevent errors from stopping the flow.
  • In the future, when Campus Wide networking work will be done, the Campus API Team may desire to send out notifications that it could disrupt service.

...