The Mysterious Case of the Disappearing Logs
Recently, a Sentinel instance that I'm responsible for showed a significant decrease in the volume of firewall logs being ingested. This drop coincided with an upgrade to the firewall firmware version, so I assumed there may have been a change in what logs were being sent by the firewalls.
However, after some investigation, the volume of logs coming out of the firewalls and hitting the syslog collector looked similar to normal. I started looking for errors on the log collector or warnings within the workspace in Azure, but there was nothing.
I began to suspect we were running into an issue I've seen before where a vendor altering a log format collides with Sentinel limits on table schemas. As per Microsoft's documentation, each table can have a maximum of 500 columns (actually it's more like 509 including the built-in fields).
It's easy to check how many columns are used in a specific table with a query like the following
fortigate_CL
| getschema
| count
If you get back a count of more than 500 then it's likely that you're dropping some events.
To test this theory, I directed the firewall logs into a new custom table and within a few minutes, the ingestion was back to its full rate.
I have now added an analytics rule which looks for the number of columns in the schema exceeding 500, however it's not straightforward to do this across all tables without creating a large number of specific rules.
While this limit seems reasonable and well-documented, the lack of any errors or warnings is frustrating. When the total length of a single field exceeds the limit and is truncated, this is clearly shown in the Azure portal, but there's so similar message for the number of columns.