Wix Tackles Application-Level Security with Custom Detector

Website building platform Wix recently created a simple but effective logic-based monitoring process that detects application-level vulnerabilities in production-level environments in real-time.
Security vulnerabilities happen, even with shift-left and security testing done in the development phase, there are still vulnerabilities in an application’s production level. It’s pretty wild to read about all these different monitoring, observability, and security tools as they come out to combat the ever-growing amount of security threats.
What Is Not Expected
Wix’s custom security system detects security vulnerabilities by using logic that looks for the opposite of what’s expected from a secure application and searches for those exceptions in the application logs, according to a blog post written by Wix Security Engineer Moti Harmats.
Once a security exception is found in an application log, a member of the security team is notified and the threat is addressed. The logs are aggregated to a central database which allows the custom solution to scale successfully while keeping site load times and latencies low.
Wix plans to open source the new security software in the coming months.
Wix already had security processes in place including shift-left, and production-level processes that focused on the runtime environment (server/ OS) and perimeter (WAF/ HTTP access logs). But the Wix security team found that commercial solutions neglected the “application runtime” which they consider the behavior, logs, and metrics of the application stack. They set out to create a safety net for their shift-left practices.
Log Based Monitoring
The logic behind the new monitoring process was basic but solid. Certain exceptions should never appear in applications that are securely written. For example, SQL “syntax error” doesn’t occur when a query is written “properly” (i.e. using a valid parameterized library). Exceptions are a possible indicator that the syntax of the SQL query changed as the result of a runtime aspect that was unexpected or improperly handled, such as user input. These errors only occur when using dynamic SQL queries (which are considered bad practice) or if it’s an input that “breaks” the syntax — the query is vulnerable to an SQL injection.
To implement their security solution they just turned that logic on its head — rather than using penetration tests to search for injections and use logs for supporting evidence, find the errors in the logs then reverse-engineer them to build injection payload and find the security vulnerabilities.
The image below illustrates an error containing a stack trace:
The first line contains the Exception Class and Exception Details.
Each new line contains the following details: Full Class, Invoked Method, Class file, Line in the source code.
The key to finding the SQL injection is to find the first non-generic class that leads to the exception. Non-generic means a custom class unique to the codebase. Non-generic in this example is “.com.wix*” meaning the generic library is “.com.mysql*”.
The suspicious query is fired in the MySqlQuestionDao.scala at line 324 in the getQuestions function.
With SQL injection understood, they extended the approach to other vulnerability classes sharing symptoms capable of triggering unexpected errors which are part of an application’s runtime/ core decencies known as application-level Indicators of Compromise (IOCs). Some of these classes are deserialization bugs, XXE, server-side template injection, etc.
This same logic is available for other runtimes (Node.js, .Net, Python, etc) as well.
Scaling up
Monitoring logs wasn’t exactly simple because there were more than a few logs. Wix has over 3000 services running on ~20k Kubernetes pods. The central log database has approximately 35 terabytes of storage added daily. Wix also handles the monitoring and observability for several “external” first-class citizen production environments.
Applying the monitoring while keeping site load time and latencies down was a multistep process. Wix built an internal log processing pipeline to apply their detection rules to server logs at scale. Since there was still an incredible amount of logs, to further keep latencies down, the monitoring process extracts a subset of the application server logs containing suspicious application-level IOCs, extracts the relevant metadata, and alerts via Slack when a log matches the rule-set already defined. The alerts are then manually analyzed by an application security specialist and a bug is born.
After a year of refining and iterating on the security logic and processes, Wix reached a stable and reliable set of detection rules. False positives are low but ultimately depend on the vulnerability class and accuracy of the detection rule.
The detection rate for SSTI and XXE is 100% with zero false positives.
The SQL injection numbers look a little different with a detection rate of 26%, making the false positive rate 76%. Edge cases in the queries, such as typos or plain bugs such as empty an in() clause, lead to false positives.
An alert example looks like this:
Conclusion and Future Plans
Application security is a multilayered process. Wix found reverse engineering application vulnerabilities and monitoring server logs at scale was an incredibly valuable process for their applications as they found a security solution that works at scale.
Wix plans to release this solution as an open source product in the upcoming months to help organizations monitor their application stacks for vulnerabilities alongside a robust rule set for many vulnerability classes and runtime stacks.