.vscode | ||
assets | ||
vendor | ||
.gitignore | ||
alloc.go | ||
bindata.go | ||
classifier.go | ||
file.go | ||
go.mod | ||
go.sum | ||
handler.go | ||
log.go | ||
main.go | ||
matrix.go | ||
matrix.old | ||
README.md | ||
run.sh | ||
zgc.go |
Zardoz: a lightweight WAF , based on Pseudo-Bayes machine learning.
Zardoz is a small WAF, aiming to take off HTTP calls which are well-known to end in some HTTP error. It behaves like a reverse proxy, running as a frontend. It intercepts the calls, forwards them when needed and learns how the server reacts from the Status Code.
After a while, the bayes classifier is able to understand what is a "good" HTTP call and a bad one, based on the header contents.
It is designed to don't consume much memory neither CPU, so that you don't need powerful servers to keep it running, neither it can introduce high latency on the web server.
STATUS:
This is just an experiment I'm doing with Pseudo-Bayes classifiers. Run in production at your own risk.
Compiling:
Requirements:
- golang >= 1.12.9
build:
git clone https://git.keinpfusch.net/LowEel/zardoz
cd zardoz
go build
Starting:
Zardoz has no configuration file, it entirely depends from environment string.
In Dockerfile, this maps like:
ENV REVERSEURL http://10.0.1.1:3000
ENV PROXYPORT :17000
ENV TRIGGER 0.6
ENV SENIORITY 1025
ENV DEBUG false
Using a bash script, this means something like:
export REVERSEURL=http://10.0.1.1:3000
export PROXYPORT=":17000"
export TRIGGER="0.6"
export SENIORITY="1025"
export DEBUG="true""
./zardoz
Understanding Configuration:
REVERSEURL is the server zardoz will be a reverse proxy for. This maps to IP and port of the server you want to protect.
PROXYPORT is the IP and PORT where zardoz will listen. If you want zardoz to listen on all ports, just write like ":17000", meaning, it will listen on all interfaces at port 17000
TRIGGER: this is one of the trickiest part. We can describe the behavior of zardoz in quadrants, like:
- | BAD > GOOD | BAD < GOOD |
---|---|---|
| GOOD - BAD | > TRIGGER | BLOCK | PASS |
| GOOD - BAD | <= TRIGGER | BLOCK+LEARN | PASS+LEARN |
The value of trigger can be from 0 to 1, like "0.5" or "0.6". The difference between BLOCK without learning and block with learning is execution time. On the point of view of user experience, it will change nothing (user will be blocked) but in case of "block+learn" the machine will try to learn the lesson, since there is some ambiguity (the good probability are high too).
The same happens for the situation "PASS+LEARN": in such a case, both probabilities are low, so we are in a situation of ambiguity. Zardoz cannot say this is a bad request, neither can say it is bad. In such a case, it will allow the request , but it will learn to improve future decisions.
Personally I've got good results putting the trigger at 0.6, meaning this is not disturbing so much users, and in the same time it has filtered tons of malicious scan.
SENIORITY: since Zardoz will learn what is good for your web server, it takes time to gain seniority. To start Zardoz as empty and leave it to decide will generate some terrible behavior, because of false positives and false negatives. Plus, at the beginning Zardoz is supposed to ALWAYS learn.
The parameter "SENIORITY" is then the amount of requests it will set in "PASS+LEARN" before of activating the filtering. During this time, it will learn from real traffic. If you set it to 1025, it will learn from 1025 requests and then it will start to actually filter the requests. The number depends by many factors: if you have a lot of page served and a lot of contents, I suggest to increase the number.
TROUBLESHOOTING:
If DEBUG is set to "false" or not set, minute Zardoz will dump the sparse matrix describing to the whole bayesian learning, into a file named bayes.json. This contains the weighted matrix of calls and classes. If Zardoz is not behaving like you expected, you may give a look to this file. The format is a classic sparse matrix. WARNING: this file may contain cookies or other sensitive headers.
DEBUG : if set to "true", Zardoz will create a folder "logs" and log what happens, together with the dump of sparse matrix. If set to "false" or not set, sparse matrix will be available on disk for post-mortem.
TODO:
- Loading Bayesian data from file.
- Better Logging
- Configurable block message.
- Usage Statistics/Metrics sent to influxDB/prometheus/whatever