# MapleGrid

<img src="imgs/20220525_01_maplegrid_honeypot_01.png" width="400">

# Scope:
Building an honeypot "network" with a central instance that aggregates the data collected.

# Idea:
<img src="imgs/20220525_01_maplegrid_honeypot_02.png" width="400">

The idea is having as many "leaves" as we may so we can sense the "wind" and send everything to a central "trunk" where the info is processed.
This way we can have our little network of ssh honeypots and do something with the data.

the "leaves" will leverage [Pshitt](https://github.com/regit/pshitt) made by [regit](https://github.com/regit), a python program that simulates an ssh server with user/pass authentication.
Differently from a real ssh server Pshitt stores the data of the brute force access in json format, for example:

```json
{"username": "user", "try": 1, "src_port": 47097, "software_version": "libssh2_1.4.2", "timestamp": "2022-05-25T19:50:00.123456", "src_ip": "123.123.123.123", "mac": "hmac-sha1", "cipher": "aes128-ctr", "password": "password"}
```

- username
- try: number of try of the 3 before disconnection
- src_port
- software_version
- timestamp
- src_ip
- mac
- cipher
- password


# Components:

### pshitt
our ssh honeypot

### filebeat 
a lightweight log-shipper to be used on the leaves

### logstash
where we do a little processing of the data and add geo-ip data
I tried using logstash directly on the leaves but it's too resource hungry

### elasticsearch
where we store our data

### kibana
used for exploring data and creating dashboards


# Network configuration:

On the leaves I assume we will have a public ip address, so we can expose:

- port 22: pshitt
- port 22222: the real ssh server we need to manage the machine

The trunk would not need any exposed port (at least by design) since we should be able to reoute everything via a VPN or protected connection,
in case we cannot do anything like that the port we need to expose:

- port 5044: logstash for filebeat

# How-To:

## "Trunk" node (ELK stack):

### Install:

install and configure Elasticsearch on a machine,
preferably with some cores and some GB of ram (I am using 4 cores and 6GB ram),
since I am not very keen on ELK (and I am leveraging this project to learn something), I have used these articles: [[0]] and [[1]]

NB: this is a single node "quick and dirty" installation of Elasticsearch and Kibana for a POC.

```bash
#install ELK v7.x:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -

apt-get install apt-transport-https

echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-7.x.list

apt update

apt install elasticsearch kibana logstash filebeat
#filebeat is onyl needed if you wish to host the honeypot on the trunk node


cat <<EOF > /etc/elasticsearch/elasticsearch.yml
cluster.name: maplegrid
node.name: maplenode-01
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
discovery.type: single-node
node.ingest: true
EOF

cat <<EOF > /etc/kibana/kibana.yml
server.port: 5601
server.host: "0.0.0.0"
server.name: "maplegrid-kibana"
elasticsearch.hosts: ["http://localhost:9200"]
EOF

systemctl enable elasticsearch
systemctl enable logstash
systemctl enable kibana
systemctl start elasticsearch
systemctl start logstash
systemctl start kibana
```

### Check if Elasticsearch is ok:

at this point you should be able to ask Elastic how it feels:

```bash
curl -XGET http://localhost:9200/_cluster/health?pretty
```

expecting an answer like this:

```json
{
  "cluster_name" : "maplegrid",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 15,
  "active_shards" : 15,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 3,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 83.33333333333334
}
```
and you should be able to connect to Kibana on the ip address of the server on port 5601.

### Logstash config:

now we can configure Logstash:

edit /etc/logstash/conf.d/pshitt.conf

```
input {
  beats {
    port => 5044
  }
}

filter {
    # warn logstash that timestamp is the one to use
    if [type] == "json-log" {
        date {
            match => [ "timestamp", "ISO8601" ]
        }
    }

    geoip {
      default_database_type => "City"
      source => "src_ip"
      tag_on_failure => ["geoip-city-failed"]
    }

    geoip {
      default_database_type => "ASN"
      source => "src_ip"
      tag_on_failure => ["geoip-asn-failed"]
    }

}

output {
  elasticsearch {
       hosts => "http://localhost:9200"
       index => "pshitt"
  }
}
```

As you can see our input is not a file but we'll listen on port 5044 for our remote filebeat clients.

The "geo-ip" part is needed to add the geo-data info starting from the attacker IP,
the split configuration is needed because the filter can process "City" or "ASN" data but not both at the same time,
so we need to split the configuration. Thanks to leandrojmp [[2]].


The output is out Elasticsearch node.

the restart Logstash:

```bash
systemctl retart logstash
```

in case you need to debug logstash (I had to), you can read: /var/log/logstash/logstash-plain.log

## "Leaf" node (honeypot + logshipper):

I've create an Ansible playbook to configure everything but we'll obviously see the steps,
it's been test on both Debian 9 and 10.

### Ansible Playbook:
Playbook: [maplegrid.yml](/img/20220525_01_maplegrid_honeypot_files/maplegrid.yml)
Variables file: [maplegrid_variables.yml](/img/20220525_01_maplegrid_honeypot_files/maplegrid_variables.yml)

### Add elastic repo and install filebeat:

```bash
curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | apt-key add -

apt-get install apt-transport-https

echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | tee -a /etc/apt/sources.list.d/elastic-7.x.list

apt update

apt install filebeat
```

### change ports:

in /etc/ssh/sshd_config we'll change the port to 22222
 Port 22222

(if you are using it, but you really should) in fail2ban we are gonna change the port of the ssh jail to 22222,
look for the config:

```bash
[sshd]
port    = ssh
```

and change it to:

```bash
[sshd]
port    = 22222
```

!!!!!!!FIX YOU FIREWALL TO ALLOW CONNECTION ON PORT 22222!!!!!!!! (Ansible will do this for you if you are using iptables, otherwise it's your task)

then restart fail2ban and ssh:

```bash
systemctl restart fail2ban
systemctl restart ssh
```

in this moment your connection will remain active,
it's the moment if you can ssh into the machine on port 22222:

```bash
ssh user@machine -p 22222
```

### Pshitt:

install the dependencies of pshitt:

```bash
apt install python-pip
pip install python-daemon argparse paramiko
```

clone the repo of Pshitt in /srv/pshitt:

```bash
git clone https://github.com/regit/pshitt /srv/pshitt
```

create the systemd unitfile we are going to use to start the service,
create the log directory and enable+start pshitt:

```bash
cat <<EOF > /etc/systemd/system/pshitt.service
[Unit]
Description=pshitt service
Wants=network-online.target
After=network-online.target

[Service]
ExecStart=/srv/pshitt/pshitt.py -p 22 -k /etc/ssh/ssh_host_rsa_key -o /var/log/pshitt/data.json -l /var/log/pshitt/log.log

[Install]
WantedBy=multi-user.target
EOF

mkdir /var/log/pshitt

systemctl enable pshitt
systemctl start pshitt
```

notice the "-k /etc/ssh/ssh_host_rsa_key",
we are using the keys of our real machine and not the ones shipped with pshitt to avoid being discovered.

at this point if you try to log into your honeypot:
```bash
ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no user@YOUR_PUB_IP -p 22
```

you should se an entry in /var/log/pshitt/data.json

### Filebeat:

now we can configure filebeat to send the data to logstash (change the strings starting with "YOUR_":

```bash
cat <<EOF > /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/pshitt/data.json
  json.keys_under_root: true

processors:
  - drop_fields:
      fields: ["beat", "source", "prospector", "offset", "host", "log", "input", "event", "fileset" ]
  - add_fields:
      target: ''
      fields:
        pshitt_host: YOUR_LEAF_NODE_NAME

output.logstash:
  hosts: ["YOUR_LOGSTASH_IP:YOUR_LOGASTASH_BEAT-PORT"]

#Filebeat service logging:
logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0640
EOF

systemctl enable filebeat
systemctl start filebeat
```

at this point if we try another login on the honeypot we should see the data in Kibana.

### Kibana:
log into Kibana and create a new index pattern:

Management --> Kibana --> Index Patterns --> New
- Name: pshitt*
- Timestamp field: @timestamp

now go to:

Analytics --> Discover

select your index pattern and you should see some data

we can also create dashboards ( Analytics --> Dashboard ),
I've tried to export what I did (it still lacks geo-ip data since I have some issues), but I don't know if the import works:

Kibana Dashboard: [kibana_dashboard.njson](/img/20220525_01_maplegrid_honeypot_files/kibana_dashboard.njson)

My dashboard:

<img src="imgs/20220525_01_maplegrid_honeypot_03.png" width="400">

This is the data from 1 hour, we can see the ingestion from different "leaves".

with the field "pshitt_host" we can understand which leaf sent the data,
actually filebeat adds some data so the same info, taken from the vm hostname of the leaf is also found in  "agent.hostname" and "agent.name",
I've added it anyway because in the first tests with Logstash I hadn't the additional fields.

# "Historical data":
one of the next steps would be importing the "Historical" data:

This data comes from when I had the predecessor of maplegrid online, which collector more than 5 million unique records in the timespan of 2016-2019 (with some service disruptions in the middle),
at the time the idea was "retrieving" other people's wordlist

Obviously the relevance of the data regarding the IPs is not very much today, but might be interesting to see the statistics of username and passwords, other than testing everything with much more data than now.

On this regard, this is the Logstash configuration to ingest from a local json file:

```
input {
   file {
      start_position => "beginning"
      path => "/var/log/pshitt/data.json"
      sincedb_path => "/dev/null"
      codec =>   json
      type => "json-log"
   }
}

filter {
    if [type] == "json-log" {
        date {
            match => [ "timestamp", "ISO8601" ]
        }
    }

    geoip {
      default_database_type => "City"
      source => "src_ip"
      tag_on_failure => ["geoip-city-failed"]
    }

    geoip {
      default_database_type => "ASN"
      source => "src_ip"
      tag_on_failure => ["geoip-asn-failed"]
    }

    mutate {
      add_field => { "pshitt_host" => "YOUR_ORIGINAL_HOST" }
    }    
}

output {
  elasticsearch {
       hosts => "http://localhost:9200"
       index => "YOUR_ANOTHER_INDEX"
  }
}
```

and you can execute logstash with a single config file like this:

```bash
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/YOUR_CONFIGFILE.conf
```


## TODO:
- [ ] kibana/logstash: fix geo-data
- [ ] elasticsearch: understand clustering/data redundancy
- [ ] elasticsearch: understand indexes
- [ ] enable passwords on Elasticsearch and Kibana
- [ ] import "historical" data
- [ ] add the ssh version in pshitt:
  - pshitt hardcodes "OpenSSH_6.6.1p1 Debian-5" as the ssh version, it should be changed to the real one of the vm (or randomly picked from a list) to avoid the fact that it's an honeypot
  - to retrieve the local ssh version: ssh -V 2>&1 >/dev/null | cut -f1 -d','
- [ ] add the public ip of the leaf to the data
  - this way we can apply geo-ip also to the dest_ip and try to plot that 

## Notes:
- the "trunk" vm (ELK stack): 
  - OS: Debian 10
  - CPU: 4 core
  - RAM: 6gb
  - Disk: 40gb
- 2 "leaves":
  - OS: Debian 9
  - CPU: 1 core
  - RAM: 0.5gb
  - Disk: 10gb
- 2 "leaves":
  - OS: Debian 10
  - CPU: 1 core
  - RAM: 0.5gb
  - Disk: 10gb


## links:
[0]: https://medium.com/devops-dudes/how-to-deploy-elasticsearch-5b1105e3063a
[1]: https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html
[2]: https://web.leandrojmp.com/posts/en/2020/10/logstash-geoip