Malware Analysis, Part 1: Understanding Code Obfuscation Techniques

Written by Thomas Gendron | December 04, 2018

In a preceding article, we conducted a study on a malicious email and noted that code could be executed when attachments were opened. Today we will focus on these malicious codes that are executed. We will come to understand what they are doing and what techniques are used to complicate the work of analysts and to bypass protections.

Before beginning, let’s do a quick review of certain keywords:

A downloader is a file that downloads and executes malware.
A dropper is a file that directly embeds the malware in its code by concealing it to avoid detection. It will drop the malware onto the computer and then execute it.
A macro, in Office documents (Word, Excel, PowerPoint, etc.) is a set of commands and instructions that will be executed when it is opened.

Today, attackers often use the same attack vectors for downloaders and droppers, namely, macros in Office documents and JavaScript code in PDF files. An analyst or program can detect this code, extract it and analyze it to understand how it works and quickly block the threat. As a result, attackers are forced to use techniques to delay detection of their malicious files for as long as possible; obfuscation of code is one of the most used methods and this is what we are going to see today.

What is code obfuscation and what can it be used for?

Obfuscation, in computing, consists of rendering an executable program or source code unreadable and hard to understand by a human, while maintaining its functioning. The objective is to bypass static code analyzers as much as possible as well as wasting time for the analysts who will study the code. Obfuscation of character strings is one of the techniques most used by malware creators. This method consists of concealing, or rendering incomprehensible, character strings using an algorithm that will decode the data when the code executes. This article mainly focuses on this obfuscation technique through downloaders.

We will study two examples of code found in downloaders in recent months. First off, we will begin with a simple example to understand how obfuscation works and why it is of interest. Then we will study a more complicated case, since simplicity in malware analysis is rare and when it happens we have to take full advantage of it. Here is a perfect example to start with!

Obfuscation is not that obfuscated

File Type	SHA1	VirusTotal	VirusBay
PDF	39809b0836b3198e472e9e5a4f15f5e75ab49265	Lien	Lien

Without further ado, let’s get into the heart of the matter and look at this little bit of nasty code that comes from the malicious PDF file above.

From a general point of view, it is difficult to understand what this code does, although line 1 gives us a very big clue with a URL that points to an EXE file. When I analyze malicious code, my first step consists of making it as readable as possible, for example by adding line breaks, spaces and by separating the code into several parts.

Here is the result, a little more readable and comprehensible.

The code has been divided into 6 blocks, separated by an empty line:

Launches the Windows command prompt that will execute commands on the system, then moves to a temporary folder.
Declaration of variables.
Declaration and use of a first object.
Declaration and use of a second object.
Declaration of a function.
Execution and deletion of file.

It can be very quickly seen that parts 2, 3, 4 and 5 begin with '& @echo' and end with '>> N2o.vbs'. What does this mean?

The '&’ indicates that another command will be executed. For example, 'command1 & command2' indicates that command1 will be executed and then command2.
The '@echo' is the command that lets us write to the Windows command prompt.
the '>>N2o.vbs' redirects what is written in the Windows command prompt to the 'N2o.vbs' file without erasing what is already there (the content is added to the end of the file each time).

To summarize, parts 2, 3, 4 and 5 are only commands that will write to the 'N2o.vbs' file.

For now, here is what we know and the steps that the code will perform:

The first line runs a Windows command prompt and moves to the temporary file of the current session.
Then, several commands will write to the 'N2o.vbs' file.
The 'N2o.vbs' file is executes (line 27).
The file is then deleted (line 28).
A file named 'ITL.EXE' is executed (line 29).

To understand what the code will do, the 'N2o.vbs' will has to be analyzed:

It is noted that the function 'K4d' is called several times (lines 2, 4, 5 and 8) with an incomprehensible character string. In view of the function and the code, it can be deduced that the input string will be transformed to bring out an understandable string that will then be used. It is also very surprising that the most interesting string in the code, the URL, has not been obfuscated. It was probably an oversight…

Let's look at what this function does exactly:

Line 17: A loop is created that will be repeated according to the length of the input string; for example, if the input string is 'toto' the loop will be repeated 4 times.
Line 18: The MID() function returns a substring beginning at a position with a with a given size; for example, MID('toto', 2, 3) will return the substring 'oto'. The 2 sets the position of the beginning of the string and the 3 its length.
Line 19: Here, several transformations are applied to the substring found on the previous line.
1. The 'Asc()' function returns the decimal representation of a character; for example, Asc("A") returns the number 65.
2. Then, 35 is subtracted from the number returned by the 'Asc()' function.
3. To finish, the 'Chr()' function uses the result of the two preceding operations to return a character; for example, Chr(65) returns the letter A.
Line 20: The character calculated in line 19 is added to the 'Z4o' string
Line 22: Returns the new string created by the loop.

Let's proceed step by step with the use of the K4d function in line 5 which gives "jhw" as input string. The input string has a length of 3, so the loop will be executed 3 times.

1st round of the loop

Line 18 : MID("jhw", 1, 1) → "j"
Line 19 :

Asc("j") → 106
106 - 35 → 71
Chr(71) → G

Line 20 : Z4o = "G"

2nd round of the loop

Line 18 : MID("jhw", 2, 1) → "h"
Line 19 :
1. Asc("h") → 104
2. 104 - 35 → 69
3. Chr(69) → E
Line 20 : Z4o = "GE"

3rd round of the loop

Line 18: MID("jhw", 3, 1) → "w"
Line 19:
1. Asc("w") → 119
2. 119 - 35 → 84
3. Chr(84) → T
Line 20: Z4o = "GET"

After 3 times round the loop, the function returns the string "GET". The call to this function in lines 2, 4 and 8 returns the strings "ITL.EXE", "MSXML2.XMLHTTP" and "ADODB.STREAM" respectively.

We can now replace all the character strings and variables, and delete the function which is no longer necessary to simplify the code as much as possible. Here is the final result, much simpler to read and understand.

The remaining code is divided into 2 parts. Part 1 (lines 1 to 3) will retrieve data from the file 'albert.exe', then part 2 (lines 5 to 11) takes care of writing the data retrieved to a file named 'ITL.EXE'.

We now know all the steps the malicious code carries out:

Write the code into a file
Execute the file
Delete the file
Execute the downloaded program

URL	URL Scan
hxxp://ultimatefifa[ . ]com/po/albert[ . ]exe	Lien

File	SHA1	VirusTotal	VirusBay
albert.exe	33985325dd64e06a3e3af0c540073eefd07d9596	Lien	Lien

We have completed this example and as we have been able to see, the obfuscation of the code here is really quite light, and even more so since the URL of the downloaded file is readable from the beginning (probably a mistake) unlike the other character strings present in the code.

In the second part of this series focused on malware analysis, we will see what it actually does and what level of obfuscation can be achieved.

View full post