May 25, 2016

Classifying Malware using Import API and Fuzzy Hashing – impfuzzy –

Hello all, this is Shusei Tomonaga again.

Generally speaking, malware analysis begins with classifying whether it is known malware or not. In order to make comparison with the enormous number of known malware samples in the database in a speedy manner, hash values are used, derived by performing hash functions to the malware sample.

Among the different hash functions, traditional ones such as MD5 and SHA1 derive totally different hash values if the input data is different even by 1 bit. So this would not be useful when you have a sample that is similar to a known malware, and would like to classify it as “known”.

These days, malware used in attacks is mostly customised for each case, so it is ideal to use hash functions which can classify those samples as similar to known ones.

Therefore, fuzzy hash function (of which the hash value for modified codes is close to that of the original codes) is used if modifications are simple and mechanical, and for Windows executable file samples, imphash [1] (import hash) is used, which calculates values from PE (portable executable)’s import table.

One of the known examples of fuzzy hash is ssdeep [2].

However, there are various issues with these methods used for malware classification; hash values derived by ssdeep often do not coincide with the similarity of malware samples, while imphash derives different values once new functions are added to the malware.

This blog article proposes a new method “impfuzzy”, and demonstrates how it can accurately classify similar malware, in comparison to existing methods.

impfuzzy

The proposed method, as in imphash, calculates values from Import API, however, it also uses Fuzzy Hashing to calculate hash values of Import API, in order to supplement the shortcomings of imphash. With this process, a close value will be derived if just a part of Import API was added or modified.

Furthermore, it reduces time for calculation and enables efficient comparison by specifying the object of the hash value calculation to Import API (and not to the Fuzzy Hashing value of the whole executable file).

Implementing impfuzzy

“pyimpfuzzy”, a Python module for calculating and comparing impfuzzy, is available on GitHub (a public web service for software development projects). Feel free to download it from the following URL for your use:

JPCERTCC/aa-tools GitHub - impfuzzy

https://github.com/JPCERTCC/aa-tools/tree/master/impfuzzy/

Volatility Plugin is also released on the portal, which searches for similar files based on hash values of executable files loaded from memory images by using impfuzzy.

In order to use pyimpfuzzy, the following tools need to be installed.

For this implementation, ssdeep is used as Fuzzy Hashing.

pyimpfuzzy has the following functions:

  • get_impfuzzy: calculates hash values from selected PE files
  • get_impfuzzy_data: calculates hash values from PE files in a data format
  • hash_compare: compares two hash values and returns the degree of similarity within the range of integer 0-100 (100 when same)

The similarity of the two PE files is calculated as follows:

import pyimpfuzzy
import sys

hash1 = pyimpfuzzy.get_impfuzzy(sys.argv[1])
hash2 = pyimpfuzzy.get_impfuzzy(sys.argv[2])
print "ImpFuzzy1: %s" % hash1
print "ImpFuzzy2: %s" % hash2
print "Compare: %i" % pyimpfuzzy.hash_compare(hash1, hash2)

The following result is derived when executing the above script:

Impfuzzy

Evaluation of impfuzzy

This section describes the results of an experiment of comparing, and evaluating, how the 3 different methods (proposed impfuzzy, imphash and ssdeep) are capable in classifying similar types of malware.

For the experiment, 10 different samples for 20 types of malware (200 samples in total) were prepared. For all the combinations that select two samples out of the 200, the similarities of the samples were calculated using the three methods. The two samples were classified as the same if the calculated value was 30 and larger.

For the experiment, packed samples were unpacked beforehand.

Figure 1 shows the results of the experiment.

There were no false positives detected which classified different types of malware as the same.

Figure 1: Success rates of malware type classification among impfuzzy, imphash and ssdeep
Comp_eng

*1 The threshold value for classifying the same malware type is set as 30, for impfuzzy and ssdeep

*2 ssdeep’s target of comparison is the whole executable file

The results reveal that impfuzzy is more capable of classifying the same malware type than the other two methods. For the details of the results, please see Appendix A.

This method judges the samples’ similarities based on Import API, and therefore it is clear that the identification rates are higher for samples whose malware generation tools (builders) are publicly available (e.g. Pony and ZeuS), because their Windows API rarely change. On the other hand, samples whose functions are frequently updated or changed (e.g. Emdivi) have lower rates.

For the experiment, the threshold value for impfuzzy was set as 30, however, it may detect false positives depending on the executable files. We recommend adjusting the threshold value for each executable file.

In case impfuzzy is not applied effectively

Some malware have few Import APIs.

For example, some samples called downloader use only around 5 Windows APIs. For those samples, impfuzzy may not be able to compare the hash values accurately due to the limited size of data to compare. Also, samples generated by .NET Framework have different mechanisms from applications which directly execute Windows API. Therefore, hash values for those samples cannot be calculated.

Summary

In the current situation where huge numbers of malware samples emerge day by day, it is important to efficiently classify malware samples, and identify new ones which need to be analysed. We believe that the method introduced here will be one approach to support this process.

Also, Volatility Plugin that we released together with impfuzzy, is useful for memory forensics to check malware infection.

Since this tool can calculate hash values of samples which are unpacked in the memory, it is capable of judging the similarities of the samples, even if the malware is packed.

This Plugin is available at the following URL. We will introduce this tool on this blog at another occasion.

JPCERTCC/aa-tools GitHub – impfuzzy for Volatility

https://github.com/JPCERTCC/aa-tools/tree/master/impfuzzy/impfuzzy_for_Volatility

Thanks for reading.

- Shusei Tomonaga

(Translated by Yukako Uchida)


Reference

[1] FireEye - Tracking Malware with Import Hashing

https://www.fireeye.com/blog/threat-research/2014/01/tracking-malware-import-hashing.html

[2] Fuzzy Hashing and ssdeep

http://ssdeep.sourceforge.net/

Appendix A

Comparison of impfuzzy, imphash and ssdeep

Table 1: Malware Identification Rate
Typeimpfuzzy (%)imphash (%)ssdeep (%)
Agtid 100 35.6 26.7
BeginX Server 15.6 11.1 11.1
IXESHE 196 44.4 4.4 2.2
IXESHE 2 28.9 6.7 2.2
IXESHE 2sw 100 15.6 17.8
Daserf 35.6 4.4 4.4
Dyre 100 6.7 2.2
Fucobha 44.4 4.4 2.2
Gstatus 60 4.4 2.2
Hikit 64.4 35.6 6.7
Netwire 80 15.6 28.9
NfIpv6 24.4 4.4 4.4
Emdivi t17 37.8 0 0
Emdivi t20 4.4 0 0
plurk 15.6 6.7 11.1
Derusbi 80 8.9 8.9
Pony 95.6 15.6 0
sregister 100 11.1 15.6
Sykipot 8.9 0 2.2
ZeuS 80 35.6 35.6

*Types of malware used in this experiment are those analysed and classified by JPCERT/CC.

Decoding Obfuscated Strings in Adwind

From the latter half of 2015 to 2016, there have been an increasing number of cyber attacks worldwide using Adwind, a Remote Access Tool [1]. JPCERT/CC also received incident reports about emails with this malware in its attachment.

Adwind is malware written in Java language, and it operates in Windows and other OS as well. It has a variety of functions: to download and execute arbitrary files, send infected machine information to C&C servers and extend functions using plug-ins.

One of the characteristics of Adwind is its frequent updates. In an extreme case, an update was released merely in a two-week interval. When investigating Adwind-related incidents, it is important to correctly examine the functions of the Adwind version in use.

The challenge is, however, the strings stored within Adwind, which count up to about 500, are artfully obfuscated, and they need to be decoded in order to analyse the malware’s function. JPCERT/CC created a tool “adwind_string_decoder.py”, which efficiently decodes such obfuscated strings. This blog article describes how this tool works.

Although Adwind has multiple generations [1], this blog article and the tool created will examine the new Adwind versions which have been used in recent attacks since the latter half of 2015.

Obfuscated strings

Most of the strings that Adwind has are obfuscated as in Figure 1. The number of such strings differs depending on the Adwind’s version, but there are about 500. These strings look totally different in each Adwind version.

Figure 1: Decompiled codes containing obfuscated strings
1_obfuscated
Figure 2: Decompiled codes in another Adwind version
2_obfuscated_old

Figure 1 and Figure 2 describe codes which correspond to the same process. Both figures have line feeds inserted and are indented so that the decompiled codes can be read easier. Figure 2 is the Adwind version which was seen in mid-August 2015 and Figure 1 in late November 2015. As previously mentioned, Adwind gets updated frequently, and furthermore, the codes are obfuscated using different keys for each version.

The red-marked sections in Figure 1 and 2 indicate part of the process for collecting information to be sent to C&C servers, and contain a process for obtaining infected usernames. However, when calling the process for collecting information, the object (which is the username) is specified in the obfuscated strings. This makes it impossible to tell from the codes that the malware intends to collect infected usernames, unless they are decoded. Furthermore, it is difficult to decode the strings since the keys used for obfuscation are scattered and different for each Adwind version (as described later).

In incident analysis, damage caused by the infection and the next attack sequence can sometimes be predicted by specifying the information sent to C&C servers. From the analysis perspectives, it is important to closely examine what kind of information the malware is targeting. For this purpose, static code analysis has to be effectively conducted, and about 500 obfuscated strings for various Adwind versions need to be quickly decoded.

Analysing the string-decoding process

Generally in stream cipher, in order to encrypt data m (with a random length), usually a pseudo-random number sequence k (with the same length as m) is created using its encryption key, and an encrypted string is generated by m XOR k. By combining the XOR for m XOR k and the pseudo-random number sequence k (which is described as (m XOR k) XOR k) using XOR operation, the string is decrypted to derive the plaintext m.

Obfuscation in Adwind is conducted in a method which is similar to the abovementioned stream cypher. However, Adwind creates k in a different method, not from an encryption key. In this article, k for Adwind is referred to as an “obfuscating key”.

Codes in Adwind contain some functions which take an obfuscated string as an argument, and returns its decoded string. These functions are referred to as Fi hereafter. One thing to note here is that Fi returns different results even for the same input, if the caller method is different. This means that, in order to do static analysis and obtain the obfuscating key corresponding to a certain string, it is necessary to understand which Fi processes the strings, as well as in which method the string exists to call for Fi. The following describes what kind of obfuscating key Fi generates to process decoding.

Fi takes its caller’s method name and class name as the basis, and generates an obfuscating key by giving them a transform process derived from the following factors. This process is repeated until it gains a certain length required for a key.

Factor 1: Which comes first when concatenating the method name and class name

Factor 2: The value used in the operation for transforming the basis string to a completely different string

All Fi consist mostly of the same codes, however, only those corresponding to the above factors have different codes in each Fi. Furthermore, Factor 2 does not exist in the code as an immediate constant, and is derived through obfuscated codes including bit-operations.

Adwind contains at least 5 varieties of Fi and about 60 methods including obfuscated strings, which means that it has a combination of about 100 obfuscating keys. Although Fi consists of relatively simple codes, this number makes it fairly difficult to remove obfuscation.

Additionally, the two factors mentioned above vary in each Adwind version. Therefore, even if we create a decoding tool for a certain version of Adwind, it cannot be applied to other versions.

On the other hand, Fi has the following characteristics in common:

- Has one argument of a string object

- Is a static function that returns a decoded string as a string object

- Contains a certain API call to obtain the caller’s information

- Has limited varieties of instructions within the function

Based on the features, JPCERT/CC created a tool to automatically decode obfuscated strings using a method which does not rely on the Adwind version as much as possible.

adwind_string_decoder.py

This tool is available on GitHub. Feel free to download for your use.

JPCERTCC/aa-tools - adwind_string_decoder.py

https://github.com/JPCERTCC/aa-tools

In order to use adwind_string_decoder.py, a disassembler, javap, is required which is included in JDK (Java Development Kit). Users are required to set a path to javap, or configure so that the environment variable JAVA_HOME is pointed to the JDK folder.

adwind_string_decoder.py basically processes in the following sequence:

  1. Open the selected jar file and call a disassembler
  2. Scan all the disassembled codes, and extract functions which seem to be decoding functions from the arguments and types
  3. Judge if it really is a decoding process from the kinds of instructions and sequences that appear in the function
  4. If it is a decoding process, derive Factor 1 and 2 to generate obfuscating keys
  5. Scan all the codes again and extract parts which call for the decoding process
  6. Derive each method name and class name, and use them as the basis for obfuscating keys
  7. Generate obfuscating keys and decode the strings

Before using adwind_string_decoder.py – Unpacking Adwind

Typically, Adwind is packed, and its main jar file is hidden in the artifact’s jar file. Since adwind_string_decoder.py does not have the function to unpack Adwind, users are required to run Adwind in an analysis environment beforehand, and extract the jar image that appears in its memory. The jar image tends to disappear easily from the memory, however, it could be easier to extract it if you set a breakpoint in the API which reads the jar file, by using a Java debugger (e.g. jdb).

Executing adwind_string_decoder.py

To decode obfuscated strings, select the unpacked jar file and output file, and execute as follows:

python adwind_string_decoder.py sample.jar output.jasm

Then it outputs disassembled codes which contain decoded strings as comments, as in Figure 3.

Figure 3: Disassembled codes with some decoded strings inserted
3_decoded_disassembly

Also, if you execute without any output files, the output of the disassembled codes will be omitted, and you can output decoded strings only to the standard output, as in Figure 4.

python adwind_string_decoder.py sample.jar
Figure 4: Output of decoded strings only
4_decoded_strings

It is also possible to scan the java codes (outputs of the decompiler), and replace the function call and argument with the decoded string. This option only supports codes in Fully Qualified Name (FQN) format. For example, you can obtain the output in Figure 6 from codes as in Figure 5. Since adwind_string_decoder.py does not have a decompiling function, you need to output a file with the decompiler and store it in a folder beforehand. After selecting that folder and a new folder for outputting the decoded file, execute as follows:

python adwind_string_decoder.py sample.jar source_folder output_folder
Figure 5: Decompiled codes before decoding
5_obfuscated_code
Figure 6: Decompiled codes after decoding
6_decoded_code

Using these decoded strings, it is easy to understand what kind of information the malware intends to collect and send to C&C servers. It is also possible to find out which OS can be infected by the malware, and how the Adwind functions may differ in each OS.

Summary

Since early February 2016, attacks using these new Adwind versions have been less and less seen. This may be good news, however, it seems that there are still new samples found at the end of February 2016. We hope that the tool we introduced here would be of your help in case if you see any new versions of Adwind.

- Kenichi Imamatsu

(Translated by Yukako Uchida)


Reference:

[1] Adwind: FAQ - Securelist

https://securelist.com/blog/research/73660/adwind-faq/

Appendix

SHA-256 Hash value of the sample

  • 033db051fc98b61dab4a290a5d802abe72930338c4a0dd4705c74eacd84578d3
  • f8f99b405c932adb0f8eb147233bfef1cf3547988be4d27efd1d6b05a8817d46

 

Workshop and Training in Congo

Nice to see you!

My name is Jimmy, Hajime Komaba, working at Enterprise Support Group of JPCERT/CC, a department which takes care of Nippon CSIRT Association (NCA, a community of various enterprise and organizational CSIRTs in Japan) and Council of Anti-Phishing Japan (APC).

It’s been quite a while ago, but last November, I was given an opportunity to travel to the Republic of Congo with my colleague, Koichiro (Sparky) Komiyama. Today, I would like to share the experience from my first trip to Africa and about AfricaCERT’s event.

AfricaCERT is a forum of CSIRTs in Africa, with the aim of promoting cybersecurity on the Continent. One of their key activities is the trainings and lectures for their members. Last year, one of the trainings and lectures took place in Pointe Noire, the Republic of Congo, from 28th November to 1st December. JPCERT/CC, a supporting member of AfricaCERT, was invited as a trainer for the programs planned for the first two days. The sessions consisted of a lecture about CSIRTs in Japan, such as activities of NCA, followed by some hands-on trainings. Our sessions attracted about 28 trainees from various African countries including Cote d’Ivoire, Chad, Angola, Guinea, etc. The participants were engaged in cyber security missions as air traffic controllers, telecom company officials, and so on. For the sessions on the later days, CERT-FR from France, also a supporting member of AfricaCERT, was also invited as a trainer.

29th November (Morning session)

I was an assistant for this session, while Sparky was the main trainer. When he started talking about the various CSIRTs in Japan, the participants seemed to be interested in the topic at once. Lately in Africa, “CSIRT” is one of the hot topics, and there are many companies and organizations that are interested in launching one. Serving as the Secretariat for NCA, JPCERT/CC introduced its organization overview and activities. This included the fact that NCA’s members now count up to more than 100 teams (and more to come in 2016), which is almost double from 2014. The participants were surprised at this number. As well as aiming to support organizations who wish to launch a new CSIRT, NCA also has diverse Working Groups in order to provide opportunities for teams to exchange information on CSIRT operation, incident case studies and latest cyber threats.

Photo1final

(Photo of me at the training)

29th (Afternoon session) – 30th November

The next program was a hands-on session, mainly for log management using SSH server. Sparky and I introduced how to collect and sort information from a vast amount of log data using shell commands. The participants were engaged in each task enthusiastically. This picture above was taken while I was teaching how to use those commands and confirm the results.

Photo2

(Photo of Sparky with banner)

At the event, we found this banner (above) with the pictures of key persons who have assisted in spearheading Internet in Africa. This is Sparky pointing the picture of Dr. Suguru Yamaguchi1, a member of JPCERT/CC’s Board of Directors, and one of the initiators and a contributor of AfricaCERT. Of course, having travelled to Africa more than 12 times now, Sparky’s photo was on the banner as well.

What impressed me during the training were the bright and enthusiastic eyes of the participants. I felt that each attendee was actively engaged in each task and trying to make the most of the training. As a trainer, I also enjoyed conducting the training to such participants, and I recall those moments every now and then.

Various different trainings and events by AfricaCERT will keep going. I hope to return to Africa in the near future if any opportunity arises. In order to provide continuous assistance in CSIRT development in Africa, JPCERT/CC will continue such activities by making more visits.

I will never forget those bright eyes of the participants in the Republic of Congo, and will work on my projects here at JPCERT/CC until my next return.

Thank you for reading.

- Hajime Komaba

(Translated by Yukako Uchida)


1We are deeply saddened to announce that Dr. Suguru Yamaguchi, a member of the Board of Directors of JPCERT/CC, passed away on May 9, 2016.

Statement of Condolences

https://www.jpcert.or.jp/english/about/2016/PR20160512R.html

 

May 06, 2016

Some coordinated vulnerability disclosures in April 2016

Hello, Taki here. It has been a long time since I have written here.

Today, I will be writing about some activities within our Vulnerability Coordination Group.

Over the past few years, we have received some coordination requests directly from overseas researchers and other sources, in addition to the reports through the " Information Security Early Warning Partnership".

I would like to introduce some recent cases that we have published on Japan Vulnerability Notes (JVN - https://jvn.jp/) but due to system limitations, are not in English at the moment.

The first case is,

JVNVU#92116866 – Published on 2016/4/26
(https://jvn.jp/vu/JVNVU92116866/index.html)

Keitai Kit for Movable Type vulnerable to OS Command Injection.

An OS command injection vulnerability was reported to us in the Movable Type plugin, Keitai Kit. Leveraging this vulnerability may result in arbitrary OS commands being executed on the server.

Versions affected are 1.35 through 1.641.

Attacks in the wild leveraging this vulnerability have been confirmed.

The vendor has provided an updated version, 1.65 to address the vulnerability. For those that cannot update, emergency patches have been provided for affected versions.

This vulnerability has been assigned, CVE-2016-1204.

The second case is,

JVNVU#90405898 – Published on 2016/4/27
(https://jvn.jp/vu/JVNVU90405898/index.html)

ManageEngine Password Manager Pro fails to restrict access permissions.

An access permission bypass vulnerability was reported to us in ManageEngine Password Manager Pro. Leveraging this vulnerability may result in unauthorized users being able to access password entry records for other users.

Versions affected are 8.3.0 (Build 8303) and 8.4.0 (Build 8400, 8401, 8402).

The vendor has provided an update version, 8.4.0 (Build 8403) to address the vulnerability.

And this vulnerability has been assigned, CVE-2016-1159.

Internally, we are working to upgrade our systems so that we can get this information onto JVN in an advisory format. For the time being, I hope to update through this blog periodically on some of our coordinated disclosures.

If you have a vulnerability report and are having difficulty reaching a vendor or are trying to reach a Japanese vendor and having a language issue, feel free to contact us. While we may not always be successful, we can at least give it our best try to contact and coordinate with a vendor.

For coordination assistance, please contact us at vultures (at) jpcert (dot) or (dot) jp.

- Takayuki (Taki) Uchiyama

Apr 08, 2016

PHP Files in CMS, Targeted for Alteration

JPCERT/CC has been continuously observing cases where websites in Japan created with Content Management Systems (hereafter “CMS”) are defaced in a similar way, and the same kind of cases are also observed overseas [1], [2]. In these cases, part of the PHP files composing the CMS are altered, and this results in defacement of the website contents [3].

Based on the analysis of several cases, this entry today describes the alteration of such files composing CMS.

Altered Files

JPCERT/CC confirmed that the targeted CMS contained partially altered PHP files as components. The CMS names and altered PHP files in each are listed in Table 1.

Table 1: CMS names and altered PHP files
CMS NameAltered PHP Files
WordPress /wp-includes/nav-menu.php
/wp-admin/includes/nav-menu.php
Joomla! /includes/defines.php
/administrator/includes/defines.php
Drupal /includes/bootstrap.inc
MODX /manager/includes/protect.inc.php

Our study revealed that malicious codes were dynamically inserted into the response from the website for each access by the visitor, which was a result of the PHP file alteration.

How Altered PHP Files Insert Malicious Codes

Altered PHP files included malicious PHP codes in between “//istart” and “//iend” as in Figure 1 (we noted that malicious codes are obfuscated in some cases).

These malicious PHP codes have a function to insert codes obtained from outside. They receive malicious codes from a specific URL and insert them in a certain place.

Figure 1: Malicious PHP codes in between “//istart” and “//iend”
Pic1

Malicious Codes to be Inserted

When a visitor accesses such websites, the malicious PHP codes in the altered PHP files obtain malicious codes composed of “div” tag and JavaScript, and insert them in the response to the website visitor. The places to insert the codes differ in each CMS: right after the “body” tag in WordPress, and at the beginning of the HTML in Joomla!, Drupal and MODX.

Figure 2: Example of malicious codes to be inserted
Pic2

Malicious codes include obfuscated JavaScript, and if this is executed on the visitor’s browser, they generate tags such as iframe, etc., which redirect the visitor to the attacker’s site. Also, if the visitor’s web browser or its plug-in, etc., has certain vulnerability, their PC may be exposed to the risk of being infected with malware such as ransomware.

Summary

In such cases where malicious codes are dynamically inserted, it may be difficult for website administrators to realise that their websites are providing malicious contents. We recommend the administrators to confirm that there are no malicious PHP codes (as in Figure 1) in the PHP files described in Table 1 and others in their websites. We have not yet confirmed how these PHP files are altered, but vulnerabilities in the CMS or the plug-ins used by CMS may be leveraged for the alteration. We also recommend updating the CMS and plug-ins to the latest version.

Other than the instances introduced here, JPCERT/CC has been seeing cases where Japanese websites are defaced, and then leveraged as an entrance to the attacker’s website. We hope that each website administrator takes actions as updating the software and properly managing passwords, and be mindful that their website would not be abused for such attacks.

- Ayaka Funakoshi

(Translated by Yukako Uchida)


Reference:

[1] Sucuri Inc
WordPress Malware Causes Psuedo-Darkleech Infection
https://blog.sucuri.net/2015/03/pseudo-darkleech-server-root-infection.html

[2] DAVISEFORD.COM
DarkLeech: Finally Under Control?
http://daviseford.com/node/58

[3] JPCERT/CC
JPCERT/CC Incident Handling Report (October 1 – December 31 2015) (English)
http://www.jpcert.or.jp/english/doc/IR_Report2015Q3_en.pdf