To decrypt it, we need to tweak the code a little bit so that the evil script reveals its true nature - as opposed to silently executing the payload. As you can see the injected code looks strange, but other than that it does not tell us whether the code is malicious or not:
What you can see from here is not that much, except you can be sure that the script is obfuscated. For a security expert this kind of code is always highly suspicious as it reveals that the author of the code wanted to hide something for a good reason. If you are indenting the code properly, however, it shows something more to the human eye. Actually you can then divide the code into two parts. In the first part there is only a very short function and a definition of a variable:
Note that we cut off the value of the variable as it was just too long and is not needed to understand this algorithm. The second part is another function and a call to the first function mentioned above:
As you can see it calls function t() which is only a wrapper around function z(), most probably only to use it as a light anti-de-obfuscation technique. Therefore we need to analyze only the second function. It is very easy to spot that it uses simple substitution ciphering, this time only for the letter 'Z'. Also it uses char representation coding, where for decoding it only uses the unescape() function. At the end of the script, you can see the eval() function call. This one needs to be replaced with a print() instead in order to display the de-obfuscated code:
Have you noticed the clear URLs at the bottom? There is definitely something there to investigate. Again, we can use indentation tools (or to use their more fashionable name, code beautifiers) to see what is behind the scenes:
We can see the Twitter links, which are clean of course. We could easily come to the wrong conclusion that the script contains only clean URLs, and is therefore not malicious. However, as the bigger part of the script is still cryptic we have to suspect that there is much more behind the scenes.
If we look at that code snippet even more closely, we can see that it uses two of these callbacks. The first one is only to determine the date. It also fires up the second callback function, which then utilizes the eval() function in the middle. Note also the following document.write() function, which we will discuss later.
In order to be able to decipher this layer, we need to get to the stage where that eval() function is called with the proper parameters. For this we either need to completely emulate the browser environment including Twitter APIs, or need to modify the code a little bit to get over those callback tricks. For the sake of simplicity I just commented out the part of the code that is not needed just now, leaving only the eval() function call (replacing that with a print() of course). Let's see what we have:
At this stage we are a bit closer to the finish, but the fun part just starts here. This bit uses global variables previously defined in the upper layers, so we need those. Once we copy over the variable set, we realize that the function named 'cz' interferes with the variable of the same name. If we quickly look back, we can see that the eval() function that generates this code snippet is embedded into the callback2() function. This means it is in a function context, not global. That's why it is no problem for the script to override the original definition of the 'cz' and redefine it as a function. This trick, however, makes harder to emulate the code.
In addition to this, it plays with the variable names inside the dw() function to fool simple de-obfuscation engines, which are using context-free grammar only. Furthermore it keeps reusing the variable $a in a nested eval() chain, just for total confusion. Once we solve this level, we get yet another which looks very similar to the previous one; however, there are differences:
Nothing special about this one: it looks like just another layer, except you can see that it also uses an XOR algorithm to increase the level of encryption. Solving this part is very similar to the previous one, so we get into the final level easily, which looks something similar to this:
Now this part could be a bit confusing as there are loads of things happening here. The good news is that we already can see some valuable information: a decrypted domain name. However, we can also see that there is a dynamic URL generation algorithm at the end. To decrypt it fully, we should tweak the code again in order to get it to work.
Basically we have to remove all the browser-specific code snippets, replacing their return values and variable initializations with static data. This way the script can decipher itself in a non-browser environment. Please note that the function returns different URLs depending on the computed values of the variables. Therefore to get the entire list of URLs, we would need to determine and feed all the possible combinations of the values.
However, for this we would need to fully understand the code. The basic idea is that it uses the date as well as character code and the length of the Twitter titles to generate the URL. The sample could create two URLs each day giving a combination of 730 in total throughout the whole year. The algorithm is a true mixture of Cesar and codebook ciphering, as it uses the live Twitter data as a codebook, and then uses this data to calculate a shift value from it. The resulting encrypted text is then used as the domain name for the URL.
Finally, you may notice that there is no eval(), document.write() or any other method in this piece of code to write the data back to the document or otherwise execute the decrypted code. Do you remember that this code is still running inside the callback2() function? There is the document.write($a) right after the eval() which then makes it happen.
After all this we got the results: a hidden iframe pointing to a possible malicious site:
Wait a minute, why did we just say "possible"? Can't we just tell this for sure? Of course we can, once the URL has been correctly generated. However, as the original algorithm generates these URLs from live Twitter data, the URL list can't be guessed, not even by the author of this threat. We can, however, mine the existing trends and calculate all the URLs that were used in the past. Other possibilities include predicting the URL using all possible combinations of the shifting value, giving us a huge list of URLs to block.
This sample is a clear example how cyber criminals are trying to use increasingly advanced tricks to fool automated analysis. Using multiple level of obfuscations and live data from the internet is problematic for the traditional static detection algorithms and requires more advanced methods.