Categorized | Antivirus

Malicious PDFs: A summary of my VB2010 presentation

Last week, I presented at VB2010 a talk that was well received in the room and on the wires. A number of people have requested copies of or links to my presentation and paper (thanks to Helen Martin of Virus Bulletin for permission). Reading presentations without the commentary is difficult and so I will expand on a few slides here.

In the presentation I give 5 heuristics for detection and/or more in-depth parsing:

Heuristic 1

For my paper I gathered a corpus of ~130 000 PDF file. Of which half were malicious. Scanning the corpus for the tag /JavaScript gave the following results:

For the presentation I gathered a larger corpus (concentrating on files with JavaScript) and that gave:

In the two corpuses the definition of malicious differs – the first overly agreesive and the second less so however, it appears that:-

“While JavaScript is not neccessary for maliciousness it is not neccessary for the majority of clean files.”

Heuristic 1: If the PDF contains JavaScript look more closely

Heuristic 2

Within PDF files you have indirect objects and they are of the form N R obj (where N is the object number and R is the revision number). Each indirect object is associated with a tag endobj. Within indirect objects you can have stored binary data in a stream tag. Each stream is associated with an endstream tag.

Within the first corpus we see:

The second corpus shows:

“Because the PDFs are those reported to SophosLabs some of them are actually corrupt. Writing a parser to know when the files are maliciously corrput is non-trivial”

Heuristic 2: If the objects or streams are mismatched look more closely

Heuristic 3

There are two main ways of parsing a PDF file:

  • Use the Cross Reference (XRef) Table which points to the position of each object and build the tree
  • Brute force the file. Scan for starting and ending object tags and build the tree

Over the first corpus I attempt to validate the XRef table:

“It appears that readers and parsers must use both methods”

Heuristic 3: If the Cross-Reference (XRef) Table is invalid look more closely

Heuristic 4

Binary data within streams can be stored in various Filters (Adobe parlance for compression methods). Scanning the first corpus for different types of data shows the following prevelence :

Over the second corpus we see slightly different results:

“LZWDecode is suggestive of older PDFs and DCTDecode is used to store certain graphics”

Heuristic 4: The presence of LZWDecode, ASCII85Decode, DCTDecode and Encrypt Filter are indictative of clean files

Heuristic 5

The standard allows for Fonts names to have non-ASCII characters in them to do this the non-ASCII are encoded via hash encoding i.e. #61 the hash followed by the hexidecimal number 61 (ASCII ‘a’). Scanning the corpus for Filters that use hash encoding gives:

“When I rescanned the 257 file with later data they are all malicious”

Heuristic 5: Hash (#) encoded tags are indictative of malicious files

Conclusions

Adobe have done a great deal of work to try and fix the problems of malicious PDFs: changed the update frequency of their products; changed the update mechanisms; and joined MAPP. Even so, there are still things they could improve:

Conclusion 1

Heuristic 1 suggests that JavaScript isn’t that common and so lightweight readers shouldn’t implement it, especially, browser plugins.

Conclusion 2

If running code (via JavaScript or Flash) it should be signed so you can have some level of trust. This isn’t a fail-safe method but it helps.

Conclusion 3

Having readers by default warning when trying to open corrupt files would be a help. Browser plugins should even try to.

Conclusion 4

Redesigning PDF has already begun, and PDF/A is actually a good start. History has shown, that the problems with Microsoft Office macros went away with newer versions because of a redesign.

Conclusion 5

PDF Reader is being redesigned to have a sandbox but care must be taken not to allow sloppy code that relies on the sandbox to catch errors.

I finished the presentation by stating:

This house believes that PDF as a file format is no longer fit for purpose and that a new SDF (Safe Document Format) should take its place.

Of the ~200 people in the room ~75% agreed with the statement and ~3% disagreed.

My colleague Mike Wood – who also presented at VB2010 – joined Chet and me in a podcast.

View full post on SophosLabs blog

Related Posts
  • Surrounded by Malicious PDFs
    Malicious PDF files and related exploits are invading the Net. Looking at the CVE records in the National Vulnerability Database for Adobe products, we see a dramatic increase in 2009. Since January ...
  • Malicious PDFs find a novel way of running JavaScript
    Earlier this year I gave a talk at the Virus Bulletin conference in Vancouver about malicious PDFs.As a consequence of that paper, I received a number of enquiries from other researchers working in th...
  • Malware abusing digital signatures: VB2010 presentation highlights
    I recently presented my paper Want My Autograph? The use and abuse of digital signatures by malware at Virus Bulletin 2010. I will refrain from delving into the gory details of digital signatures heur...
  • Malicious PDFs cause trouble at the Ministry
    It seems someone compromised the ministryofrum(dot)com recently, replacing an understanding and appreciation of rum with malicious PDF files instead. The site is fixed now, but compare the clean s...
  • PDF Scanner: detecting malicious PDFs
    Today I wrote a simple program that scans PDF files and detects the malicious ones. 7 malicious PDFs downloaded from malwaredomainlist.com and mdl.paretologic.com 493 good PDFS downloaded from a reput...
  • Analysis of a set of malicious and-or malformed PDF(s)
    Hi,As promised some day ago, I'll increase the number of posts centred on Malicious PDF Analysis, focusing attention on the most common malformations, that could make harder or block common inspection...
  • Launching malicious content from PDFs
    Last week, Didier Stevens (an independent security researcher) wrote a blog about a security hole in PDFs. In it he described how to launch arbitrary files from within a PDF. Following on from Didier&...
  • Malicious PDFs utilizing Launch Action Now Seen in the WILD!
    We all knew it was coming, so I doubt anyone is going to be shocked to learn that SophosLabs is reporting they have now seen the first instance of a malicious PDF file utilizing the Launch action. Pa...
  • Malicious Spam on the increase again
    Malware distribution via email is far from dead.  While we had a distinctly quiet period from October 2010 to March 2011, our stats show the bot herders are gearing up again with the proportion o...
  • Yahoo! PH Purple Hunt 2.0 Ad Compromised
    Earlier the other day, I was browsing through the Yahoo! PH site and the Yahoo! Purple Hunt 2.0 ad caught my attention.Curious as I am, I clicked on the ad and surprisingly my browser downloaded a sus...

Comments are closed.

Security Status

Beware Facebook "Timeline" scams http://t.co/W5EW0cVv
4 months ago
Nigerian government (unknowingly) hosts phishing website http://t.co/uQd42ENw
4 months ago
PCMag Awards McAfee All Access its Editors’ Choice: SANTA CLARA, Calif.--(BUSINESS WIRE)--McAfee today announced... http://t.co/FakV7Vd8
4 months ago
RT @mikko: I hadn't noticed Google Maps has added 3D models of buildings. Here's a (very accurate) view of F-Secure HQ in Helsinki http://t.co/IKfAZlak
4 months ago
North Koreans aren't known for their online presence. But others may be lured into clicking Kim Jong-Il 'videos' too http://t.co/yQOon6YT
4 months ago
How to Protect Your Professional Reputation on Facebook Timeline http://t.co/I4bcR2VN
4 months ago
This is pretty impressive from @Softpedia: Facebook scans 2 trillion link clicks and blocks 220 million posts each day http://t.co/vKsn9gNl
4 months ago
Need for integrated approach to security in industrial control systems - http://t.co/tPBCNOow with @PikeResearch
4 months ago
Some free-based music we play at work http://t.co/xu5agZfc
4 months ago
Japan’s cyber defense weapon: a virus. It includes quotes by @Luis_Corrons via @InfosecurityMag
4 months ago