Contents | ||
Introduction | ||
|
||
This article will help you get better understanding of inner working and flow of PDF file assisting you in the PDF Malware Analysis or any research work revolving around PDF. | ||
Requirements | ||
Before we get our hands dirty, we need to have following tools | ||
| ||
Starting Corrupted PDF | ||
Now download the sample document 'multipages.pdf'
[References 2] and open it in the PDF reader. On launching you will see following error |
||
Tracing and Fixing the Error in PDF | ||
Lets start the investigation as to see
what went wrong with this PDF document. To get inside view, open this corrupt PDF file in Notepad++. You will see the contents as shown below | ||
1
0 obj
<<
/Pages
2 0
R
/Type
/Catalog
>>
endobj
2
0 obj
<<
/Count
2
/Kids
[ 3
0 R 5
0 R 7
0 R 9
0 R
11 0 R
]
/Type
/Pages
>>
endobj
3
0 obj
<<
/MediaBox
[ 0
0 795
842 ]
/Parent
2 0
R
/Contents
4 0
R
/Resources
<<
/Font
<<
/F1
<<
/Name
/F1
/BaseFont
/Helvetica
/Subtype
/Type1
/Type
/Font
>>
>>
>>
/Type
/Page
>>
endobj
4
0 obj
<<
/Length
55
>>stream
BT
/F1
18 Tf
186
690 Td
20
TL
(www.secsavvy.com)
Tj
ET
endstream
endobj
5
0 obj
<<
/MediaBox
[ 0
0 795
842 ]
/Parent
2 0
R
/Contents
6 0
R
/Resources
<<
/Font
<<
/F1
<<
/Name
/F1
/BaseFont
/Helvetica
/Subtype
/Type1
/Type
/Font
>>
>>
>>
/Type
/Page
>>
endobj
6
0 obj
<<
/Length
45
>>stream
BT
/F1
15 Tf
186
690 Td
20
TL
(Page
1) Tj
ET
endstream
endobj
7
0 obj
<<
/MediaBox
[ 0
0 795
842 ]
/Parent
2 0
R
/Contents
8 0
R
/Resources
<<
/Font
<<
/F1
<<
/Name
/F1
/BaseFont
/Helvetica
/Subtype
/Type1
/Type
/Font
>>
>>
>>
/Type
/Page
>>
endobj
8
0 obj
<<
/Length
45
>>stream
BT
/F1
15 Tf
186
690 Td
20
TL
(Page
2) Tj
ET
endstream
endobj
9
0 obj
<<
/MediaBox
[ 0
0 795
842 ]
/Parent
2 0
R
/Contents
10 0
R
/Resources
<<
/Font
<<
/F1
<<
/Name
/F1
/BaseFont
/Helvetica
/Subtype
/Type1
/Type
/Font
>>
>>
>>
/Type
/Page
>>
endobj
10
0 obj
<<
/Length
45
>>stream
BT
/F1
15 Tf
186
690 Td
20
TL
(Page
3) Tj
ET
endstream
endobj
11
0 obj
<<
/MediaBox
[ 0
0 795
842 ]
/Parent
2 0
R
/Content
12 0
R
/Resources
<<
/Font
<<
/F1
<<
/Name
/F1
/BaseFont
/Helvetica
/Subtype
/Type1
/Type
/Font
>>
>>
>>
/Type
/Page
>>
endobj
12
0 obj
<<
/Length
47
>>stream
BT
/F1
15 Tf
186
690 Td
20
TL
(Password)
Tj
ET
endstream
endobj
xref
0
13
0000000000
65535 f
0000000010
00000 n
0000000067
00000 n
0000000161
00000 n
0000000398
00000 n
0000000510
00000 n
0000000747
00000 n
0000000849
00000 n
0000001086
00000 n
0000001188
00000 n
0000001426
00000 n
0000001529
00000 n
0000001768
00000 n
trailer
<<
/Root
1 0 R
/Size
13
>>
startxref
1873
%%EOF | ||
PDF file consists of 4 elements: | ||
| ||
But here if you observe closely, there is no header so we will add a PDF header and try to open this PDF. | ||
%PDF-1.7 | ||
Lets add this missing header info at the beginning of the file. Now you can open it open it without problem as shown below. | ||
Well that's good, but everything is not
right. From the above picture you can see that it has total of 2
pictures. Lets investigate further. Here is the screenshot showing the brief analysis of page-linking structure of this PDF file | ||
Now, we are able to find that this PDF has actually total 5 pages so edit the Count from 2 to 5 and open this PDF as shown below. | ||
%PDF-1.7 1 0 obj << /Pages 2 0 R /Type /Catalog >> endobj 2 0 obj << /Count 5 /Kids [ 3 0 R 5 0 R 7 0 R 9 0 R 11 0 R ] /Type /Pages >> endobj | ||
Now, we are able to see all 5 pages but
last page is blank so we will investigate further. Last page is in fact pointed by 11 0 R indirect object reference clear from the code snippet below | ||
11 0 obj << /MediaBox [ 0 0 795 842 ] /Parent 2 0 R /Content 12 0 R /Resources << /Font << /F1 << /Name /F1 /BaseFont /Helvetica /Subtype /Type1 /Type /Font >> >> >> /Type /Page >> endobj | ||
In PDF, 'Contents'
keyword is used for describing the contents of a file . If this
entry is absent then the page is empty. But here object number 12 Contents is written as 'Content' (note the missing 's' at the end). Hence the PDF reader is unable to recognize the name Content so it ignores the Content without giving any error. To fix this, simply replace Content with Contents and open the PDF. Now you will be able to see all five pages. You can download this fixed PDF 'MultiplePages_Fixed' [Reference 2] and test it for yourself. | ||
Video Demonstration | ||
Here is the video demonstration of this entire analysis and fixing process. | ||
|
||
http://vimeo.com/18075125 | ||
Reference | ||
Conclusion | ||
IHope you enjoyed this article and also
got to know more about working flow of PDF document. f you are more interested to read about PDF then I recommend you to visit excellent bog of Didier Stevens [Reference 3] |
Wednesday, 29 February 2012
Investigating Corrupt/Malicious PDF Document - Author: Ayush Anand
Labels:
Exploit
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment