Lab 1.3: Metadata Treasure Hunt

Objectives

  • To use ExifTool to analyze .xls, .doc, and .pdf files for information that will be useful in a penetration test
  • To gather recon information about usernames, email addresses, file system paths, and other sensitive data associated with a target organization

Table of Contents

 

Lab Setup

The files you will examine in this lab are located in /home/sec560/coursefiles/metadata/. The files are:

Please use the copy in your Linux VM. The links are provided here so they can be accessed in Windows.

The goal of this lab is to run exiftool and strings on each of these files, trying to answer the specific questions posed below.

A copy of each of these files is also included on the course USB drive in the coursefiles\metadata directory. You can open them in Windows and look at them if you’d like, but the lab should be performed in Linux, which has exiftool and strings installed.

ExifTool can be invoked on the VMware Linux image to analyze a file by running:

$ exiftool filename

To run strings against a file, you could simply use:

$ strings filename

Try this for each of the files, and enter the data you discover that answers the questions on the next page.

Also, remember that you can peek ahead at the answers and the approach used to determine them.

 

Questions:

  • What is the full name of user Bob? What is Bob’s nickname?

  • What is Bob’s email address?

  • What Personally Identifiable Information is located in the spreadsheet (.xls) file?

  • What information is associated with the organization’s firewall ruleset? Hint: The command below shows lines of output with the word "firewall" in a case-insensitive fashion.

$ strings filename | grep -i firewall
  • If you have some extra time, also look through the files to find all file system paths and URLs.

Hint 1: You should consider looking for forward slashes by piping your output through grep to search for a / character using the command below.

$ strings filename | grep /

Hint 2: To find lines with a single backslash in them, you could pipe your data through grep '\\'. That syntax will make your shell send a single \ into the grep command.

$ strings filename | grep '\\'

Also, remember that you can peek ahead to the answers.

 

Walkthrough - Step-by-Step Instructions and Answers

Bob’s Full Name, Nickname, and Email Address

Bob created the .doc and .xls files in Microsoft Word and Microsoft Excel, respectively, so we can analyze the metadata of either file to determine Bob’s full name and nickname. Microsoft Office inserts usernames and author information in specific fields of the files it generates, so we can look for this structured metadata with ExifTool. You can run ExifTool against either the .doc or the .xls file.

sec560@slingshot:~/coursefiles/metadata$ exiftool WidgetStatisticalWhitepaper.doc
ExifTool Version Number         : 10.10
File Name                       : WidgetStatisticalWhitepaper.doc
Directory                       : .
File Size                       : 35 kB
File Modification Date/Time     : 2018:08:29 18:06:06+00:00
File Access Date/Time           : 2019:06:29 17:34:33+00:00
File Inode Change Date/Time     : 2019:06:29 17:28:35+00:00
File Permissions                : rwxr-xr-x
File Type                       : DOC
File Type Extension             : doc
MIME Type                       : application/msword
Title                           : Statistical Analysis Whitepaper
Subject                         :
Author                          : Bob the Awesome
Keywords                        :
Template                        : Normal
Last Modified By                : Bob Boberson
Revision Number                 : 23
Software                        : Microsoft Word 9.0
Total Edit Time                 : 22.0 minutes
Last Printed                    : 2009:12:30 16:22:00
Create Date                     : 2009:12:30 15:30:00
Modify Date                     : 2009:12:30 16:23:00
Pages                           : 1
Words                           : 219
Characters                      : 1253
Security                        : None
Company                         : 560 Global Conglomerate
Lines                           : 10
Paragraphs                      : 2
Char Count With Spaces          : 1538
App Version                     : 9.8968
Scale Crop                      : No
Links Up To Date                : No
Shared Doc                      : No
Hyperlinks Changed              : No
Title Of Parts                  : Statistical Analysis Whitepaper
Heading Pairs                   : Title, 1
Code Page                       : Windows Latin 1 (Western European)
Hyperlinks                      : \\webserver\wwwroot\images\560gc_logo.jpg, ..\My Pictures\chart.png
E-Mail                          : bob.boberson@560gc.tgt
Comp Obj User Type Len          : 24
Comp Obj User Type              : Microsoft Word Document

The interesting data from the above command is:

Author                          : Bob the Awesome
Last Modified By                : Bob Boberson
Hyperlinks                      : \\webserver\wwwroot\images\560gc_logo.jpg, ..\My Pictures\chart.png

We can see the Author and his name as well as file paths.

Next, examine WidgetStatisticalAnalysis.xls using ExifTool.

sec560@slingshot:~/coursefiles/metadata$ exiftool WidgetStatisticalAnalysis.xls
ExifTool Version Number         : 10.10
File Name                       : WidgetStatisticalAnalysis.xls
Directory                       : .
File Size                       : 32 kB
File Modification Date/Time     : 2018:08:29 18:06:06+00:00
File Access Date/Time           : 2019:06:29 17:34:33+00:00
File Inode Change Date/Time     : 2019:06:29 17:28:35+00:00
File Permissions                : rwxr-xr-x
File Type                       : XLS
File Type Extension             : xls
MIME Type                       : application/vnd.ms-excel
Title                           : Intense Statistical Analysis of Color Preferences
in 560 Global Conglomerate Customers
Author                          : Bob the Awesome
Last Modified By                : Bob Boberson
Software                        : Microsoft Excel
Create Date                     : 2009:12:30 14:37:51
Modify Date                     : 2009:12:30 15:55:14
Security                        : None
Company                         : 560 Global Conglomerate
App Version                     : 9.8968
Scale Crop                      : No
Links Up To Date                : No
Shared Doc                      : No
Hyperlinks Changed              : No
Title Of Parts                  : Trends
Heading Pairs                   : Worksheets, 1
Code Page                       : Windows Latin 1 (Western European)
E-Mail                          : bob.boberson@560gc.tgt
Comp Obj User Type Len          : 26
Comp Obj User Type              : Microsoft Excel Worksheet

The interesting data from the above command is:

Author                          : Bob the Awesome
Last Modified By                : Bob Boberson
E-Mail                          : bob.boberson@560gc.tgt

Bob’s full name is Bob Boberson (from the Last Modified By field).

Bob’s nickname appears to be Bob the Awesome as indicated in the Author field.

Bob’s email address appears to be bob.boberson@560gc.tgt as indicated in the E-mail field.

Personally Identifiable Information (PII)

To find PII in the .xls file, we can look for strings of consecutive characters. However, many files are littered with meaningless small strings, so we’ll focus our search on longer strings, such as eight characters or more in length. When we do this using the strings command with the -n 8 option, we find some interesting strings in the .xls file, as shown below (output truncated for brevity).

sec560@slingshot:~/coursefiles/metadata$ strings -n 8 WidgetStatisticalAnalysis.xls
Daniel Pendelino
ThisWorkbook
"$"#,##0_);\("$"#,##0\)
"$"#,##0_);[Red]\("$"#,##0\)
"$"#,##0.00_);\("$"#,##0.00\)
"$"#,##0.00_);[Red]\("$"#,##0.00\)
_("$"* #,##0_);_("$"* \(#,##0\);_("$"* "-"_);_(@_)
_(* #,##0_);_(* \(#,##0\);_(* "-"_);)(@_)
_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)
_(* #,##0.00_);_(* \#,##0.00\);_(* "-"??_);_(@_)
 $.' ",#
(7),01444
'9=82<.342
!22222222222222222222222222222222222222222222222222
%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
.xV{y!rv
Customer Color Preferences
Number of Customers
Customer #
Color Preference
Mrs. Boberson
111-11-1111
Sally Southers
222-22-2222

In the output, you’ll see strings with the full names of various people (Mrs. Boberson, Sally Southers, and more) along with data that appears to be Social Security numbers or some government-related identification numbers. This is likely PII that has leaked out of the target organization.

Firewall Information

Next, we’ll look for information about the firewall of the target organization by running strings and grepping its output for the string "firewall" in a case-insensitive fashion.

sec560@slingshot:~/coursefiles/metadata$ strings WidgetStatisticalAnalysis.xls | grep -i firewall

There’s no output, which implies that there are no such ASCII strings in this document.

sec560@slingshot:~/coursefiles/metadata$ strings WidgetStatisticalWhitepaper.pdf | grep -i firewall

Again, we see no output.

sec560@slingshot:~/coursefiles/metadata$ strings WidgetStatisticalWhitepaper.doc | grep -i firewall
Note to self. Sandra asked to open port 8000 on the Windows Web server Firewall for
something called IceCast. Do this before lunch. Widget Color Analysis White Pbelow

Here we see output that mentions opening up port 8000 on the Windows Web Server Firewall for Icecast, which is a streaming audio service. Bob apparently made this comment to remind himself to take this action before lunch.

Path and URL Information

If you have extra time, you can look for additional information—specifically, URLs and file system paths—in the files. These might be useful to a penetration tester who is looking to target specific valuable information assets in a target organization.

File system paths may be structured or unstructured metadata, so we’ll look for them using both ExifTool and strings.

We’ll start with ExifTool. To make our analysis more efficient, we’ll rely on a feature of ExifTool that lets us specify multiple files on the command line, one after another, and the tool will retrieve metadata from all files we specify.

First, let’s run ExifTool to look through each of our three files, grepping our output to find slashes (/):

sec560@slingshot:~/coursefiles/metadata$ exiftool WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.pdf | grep /

Next, let’s look for backslashes. (Sending grep '\' makes grep look for a single backslash only.)

sec560@slingshot:~/coursefiles/metadata$ exiftool WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.pdf | grep '\\'

Here we see file system paths of \webserver\wwwroot\images\560gc_logo.jpg and ..\My Pictures\chart.png.

sec560@slingshot:~/coursefiles/metadata$ exiftool WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.pdf | grep /
File Modification Date/Time     : 2018:08:29 18:06:06+00:00
File Access Date/Time           : 2019:06:29 18:25:30+00:00
File Inode Change Date/Time     : 2019:06:29 18:17:04+00:00
MIME Type                       : application/msword
File Modification Date/Time     : 2019:06:29 18:03:35+00:00
File Access Date/Time           : 2019:06:29 18:17:10+00:00
File Inode Change Date/Time     : 2019:06:29 18:17:04+00:00
MIME Type                       : application/vnd.ms-excel
File Modification Date/Time     : 2018:08:29 18:06:06+00:00
File Access Date/Time           : 2019:06:29 18:27:29+00:00
File Inode Change Date/Time     : 2019:06:29 18:17:04+00:00
MIME Type                       : application/pdf
Producer                        : \376\377\000B\000u\000l\000l\000z\000i\000p\000
 \000P\000D\000F\000 \000P\000r\000i\000n\000t\000e\000r\000 \000/\000 \000w\000w
 \000w\000.\000b\000u\000l\000l\000z\000i\000p\000.\000c\000o\000m\000 \000/\000
 \000F\000r\000e\000e\000w\000a\000r\000e\000 \000E\000d\000i\000t\000i\000o\000n
Format                          : application/pdf
sec560@slingshot:~/coursefiles/metadata$ exiftool WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.pdf | grep '\\'
Hyperlinks                      : \\webserver\wwwroot\images\560gc_logo.jpg,
 ..\My Pictures\chart.png
Producer                        : \376\377\000B\000u\000l\000l\000z\000i\000p\000
 \000P\000D\000F\000 \000P\000r\000i\000n\000t\000e\000r\000 \000/\000 \000w\000w
 \000w\000.\000b\000u\000l\000l\000z\000i\000p\000.\000c\000o\000m\000 \000/\000
 \000F\000r\000e\000e\000w\000a\000r\000e\000 \000E\000d\000i\000t\000i\000o\000n
Title                           : \376\377\000W\000i\000d\000g\000e\000t\000 \000S
\000t\000a\000t\000i\000s\000t\000i\000c\000a\000l\000 \000W\000h\000i\000t\000e
\000p\000a\000p\000e\000r
Creator                         : \376\377\000B\000o\000b\000\000B\000o\000b\000e
\000r\000s\000o\000n

The grep / command didn't reveal any interesting information. However, the grep \\ command did reveal file paths.

Hyperlinks                      : \\webserver\wwwroot\images\560gc_logo.jpg, ..\My
Pictures\chart.png

Next, we’ll look for ASCII strings in our files using the strings command, also taking advantage of the fact that strings supports multiple files on the command line. We’ll start by searching for strings greater than eight characters (-n 8), looking through the output for the / character:

sec560@slingshot:~/coursefiles/metadata$ strings -n 8 WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.pdf | grep /

Here, we see a lot of strings in the output, which includes several URLs: http://www.w3.org/199/02/22-rdf-syntax-ns#, http://purl.org/dc/elements/1.1/, and http://ns.adobe.com/xap/1.0/mm/. These URLs are likely just part of the PDF file and point to items outside of our target scope.

sec560@slingshot:~/coursefiles/metadata$ strings -n 8 WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.pdf | grep '\\' 
"$"#,##0_);\("$"#,##0\)
"$"#,##0_);[Red]\("$"#,##0\)
"$"#,##0.00_);\("$"#,##0.00\)
"$"#,##0.00_);[Red]\("$"#,##0.00\)
_("$"* #,##0_);_("$"* \(#,##0\);_("$"* "-"_);_(@_)
_(* #,##0+);_(* \(#,##0\);_(* "-");_(@_)
_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)
_(* #,##0.00_);_(* \(#,##0.00\);_(* "-"??_);_(@_)
#C:\WIND
OWS\syst
em32\STD
Files\Mi
.G8Z!sU=/\
<rdf:Description rdf:about='27fc05ce-f7bb-11de-0000-4235e672b786' xmlns:pdf='htt
p://ns.adobe.com/pdf/1.3/' pdf:Producer='\\376\377\000B\000u\000l\000l\000z\000i\
000p\000 \000P\000D\000F\000 \000P\000r\000i\000n\000t\000e\000r\000 \000/\000 \
000w\000w\000w\000.\000b\000u\000l\000l\000z\000i\000p\000.\000c\000o\000m\000 \
000/\000 \000F\000r\000e\000e\000w\000a\000r\000e\000 \000E\000d\000i\000t\000i\
000o\000n' />
<rdf;Description rdf:about='27fc05cd-f7bb-11de-0000-4235e672b786' xmlns:dc='http
://purl.org/dc/elements/1.1/' dc:format="application.pdf'><dc:title><rdf:Alt><rd
f:li xml:lang='x-default'>\376\377\000W\000i\000d\000g\000e\000t\000 \000S\000t\
000a\000t\000i\000s\000t\000i\000c\000a\000l\000 \000W\000h\000i\000t\000e\000p\
000a\000p\000e\000r</rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>\3
76\377\000B\000o\000b\000 \000B\000o\000b\000e\000r\000s\000o\000n</rdf:li></rdf
:Seq></dc:creator></rdf:Description>
#

So our analysis looking for standard ASCII strings didn’t prove too useful. Let’s look for big-endian and little-endian Unicode strings to see if we get any more useful information that way.

We’ll start by looking for big-endian strings eight characters or more in length that include a slash (/):

sec560@slingshot:~/coursefiles/metadata$ strings -n 8 -e b WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.pdf | grep /

Our output is empty. Let’s look for little-endian Unicode strings with forward slashes:

sec560@slingshot:~/coursefiles/metadata$ strings -n 8 -e l WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.pdf | grep /

Note: The character after -e is a lowercase L, not a one.

Again, nothing. Let’s look for big-endian Unicode strings with backslashes:

sec560@slingshot:~/coursefiles/metadata$ strings -n 8 -e b WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.pdf | grep '\\'

This gives us some useful information. Here, we found a potentially interesting piece of information—a file system path to the original file on Bob’s machine:

C:\Users\Bob Boberson\My Documents\WidgetStatisticalWhitepaper.doc.

Finally, let’s look for strings with little-endian Unicode backslashes:

sec560@slingshot:~/coursefiles/metadata$ strings -n 8 -e l WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.pdf | grep '\\‘

Note: Again, the above command uses a lowercase L, not a number one.

With this one, we’ve found numerous file system paths, including paths to a file on a web server, a Visual Basic for Applications DLL, the file system path to Office on the machine, and much more.

sec560@slingshot:~/coursefiles/metadata$ strings -n 8 -e b WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls | grep /
sec560@slingshot:~/coursefiles/metadata$ strings -n 8 -e l  WidgetstatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.doc.pdf | grep /
sec560@slingshot:~/coursefiles/metadata$ strings -n 8 -e b WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.pdf | grep '\\'
..\My Pictures\Chart.png
\\webserver\wwwroot\images\560gc_logo.jpg
..\My Pictures\chart.png
Bob BobersonBC:\Users\Bob Boberson\My Documents\WidgetStatisticalWhitepaper.doc
Bob BobersonBC:\Users\Bob Boberson\My Documents\WidgetStatisticalWhitepaper.doc
Bob BobersonBC:\Users\Bob Boberson\My Documents\WidgetStatisticalWhitepaper.doc
Bob BobersonBC:\Users\Bob Boberson\My Documents\WidgetStatisticalWhitepaper.doc
Bob BobersonBC:\Users\Bob Boberson\My Documents\WidgetStatisticalWhitepaper.doc
Bob BobersonBC:\Users\Bob Boberson\My Documents\WidgetStatisticalWhitepaper.doc
Bob BobersonBC:\Users\Bob Boberson\My Documents\WidgetStatisticalWhitepaper.doc
Bob BobersonBC:\Users\Bob Boberson\My Documents\WidgetStatisticalWhitepaper.doc
Bob BobersonBC:\Users\Bob Boberson\My Documents\WidgetStatisticalWhitepaper.doc
Bob BobersonBC:\Users\Bob Boberson\My Documents\WidgetStatisticalWhitepaper.doc
C:\Users\Bob Boberson\My Pictures\560gc_logo.jpg

sec560@slingshot:~/coursefiles/metadata$ strings -n 8 -e l WidgetStatisticalWhitepaper.doc
WidgetStatisticalAnalysis.xls WidgetStatisticalWhitepaper.pdf | grep '\\'
\\webserver\wwwroot\images\560gc_logo.jpg
*\G{000204EF-0000-0000-C000-000000000046}#4.0#9#C:\PROGRA~1\COMMON-1\MICROS~1\VB
A\VBA6\VBE6.DLL#Visual Basic For Applications
*\G{00020813-0000-0000-C000-000000000046}#1.3#0#C:\Program Files\Microsoft Offic
e\Office\EXCEL9.OLB#Microsoft Excel 9.0 Object Library
*\G{00020430-0000-0000-C000-000000000046}#2.0#0#C:\WINDOWS\system32\STDOLE2.TLB#
OLE Automation

 

Conclusion

In this lab, we’ve seen how we can use ExifTool and the strings command to pull data from files that may be useful to us in our penetration test. We’ve seen the advantages of structured data and ExifTool in pinpointing useful information.

We’ve also seen the advantages of looking for unstructured data with the strings command to find something that ExifTool isn’t designed to show: obscured fields and comments.

We’ve also seen how to transcend the default limitation of ASCII strings on Linux with the -e option to look for Unicode strings, both big endian and little endian.