0% found this document useful (0 votes)
24 views6 pages

Forensics in PII Detection

jnjikninkonn

Uploaded by

Islam For all
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views6 pages

Forensics in PII Detection

jnjikninkonn

Uploaded by

Islam For all
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Feature

Computer Forensics Technologies for Personally


Identifiable Information Detection and Audits
Yin Pan, Ph.D., is an associate Identity theft has become more prevalent in PII BACKGROUND
professor in the department of recent years; about 10 million incidents occur Where Do Existing PII Tools Look for
networking, security and systems each year.1 IT professionals must understand Sensitive Data?
administration at the Rochester the need for personally identifiable information PII tools typically search through the directory tree,
Institute of Technology (RIT) (PII) discovery to protect themselves and their looking at the content of allocated files, e-mail and
(New York, USA). Pan is actively company from the civil, legal and financial the like for keywords or strings that indicate the
involved in teaching and research liabilities caused by data loss. As documents file needs further investigation. For example, the
in the security area, especially in migrate to digital form from hard copy, sensitive tools might search for files that contain strings of
IT security audits and computer personal information gets stored in a variety of 16 digits that could represent a credit card number.
forensics. She has published many places digitally. National and international laws The strength of current PII tools is to quickly find
papers in these fields. are in place requiring companies to search for possible PII in locations in which data are visible
confidential data to ensure compliance. Some to the operating system. However, they do not
Bill Stackpole is an assistant US examples include the Family Educational provide support for searching data outside of these
professor in the department of Rights and Privacy Act (FERPA) and the Health traditional areas. For example, if a file containing
networking, security and systems Insurance Portability and Accountability Act the same string of 16 digits had been deleted,
administration at RIT. His teaching (HIPAA). At the state level in the US, New the current PII tools would not find the file even
has focused in the areas of York State’s Disposal of Personal Records Law though it may be recoverable. Deleted files are
system administration, system (2006) requires businesses to “properly dispose considered unallocated and are not available to
security and computer system of records containing personal information,” view by PII tools.
forensics. He has additional implying that this information must be unreadable
research interests in the areas of and unrecoverable. International privacy laws, Where Else Can Information Reside and Hide?
authentication and virtualization. many of which are more stringent than those in Data can reside in places that the operating
the US, require similar activity.2 system can see as well as places that are invisible
Luther Troell, Ph.D., is a To comply with these laws, security to it. Most of these nontraditional locations are
professor in the department professionals use a variety of sensitive completely overlooked by PII tools but can still
of networking, security and information discovery tools to find and remove harbor sensitive information. Some examples of
systems administration at readily available information stored on end-point these areas include:
RIT. Troell currently serves devices. While current PII discovery tools can • Metadata—Metadata contains file information
as the director of the school find information that is readily available, they are about the actual data in the file. Metadata can
of informatics and teaches not capable of discovering information that has include who created the information, what the
networking and information been encrypted, obfuscated, hidden, deleted or is information is about, when the file was created/
assurance classes. He has otherwise unrecoverable. It is critical to note that modified/accessed, and other information that
presented papers on curriculum the content and metadata of deleted files can be can describe particulars about the data file.
development in information easily recovered using standard forensics tools. Metadata can provide a place for sensitive
assurance as well as the This paper will introduce computer forensics data to reside, but it is not particularly easy to
development of an undergraduate techniques to reveal sensitive data that are likely remove.
program, Information Security to be missed by PII tools, including data in RAM • Recycle bin and unallocated space—When a file
and Forensics, and a graduate memory, graphics files, registry information or is deleted in a Windows machine, for example,
program, Computer Security and files marked as deleted. the file appears in the recycle bin. The complete
Information Assurance. path and file name are stored in a hidden file
called INFO2 in the recycled or recycler folder.

1 ISACA JOURNAL VOLUME 2, 2010


After emptying the recycle bin, the corresponding entries in • E-mail—E-mail, instant messages and other communication
INFO2 are also deleted. However, when the file is removed traffic can contain sensitive information.
from the recycle bin, the system simply marks the space • Registry hive—The Windows registry contains settings
previously occupied by this file to be “available for use.” specific to the hardware, applications, services, security and
Metadata information as well as the original contents of the users on the system. Much potentially valuable information
file remain on the hard drive, completely intact until the is stored there. Such information includes user and group
space is allocated to new files and the original information is identities and passwords and Internet history, including
overwritten with new data. As storage becomes less expensive cookies and records of queries such as searches and lists of
and hard drives increase in size, deleted files are likely to recently accessed files.
remain in unallocated space longer.
• Alternate data streams3—Microsoft’s New Technology PII TOOLS AND FINDINGS
File System (NTFS) uses “streams” to store file content. Existing PII tools
A stream is a link to a data storage location. NTFS allows Both open source and commercial products are available
multiple streams of data to be associated with a file but only to assist in detecting sensitive data on digital media. Some
the default (main) stream is normally displayed to the user. examples include:
One could hide sensitive data in any number of alternate • Find_SSN
streams associated with a host file with no apparent changes • Sensitive Number Finder (SENF)
to the host file. • Spider
• Files with modified extensions—File extensions are the • Identity Finder
characters following the dot in a filename, e.g., [Link]. Find_SSN is a simple-to-use Windows application.
They indicate a file’s data type but can easily be changed Developed by Virginia Tech, Find_SSN searches for files
by the user. Each file also has a file header that uniquely that may contain data matching the known patterns of
identifies the file type. Adversaries often attempt to disguise Social Security numbers. Find_SSN does not scan PDF
the true nature of a file by changing the file’s extension. and Microsoft Outlook PST files, and files larger than 100
Such changes can make it difficult for PII detection tools to megabytes. SENF is a tool written by the information security
find sensitive information in a file. group at the University of Texas at Austin. It has similar
• Graphics and images—Pictures, images and graphics can search capabilities as Find_SSN. Cornell University’s Spider
contain information that may be considered private or is an open source, more sophisticated scanning tool that can
sensitive. For example, a screen capture of a spreadsheet scan archives, documents and spreadsheets after selecting
page with account or other character-based information will a target directory. However, Spider does not scan PDF and
not be read as a character-based file. Microsoft Outlook PST files. Identity Finder is a commercial
• Print spool files—Printing involves a spooling process. For search tool capable of locating and identifying personal
a print job, the file’s content is written to a spool file (.SPL) information (such as Social Security, credit card and bank
and a separate graphics file (.EMF) for each page. These account numbers as well as passwords or dates of birth) in
Print Spool files are saved to disk until the print is done and files, e-mail, databases, registry entries and web browser
then deleted. caches. However, even with Identity Finder, there are still
• Link or shortcut files—Link files are shortcuts pointing to many areas not in its search path.
the actual files that allow users to launch programs, open
files and folders, or connect to a URL. They contain path Testing the PII tools
information to allow the operating system to navigate to the To test the limits of the existing PII tools, a variety of files
location of the actual data file. were created including .pdf, .xls, .doc, .txt and archive files on
• RAM and page files—Some sensitive information may be a USB drive. Two of these files were deleted to test whether
found only from the computer’s RAM and in its page files. the tools search for deleted files in unallocated space. The
At other times, such data will be written to disk if a machine file [Link] was also renamed to [Link] to test whether the
is suspended or hibernated. tools are capable of detecting sensitive data from a file with
ISACA JOURNAL VOLUME 2, 2010 2
its extension modified. A file was also created that did not data (digital steganography, for example), as well as links
contain Social Security numbers in its main content, but had or indirect references to PII. These limitations may lead to
these data hidden in an alternate data stream. Note that none inaccurate results with false positives and false negatives.
of the deleted files had been overwritten.
In addition, the tools were run against a live Windows FORENSICS TOOLS AND FINDINGS
machine so they could interrogate files that could not be As mentioned previously, the current existing PII
stored on the USB drive, such as deleted Microsoft Outlook identification tools search only within the file system and are
(*.pst) files, the Windows registry, alternate data streams, files not capable of detecting information in deleted files, or from
in the recycle bin, print spool files and RAM data. any location not normally accessible to a nonprivileged user.
To interrogate files from the USB test drive, each tool was Modern forensic tools can help to bridge this gap. Forensic
directed to run against the drive image and the results were tools are designed to recover and analyze data that has been
recorded. For other files not stored on the drive, the tool was intentionally or unintentionally deleted or otherwise hidden.
directed to the location on the hard drive where those files Originally, this class of tools was associated with activities
would normally reside. All the test files are listed in figure 1. in the law enforcement sector and was used exclusively to
discover legal evidence. Recently, they have been seen widely
Results From Running These Tools used in business and corporate environments.
The Find_SSN, SENF and Spider tools discovered only a Forensic tools are capable of bypassing the limits imposed
limited subset of the conditions presented. Identity Finder, by the operating system and can find file content that has
on the other hand, improved on the capability to identify PII been deleted (i.e., no longer available to the operating system)
when compared to the other PII tools tested. Identity Finder or has been stored in a place not typically accessible to a user,
not only detected all of the conditions found with the other such as RAM or the registry. They can display files stored in
tools, but also discovered PII in renamed files, PDFs, registry a variety of formats, allowing a knowledgeable user to find
entries and .pst files. However, all of these tools still had information that would otherwise appear to be inaccessible.
limited capability to detect sensitive information stored in While forensic tools’ strength is their capability to search
metadata, alternate data streams, graphics files, printer spools, nontraditional areas in which data can be hidden, their
RAM and page files. Additionally, none of the tools could weakness is that they can be time-consuming as they search so
identify PII that exists in unallocated space (if the recycle bin much more of the system than PII tools.
has been emptied, for example) even though the PII remains
intact on the disk. All of these areas represent potential data Existing Forensics Tools
leakage areas for the current PII tools. While many forensics tools may be capable of detecting PII,
A summary of the results is shown in figure 1. two commercial forensics tools, Forensic Toolkit (FTK)4 and
EnCase,5 were studied.6
Limitations of PII Tools Both EnCase and FTK run on Windows and provide
The current PII identification tools search for sensitive data— sophisticated digital evidence analysis functions such as
either through user-specified directories or starting at the recovering deleted files, keyword and regular-expression
root directory. They are not designed to find information that searching, registry viewing, e-mail and memory analysis, and
has been obfuscated, encrypted, deleted or otherwise hidden. much more. The popular, versatile and free forensics tool the
Figure 1 shows that PII identification tools miss finding Sleuth Kit (TSK) with an advanced interface, PTK,7 can also
information in areas including unallocated space (i.e., deleted perform most of the same tasks. These tools can be used on
files that can be recovered), e-mail and deleted e-mail, files offline as well as running systems.
open and locked or auto-saved by other processes, system
files (executables, DLLs, hibernation files), the Windows Finding PII Using Forensics Tools
registry, application-specific database files, memory and page The forensic tools were applied to interrogate the same
files, metadata, graphics files (screenshots, thumbnails, RPM, dataset tested by the PII tools. The results were then
TIFF, JPG, EMF, etc.), intentionally hidden PII in digital incorporated into figure 1.
3 ISACA JOURNAL VOLUME 2, 2010
Figure 1—Test Files and the Results
Identity
File Name Description of the file FindSSN Spider SENF Finder FTK EnCase
My Recent Documents A folder containing a link file that links to [Link] No No No No FOUND by FOUND by
following following
link link
SSN test [Link] A PDF file containing Social Security numbers No No No FOUND Viewable * Viewable*

[Link] A [Link] file, renamed to a JPEG file No No No FOUND FOUND FOUND


[Link] Excel spreadsheet FOUND FOUND FOUND FOUND FOUND FOUND

PII detection [Link] PowerPoint slides FOUND No No FOUND FOUND FOUND

SSN test [Link] Word document with Social Security numbers in FOUND FOUND FOUND FOUND FOUND FOUND
content
SSN test [Link]
Word document with Social Security numbers in No No No No FOUND FOUND
summary (metadata)
It was printed to generate a print spool file (*.emf) No No No FOUND Viewable * Viewable*
Deleted Social Security numbers test file No No No No FOUND FOUND
[Link] A screen shot containing Social Security numbers No No No No Viewable * Viewable*

[Link] RTF FOUND FOUND FOUND FOUND FOUND FOUND


[Link] Text file FOUND FOUND FOUND FOUND FOUND FOUND
[Link] Deleted text file (after recycle bin emptied) No No No No FOUND FOUND
PII [Link] Zip file containing [Link] and Social Security FOUND FOUND No FOUND FOUND FOUND
number test [Link]
pst file Outlook file with e-mail not deleted No No No FOUND FOUND FOUND
(limited
support)
deleted pst file Outlook file with e-mail deleted No No No No FOUND FOUND
File with SSN in alternate Word document with Social Security numbers in No No No No FOUND FOUND
stream alternate data stream
File in recycle bin File deleted but recycle bin not emptied FOUND FOUND FOUND FOUND FOUND FOUND
RAM and page files Contents of memory with Social Security numbers No No No Unknown FOUND FOUND
in memory
Windows registry PII written to Windows registry No No No FOUND FOUND FOUND
* While the tools are not capable of searching these files directly, they allow display of the enclosed image using gallery view.

ISACA JOURNAL VOLUME 2, 2010 4


Both Encase and FTK offer live and static data search files are deleted after the print is done, forensics tools are
functions that support pattern matching using regular capable of recovering deleted EMF files from unallocated
expressions. For example, a search might be conducted to space. Once the EMF file content has been recovered, the
identify files containing Social Security numbers using the information is then viewable using the forensics Graphics
pattern: <\d\d\d[\- ]?\d\d[\- ]?\d\d\d\d\> (where \d represents View feature described previously.
a “digit”). Using this search, the forensic tools discovered • PDF files—Regular-expression live search does not support
Social Security numbers from all deleted files, the signature searching for PDF files. PDF files must be converted to
mismatch file, the ppt file, RAM and page file, e-mail including text using OCR techniques for the live search. For this
deleted e-mail, link files, and registry hives. Data that had been research, the authors viewed all PDF files via EnCase/FTK’s
hidden in the alternate data stream were revealed as well as Transcript/Native View tab. This is not a viable solution if
the PII stored in the metadata of the file. These are not trivial there are many PDF files to be searched. Deleted PDF files
differences. A file containing PII in its metadata would not be can be recovered using the data carving feature.
discovered by any of the previously discussed PII tools, since In summary, using the forensic tools, all of the traditional
they focus only on file content. Using a process called signature locations for PII, as well as many areas in which PII could
analysis, the PII hidden in a file with its file extension changed be intentionally or inadvertently hidden, can be searched.
from .txt to .jpg is also easily discovered by the forensics tools. Forensics tools can add significantly to the results provided by
The signature analysis process identifies and corrects the PII tools.
changed extension by comparing the file extension with header
information stored in a file. CONCLUSIONS
Even though the powerful live search reveals many results A fundamental strength of forensics tools when used for
that were missed by the PII identification tools, the sensitive detecting sensitive information is that they can recover deleted
information hidden in graphics files, a print spool file and a files and embedded images, list all alternate streams for each
PDF file were not detected by live search. The authors also file, and perform signature analysis before conducting a search
utilized other technologies (built into the forensic tools) to for sensitive information. A search using forensics tools not only
uncover the rest of the PII information: checks the file content, but also examines file slack, metadata,
• Graphics files—To identify sensitive information hidden links and other content. As a result, forensics tools are capable
in graphics files, the authors used the Gallery/Graphics of discovering more information than the PII detection tools
View feature. By clicking the Graphics/Gallery View tab, that were evaluated. They provide an additional source of
all graphic formats including EMF, BMP, TIFF, JPEG, PNG information by which a PII search can become more effective
and GIF are displayed. The operator can then view the and uncover more potential sources of PII.
files to make a determination about the sensitivity of the However, there is no single tool capable of effectively
image content. finding all PII. Each of the tools has its strengths and
weaknesses. This article was intended to address data that are
It may be possible to convert image files to text using optical
likely to be missed by currently available PII tools. Forensics
character recognition (OCR) techniques. Regular-expression
tools can be a powerful addition to the auditor’s toolbox,
searches can then be employed to search for strings in the
providing additional capabilities to reveal sensitive data. If
resulting files. This capability is not currently a feature of
an auditor uses only PII tools, it is possible that sensitive
for-free or for-pay forensic tools, but such capabilities could
recoverable information will be overlooked.
improve PII scanning exercises as well as the forensics tools.
Organizations concerned about unauthorized
• Print spool files—Forensics tools support data carving that
dissemination of sensitive information should be aware
allows an investigator to search for EMF, GIF, JPEG, PDF,
of the limitations of their current PII tools and use this
HTML and MS Office document files embedded in other
knowledge to make decisions about potential data leakage or
files or unallocated space. This feature can recover deleted
compromise. Otherwise, it might be interpreted as a failure
files embedded in unallocated space as well as the EMF
to adhere to privacy laws or regulations that could subject the
files embedded in print spool files. Even though the EMF
organizations to legal liability or negative publicity.
5 ISACA JOURNAL VOLUME 2, 2010
ACKNOWLEDGEMENT
The authors wish to acknowledge the reviewers for their
invaluable comments and suggestions that have improved the
overall quality of the paper.

ENDNOTES
1
[Link], “2009 Identity Theft Statistics,”
[Link]/guide/2009-identity-theft-statistics
2
Information Shield, “International Privacy Laws,”
[Link]/[Link]
3
Parker, Don; “Windows NTFS Alternate Data Streams,”
Security Focus, 16 February 2005, [Link]/
infocus/1822
4
Forensic Toolkit (FTK), [Link]
5
EnCase Forensic, [Link]
6
The authors do not endorse these products in any way.
7
PTK and Sleuth Kit, [Link]

ISACA JOURNAL VOLUME 2, 2010 6

Common questions

Powered by AI

Forensic tools enhance the detection of sensitive data by searching nontraditional areas where PII might be hidden, such as unallocated space, RAM, and metadata. They can recover and analyze data that has been deleted, finding file content in regions not accessible to typical users. Unlike PII tools, forensic tools use signature analysis to identify altered file extensions and recover PII hidden in alternate data streams. They also enable viewing metadata and image content with Graphics View features, utilizing OCR for image-to-text conversion when necessary .

Regular expressions facilitate PII detection by allowing forensic tools to perform complex pattern matching on data, identifying sequences such as Social Security numbers across various file types, including deleted and altered ones. This approach is more effective than simple keyword searches because it can account for variations in format and structure within the data. Using regular expressions, forensic tools can efficiently search through large datasets to find instances of PII that might be formatted differently or hidden among non-alphanumeric characters .

The combination of live and static data searches in forensic tools provides a comprehensive PII detection approach by covering various states of data existence. Live searches allow for real-time analysis of running systems to detect PII in volatile memory, temporary files, and active processes. Static searches, on the other hand, analyze data stored on disk, including deleted or archived content. This dual methodology enables forensic tools to address both transient and persistent data types, ensuring a more thorough detection and analysis of all applicable PII, overcoming the limitations of static-only or live-only tools .

FTK and EnCase are more capable than traditional PII tools due to their ability to conduct comprehensive searches for deleted files and information stored in inaccessible areas like RAM and registry. They utilize signature analysis to detect mismatches in file extensions, can view images in various formats for hidden information, and support data carving to recover deleted files from unallocated space. They effectively combine live and static data search functions, supporting complex pattern matching through regular expressions, which allows them to reveal PII in file metadata, alternate data streams, and digital steganography .

PII identification tools have several limitations, including their inability to find information that has been obfuscated, encrypted, deleted, or otherwise hidden. They also miss finding information in unallocated space, email and deleted emails, system files, and metadata. Furthermore, they cannot detect PII in non-file content areas such as RAM, print spool files, and pages files. These tools also struggle with identifying PII in digital steganography or indirect references to PII .

Data carving techniques identify and recover files from unallocated space, effectively overcoming standard scanning limitations that miss deleted files. By searching within otherwise inaccessible areas, such as after the recycle bin is emptied or in print spool files, data carving retrieves hidden PII that conventional scanning methods ignore. This process is crucial for extracting embedded files or fragments that persist beyond their visible deletion, filling a critical gap in PII detection capabilities .

Features such as signature analysis, data carving, and extended search capabilities enhance forensic tools. Signature analysis identifies mismatched file extensions, data carving recovers files from unallocated spaces, and extended search capabilities include viewing metadata and graphic content, supporting image-to-text conversion when needed. These features allow forensic tools to recover deleted files, view hidden data streams, and access memory and registry data, significantly expanding their PII detection range compared to traditional PII identification tools .

Manual visual inspection is sometimes necessary because certain types of PII, such as image content with embedded text or sensitive information, cannot be automatically detected by forensic tools. The limitations of this approach include the time-consuming nature of manually reviewing large amounts of data and the potential for human error in recognizing sensitive information. While the tools provide a visual interface to display image formats, the decision on the sensitivity of content often relies on an investigator's judgment, which can vary .

Forensic tools address the issue of disguised digital data by using signature analysis, which identifies inconsistencies between a file's extension and its content's header information. This approach is effective because it bypasses the deception of merely renaming file extensions, allowing the tools to correctly identify the file type and its contents regardless of external alterations. By correcting these mismatches, forensic tools can detect hidden PII that other tools would miss .

Improving OCR can significantly enhance forensic tools by enabling them to convert image files into searchable text formats, allowing for regular-expression searches on the resulting data. This improvement would facilitate the detection of text hidden within images, graphics, and scanned documents, areas where current forensic tools struggle. Enhanced OCR can expand the scope of PII detection to include text data that is visually embedded in non-text files, improving the thoroughness and accuracy of forensic analyses .

You might also like