Computer Protection Program Berkeley Lab
Computer Protection Program at Berkeley Lab Security
Ernest Orlando Lawrence Berkeley National Laboratory
Emergencies | Site Index | Contact Us
CPP Home
Contacts
Policy Guidelines
Scan Information
System Procedures
Tools & Services 
ALERTS
Recent CPP Actions
News & Articles
CPP Intranet
 
 
 
  TOOLS & SERVICES  
Detecting PII on Computers  

Overview

The loss of Personally Identifiable Information (PII) can lead to identity theft, loss of privacy, and other undesirable outcomes for individuals. In order to protect individuals from identify theft, State and Federal lawmakers have created laws that require entities to protect PII and report if PII has been compromised. U.S. lawmakers have paid special attention to the social security number because it can be easily used to commit identity theft .

When PII is compromised the entity that caused the compromise faces high reporting costs and a significant reputation impact. It is also common that disclosure of PII makes it into the media. For example, the Department of Veterans Affairs incident and the UC Berkeley incident both made national media headlines.

Berkeley Lab does not want to be the source of a PII disclosure. Therefore, Berkeley Lab has established policies to protect PII. These policies are listed on the CIO policy website. The relevant policy for this particular webpage is as follows:

"You may not collect and store Protected Information at LBNL to include Social Security Numbers, Personally Identifiable Health Information, Driver's License Numbers, or Financial Account Numbers without prior authorization from the Computer Protection Program."

You do not want to be the source of a PII disclosure. If you think you have PII on your workstation, in your email archives, in eRoom, or on a network drive, right now is the time to remove it. The Computer Protection Program has created a tool to assist you in identifying possible locations of PII on your workstation. This tool currently focuses on identifying files on your local workstation that may contain Social Security Numbers (SSNs). Instructions for using this tool, called Find_SSNs, are below.

About Find_SSNs

Find_SSNs is a standalone, meaning you do not need to install anything, program to flag files that potentially contain SSNs. The tool was originally written by Brad Tilley at Virginia Tech and has been customized by CPP for Berkeley Lab. The tool only runs on Windows. If you need a similar tool for OSX for Linux, please contact cppm@lbl.gov.

Download Find_SSNs

The tool is not perfect. The tool will identify some files as containing SSNs that do not actually contain SSNs, this is known as a false positive. The tool may also miss some files that contain SSNs, this is known as a false negatives. The tool is meant to assist you, not accurately identify all possible PII. Only you have the knowledge and expertise to properly comb your files and email to remove PII.

Before You Begin

Before you begin please take the following steps to reduce false positives and allow Find_SSNs to provide the most value.

  • If you already know you have PII, remove it before running Find_SSNs. There is not much value in Find_SSNs informing you of what you already know. If you know you have PII on your computer, remove it. If you know you have SSNs stored in emails, remove them.
  • Empty your Recycle Bin and the Trash in your email client. Any of the files you have deleted in the preceding step remain in your Recycle Bin and Trash until you delete them.
  • Clear your web browsers' cache. Many temporary files your browser stores have files that contain strings of numbers. These files can be flagged by Find_SSNs so best to remove them before you start.

Also, please consider involving your local technical support staff in running Find_SSNs. You support staff can help ensure the procedures below are followed correctly and provide valuable guidance in interpreting the results.

Running Find_SSNs

If you have not done so already, download Find_SSNs. When you download the Find_SSNs tool, you can locate it by looking for the following icon.

Double click the icon to begin running Find_SSNs. Find_SSNs does not install anything on your computer, it is a standalone executable program There are four steps to running Find_SSNs. Each of these steps are outlined below. Find_SSNs typically takes 30-45 minutes to run.

  1. The first step is to select the folder to scan. The default is to scan your C:\ drive, which is recommended. If you have files stored in other locations, such as D: drive or the H: drive, you should scan those separately. You may also wish to target specific folders you know may contain PII. Targeting folders will reduce the time it takes for Find_SSNs to run, but you could miss scanning some files. We do not recommend you scan "My Computer" or "Computer" as this will scan all local and remote drives, which could take a really long time.



  2. In step 2, Find_SSNs counts the number of files to be scanned. This step is necessary to provide you an accurate estimate of time it will take Find_SSNs to finish. This step typically takes 2-3 minutes.



  3. In step 3, Find_SSNs builds the list of files to scan. This step typically takes 3-4 minutes.



    Find_SSNs pauses before the real work begins. This is your chance to quit Find_SSNs. Once you select 'Yes', Find_SSNs will begin to scan the contents of your files for SSNs. This will typically take 30-45 minutes. Your computer will be usable during this time, but likely it will run very slow. It is recommended you not use your computer during this time to allow Find_SSNs to finish as soon as possible.



  4. In step 4, the searching is performed. Find_SSNs searches each of your files for potential SSNs.



  5. When Find_SSNs finishes the following dialog will be displayed. The dialog reminds you that the real work has just begun. When you click OK, the results file will open. This file needs to be reviewed for SSNs. Files with SSNs must have the SSNs deleted or the files deleted. In the next section we offer some tips for reviewing the results.htm file.

 

Examining the Results

After Find_SSNs runs it will open your default webbrowser to display the results file, called results.html. If results.html does not open automatically, locate the Scan_Results file in the same folder as Find_SSNs. Locate the file called results.html and double click it to open.

Below is an example of a results.html file. At the top of the file are details about when the Find_SSNs program was run and on what computer. The next two paragraphs describe the format of the file, which is "possible SSN masked" followed by the file location. The file locations are links, so in most cases you can click on the link to examine the file. Now the real work begins, you need to examine each of the files for potential SSNs. Below we have some tips for reviewing files.

File review tips: (we will add more as we get feedback)

  • URGENT=>. Microsoft Excel and Word files are flagged as URGENT and printed in red. Current usage of Find_SSNs suggests these files are the most likely to contain PII. Excel files are most worrisome since they could contain many SSNs. Please review the files marked URGENT first.
  • Use the masked SSN to your advantage. When reviewing files you can use the masked SSN to assist you in determining if a file contains a SSN. Note the last four digits of the masked SSN, for example 7774. Search within the file for those digits. This will lead you to what Find_SSNs thinks is the problem. If the digit is not a SSN or is part of a larger number, you can be fairly confident this is a false positive.
  • Only the first SSN is flagged. Find_SSNs only flags the first instance of a possible SSN in the file. So be aware that the file could contain other instances. This is particularly important when reviewing mail folders. If a mail folder is flagged, you should carefully review all the mail in that folder.
  • Mail locations. If you store email on your local drive, Find_SSNs can read the contents of the mail since it is stored in clear text. You should use your mail client to review the mail folders flagged by Find_SSNs. Find_SSNs cannot tell the particular piece of email flagged, only the folder. Again, you can use the masked SSN to your advantage here and search the folder.
  • Cookies. Cookies left by your webbrowser often contain long strings of numbers. Some of these numbers can look like SSNs. Although Find_SSNs tries to weed out these false positives some may exist. In most cases you can safely ignore any cookies. If you are not sure, best to just delete the cookie.
  • CPP can help. If you have any questions please send them to cppm@lbl.gov. Feel free to include a copy of your results.html file in the email. We can reduce false positives the more sample results.html file we get.

Scan_Results folder

When Find_SSNs runs it creates a folder, in the same location as Find_SSNs.exe, with four files. One of the files, results.html, you are already familiar with. This is the file you reviewed in the previous step. You can also open results.html to review the results at a later time. The other files in this folder (analyzed.txt, exceptions.txt and skip.txt) are files produced by Find_SSNs with additional details about what was done. In most cases, you do not need to review these files. In some cases, CPP may ask you to send these files for troubleshooting.

Important: When you are done reviewing Find_SSNs results, delete the Scan_Results folder. The files in the folder may provide information about the location of files containing PII. You can always rerun Find_SSNs to reproduce the results.

Logging

While Find_SSNs is running, it logs progress to the CPP syslog server. The logging provides CPP a mechanism to keep track of how many people have run the Find_SSNs tool, how long Find_SSNs takes to run on average, and feedback to detect failures. Nothing about results is logged. Below is an example of the logging produced.

Sep 25 16:11:11 128.3.128.26 find_ssns[0]: program started on neo
Sep 25 16:12:33 128.3.128.26 find_ssns[0]: step 1 - search path chosen C:\
Sep 25 16:17:39 128.3.128.26 find_ssns[0]: step 2 - counted files
Sep 25 16:17:39 128.3.128.26 find_ssns[0]: step 3 - built file list
Sep 25 17:09:19 128.3.128.26 find_ssns[0]: step 4 - user OK to begin hunt
Sep 25 17:23:26 128.3.128.26 find_ssns[0]: program finished on neo

Other Tools

This section is a holder for other tools. You likely do not need anything from this section unless someone from CPP has directed you here.

Cornell Spider - OSX
Cornell Spider - Linux/UNIX and Documentation

 

Help/Feedback

If you have questions or comments about this website please contact the CPP group via email at cppm@lbl.gov.

If you need general computer assistance, please contact the LBNL Help Desk at x4357, help@lbl.gov , or online at http://www.lbl.gov/help.

 

 

Home | Contacts | Policy Guidelines | System Procedures | Tools & Services | ALERTS | News & Articles