Best Practices: Handling Sensitive Data

Sensitive data is information that could harm the subjects of a study if it were disclosed. While we often think of individuals and communities that can be affected, protected fauna and flora, and specific geographic regions or environments can also fall into the sensitive data category. Responsible stewardship is a core ethical and legal obligation for all UB researchers. Every researcher when they begin collecting their data should ask themselves, What's the potential for harm if this data is improperly disclosed, and what steps can I take to prevent that harm?

Know Your Data's Classification

Before you decide where and how to store your data, it's vital to understand its classification according to the University at Buffalo's Data Risk Classification Policy. This policy categorizes data into three levels based on its sensitivity and the potential impact of its unauthorized disclosure.

Key Principles for Sensitive Data

Beyond just storage, you must protect sensitive data throughout its entire lifecycle.

Practice Data Minimization

Collect and retain only the absolute minimum data necessary for your work.

Collect Only What You Need: Do not collect sensitive identifiers (like names or SSNs) if anonymous codes will suffice.
Delete What You Don't Need: As soon as data is no longer required for your project or to meet retention policies, dispose of it securely.

Encrypt Everything

Encryption scrambles data so it's unreadable without a key, making it your single most effective protection against theft or loss.

Encryption at Rest: This protects data where it is stored. Enable BitLocker (Windows) or FileVault (macOS) on all computers that handle sensitive data. This is a university requirement.
Encryption in Transit: This protects data as it moves across a network. When accessing UB resources from off-campus, always connect to the UB VPN. This creates a secure, encrypted tunnel for your data.

Control Access with the Principle of Least Privilege

Only grant access to individuals who have an explicit, job-related need to see the data.

In UBbox or OneDrive, share files with "Specific people" by entering their UBITName.
Set permissions to "View only" if the collaborator does not need to edit the file.
Set expiration dates on sharing links for temporary collaborations.
Regularly review folder permissions and remove individuals who no longer need access.

De-identify or Anonymize Data

For research, removing identifiers is a key strategy for reducing risk.

De-identification: You replace direct identifiers (like name, address) with a code. You maintain a secure "key" file that links the code back to the identifier. The key must be stored separately and with even higher security.
Anonymization: You strip all identifying information permanently. The data can never be re-linked to an individual.

Ensure Secure Disposal

Simply moving a file to the Trash or Recycle Bin is not enough to delete it securely.

Electronic Data: Use a secure file deletion utility that overwrites the data multiple times. For disposing of an entire computer or hard drive, contact UBIT for approved disposal procedures.
Physical Data: All paper records containing Category 1 data must be cross-cut shredded.

Data from Human Participants

This is the most common category of sensitive data at a university. It encompasses not only personal identifiers but also any information that could put an individual at risk if disclosed either directly or indirectly.

Types of Sensitive Human Participant Data:

Personally Identifiable Information (PII)
- Direct Identifiers: Name, Social Security Number, address, etc.
- Indirect identifiers: zip code, birthdate, education, race, etc., which could be used in combination to identify an individual.
Protected Health Information (PHI)
- Data governed by HIPAA, including medical history, diagnoses, and treatment details.
FERPA-Protected Data
- Student records, grades, and schedules.
Other Sensitive Information
- Opinions on sensitive topics, sexual behavior, mental health information, criminal or illegal behavior, or any data that could lead to stigma or discrimination.

Information in a dataset that can be linked with outside information, from sources such as social media, administrative data, or other public datasets, that results in identification of an individual can be sensitive human participant data.

Key Governing Body: UB's Institutional Review Board (IRB) is the definitive authority. All research involving human participants must be reviewed and approved by the IRB before any data collection begins.

Data Management Practices:

Informed Consent: Your consent process must transparently inform participants how their data will be stored, used, protected, and potentially shared in a de-identified format.
De-identification: This is the primary method for protecting participant privacy. Replace direct and indirect identifiers with participant codes and maintain a secure, encrypted key file that links codes to identifiers. This key file must be stored separately from the research data, with extremely limited access.
- HIPPAA Safe Harbor Method for de-identification.
- NLM-Scrubber - Clinical text de-identification tool developed by the National Library of Medicine
Secure Storage: Use only IRB-approved, UB-sanctioned storage for this Category 1 data:
- UBbox and Microsoft OneDrive: Approved for storing de-identified data and for collaboration within the research team. Use specific, person-by-person sharing and never public links.
- Secure Departmental Servers: Often required by the IRB for storing identifiable data or the de-identification key file.
Access Control: Employ the "Principle of Least Privilege." Only research team members listed on the IRB protocol should have access to the data.

Be aware that in addition to the content of the data, the agreement made with participants in your IRB can also limit the extent to which human subjects data can be shared.

Data About Communities

Sensitive data can also describe groups or communities, where disclosure could cause collective harm even if individuals are not named. When dealing with data about communities you should ask yourself: How would public release of this data, even if de-identified, impact the community?

Types of Sensitive Community Data:

Group-Identifiable Data: Data about a small, easily identifiable group (e.g., residents of a specific housing complex, members of a small cultural group) where individuals could be re-identified through inference.
Culturally Sensitive Information: Data on the cultural practices, traditions, or sacred knowledge of a group, especially Indigenous communities.
Data with Potential for Stigmatization: Information that could lead to stereotyping or negative impacts on a community (e.g., mapping crime or disease outbreaks at a granular, neighborhood level).

Data Management Practices:

Community Engagement: When research focuses on a specific community, consider a community-based participatory research approach. Engaging with community leaders can provide crucial context and help establish trust and appropriate data handling protocols. For research with Indigenous peoples, principles like OCAP® (Ownership, Control, Access, and Possession) should be respected.
Data Aggregation: To prevent re-identification and stigmatization, aggregate data to a broader geographic or demographic level before sharing (e.g., report findings at the county level instead of by census tract).
Geographic Obfuscation: Avoid publishing maps that pinpoint sensitive locations or patterns within a small, identifiable community.
Ethical Review: Your IRB protocol should address the potential for group harm and the steps you will take to mitigate it.

Data About Fauna & Flora

In ecological and environmental science, locational data about species can be highly sensitive. For this reason, you might include in the documentation Institutional Animal Care and Use Committee (IACUC) approval, other animal research approval, or whether approval was not required for this type of research.

Types of Sensitive Fauna & Flora Data:

Locations of Threatened or Endangered Species: Precise GPS coordinates of nests, dens, or habitats.
Locations of Commercially or Recreationally Valuable Species: Areas with high concentrations of species subject to poaching (e.g., certain orchids, turtles) or over-harvesting (e.g., ginseng, fish spawning sites).

Reason for Sensitivity: Public disclosure could lead directly to poaching, illegal collection, or habitat destruction from excessive human traffic.

Key Governing Body: UB's Institutional Animal Care and Use Committee (IACUC) is the definitive authority.

Data Management Practices:

Geographic Obfuscation (Geo-masking): This is the primary protection method. Do not share raw GPS coordinates. Instead, generalize the data by:
- Rounding coordinates to a lower degree of precision.
- Using broader locational information (e.g., "Allegany State Park" instead of a GPS point).
- Releasing data points within a larger grid to mask the exact location.
Data Use Agreements (DUAs): When sharing precise data with trusted partners (e.g., NYS Department of Environmental Conservation, U.S. Fish and Wildlife Service), use a formal DUA that legally binds the recipient to protect the data.

Federal Data Sharing Mandates with Sensitive Data

Recent federal policies require federal funded researchers to share their data at the time of publication, which can pose problems for researchers working with sensitive data.

How to comply with a mandate to share when your data is sensitive?

As Open as Possible, As Closed as Necessary. Federal funders recognize that not all data can be made fully public. However, funders also understand that there are circumstances which put people, communities, and species at risk. For this reason, it is important to make decisions about your data when you begin your research by identifying that the data cannot be shared when you are writing your data management & sharing plan.

Data Management Practices:

Develop a Robust Data Management and Sharing Plan (DMSP): At the grant proposal stage, you must create a DMSP that prospectively outlines how you will manage sensitive data.
Justify Restrictions: Your DMSP is the place to provide a legitimate justification for limiting access to your data. Valid reasons include:
- Participant consent does not permit public sharing.
- The data cannot be sufficiently de-identified without compromising its scientific utility.
- Sharing could cause harm to a community or endangered species.
- There are legal or contractual restrictions on sharing the data.
Use Controlled-Access Repositories: The preferred method for sharing sensitive human data is through a controlled-access repository (e.g., NIH's dbGaP). These repositories do not make data public. Instead:
- You deposit your de-identified dataset.
- Other researchers must apply for access.
- A Data Access Committee reviews the request to ensure it is for legitimate scientific purposes.
- Approved researchers must sign a DUA, legally binding them to protect the data.
Design Forward-Thinking Consent Forms: Work with the IRB to craft consent language that informs participants their de-identified data may be shared for future research through a controlled-access repository.

Approved Storage for Sensitive Data at UB

Where you store sensitive data is your most important decision. Using unapproved services can lead to a data breach and severe penalties.

Storage Solution	Approved for Category 1 Data?	Key Security Practices
UBbox	Yes	Use specific, person-by-person sharing. NEVER use the "Anyone with the link" option for sensitive files. Regularly audit who has access to your folders. UB also has a UBbox Sensitive Data Storage Service.
Microsoft OneDrive (UB Account)	NO	OneDrive is not approved for storing any Category 1 UB data. It does not fall under the university's security agreement. OneDrive in conjunction with SharePoint security configurations can be used with Category 2 data.
Secure Departmental Servers / UBIT-Managed Servers	Yes	These are often configured specifically for research involving sensitive data (e.g., HIPAA-aligned servers). Consult with your departmental IT staff or UBIT.
Personal Computers /Laptops	Only if Mandated Security Measures are in Place	Your entire device hard drive must be encrypted using BitLocker (Windows) or FileVault (macOS). The device must have up-to-date security software.
External Hard Drives /USB Drives	Only if the device is fully encrypted.	Use hardware-encrypted drives or software encryption like BitLocker To Go. Unencrypted portable media are a major source of data breaches and should never be used for Category 1 data.
Personal Cloud Storage (e.g., personal Google Drive, Dropbox)	NO	These services are not approved for storing any Category 1 UB data. They do not fall under the university's security agreements.
Email	NO	Email is not a secure storage or transfer mechanism for sensitive data. Do not send or store Category 1 data in your email account.

Best Practices: Handling Sensitive Data

ON THIS PAGE

Know Your Data's Classification

Key Principles for Sensitive Data

Practice Data Minimization

Encrypt Everything

Control Access with the Principle of Least Privilege

De-identify or Anonymize Data

Ensure Secure Disposal

Data from Human Participants

Data About Communities

Data About Fauna & Flora

Federal Data Sharing Mandates with Sensitive Data

Approved Storage for Sensitive Data at UB

Resources