This page explains the principles involved in securing IRB
permission for data sharing. If you already have IRB clearance and are
ready to contribute your data to TalkBank (CHILDES, AphasiaBank,
SLABank, etc.), you should follow these instructions
on how to actually submit your data.
1. IRB Principles
TalkBank members who are interested in
contributing their data need to make sure that they obtain IRB approval
for their study, along with informed consent from individual
participants. There are no standard forms for IRB applications, since
every university or institute creates their own forms, procedures, and
templates. For the purposes of contributing to TalkBank, the important
thing is to select the appropriate level of access to the data that
participants are being asked to grant. To help you determine this, we
have created an OPTIONS summary for the 9
options that are available. We would recommend that you ask
participants to permit unrestricted access with pseudonymization of the
transcripts (Options 1 and 2). You should include on your form the fact
that participants always have the right to request that parts or all of
the data in which they participate be removed from TalkBank at any time.
2. Contributions of Archival Data
Often researchers will wish to contribute data collected in projects that have
already been completed. In such cases, it may be difficult or impossible to
contact participants to obtain a new consent form. However, IRBs are allowed
to permit including these data in TalkBank, if certain conditions are met.
The original consent forms should not have exclusionary language such
as "These data will only be made available to Professor XYZ and her laboratory".
If the consent forms says something like "These data will only be made available to
qualified researchers," then inclusion in TalkBank should be allowed, as long as
only qualified researchers are given the necessary password. If the consent form
is still more general, then passwords may not be necessary.
It is important to emphasize that granting agencies stipulate that data collected with federal funds
should be made available to researchers, as long as anonymity is preserved.
3. GDPR Compliance
The General Data Protection Regulation (GDPR) establishes rules for personal data
on the web. The EU web site for GDPR issues is https://gdpr-info.eu/.
In regards to TalkBank, there are five core GDPR issues
Commercial purposes issue:
GDPR is designed to apply to data transferred for commercial purposes.
TalkBank has no commercial purposes. However, it could still apply if
TalkBank were to collect emails and addresses, which it does not do.
The scientific data issue:
A good summary of these issues can be found in
this Nature article which notes that, consent is given "to certain areas of
scientific research when in keeping with recognised ethical standards
for scientific research." Article 89 of the GDPR states that, "Where
personal data are processed for scientific or historical research
purposes or statistical purposes, Union or Member State law may provide
for derogations from the rights referred to in Articles 15, 16, 18 and
21 subject to the conditions and safeguards referred to in paragraph 1
of this Article in so far as such rights are likely to render impossible
or seriously impair the achievement of the specific purposes, and such
derogations are necessary for the fulfilment of those purposes." In other words,
data-sharing is allowed for research purposes. In addition,
Recital 113 allows for transfers of data from a limited number of
data subjects for scientific purposes for an increase of knowledge.
The informed consent Issue:
NIH IRB informed consent guidelines are in accord with the GDPR Consent
rules. Given this, if participants give consent for making data available to
qualified researchers, then this should be approved. GDPR emphasizes also that
this consent must be revocable and that there should be methods for allowing
participants to revoke consent.
The deidentification issue:
If data are deidentified, then they are not personal data and are not covered
by GDPR and IRB. Data are not be anonymous or deidentified if they have: name plus surname,
credit card, telephone, address, or number plate. First name alone is not
identifying. It has to be Name plus Surname. Anonymization must be
irreversible. This means that contributors should destroy participant
names. This holds in both EU and USA. However, the GDPR catch-22 here is that a
link to the data needs to be maintained to allow for data removal. The
solution for this is to make the information linking to a person only
available to a third party "honest broker". See below for a discussion of
identification based on voice samples.
The Code of Conduct issue:
Article 40 allows for development of a Code of Conduct to facilitate
data transfer to non-EU countries. In the case that an institution prefers
to have identifiable media stored on servers in the EU, it is possible to
implement CORS (cross origin resource sharing) from a CHAT file at CMU to
a media server in the EU. This is done by allowing access from https://*.talkbank.org.
4. Deidentification
In order to deidentify transcripts, it is important to replace any last names
with the word "Lastname" with a capital L. Also
addresses or local city names should be replaced with "Addressname" with a
capital A. Other forms include "Cityname", "Schoolname", "Hospitalname" and so on.
These same English words should be used even in other
languages. It is not crucial to replace children's first names unless
they are very unique.
The EU Amnesia project at https://amnesia.openaire.eu provides software for
deidentification of spreadsheet data.
For audio deidentfication, we can then use the
occurrences of the terms Lastname and Address in the transcripts to
guide the removal of the names and addresses from the corresponding
segment in the audio track. You can follow the suggestions in the section
of the CLAN manual on "Audio Anonymization" which are also available
here
Once this is done, children and others can
only be identified by people who already know them. Because of this,
contribution of audio is equivalent in IRB terms to contribution of
transcripts.
You can also save yourself a lot of trouble if you avoid using idetifying
information when making recordings.
Voiceprints
Researchers often ask about whether they
need to request additional IRB approval for contributing audio data.
The concern is that audio data may be less confidential than transcript
data. However, as long as identifying material is removed from both
transcripts and audio, they do not present additional confidentiality
issues.
Some reviewers and IRB committees believe that spoken data is identifiable
through voice recognition technology. However, this judgment is based on a confusion
between closed-set identification and open-set identification. Closed-set
identification relies on a pre-existing pool of voiceprints from a given group,
such as members of a company or subscribers to a service. Open-set identification
does not rely on this pre-existing pool of voiceprints. As noted by Togneri and
Pullella (2011), "in open-set identification the unknown individual can come from the
general population. However as identification is always carried out
against a finite, known pool of individuals it is not possible to
identify arbitrary people."
Togneri, R., & Pullella, D. (2011). An overview of speaker
identification: Accuracy and robustness issues. IEEE circuits and
systems magazine, 11(2), 23-61. pdf
As Yuan and Liberman (2008) discovered, speaker identification in even
a closed group of Supreme Court judges in TalkBank's SCOTUS corpus is still
very difficult.
Yuan, J., & Liberman, M. (2008). Speaker identification on the SCOTUS
corpus. Journal of the Acoustical Society of America, 123(5), 3878.
pdf
5. Contributions to CHILDES and PhonBank
Although each University and project will have different requirements, contributors
often ask for a generic contribution template form, so here is a
sample CHILDES/PhonBank consent form based
roughly on the local format at CMU.
6. Contributions to AphasiaBank/DementiaBank/TBIBank/RHDBank:
Research with subjects with
disabilities requires additional access restriction, such as password
protection. It may also require more complete IRB documentation. In
this regard, researchers working with the AphasiaBank protocol will find
these additional IRB-approved materials useful:
Contributions to the other three clinical databanks -- DementiaBank, RHDBank, and TBIBank
can follow formats similar those given above for AphasiaBank. The issues involved are generally
similar.
7. Contributions to FluencyBank
To protect subject confidentiality, all research
contributions to FluencyBank are restricted and require password to
access. We suggest that new projects use a graduated consent form
developed at the University of Maryland, that allows participants to
specify use of video, audio-only, or transcript-only in contributed
data.
When communicating with your IRB, you may find the suggestions in this
briefing sheet helpful.
For projects underway, or recently completed, or longitudinal
projects in which PIs would like to have an ongoing relationship before
making a contribution request of subjects, we have a
sample post-hoc consent form from the University of
Maryland.
For completed projects that have used video without permission to share the
video, we will work with
you to extract the audio tracks from your video files. (Please see Contributing
audio, above, for reasons why this may not require additional IRB
consideration). Please contact Brian MacWhinney or Nan Bernstein Ratner
to determine how best to handle your data.