Need access to de-identified medical record data for research?

BioVU is Vanderbilt's biorepository of DNA extracted from discarded blood collected during routine clinical testing and linked to de-identified medical records in the Synthetic Derivative. The goal of BioVU is to provide a resource to Vanderbilt investigators for studies of genotype-phenotype associations. Planning for BioVU began in mid-2004 and the first samples were collected in February 2007. Prior to collecting DNA samples, all aspects of the BioVU project were extensively tested. BioVU now accrues 500-1000 samples per week, totaling more than 225,000 DNA samples. Vanderbilt clinic patients may sign the BioVU Consent Form if they wish to donate their excess blood samples, or not sign the form if they do not wish to participate.

Samples are scanned via a custom-developed sample acceptance program that includes automated exclusion based on specific criteria. Manual exclusions include poor quality of the blood sample, insufficient volume of blood and/or an unreadable label on the sample tube. Automated exclusions include opt-out, no signed form documenting notification of the program, duplicate samples not targeted for replenishment and random exclusion. Once a sample passes the necessary criteria, it is accepted by the program. Acceptance of a sample triggers the encryption program to assign a unique research ID number to the sample. The unique research ID is generated by a Secure Hash Algorithm (SHA-512, National Security Administration). SHA-512 generated a unique 128 character (512 bit) code that serves as the unique research ID that links the DNA samples to the de-identified clinical data and resulting genotype data. We have validated that the original medical record number (input) cannot be regenerated from the unique research ID (output).


With the help of the bioinformatics expertise at Vanderbilt a "mirror image" of the EMR, the Synthetic Derivative, was created. It contains over 2 million individual patients with all clinical information available in a searchable form for more than the past ten years. The Synthetic Derivative is scrubbed of HIPAA identifiers with an error rate of ~0.01%. New clinical data are added to the database as they are created. The records in the Synthetic Derivative are labeled with the same 128-digit identifier as the DNA sample to maintain the link between the clinical data and DNA.

DNA samples may be requested after a proposal for the study is received, approved by the BioVU Review Committee and a user agreement is signed. A record counter tool, available to all Vanderbilt investigators and does not require IRB approval, can be used to estimate the number of cases and controls. Searches can be conducted using billing and procedure codes, free test searches in clinical notes (including histories, discharge summaries, laboratory reports, etc), and laboratory results. The record counter searches can be filtered to estimate numbers of the cohort in the entire Synthetic Derivative or only records associated with DNA.

Initial examination of feasibility and utility of BioVU included assessment of sample handling algorithms and a "demonstration project". The richness of phenotypic data was examined in a sample set of 26,724 records at the outset of the program. These records contained 6,816 unique ICD-9 codes and a total of 261,953 ICD-9 codes, with an average of 10 ICD-9 codes per patient. Natural language processing methods have been and continue to be developed to allow efficient generation of case and control cohorts.

BioVU served as a platform for Vanderbilt's successful application to join NHGRI's eMERGE (Electronic Medical Records and Genetics) Network. Vanderbilt is one of 9 nodes in the network that link electronic medical records with genetic information to perform a series of studies examining the association between genetics and phenotypes extracted from the electronic medical record.

1. Roden, D.M. et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clinical pharmacology and therapeutics 84, 362-9 (2008).
2. Pulley, J.M., Brace, M.M., Bernard, G.R. & Masys, D.R. Attitudes and perceptions of patients towards methods of establishing a DNA biobank. Cell and tissue banking 9, 55-65 (2008).
3. Pulley, J.M., Brace, M., Bernard, G.R. & Masys, D. Evaluation of the effectiveness of posters to provide information to patients about a DNA database and their opportunity to opt out. Cell and tissue banking 8, 233-41 (2007).
4. Pulley, J.M., Clayton, E., Bernard, G.R., Roden, D.M. & Masys, D.R. Principles of human subjects protections applied in an opt-out, de-identified biobank. Clin Transl Sci 3, 42-48 (2010).


Questions, Comments, Suggestions...

Ask a question about VICTR or the CTSA
or provide comments or suggestions
regarding this website. Include your name and address if you wish to receive a response.

Type characters above to submit