Hi, my name is

James Baker

PhD Candidate & Researcher

I am a PhD Graduate Student and software developer in Human Genetics. Welcome to my portfolio.

About Me

I am a PhD student in the Human Genetics Program at Vanderbilt University. My research focuses on how we can leverage distant relatedness in large biobanks linked to electronic health data to explore the genetic causes of disease. To investigate this, I design workflows and software in (mostly) python and R for both local HPC environments and the cloud using Terra.bio. Additionally, I have an interest in how we can leverage modern high performance data science packages such as Polars and DuckDB to increase the scalability of many genomic workflows to accommodate the large biobanks such as BioVU, All of Us, and UK Biobank.

Skills

Here are some of the technologies I work with:
  • Python
  • R
  • Pandas
  • Polars
  • DuckDB
  • ggplot
  • SQL
  • Docker
  • Git & GitHub
  • REGENIE
  • SAIGE
  • PLINK
  • BCFtools

Software Projects

DRIVE
DRIVE
Distant Relatedness for Identification and Variant Evaluation. A tool leveraging graph theory and identity-by-descent (IBD) in biobanks to identify networks of related individuals and evaluate genetic variants.
Contributions:
  • Conceptual design and programmatic implementation
  • Optimized memory usage and performance for large-scale biobank data with packages like duckdb and pandas
  • Wrote comprehensive documentation available on both GitHub and Readthedocs.
  • Integration tests as well as continuous integration through GitHub actions
  • Performance testing and benchmarks using pytest
  • Simulated test IBD data for integration tests and examples
IBDMap
IBDMap
A computational workflow for identity-by-descent (IBD) mapping. IBDMap identifies genomic regions associated with disease by detecting excess IBD segment sharing among affected individuals in biobank-scale datasets.
Contributions:
  • Restructured & refactored python IBDreduce code to PDM build system for reproducible installs
  • Containerized application using a multi-stage docker build
  • Tested the build and install pipelines for the c++ code and the python code
COMPADRE
COMPADRE
Combined pedigree-aware distant relatedness estimation for improved pedigree reconstruction. This tool leverages genome-wide IBD sharing to accurately reconstruct families even with sparse or ungenotyped individuals.
Contributions:
  • Tested the conda and docker installations
  • Refactored communication between both the perl frontend and the python server to improve error handling and user experience
Pubmed RAG
Pubmed RAG
CLI TUI that allows the user to search for abstracts related to a topic of interest. Abstracts are retrieved from pubmed and then embedded with a vector database. Vector search is performed to find relevant abstracts and then an llm is used to summarize the abstracts and return results. This project uses common technologies like FastAPI, LangChain, HuggingFace (LLMs), Qdrant (for vector store), and Bubbletea (tui interface).

Experience

Aug 2019 - Present
Graduate Student Researcher
College of Basic Sciences - Vanderbilt University
Mentors: David Samuels, PhD & Jennifer Below, PhD
Projects:
  • DRIVE: Python CLI tool leveraging graph theory and identity-by-descent (IBD) in biobanks to identify networks of related individuals
  • IBDMap: A C++ and python tool testing genome-wide for enrichment of IBD sharing between pairs of cases and case-control pairs in a biobank
  • COMPADRE: Perl and Python tool for Pedigree reconstruction and the identification of the maximum unrelated set
Aug 2018 - Aug 2019
Undergraduate Student Researcher
Department of Chemistry: North Carolina State University
Mentors: Christian Melander, PhD
Projects:
  • Meridianin D Analogues Display Antibiofilm Activity against MRSA and Increase Colistin Efficacy in Gram-Negative Bacteria

Education

Aug 2019 - present
PhD in human genetics
Vanderbilt University
Aug 2014 - May 2018
B.S. in Chemistry
North Carolina State University
Aug 2016 - May 2019
B.S. in Biochemistry
North Carolina State University

Publications

Common variant approaches to study Mendelian disease gene function identify novel phenome and pathways associated with PLOD3
Alexandra Scalici, James T Baker, Freida Blostein, Megan Shuey, Dharmendra Choudhary, Ela W Knapik, David C Samuels, Jennifer E Below, Lisa Bastarache, Tyne W Miller-Fleming, Nancy J Cox
MedRxiv 2025
Genome sequencing of 35,024 predominantly African ancestry persons addresses gaps in genomics and healthcare
Cecile Avery, Mojgan Babanejad, James Baker, Xavier Bledsoe, Freida Blostein, Robert W Corty, Kimberlyn Ellis, Adriana M Hung, Allison Lake, John Shelley, Quanhu Sheng, Vanderbilt University Medical Center and Alliance for Genomic Discovery Investigators, Melinda Aldrich, Melissa Basford, Lisa Bastarache, Jennifer Below, Alexander G Bick, Peter Embi, QiPing Feng, Eric Gamazon, Lide Han, Jibril Hirbo, Kayla Marginean, Jonathan Mosley, Jill Pulley, Dan M. Roden, Douglas M Ruderfer, Megan Shuey, Yu Shyr, C Michael Stein, Colin Walsh, Consuelo Wilkins
MedRxiv 2025
COMPADRE: Combined pedigree-aware distant relatedness estimation for improved pedigree reconstruction
Evans GF, Baker JT, Petty LE, Petty AS, Polikowsky HG, Bohlender RJ, Chen HH, Chou CY, Viljoen KZ, Beilby JM, Kraft SJ, Zhu W, Landman JM, Morrow AR, Bian D, Scartozzi AC, Huff CD, Below JE
American Journal of Human Genetics, 2025
Genetic study of von Willebrand factor antigen levels ≤ 50 IU/dL identifies variants associated with increased risk of VWD and bleeding
Friedman RK, Heath AS, Huffman JE, Baker JT, Hasbani NR, Gagliano Taliun SA, Chen MH, Howard TE, Lewis JP, Pankratz N, Patil S, Reiner AP, Thibord F, Yanek LR, Yao J, Chen HH, Curran JE, Faraday N, Guo X, Wheeler MM, Ryan KA, Zhou X, Cho K, Almasy L, Auer PL, Becker LC, Wilson PWF, Boerwinkle E, O'Connell JR, Rich SS, Samuels DC; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; TOPMed Hematology & Hemostasis Working Group; VA Million Veteran Program; Blangero J, Fornage M, Kooperberg C, Mathias RA, Mitchell BD, Rotter JI, Johnson AD, Smith NL, Coban-Akdemir ZH, Below JE, Morrison AC, Johnsen JM, de Vries PS
Journal of Thrombosis and Haemostasis
Gene and phenome-based analysis of the shared genetic architecture of eye diseases
Scalici A, Miller-Fleming TW, Shuey MM, Baker JT, Betti M, Hirbo J, Knapik EW, Cox NJ
American Journal of Human Genetics (2025)
Detection of distant relatedness in biobanks to identify undiagnosed cases of Mendelian disease as applied to Long QT syndrome
Lancaster MC, Chen HH, Shoemaker MB, Fleming MR, Strickland TL, Baker JT, Evans GF, Polikowsky HG, Samuels DC, Huff CD, Roden DM, Below JE
Nature Communications (2025)
Machine-learning based classification of Frontotemporal dementia in electronic health records for genetic discovery
Below, J., Shaw, D., Evans, G., Baker, J., Bohlender, R., Petty, A., Petty, L., Roshani, R., Lifferth, J., Bastarache, L., Naj, A., Bush, W., Darby, R., McMillan, C., Samuels, D., Huff, C
Alzheimer’s & Dementia (2023)
2-aminobenzimidazoles as antibiofilm agents against salmonella enterica serovar typhimurium
Huggins, W. M., Vu Nguyen, T., Hahn, N. A., Baker, J.T., Kuo, L. G., Kaur, D., Melander, R. J., Gunn, J. S., & Melander, C
MedChemComm (2018)
Meridianin D analogues display antibiofilm activity against MRSA and increase colistin efficacy in gram-negative bacteria
Huggins, W. M., Barker, W. T., Baker, J.T., Hahn, N. A., Melander, R. J., & Melander, C
ACS Medicinal Chemistry Letters

Get in Touch

Feel free to reach out directly via email: