Aaron N. Brooks Ph.D
å aaron.neil.brooks@gmail.com | S www.aaronbrooks.info | E scalefreegan | D aaron-n-brooks | œ Google Scholar
Scientist and leader with a track record of transforming data into insight using machine learning. I build tools, processes and teams to
structure, integrate and distill biological data into formats for stakeholders to make eective decisions.
Python, R
scikit-learn, PyTorch, Hydra
DNA/RNA-Seq, multi-omics, genome assembly
AWS, GCP
SQL, NoSQL (ArangoDB, MongoDB), Neo4j
Dash, Django, Shiny
git, Docker
Snakemake, Nextflow
Jira, Confluence
German (A2), Spanish
Boston, Massachusetts (Remote)
Jan. 2022 -present
• Leading a team that innovates approaches to curation, characterization and communication of information about best-in-class genetic parts.
Outcome: Curated 229 DNA parts across 6 organisms yielding part sets with tunable expression and up to 30-fold increases in expression levels.
• Founded and developed a high-performing team to meet increasing business demands. Developed and implemented Agile processes and a
support service architecture (Jira) to execute on external requests and internal projects eectively.
• Prototyped AI-assisted knowledge management systems, including retrieval augmented generation (RAG) and other embedding approaches.
• Predicted DNA synthesizability using machine learning (gradient boosting, random forest, logistic regression).
• Fine-tuned a DNA foundation model (HyenaDNA) for multiple, application-specific tasks, including regression and classification.
Boulder, Colorado (Remote)
Dec. 2020 - Dec. 2022
• Developed statistical and analytical soware for interpretation of highly-multiplexed selection experiments with CRISPR-engineered cell li-
braries. Outcome: 3 publications and 3 patent applications.
• Designed and implemented an interactive web service for submitting deep mutational scanning data for analysis and visualization (Dash).
• Supported interpretation of pooled selection experiments for external customers. Distilled complex data into actionable insights.
• Designed DNA libraries (promoter insertion) leveraging a pre-trained convolutional neural network (CNN).
• Led implementation of ETL processes for generation and querying of Knowledge Graphs.
Heidelberg, Germany
Jun. 2015 - Nov. 2020
• Established a synthetic biology research subgroup, directed day-to-day research activities and secured funding. Outcome: 1.4M Euro funding,
4 publications in top journals, including Science and Cell, and 1 patent application.
• Collected and analyzed hundreds of millions of Nanopore direct RNA sequencing reads on more than 60 highly-rearranged synthetic yeast
genomes.
• Wrote a semi-automated analysis pipeline (Snakemake) to perform all steps in a sequencing workflow on HPC infrastructure, from basecalling
to transcript quantification. Used this pipeline to process terabytes of sequencing data.
• Applied machine learning (gradient boosting) to disentangle multiple factors influencing transcript start and end sites in S. cerevisiae.
Cell
2023