expertise in

Experienced data scientist and software engineer with 12 years spent designing, conducting, and sharing results of complex research. Worked with faculty, staff, and students from dozens of top universities to achieve excellence in the preparation and performance of research-related tasks, including data management, hypothesis formulation, and both mathematical and statistical modeling.

Possesses a diverse set of software skills, including DevOps, full-stack app development, performance benchmarking, ETL, and cloud deployments. Enjoys hosting workshops and hackathons to train users in state-of-the-art technologies and best practices. Familiar with a large number of data formats, with a strong concentration on updating local file standards (e.g., binary, CSV, JPG, etc.) to cloud-performant ones (HDF5, Zarr). Knowledgeable about various mathematical and statistical applications, including high-dimensional time series analysis, nonlinear dynamics, and machine learning.

Technical
Skills

Coding (in order of experience)
Python / Typescript / Javascript / MATLAB / LaTeX / R / HTML / CSS / Logo / C / C++ / Java

Machine learning techniques
multivariate regression / classification / cross-validation / clustering / mixture models
dimensionality reduction / ensemble methods / deep learning / reinforcement learning
time series analysis / anomaly detection / computer vision

DevOps
Git / GitHub Actions / Docker / unit & integration testing / user & quality testing / Storybook
Chromatic / Puppeteer / Codecov / linting

Libraries
NumPy / SciPy / pandas / sklearn / OpenCV / FFmpeg / timeit / airspeed velocity / Matplotlib
Plotly / Flask / PyInstaller / Pydantic / Boto3 / concurrents / joblib / tqdm / requests / pytest

Frameworks
AWS (IAM, S3, EC2, Batch, DynamoDB) / REDIS / REST / GraphQL
Electron / Tk / Qt
pre-commit / Jupyter / Sphinx
multiprocessing / threading / high-performance computing (HPC)

Strong familiarity with data formats
HDF5 / Zarr / JSON / YAML / XML / Markdown / RST / Parquet / TSV / XLSX / MP4 / AVI / TIFF
PNG / JPEG / MP3 / WAV / PKL / MAT

Administration
GitHub / AWS / PyPI / Condaforge / DockerHub / ReadTheDocs

Business & team skills
Agile / scrum / research / mentoring / training / client consultations / drafting scopes of work
budgeting / grant writing

Education

Ph.D. | Applied and Computational Mathematics and Statistics (ACMS)

University of Notre Dame | 2020

Doctoral thesis: "Second-order moments of activity in large neural network models"

Relevant coursework: machine learning, Bayesian statistics, network science, stochastic analysis, time series analysis, partial differential equations, nonlinear dynamics

B.S. | Applied Mathematics & Cognitive Science

Minors: Computer Science, Philosophy, Psychology, and Neuroscience
University of Evansville | 2015

Senior thesis: "Crime prediction models applied to the city of Evansville"
Honors thesis: "Goodness of fit metrics for joint probability distributions"

Relevant coursework: calculus, linear algebra, probability, statistics, discrete & combinatorics, numerical analysis, real analysis, mathematical physics, algorithms & data structures, cryptography, machine learning, symbolic logic, cognitive psychology, neurophysiology, advanced neuroscience, philosophy of mind, philosophy of science

Experience

Research Software Engineer

CatalystNeuro | 2020-2024

Developed and maintained 5 repositories of open-source software by ensuring proper functionality of automated testing suites, documentation, tutorials, and demos.

Created and maintained 12+ data processing pipelines for neuroscience labs, allowing their data to flow from acquisition to sharing in a seamless fashion.

Personally curated a total of 256 TB of high-value datasets to NIH data archives on behalf of various research groups.

Managed the company's cloud resources (Amazon Web Services; AWS), including storage, compute resources, and identity access management (IAM).

Handled user interactions within the research community by offering technical support and resolving issues or feature requests in a timely manner.

Facilitated user education across various platforms by running sessions at multiple conferences and workshops, increasing user adoption and effective system utilization.

All software related in some way to the facilitation of terabyte-scale data management, analysis, and visualization for the field of neurophysiology.

Research Assistant

Neural Dynamics and Computing Group at University of Notre Dame | 2016-2020

Reconciled the computational properties of biologically realistic neural networks with artificial machine learning models (such as those used across computer vision) through complex mathematical theory and stochastic biophysical simulations, with results communicated through 3 journal publications and a presentation at the high-impact COSYNE conference (top 4% of abstracts accepted).

Collaborated with several top experimentalists in neuroscience as a trainee in the NeuroNex program, which focused on understanding how neural function emerges from underlying structure.

Teaching Assistant

Applied and Computational Mathematics and Statistics (ACMS) | 2015-2017

Ran 4 tutorial sections for 112 students, aiding their study of course concepts to achieve a 96% satisfaction rate. Graded homework assignments and exams. Filled in main lectures for the professor as needed.

Data Analyst

JRM Environmental Inc | 2015

Analyzed geological data from water samples in conjunction with the Indiana Department of Environmental Management to issue compliance permits for Indianapolis regulation standards.

Technical Editor

Penguin Random House | 2015

Reviewed and corrected over 300 pages of "An Idiot's Guide to Algebra II".

Research Intern

BITLab | Michigan State University | 2014

Explored statistical effects of algorithmic curation (the use of automated filtering mechanisms in the delivery and display of information) by measuring properties of simulated models of social networks, with results presented at the MIDSURE conference.

Research Intern

Bioinformatics | University of Kansas | 2013

Examined information diffusion through large-scale simulations of G-protein signaling mechanisms using a high-performance super-computing cluster (HPC), then presented results at an undergraduate symposium.

Highlighted Projects

NWB GUIDE | Lead Developer

Developed an intuitive user interface for file management using interactive validation and real-time suggestions, streamlining the process for data submission to NIH archives.

Ensured an extremely robust testing suite involving multiple levels of integration and user interactions emulated using Puppeteer to enhance reliability and long-term maintainability.

Tracked and documented dozens of hands-on user tests to refine the user experience.

NeuroConv | Lead Developer

Led the development of an automated data conversion tool capable of reading more than 40 distinct data formats used by neurophysiology experiment devices in order to automatically write to the NeurodataWithoutBorders (NWB) standard.

Designed universal APIs which transparently handled each layer of complexity to simplify the tasks of tagging, grouping, metadata transcription, temporal alignment, asset linking, buffering, chunking, and compression.

Implemented a distributed cloud deployment system to run large-scale, off-site, batched conversions through Amazon Web Services (AWS).

NWB Inspector | Lead Developer

Created a command line tool used by the NIH data archive to validate all data uploads, which ensures all submissions are provided automated suggestions for metadata improvements that enhance data findability and reuse.

Mirrored the design, style, and functionality of linting tools such as flake8, pydocstyle, and ruff.

Cody C. Baker, Ph.D.

expertise in

Technical
Skills

Education

Ph.D. | Applied and Computational Mathematics and Statistics (ACMS)

B.S. | Applied Mathematics & Cognitive Science