Jump to:


Our newly-formed entrepreneurial team at Genentech (San Francisco) uses machine learning and visualizations to help scientists understand and solve cancer.

Past Work

In the past, I taught and developed machine learning course materials for an 8-week Data Science Fellowship and other educational programs at the Data Incubator.

I also led a small team at Method of Moments which provided statistical analysis and consulting services to medical and other researchers in Longwood Medical Area, Boston, a top research hub.

Past Research

I have a background in applied statistics, specifically in the field of public health. Until March 2015, I applied statistical methods to the design and analysis of clinical trials at Harvard School of Public Health.

There, I worked with Dr. Heather Ribaudo, Director of Biostatistics and Programming, on the design of REPRIEVE, the largest study ever in our network, which included simulated power analysis (in SAS) of Cox survival models in the presence of "competing risks." Currently, we are developing statistical and data monitoring plans with our collaborators at Massachusetts General Hospital and elsewhere. At the same time, I am involved in three other longitudinal studies; I combine and analyze datasets to produce graphical summaries and reports for our investigators and the public at large.

While doing my Masters at the University of Michigan, I modeled the hormonal response of Cortisol using functional GLM (in R) with Prof. Brisa Sánchez, where instead of using a summary statistic such as slope of response, or area under the curve, we wanted to incorporate the entire functional form of that hormone in our analyses. In addition, I compared various Phase I clinical trial dose-escalation methods (a "very small data" problem) by running intensive simulations (also in R) with Prof. Tom Braun.

Teaching Experience

At The Data Incubator, I developed machine learning course materials for our fellowship (PhDs) and corporate programs. Two highlights:

  • I wrote a teaching module on anomaly detection using Isolation Forests and One-class SVMs [Python]
  • I taught 3-day machine learning classes to two Fortune 100 analytics teams [R, Python]
  • In graduate school (University of Michigan), for the following courses, I taught section (lab), held office hours, graded papers, and directed review sessions, and sometimes developed course materials and taught lecture.

    Biostatistics 503: Introduction to Biostatistics

    with Prof. Roderick J.A. Little
    University of Michigan - Fall 2012

    Epidemiology 743: Applied Linear Regression (SPSS)

    with Prof. Brenda Gillespie
    University of Michigan - Graduate Summer Session 2012

    Biostatistics 513: Applied Regression in Public Health (SPSS)

    with Prof. Brisa N. Sánchez
    University of Michigan - Spring 2012

    Biostatistics 553: Applied Biostatistics

    with Prof. Bhramar Mukherjee
    University of Michigan - Fall 2011


    M.S. Biostatistics (April 2013)

    University of Michigan School of Public Health, Ann Arbor, MI
    Coursework in statistical inference, linear and mixed models, machine learning, Bayesian statistics, survival analysis, algorithms (statistical computing), and other methods.

    B.S. Mathematics (May 2008)

    Summa cum Laude DeSales University, Center Valley, PA
    Minors: Biology, Chemistry
    Besides courses in my major/minors, I completed all pre-med requirements, and also took higher-level classes in philosophy (Plato's Republic, Aquinas), history (Ancient Greece, Middle Ages), and literature (20th century lit, Enlightenment lit).

    Skills + toolbox

    I started coding in HTML in seventh grade, and haven't looked back (ask me for a keyboard shortcut!)

    Venti: R, Emacs(+ESS)
    Grande: Python, scikit-learn, SQL, Unix, git, Tableau, Vi, SPSS, SAS, HTML/CSS
    Tall: C++, Spark, Map-reduce, Stata, D3, Lisp