Sam Goodgame

Berkeley, CA ·

Hi, I'm Sam. I enjoy using data to empower people.


Head of Veteran Matching

I match transitioning military service members to tech jobs. I have the best job in the world: it's a hybrid between data science and operations. I'm responsible for designing data systems and writing algorithms to match veterans to jobs, and I'm also the client for those algorithms, creating a virtuous circle.

I also serve as product manager for various data products that use machine learning and common sense to allow Shift's users to leverage large pools of data to make career decisions.

Shift is a tech startup in Berkeley, California. Its primary investors include Andreessen Horowitz, Structure Captial,, and Tim Ferriss.

July 2017 - Present

Data Science Advisor

CommonLit is a literacy nonprofit growing at 500k users/month; I helped make it data-driven. My work included creating business intelligence workflows, teaching CommonLit staff SQL, conducting A/B testing, building data visualizations, automating analytics workflows, and building psychometric models in R. My favorite project was probably designing and implementing a complex randomized controlled trial (involving blocking and clustering) to successful test the effectiveness of one of CommonLit's new features.

As my work at Shift ramped up, I've needed to ramp down my involvement with CommonLit -- but it's an awesome organization full of awesome people with whom I was fortunate to work.

September 2017 - April 2018

Data Architect

At GoldenKey, my role was to derive value from data. I was a hybrid data architect, A/B testing lead, product manager, business intelligence analyst, and data product builder.

I also interned for GoldenKey (then called SoloPro) for three months while I was still in the Army. I wrote about the military-to-startup internship model in Inc. The experience made me wonder why there wasn't a more formal program for work-trial fellowships for transitioning service members. Three years later, that's what Shift is building.

September 2016 - May 2017

Infantry Officer

U.S. Army

I built teams, led teams, planned operations, and executed operations in the 101st Airborne Division and The Old Guard. Here are some highlights:

  • Planned and led 55 combat missions in eastern Afghanistan
  • Engineered data collection system and statistical model that accurately predicted 800-soldier organization's top KPI six months into the future, allowing commanders to allocate resources and manpower more effectively
  • Innovated method to airlift 100 soldiers and 22 vehicles into combat with 95% fewer resources
  • Consolidated $57M of military equipment spread across Afghanistan with 100% accountability
  • U.S. Army Airborne Ranger qualified – Ranger School is the Army’s toughest course and premier leadership training

May 2011 - August 2016


University of California, Berkeley

Master of Information and Data Science

Focused on:

  • Natural Language Processing with Deep Learning: NLP, sentiment analysis, convolutional neural networks, language modeling, entity recognition, machine translation (TensorFlow)
  • Machine Learning at Scale: petabyte-scale ML pipelines in Python via MapReduce frameworks (Spark, MrJob, Hadoop Streaming, Docker, AWS EMR, Altiscale)
  • Experiments and Causal Inference: establishing causal effects; practical experience designing and implementing a complex randomized controlled trial to test the effectiveness of a feature in a literacy nonprofit's (3-million-user) web app (R, SQL, Pandas)
  • Storing and Retrieving Data: ETL & data pipelines (AWS EC2 & S3, Google BigQuery, Hive, Storm, Spark, SQL, Python)
  • Machine Learning: classification, regression, clustering (Python/SciPy)
  • Visualization: interactive web-based visualizations (D3.js, Plotly Dash, Python, Tableau, Gephi, HTML/CSS/JS)
  • Statistics: descriptive & inferential, probability & Bayes; regression (R)

January 2017 - April 2018
Accelerated Course Load
(Because Who Needs Free Time?)

United States Military Academy at West Point

Bachelors of Science, International Relations

Majored in international relations, with a track in mechanical engineering and two years of Mandarin Chinese. Also suffered through lots of "fitness."

Grudgingly took lots of mandatory calculus/physics/chemistry/statistics courses; became thankful for them five years later.

July 2007 - May 2011

Georgetown University

Data Science Certificate

5-month in-person crash course on machine learning using Python.

January 2016 - May 2016

HBX | Harvard Business School

Credential of Readiness (CORe)

HBX CORe is essentially the first semester of an MBA program. I took three courses (business analytics, financial accounting, and economics for managers), delivered through HBS.

February 2015 - May 2016


Data Science Languages
Language Skill Capability
Primary language for scripting. Rapid prototyping for machine learning models, web applications, automating workflows, creating visualizations, and just about anything else I need it for. I generally prototype in Jupyter notebooks, then transfer work to proper modules.
SQL (PostgreSQL)
My first job out of the military involved trying to make business sense out of a messy OLTP database. I can turn just about any data question into a SQL query.
I generally use R over Python in two places. First, I'll use R when the majority of the task is standard/frequentist statistical analysis. For example, I find the analysis associated with conducting randomized controlled trials and A/B tests more straightforward in R than in Python. Second, I prefer R when the task at hand has better libraries in R than Python. One example is psychometric modeling; for some reason, all of the good psychometric/item response theory libraries (eRm, ltm, TAM, mirt) are in R.

Functional Areas
Function Skill Capability
Machine Learning
I have broad experience with supervised/unsupervised methods across classification, regression, and clustering problem spaces. Where necessary, I'm comfortable chaining together different ML techniques to enrich the overall output.

For example, I recently had one workflow that involved using a neural network to produce word embeddings, which I then clustered using k-means to find a particular type of word, which I then used to train a custom named entity recognition system, which I then used to make inferences (finding my custom entity in unseen text), the results of which I fed into an Latent Dirichlet Allocation model that provided value to users.
Field Experiments & Causality
I have experience designing and implementing hypothesis tests for causal inference that have spanned from simple UI optimization to complex designs with blocking, clustering, attrition, and spillover. I particularly enjoyed an RCT I implemented for CommonLit that to successful tested the effectiveness of a new features. Allocating a feature to one population but not another creates obvious UX concerns; in order to avoid the poor UX, there were some spillover and other experimental design considerations that my team and I needed to mitigate through statistical trickery.
Data Architecture & Data Engineering
I'm fairly adept at acquiring and organizing data, as no interesting machine learning work is possible without quality data. In my experience, the best way to improve as a data engineer is simply to work on more real-world data engineering problems... and I don't foresee any shortage of those in my future.

Big Data & Cloud Tools
Web Development
Data /
DevOps and Frameworks
AWS EC2 / EMR / S3
Google Compute Cloud
Plotly & Bokeh
Additional Favorite Data Science Libraries and Tools


I like to take pictures.

I post my photos on Unsplash.

Glacier Point