Ben Hoffman

Profile

Accomplished Principal Data Engineer of 10+ years with a strong background in machine learning and data engineering. Proven track record of architecting and building scalable data lakes, warehouses, and pipelines using Snowflake, Databricks, HCL/Terraform, AWS, and dbt. Skilled in development, mentoring peers, and driving technical innovation. Experienced in machine learning, statistical analysis, with a strong emphasis on data modeling, NLP, entity resolution, and data processing. Proficient in multiple programming languages, including Python, PySpark, and Terraform, with a demonstrated ability to lead complex projects and teams to success.

Signature Strengths

Machine Learning	Model development, Feature Engineering, Method selection	Data Engineering	Pipelines and data ingestion, Data Lakes, Architecture
Test Design and Analysis	A/B Testing, Sample Design, Power Analysis	Research	Methods, Application, Development, Analysis
Statistical Analysis	Regression Modeling, Data Mining, Time Series	Programming	Python Development, APIs, Serverless, ML Model Deployment

Technology Snapshot

Programming Languages	Python, SQL, PySpark, HCL/Terraform, R	Databases	Snowflake, Postgres, SQLite, MSSQL, MySQL, MongoDB, Redshift, duckdb
Cloud Providers	AWS, Azure	Other	Docker, Bash, Databricks, Markdown, HTML, Linux, dbt

Experience

@2022 July – Present > Principal Engineer > Graphical Methods > Denver, CO

Consulting and Development Firm

Founding; Founded an AI startup for A/B testing simulations. Winded down due to lack of product market fit.
Development; Developed LLM Agent framework for demographic simulation applications. Designed web scraping and ingestion pipelines using vector database for RAG applications.
Research; Product and feature development. Market research for product viability. User interviews for feature development and workflows. Pricing analysis for determining cost structures and MRR/ARR. Investment due diligence reporting.

@2024 Aug – 2024 Dec > Colleague > CU Boulder > Boulder, CO

Teaching; Python fundamentals, pandas, modern analytics. Wrote all materials for class. Setup real world examples, such as clinical data. Discussed AI/LLMs and product development.
Development; Pub/Sub application for interactive examples and lectures. Front end consisting of, Flask, HTMX, and bootstrap. Backend consisting of MQTT, Celery, SQLite. Created games for demonstrating global application scope.

@2022 Sept – 2024 May > Principal Data Engineer > Data Clymer > Remote

Data Engineer Consulting

Data warehousing and architecting; architected and built data lakes using s3 and Snowflake; enabling fast migrations, disaster recovery, and external data processes (entity resolution, data pipelines). Migrated Looker project to a data warehouse on Redshift using dbt and DMS; reducing the complexity and increasing performance. Full warehouse modeling from staging to marts. Provided architectural guidance and implementation of Airflow processes and practices.
Data pipelines development; using Databricks, Python, Terraform, and AWS, built low cost and reliable data pipelines. Choice project was developing AWS lambda based pipelines using dbt, terraform, docker, and python. Scoped and implemented a migration from MySQL Jenkins pipelines to Databricks using DMS for replication. Created pipelines sourcing call center data for Snowflake data warehouse using GCP Composer (Airflow).
Project Management; handled scoping and pointing development work. Worked with PMs to identify projects risks and remediation. Managed client communications on technical topics. Worked with direct manager on implementing PERT style analysis to improve client deliverable estimates.
Mentorship; mentoring peers on problem solving and technical skills. Highlight was assisted peer with learning PySpark and programming best practices. Leading them to be promoted to senior engineer and certified in Databricks.

@2022 Feb – 2022 Sept > Senior Product Analyst > Angi > Denver, CO

Home Services and Marketplace

Improved A/B testing methods; utilized linear/logistic regression and chi-square tests to create reports describing the estimated difference in revenue and conversion effects of test variants on the sample population. Testing methods resulted in improved power and interpretability.

@2019 Jan – 2022 Feb > Data Analyst > Samba Safety > Centennial, CO

Background Screening and Transportation

Ordering behavior analysis; created data mart to pool together data distributed across multiple systems. Used run length encoding to normalize and compress order history. Run length encodings were then analyzed in SQL for streaks and co-occurrences of order types. Analysis resulted in highlighting areas needing process intervention and system visibility.
Created algorithms for record linkage; designed algorithms to clean and match text strings. Methods included Jaccard and Jaro distances. Custom cleaning algorithms sourced first name and surname data from the US Census and Social Security Administration.
Deep learning driver risk modeling; WOE, one-hot encoding, mixed-effects models for factor en- coding, and automated regression trees for variable imputation were all implemented and explored for creating a predictive model of high risk drivers. Multiple layers, epochs, drop out, and regularization were tested for effects on performance. Final output was recalibrated using Platt scaling. Work was performed in conjunction with Principal Data Scientist for deploying a risk model to production.

@2018 Nov – 2018 Dec > Contracting Data Analyst > Centura Health > Centennial, CO

Healthcare and HR

Directed graph analysis with sentiment model; examined comments for strong bigram relationships. Added sentiment to graph to identify additional associations. Potential areas for improvement in communication were identified based on the common sentiment of keywords and strength of keyword relationships.
Designed comment similarity method; created a method for finding the most representative comments using a combination of TFIDF and cosine similarity to summarize responses.

@2015 Mar – 2018 Oct > Data Analyst > SaleScout > Broomfield, CO

SaaS Startup and Lead Sourcing

Built and managed outsourced team; seeing the need for an increased and flexible workforce created and qualified a team of 20+ international workers.
Implemented algorithms for data cleansing and matching; trained a PAM ML model using the output from Monge-Elken string distances. Final predictions were made with a 1 nearest neighbor model for efficiency. Process had ∼90% accuracy and allowed us to send targeted leads to our clients.

Education

@2011 Aug – 2014 Dec > Bachelors of Science in Mathematics > CU Denver

Minors: Physics and Computer Science

Strong emphasis on statistics, probability and applied mathematics.
Dean’s List Recipient.
Selected Coursework: Numerical Analysis, Experimental Design, ICM Competition, Graph Theory, Data Structures, Rule of Law Research

Sample Work

Serverless LLM Inference	https://github.com/graphicalmethods/serverless-llamas
Lifetime Value Estimation	https://examples.benhoffman.net/lifetime_value

Volunteering

COMBA; Buffalo Creek Bike Patroller. Assisting trail users with maps and water. Ensuring safe, fun, and responsible trail use.
RMFR; Helping cat rescue with kitten transportation and promotional events.

  ___________________________________________
 /                                           \
| What do you call twin dinosaurs?            |
| ------------------------------------------- |
| A pair-o-dactyls!                           |
 \                                           /
  ===========================================
                                                  \
                                                   \
                                                    \
                                                     \
                                                        .-=-==--==--.
                                                  ..-=="  ,'o`)      `.
                                                ,'         `"'         \
                                               :  (                     `.__...._
                                               |                  )    /         `-=-.
                                               :       ,vv.-._   /    /               `---==-._
                                                \/\/\/VV ^ d88`;'    /                         `.
                                                    ``  ^/d88P!'    /             ,              `._
                                                       ^/    !'   ,.      ,      /                  "-,,__,,--'""""-.
                                                      ^/    !'  ,'  \ . .(      (         _           )  ) ) ) ))_,-.\
                                                     ^(__ ,!',"'   ;:+.:%:a.     \:.. . ,'          )  )  ) ) ,"'    '
                                                     ',,,'','     /o:::":%:%a.    \:.:.:         .    )  ) _,'
                                                      """'       ;':::'' `+%%%a._  \%:%|         ;.). _,-""
                                                             ,-='_.-'      ``:%::)  )%:|        /:._,"
                                                            (/(/"           ," ,'_,'%%%:       (_,'
                                                                           (  (//(`.___;        \
                                                                            \     \    `         `
                                                                             `.    `.   `.        :
                                                                               \. . .\    : . . . :
                                                                                \. . .:    `.. . .:
                                                                                 `..:.:\     \:...\
                                                                                  ;:.:.;      ::...:
                                                                                  ):%::       :::::;
                                                                              __,::%:(        :::::
                                                                           ,;:%%%%%%%:        ;:%::
                                                                             ;,--""-.`\  ,=--':%:%:\
                                                                            /"       "| /-".:%%%%%%%\
                                                                                            ;,-"'`)%%)
                                                                                           /"      "|