About me
Hello! ๐
My name is Kevin Kho. I am currently working on Fugue, a minimal interface to bring Python, Pandas, and SQL code to Spark, Dask, and Ray. Most recently, I was at Prefect as an Open Source Community Engineer where I managed the Slack community and created content. Before working on open-source tooling, I was a data scientist for four years across Paylocity and Itron.
I am currently contracting part time with Citi helping them scale compute workflows to distributed computing. I am looking for more contract opportunities around big data and machine learning platforms.
๐ Location
I am currently based out of Chicago. Always happy to meet people in person.
๐ Blogs
Here are a few blogs Iโve written. The full list can be found under Blogs
Why Pandas-like Interfaces are Sub-optimal for Distributed Computing
The Simple Guide to Productionizing Data Workflows with Docker
๐ข Conference Talks
Here are a few talks about Fugue, Prefect, and distributed computing. The full list can be found under Talks
SciPy 2022 - Introduction to Workflow Orchestration with Prefect
PyCon US 2022 - Comparing the Different Ways to Scale Python and Pandas Code
Databricks Summit 2022 - FugueSQL - The Enhanced SQL Interface for Pandas and Spark DataFrames
PyCon US 2021 - Large Scale Data Validation with Spark and Dask
๐ค Podcasts
Data Engineering Podcast - Build Your Python Data Processing Your Way And Run It Anywhere With Fugue
๐ Community
I am involved in some other things:
DataKind - I volunteered for two projects helping non-profits with data science/data engineering work
Orlando Machine Learning and Data Science - I organized/co-organized this Meetup for 4 years
Adventurous Analytics - I often contract alongside them
Conference Involvement:
SciPy 2022 Data Life Cycle Track Co-chair
PyData Seattle 2023 Volunteer