Hello! 👋
My name is Kevin Kho. I am currently working on Fugue, a minimal interface to bring Python, Pandas, and SQL code to Spark, Dask, and Ray. Most recently, I was at Prefect as an Open Source Community Engineer where I managed the Slack community and created content. Before working on open-source tooling, I was a data scientist for four years across Paylocity and Itron.
I am currently contracting part time with Citi helping them scale compute workflows to distributed computing. I am looking for more contract opportunities around big data and machine learning platforms.
📭 Contact me!
Feel free to reach out to me for anything data related. I talk to people about big data, data artichitecture, data engineering, and career advice. Always happy to speak at meetups or company knowledge sharings about the things I’m working on. More info in the Contact section.
Email: kdykho@gmail.com
LinkedIn: https://www.linkedin.com/in/kvnkho
GitHub: https://github.com/kvnkho
🌎 Location
I am currently based out of Chicago. Always happy to meet people in person.
📝 Blogs
Here are a few blogs I’ve written. The full list can be found under Blogs
- Why Pandas-like Interfaces are Sub-optimal for Distributed Computing
- Using Pandera on Spark for Data Validation through Fugue
- The Simple Guide to Productionizing Data Workflows with Docker
- Introducing Fugue — Reducing PySpark Developer Friction
📢 Conference Talks
Here are a few talks about Fugue, Prefect, and distributed computing. The full list can be found under Talks
- SciPy 2022 - Introduction to Workflow Orchestration with Prefect
- PyCon US 2022 - Comparing the Different Ways to Scale Python and Pandas Code
- Databricks Summit 2022 - FugueSQL - The Enhanced SQL Interface for Pandas and Spark DataFrames
- PyCon US 2021 - Large Scale Data Validation with Spark and Dask
🎤 Podcasts
- Data Engineering Podcast - Build Your Python Data Processing Your Way And Run It Anywhere With Fugue
💙 Community
I am involved in some other things:
- DataKind - I volunteered for two projects helping non-profits with data science/data engineering work
- Orlando Machine Learning and Data Science - I organized/co-organized this Meetup for 4 years
- Adventurous Analytics - I often contract alongside them
- Conference Involvement:
- SciPy 2022 Data Life Cycle Track Co-chair
- PyData Seattle 2023 Volunteer