About Me

About Me

This is Wei Lee. I'm a

I enjoy automating tedious tasks and creating high-quality code. Enjoy participating in open-source communities and contributing to open-source projects. Traveling is also a passion of mine, and I often use PyCon as an opportunity to explore new places. I have attended PyCon Taiwan ๐Ÿ‡น๐Ÿ‡ผ, PyCon US ๐Ÿ‡บ๐Ÿ‡ธ, PyCon JP ๐Ÿ‡ฏ๐Ÿ‡ต, PyCon CA ๐Ÿ‡จ๐Ÿ‡ฆ, Remote Python Pizza ๐Ÿ•, Euro Python (remotely) ๐Ÿ‡ช๐Ÿ‡บ, and PyCon APAC ๐Ÿ‡ต๐Ÿ‡ญ.

I share my technical notes, book digests, and occasional thoughts here. If you're interested in cooking, anime, and traveling, I chat about those things on Those things no one cares about.


You can find me through

  • linkedin
  • twitter
  • github

I use

  • Neovim
  • Sublime Text
  • macOS
  • Firefox
  • Spotify
  • Apache Airflow

Skill

  • Programming Language: Python
  • Data Engineering: Snowflake, Redis, SQLite, PostgreSQL, MySQL, Redshift
  • MLOps: Apache Airflow, DVC, dbt, Great Expectations
  • Backend Development: FastAPI, Flask, Django,
  • DevOps tools and others: GitHub Actions, Docker, Kubernetes, Jenkins, Git, AWS Services

Work Experience

[Aug 2024 - Current] Senior Software Engineer, Astronomer

  • apache-airflow
    • Add "DatasetAlias" for creating datasets or dataset events in runtime
    • Implement half of AIP-74 and part of AIP-75
  • ruff
    • Implement most of the AIR3XX rules to facilitate the migration from Airflow 2 to Airflow 3

[Feb 2023 - July 2024] Software Engineer, Astronomer

  • apache-airflow
    • Allow Airflow tasks to execute directly from the trigger
    • Add REST API endpoint to manipulate queued dataset events
    • Upgrade apache-airflow-providers-weaviate to 2.0.0 for weaviate-client >= 4.4.0 support
    • Add Azure managed identities support to apache-airflow-providers-microsoft-azure
    • Add default_deferrable configuration for easily turning on the deferrable mode of operators
  • astronomer-providers
    • Contribute existing operators/sensors back to apache-airflow and deprecate this project to reduce maintenance efforts
    • Automated the deployment of integration tests and testing against the release of the airflow provider (#987, #1107, #1139, #1110)
  • ask-astro
    • Setup local dev tools and fix various existing bugs

[Apr 2017 - Feb 2023] Machine Learning Engineer, Rakuten USA

  • Productionize machine learning projects
    • Implemented SQS and gRPC services for grouping emails with similar structures and extracting user-sensitive data to increase the amount of training data without violating customer privacy regulations.
    • Designed and implemented a two-stage labeling system that automatically communicates between Amazon Mechanical Turk and in-house experts to generate high-quality labeled data and enhance merchandise taxonomy to increase customer conversion rate.
    • Migrated and automated the deployment process of AWS Lambda procedures that process customer lifetime value, reducing the effort of maintenance and deployment.
  • Build and maintain data pipelines on Apache Airflow
    • Implemented a pipeline that processes data larger than 10 GB to infer personalized preferences to help increase customer satisfaction.
    • Migrated legacy 1.x Airflow server on AWS EC2 to 2.0.2 Airflow on AWS MWAA, saving developers' effort on dealing with legacy dependencies issues, and created a development airflow environment for doing experiments without affecting the production pipeline.
    • Refactored data writing mechanism and reduced the data write time and AWS S3 cost.
    • Built alerts and dashboards to monitor pipeline metrics, minimizing the effort of troubleshooting using DataDog, Prometheus, and Kibana.
  • Standardize and maintain software engineering practices
    • Created and maintained the project templates, with automatic code quality check, testing, containerization, project versioning, releasing, and deployment, and a standard workflow for existing projects to update tools, which reduced project creation time, the communication overhead during code review and provided an easy way for developers to introduce new standards.
    • Implemented a life-cycle configuration management tool and a workflow for creating Amazon Sagemaker notebook instances, which saves data scientists' time in handling engineering problems.
    • Improved container build time and reduced execution time by 70\% for Jenkins CI/CD pipelines.
    • Maintain the core package that's used among most existing projects
  • Optimized SQL in a data pipeline and reduced the execution time from infeasible to within half a day.
  • Cooperate with overseas teams in US, Ukraine, and India

[Jan 2019 - March 2019] Project Manager, DLT Lab

  • Containerized and fixed legacy projects in The Mosquito Man
  • Introduced code review culture to a newly formed team
  • Set up a drone CI/CD server and created CI pipelines for two ongoing projects

[May 2018 - Nov 2018] Chief Teaching Assistant, X-Village

  • Managed the executive team with 16 members
  • Organized two months of full-time courses and a one-semester 3-credit course
  • Reviewed the teaching proposal of the Python course, "Programming Design Foundation"
  • Designed exercises for "Data Structure," the first section of "Computer Science Foundations."
  • Lectured on "Web Programming, Database/Cloud Computing," the fourth section of "Computer Science Foundations"

X-Village is an experimental education program designed to equip students who do not major in computer science with computational thinking skills and to foster future collaboration between computer science and other disciplines.

I was the program executor and the leader of the teaching assistant team. I also designed a half-day exercise for Data Structure and lectured on a four-hour web backend course for Web Programming, Database/Cloud Computing.

[July 2015 - July 2016] Substitute Military Service, K-12 Education Administration, Ministry of Education

  • Maintained legacy systems implemented in multiple languages, including C#, VBScript, PHP, etc.
  • Developed automation programs for generating reports, which save 80% of human labor time
  • Delivered a human resource management system using Django

Community Involvement

[Nov 2023 - Current] Volunteer, PyCon Taiwan

[Nov 2022 - Sep 2023] Marketing Team Lead, PyCon Taiwan 2023

[Nov 2021 - Sep 2022] Vice-Chairperson, PyCon APAC 2022

  • Coordinated three squads, including planning, sponsorship, and social media
  • Hosted the first Ask Me Anything event to promote the Call for Proposals

[Oct 2020 - Nov 2021] Chairperson, PyCon Taiwan 2021

  • Coordinated 9 teams and hosted the first online PyCon TW with 550 participants

[Dec 2019 - Sep 2020] Program Chair, PyCon Taiwan 2020

  • Coordinated around 20 team members and introduced community tracks and a speaker-dispatch program to increase the interaction between local communities.

[Jul 2019 โ€“ Nov 2019] Program Committee Member, PyCon Taiwan 2019

Talk and Tutorial

  • Hold on! You have a data team in PyCon Taiwan!
    1. 2025/07/ ๐Ÿ‡จ๐Ÿ‡ฟ - EuroPython 2025
  • ๆœ่–ไน‹ๆ—…
    1. 2025/06/11 ๐Ÿ‡น๐Ÿ‡ผ - ๅทฅ็จ‹ๅธซ็š„ๆœๅฐ‹็ด€้Œ„ โ†’ slide
  • Airflow 3.0 The First Glance
    1. 2025/03/28 ๐Ÿ‡น๐Ÿ‡ผ ้ปƒ้‡‘ๆตๆฒ™้ฅ…้ ญ็‡Ÿ โ†’ slide
  • ่ธๅ…ฅ้–‹ๆบ็š„็ฌฌไธ€ๆญฅ
    1. 2025/03/16 ๐Ÿ’ป NetDB - Tech Day, Invited Talk โ†’ slide
  • Unleash the Chaos: Developing a Linter for Un-Pythonic Code!
    1. 2025/03/02 ๐Ÿ‡ต๐Ÿ‡ญ PyCon APAC 2025 โ†’ slide
    2. 2024/09/21 ๐Ÿ‡น๐Ÿ‡ผ PyCon TW 2024 โ†’ slide, ๐ŸŽฌrecording
  • Unlocking Python's Core Magic
    1. 2024/09/28 ๐Ÿ‡ฏ๐Ÿ‡ต PyCon JP 2024 โ†’ slide, ๐ŸŽฌrecording
  • What If...? Running Airflow Tasks without the workers
    1. 2024/09/11 ๐Ÿ‡บ๐Ÿ‡ธ Airflow Summit 2024 โ†’ slide, ๐ŸŽฌrecording
  • Starts Airflow task execution directly from the triggerer
    1. 2024/05/08 ๐Ÿ’ป Airflow Town Hall โ†’ slide
  • Intro to Airflow - From Zero to Hero
    1. 2024/02/17 ๐Ÿ’ป ๆบไพ†้ฉไฝ  โ†’ slide
  • Atomic Commits: An Easy & Proven Way to Manage & Automate Release Process
    1. 2023/07/29 ๐Ÿ‡น๐Ÿ‡ผ COSCUP 2023 โ†’ slide, ๐ŸŽฌrecording
  • Python Table Manners
    1. 2020/11/07 ๐Ÿ‡น๐Ÿ‡ผ Taichung.py โ†’ slide
    2. 2020/10/16 ๐Ÿ‡น๐Ÿ‡ผ Hualien.py โ†’ slide
    3. 2020/08/31 ๐Ÿ‡น๐Ÿ‡ผ Kaohsiung.py โ†’ slide
    4. 2020/07/24 ๐Ÿ’ป Euro Python 2020 โ†’ slide, ๐ŸŽฌrecording
    5. 2019/11/17 ๐Ÿ‡จ๐Ÿ‡ฆ PyCon CA 2019 โ†’ slide
    6. 2019/10/24 ๐Ÿ‡น๐Ÿ‡ผ Taipei.py
  • commitizen-tools: What can we gain from crafting a git message convention?
    1. 2020/06/18 ๐Ÿ‡น๐Ÿ‡ผ Taipei.py โ†’ slide
    2. 2020/04/25 ๐Ÿ’ป Remote Python Pizza 2020 โ†’ slide
  • How to get more than PyCon in a PyCon
    1. 2019/09/16 ๐Ÿ‡ฏ๐Ÿ‡ต PyCon JP 2019 - Peer Reviewed Lightning Talk โ†’ slide
  • X-Village - ็”จไธๅˆฐๅ…ฉๅ€‹ๆœˆๆบ–ๅ‚™ๅ…ฉๅ€‹ๆœˆ็š„่ชฒ็จ‹
    1. 2019/03/24 ๐Ÿ‡น๐Ÿ‡ผ SITCON 2019 โ†’ slide, ๐ŸŽฌrecording
  • Intro to Python Data Science Tools
    1. 2018/03/12 ๐Ÿ‡น๐Ÿ‡ผ NCKU CSIE - Competitions in Data Sciences and Artificial Intelligence โ†’ slide
    2. 2018/02/27 ๐Ÿ‡น๐Ÿ‡ผ NCKU CSIE - Competitions in Data Sciences and Artificial Intelligence โ†’ slide
  • CRUD in Flask
    1 .2018/08/16 ๐Ÿ‡น๐Ÿ‡ผ X-Village - Web Course โ†’ slide
  • ่ณ‡็ฎก่ฌ›ๅบง (ไธ€ๅ ดๅทฅ่ณ‡็ฎก็‡Ÿ็š„ๆผ”่ฌ›)
    1. 2017/01/22 2018ๆˆๅคงๅทฅ่ณ‡็ฎก็‡Ÿ โ†’ slide
  • Bot Development
    1. 2016/12/08 ๐Ÿ‡น๐Ÿ‡ผ NCKU CSIE - Introduction to Knowledge Discovery and Data Engineering โ†’ slide
  • Keras Demo
    1. 2016/11/03 ๐Ÿ‡น๐Ÿ‡ผ ๆทฑๅบฆไน‹ๅคœ โ†’ slide

For more slides, please check my Speaker Deck.

Podcast / Show

Development Sprint

Award

  • Honorable Mention, 2013 Railway Application Section Problem Solving Competition

Publication

  1. Wei Lee, Chien-Wei Chang, Po-An Yang, Chi-Hsuan Huang, Ming-Kuang Wu, Chu-Cheng Hsieh, Kun-Ta Chuang "Effective Quality Assurance for Data Labels through Crowdsourcing and Domain Expert Collaboration" 21st International Conference on Extending Database Technology, Demo Track (EDBT-2018)
  2. I-Lin Wang, Wei Lee, Chiao-Yu Liao "Effective Heuristics for Scheduling Hump and Pullback Engines in Railroad Yard Operational Plans" Proceedings of the 10th Annual Conference of the Operations Research Society at Taiwan (ORSTW 2013)

Education

[2016-2018]
Master, Computer Science and Information Engineering
National Cheng Kung University, Tainan
GPA: 4.16/4.3

[2011-2015]
Bachelor, Industrial and Information Management
Double Major: Computer Science and Information Engineering
National Cheng Kung University, Tainan
GPA: 3.77/4.0 (CSIE GA: 3.87/4.0)

Tutorial and Study Note

Slide

Book Notes

MOOCs Note