About Me

About Me

This is Wei Lee. I'm a

I enjoy automating tedious tasks and creating high-quality code. Enjoy participating in open-source communities and contributing to open-source projects. Traveling is also a passion of mine, and I often use PyCon as an opportunity to explore new places. I have attended PyCon conferences in Taiwan ๐Ÿ‡น๐Ÿ‡ผ, the United States ๐Ÿ‡บ๐Ÿ‡ธ, Japan ๐Ÿ‡ฏ๐Ÿ‡ต, Canada ๐Ÿ‡จ๐Ÿ‡ฆ, Remote Python Pizza ๐Ÿ•, and Euro Python (remotely) ๐Ÿ‡ช๐Ÿ‡บ.

I share my technical notes, book digests, and occasional thoughts here. If you're interested in cooking, anime, and travel, I chat about those things on Those things no one cares about.


You can find me through

  • linkedin
  • twitter
  • github

I use

  • Neovim
  • Sublime Text
  • macOS
  • Firefox
  • Spotify
  • Apache Airflow

Skill

  • Programming Language: Python
  • Data Engineering: Snowflake, Redis, SQLite, PostgreSQL, MySQL, Redshift
  • MLOps: Apache Airflow, DVC, dbt, Great Expectations
  • Backend Development: FastAPI, flask, Django,
  • DevOps tools and others: GitHub Actions, Docker, Kubernetes, Jenkins, Git, AWS Services

Work Experience

[Aug 2024 - Current] Software Engineer, Astronomer

  • apache-airflow
    • Add "DatasetAlias" for creating datasets or dataset events in runtime
    • Implement half of AIP-74

[Feb 2023 - July 2024] Software Engineer, Astronomer

  • apache-airflow
    • Allow Airflow tasks to execute directly from the trigger
    • Add REST API endpoint to manipulate queued dataset events
    • Upgrade apache-airflow-providers-weaviate to 2.0.0 for weaviate-client >= 4.4.0 support
    • Add Azure managed identities support to apache-airflow-providers-microsoft-azure
    • Add defult_deferrable configuration for easily turning on the deferrable mode of operators
  • astronomer-providers
    • Contribute existing operators/sensors back to apache-airflow and deprecate this project to reduce maintenance efforts
    • Automated the deployment of integration tests and testing against the release of the airflow provider (#987, #1107, #1139, #1110)
  • ask-astro
    • Setup local dev tools and fix various existing bugs

[Apr 2017 - Feb 2023] Machine Learning Engineer, Rakuten USA

  • Productionize machine learning projects
    • Implemented SQS and gRPC services for grouping emails with similar structures and extracting user-sensitive data to increase the amount of training data without violating customer privacy regulations.
    • Designed and Implemented a two-stage labeling system that automatically communicates between Amazon Mechanical Turk and in-house experts to generate high-quality labeled data and enhance merchandise taxonomy to increase customer conversion rate.
    • Migrated and automated the deployment process of AWS Lambda procedures that process customer lifetime value, reducing the effort of maintenance and deployment.
  • Build and maintain data pipelines on Apache Airflow
    • Implemented a pipeline that processes data larger than 10 GB to infer personalized preferences to help increase customer satisfaction.
    • Migrated legacy 1.x Airflow server on AWS EC2 to 2.0.2 Airflow on AWS MWAA, saving developers' effort on dealing with legacy dependencies issues, and created a development airflow environment for doing experiments without affecting the production pipeline.
    • Refactored data writing mechanism and reduced the data write time and AWS S3 cost.
    • Built alerts and dashboards to monitor pipeline metrics, minimizing the effort of troubleshooting using DataDog, Prometheus, and Kibana.
  • Standardize and maintain software engineering practices
    • Created and maintained the project templates, with automatic code quality check, testing, containerization, project versioning, releasing, and deployment, and a standard workflow for existing projects to update tools, which reduced project creation time, the communication overhead during code review and provided an easy way for developers to introduce new standards.
    • Implemented a life-cycle configuration management tool and a workflow for creating Amazon Sagemaker notebook instances, which saves data scientists' time in handling engineering problems.
    • Improved container build time and reduced execution time by 70\% for Jenkins CI/CD pipelines.
    • Maintain the core package that's used among most existing projects
  • Optimized SQL in a data pipeline and reduced the execution time from infeasible to within half a day.
  • Cooperate with overseas teams in US, Ukraine, and India

[Jan 2019 - March 2019] Project Manager, DLT Lab

  • Containerized and fixed legacy projects in The Mosquito Man
  • Introduced code review culture to a newly formed team
  • Set up a drone CI/CD server and created CI pipelines for two ongoing projects

[May 2018 - Nov 2018] Chief Teaching Assistant, X-Village

  • Managed the executive team with 16 members
  • Organized two months of full-time courses and a one-semester 3 credit course
  • Reviewed the teaching proposal of the Python course, "Programming Design Foundation"
  • Designed exercises for "Data Structure," the first section of "Computer Science Foundations"
  • Lectured "Web Programming, Database/Cloud Computing," the fourth section of "Computer Science Foundations"

X-Village is an experimental education program aiming to equip students not major in computer science with computational thinking and to enhance future cooperation between computer science and other areas.

I was the program executor and the leader of the teaching assistant team. I also designed a half-day exercise for Data Structure and lectured a four-hour web backend course for Web Programming, Database/Cloud Computing.

[July 2015 - July 2016] Substitute Military Service, K-12 Education Administration, Ministry of Education

  • Maintained legacy systems implemented in multiple languages, including C#, VBScript, PHP, etc.
  • Developed automation programs for generating reports, which save 80% of human labor time
  • Delivered a human resource management system using django

Community Involvement

[Nov 2023 - Current] Volunteer PyCon Taiwan

[Nov 2022 - Sep 2023] Marketing Team Lead, PyCon Taiwan 2023

[Nov 2021 - Sep 2022] Vice-Chairperson, PyCon APAC 2022

  • Coordinated three squads, including planning, sponsorship, and social media
  • Hosted the first Ask Me Anything event to promote Call for Proposals

[Oct 2020 - Nov 2021] Chairperson, PyCon Taiwan 2021

  • Coordinated 9 teams and hosted the first online PyCon TW with 550 participants

[Dec 2019 - Sep 2020] Program Chair, PyCon Taiwan 2020

  • Coordinated around 20 team members and introduced community tracks and a speaker-dispatch program to increase the interaction between local communities.

[Jul 2019 โ€“ Nov 2019] Program Committee Member, PyCon Taiwan 2019

Talk and Tutorial

  • Unlocking Python's Core Magic
  • Unleash the Chaos: Developing a Linter for Un-Pythonic Code!
  • What If...? Running Airflow Tasks without the workers
  • Starts Airflow task execution directly from the triggerer
    • 2024/05/08 ๐Ÿ’ป Airflow Town Hall โ†’ slide
  • Intro to Airflow - From Zero to Hero
  • Atomic Commits: An Easy & Proven Way to Manage & Automate Release Process
  • Python Table Manners
  • commitizen-tools: What can we gain from crafting a git message convention?
  • How to get more than PyCon in a PyCon
    • 2019/09/16 ๐Ÿ‡ฏ๐Ÿ‡ต PyCon JP 2019 - Peer Reviewed Lightning Talk โ†’ slide
  • X-Village - ็”จไธๅˆฐๅ…ฉๅ€‹ๆœˆๆบ–ๅ‚™ๅ…ฉๅ€‹ๆœˆ็š„่ชฒ็จ‹
  • Intro to Python Data Science Tools
    • 2018/03/12 ๐Ÿ‡น๐Ÿ‡ผ NCKU CSIE - Competitions in Data Sciences and Artificial Intelligence โ†’ slide
    • 2018/02/27 ๐Ÿ‡น๐Ÿ‡ผ NCKU CSIE - Competitions in Data Sciences and Artificial Intelligence โ†’ slide
  • CRUD in Flask
    • 2018/08/16 ๐Ÿ‡น๐Ÿ‡ผ X-Village - Web Course โ†’ slide
  • ่ณ‡็ฎก่ฌ›ๅบง (ไธ€ๅ ดๅทฅ่ณ‡็ฎก็‡Ÿ็š„ๆผ”่ฌ›)
    • 2017/01/22 2018ๆˆๅคงๅทฅ่ณ‡็ฎก็‡Ÿ โ†’ slide
  • Bot Development
    • 2016/12/08 ๐Ÿ‡น๐Ÿ‡ผ NCKU CSIE - Introduction to Knowledge Discovery and Data Engineering โ†’ slide
  • Keras Demo

For more slides, please check my Speaker Deck.

Podcast

Award

  • Honorable Mention, 2013 Railway Application Section Problem Solving Competition

Publication

  1. Wei Lee, Chien-Wei Chang, Po-An Yang, Chi-Hsuan Huang, Ming-Kuang Wu, Chu-Cheng Hsieh, Kun-Ta Chuang "Effective Quality Assurance for Data Labels through Crowdsourcing and Domain Expert Collaboration" 21st International Conference on Extending Database Technology, Demo Track (EDBT-2018)
  2. I-Lin Wang, Wei Lee, Chiao-Yu Liao "Effective Heuristics for Scheduling Hump and Pullback Engines in Railroad Yard Operational Plans" Proceedings of the 10th Annual Conference of the Operations Research Society at Taiwan (ORSTW 2013)

Education

[2016-2018]
Master, Computer Science and Information Engineering
National Cheng Kung University, Tainan
GPA: 4.16/4.3

[2011-2015]
Bachelor, Industrial and Information Management
Double Major: Computer Science and Information Engineering
National Cheng Kung University, Tainan
GPA: 3.77/4.0 (CSIE GA: 3.87/4.0)

Tutorial and Study Note

Slide

Books Notes

MOOCs Note