This is Wei Lee. I'm a

I enjoy automating tedious tasks and creating high-quality code. Enjoy participating in open-source communities and contributing to open-source projects. Traveling is also a passion of mine, and I often use PyCon as an opportunity to explore new places. I have attended PyCon conferences in Taiwan ๐Ÿ‡น๐Ÿ‡ผ, the United States ๐Ÿ‡บ๐Ÿ‡ธ, Japan ๐Ÿ‡ฏ๐Ÿ‡ต, Canada ๐Ÿ‡จ๐Ÿ‡ฆ, Remote Python Pizza ๐Ÿ•, and Euro Python (remotely) ๐Ÿ‡ช๐Ÿ‡บ.

I enjoy automating tedious tasks and creating high-quality code. Enjoy participating in open-source communities and contributing to open-source projects. Traveling is also a passion of mine, and I often use PyCon as an opportunity to explore new places. I have attended PyCon conferences in Taiwan ๐Ÿ‡น๐Ÿ‡ผ, the United States ๐Ÿ‡บ๐Ÿ‡ธ, Japan ๐Ÿ‡ฏ๐Ÿ‡ต, Canada ๐Ÿ‡จ๐Ÿ‡ฆ, Remote Python Pizza ๐Ÿ•, and Euro Python (remotely) ๐Ÿ‡ช๐Ÿ‡บ.

I share my technical notes, book digests, and occasional thoughts here. If you're interested in topics such as cooking, anime, and travel, I chat about those things on Those things no one cares about.


You can find me through


I use Neovim Sublime Text macOS Firefox Spotify Apache Airflow


Skill

  • Programming Language: Python
  • Data Engineering: Snowflake, Redis, MySQL, PostgreSQL, Redshift
  • MLOps: Apache Airflow, DVC, dbt, Great Expectations
  • Backend Development: flask, Django, FastAPI
  • DevOps tools and others: Docker, Kubernetes, Jenkins, GitHub Actions, Git, AWS Services

Work Experience

[Feb 2023 - Present] Software Engineer, astronomer

  • apache-airflow
    • Fixed a circular import error prior to releasing new airflow providers (#31379)
    • Fixed an Amazon provider bug due to new airflow providers release (#31482)
    • Integrated the AsyncSensors logic into their Sensor counterpart and lessen the maintenance burden (#30014, #30227,#30231,#30235, #30250)
  • astronomer-providers
    • Reduced async operator overhead by adding checks before sending the task to triggerer (issue 1102)
    • Automated the deployment of integration tests and testing against the release of the airflow provider (#987, #1107, #1139, #1110)

[Apr 2017 - Feb 2023] Machine Leraning Engineer, Rakuten USA

  • Productionize machine learning projects
    • Implemented SQS and gRPC services for grouping emails with similar structures and extracting user-sensitive data to increase the amount of training data without violating customer privacy regulations.
  • Designed and Implemented a two-stage labeling system that automatically communicates between Amazon Mechanical Turk and in-house experts to generate high-quality labeled data and enhance merchandise taxonomy to increase customer conversion rate.
  • Migrated and automated the deployment process of AWS Lambda procedures that process customer lifetime value, reducing the effort of maintenance and deployment.
  • Build and maintain data pipelines on Apache Airflow
    • Implemented a pipeline that processes data larger than 10 GB to infer personalized preferences to help increase customer satisfaction.
    • Migrated legacy 1.x Airflow server on AWS EC2 to 2.0.2 Airflow on AWS MWAA, saving developers' effort on dealing with legacy dependencies issues, and created a development airflow environment for doing experiments without affecting the production pipeline.
    • Refactored data writing mechanism and reduced the data write time and AWS S3 cost.
    • Built alerts and dashboards to monitor pipeline metrics, minimizing the effort of troubleshooting using DataDog, Prometheus, and Kibana.
  • Standardize and maintain software engineering practices
    • Created and maintained the project templates, with automatic code quality check, testing, containerization, project versioning, releasing, and deployment, and a standard workflow for existing projects to update tools, which reduced project creation time, the communication overhead during code review and provided an easy way for developers to introduce new standards.
    • Implemented a life-cycle configuration management tool and a workflow for creating Amazon Sagemaker notebook instances, which saves data scientists' time in handling engineering problems.
    • Improved container build time and reduced execution time by 70\% for Jenkins CI/CD pipelines.
    • Maintain the core package that's used among most existing projects
  • Optimized SQL in a data pipeline and reduced the execution time from infeasible to within half a day.
  • Cooperate with overseas teams in US, Ukraine, and India

[Jan 2019 - March 2019] Project Manager, DLT Lab

  • Containerized and fixed legacy projects in The Mosquito Man
  • Introduced code review culture to a newly formed team
  • Set up a drone CI/CD server and created CI pipelines for two ongoing projects

[May 2018 - Nov 2018] Chief Teaching Assistant, X-Village

  • Managed the executive team with 16 members
  • Organized two months of full-time courses and a one-semester 3 credit course
  • Reviewed the teaching proposal of the Python course, "Programming Design Foundation"
  • Designed exercises for "Data Structure," the first section of "Computer Science Foundations"
  • Lectured "Web Programming, Database/Cloud Computing," the fourth section of "Computer Science Foundations"

X-Village is an experimental education program aiming to equip students not major in computer science with computational thinking and to enhance future cooperation between computer science and other areas.

I was the program executor and the leader of the teaching assistant team. I also designed a half-day exercise for Data Structure and lectured a four-hour web backend course for Web Programming, Database/Cloud Computing.

[July 2015 - July 2016] Substitute Military Service, K-12 Education Administration, Ministry of Education

  • Maintained legacy systems implemented in multiple languages, including C#, VBScript, PHP, etc.
  • Developed automation programs for generating reports, which save 80% of human labor time
  • Delivered a human resource management system using django

Community Involvement

[Nov 2023 - Current] Volunteer PyCon Taiwan

[Nov 2022 - Sep 2023] Marketing Team Lead, PyCon Taiwan 2023

[Nov 2021 - Sep 2022] Vice-Chairperson, PyCon APAC 2022

  • Coordinated three squads, including planning, sponsorship, and social media
  • Hosted the first Ask Me Anything event to promote Call for Proposals

[Oct 2020 - Nov 2021] Chairperson, PyCon Taiwan 2021

  • Coordinated 9 teams and hosted the first online PyCon TW with 550 participants

[Dec 2019 - Sep 2020] Program Chair, PyCon Taiwan 2020

  • Coordinated around 20 team members and introduced community tracks and a speaker-dispatch program to increase the interaction between local communities.

[Jul 2019 โ€“ Nov 2019] Program Committee Member, PyCon Taiwan 2019

Talk and Tutorial

For more slides, please check my Speaker Deck.

Award

  • Honorable Mention, 2013 Railway Application Section Problem Solving Competition

Publication

  1. Wei Lee, Chien-Wei Chang, Po-An Yang, Chi-Hsuan Huang, Ming-Kuang Wu, Chu-Cheng Hsieh, Kun-Ta Chuang "Effective Quality Assurance for Data Labels through Crowdsourcing and Domain Expert Collaboration" 21st International Conference on Extending Database Technology, Demo Track (EDBT-2018)
  2. I-Lin Wang, Wei Lee, Chiao-Yu Liao "Effective Heuristics for Scheduling Hump and Pullback Engines in Railroad Yard Operational Plans" Proceedings of the 10th Annual Conference of the Operations Research Society at Taiwan (ORSTW 2013)

Education

[2016-2018]
Master, Computer Science and Information Engineering
National Cheng Kung University, Tainan
GPA: 4.16/4.3

[2011-2015]
Bachelor, Industrial and Information Management
Double Major: Computer Science and Information Engineering
National Cheng Kung University, Tainan
GPA: 3.77/4.0 (CSIE GA: 3.87/4.0)

Tutorial and Study Note

Slide

Books

MOOCs

Comments

Do you like this article? What do your tink about it? Leave you comment below