This is Wei Lee. I'm a
- ๐ Pythonista
- ๐ PyCon Taiwan organizer
- commitizen-tools maintainer
- Apache Airflow committer
- opensource4you Airflow mentor(?)
- ๐ท Traveler
- โบ Member of ๅฐๆนพ้ใฏใซโฒ
- ๐บ Anime Lover
- ๐ Reader
- ๐ต Ukulele Player
- ๐ Locker
I enjoy automating tedious tasks and creating high-quality code. Enjoy participating in open-source communities and contributing to open-source projects. Traveling is also a passion of mine, and I often use PyCon as an opportunity to explore new places. I have attended PyCon conferences in Taiwan ๐น๐ผ, the United States ๐บ๐ธ, Japan ๐ฏ๐ต, Canada ๐จ๐ฆ, Remote Python Pizza ๐, and Euro Python (remotely) ๐ช๐บ.
I share my technical notes, book digests, and occasional thoughts here. If you're interested in cooking, anime, and travel, I chat about those things on Those things no one cares about.
You can find me through
I use
Skill
- Programming Language: Python
- Data Engineering: Snowflake, Redis, SQLite, PostgreSQL, MySQL, Redshift
- MLOps: Apache Airflow, DVC, dbt, Great Expectations
- Backend Development: FastAPI, flask, Django,
- DevOps tools and others: GitHub Actions, Docker, Kubernetes, Jenkins, Git, AWS Services
Work Experience
[Aug 2024 - Current] Software Engineer, Astronomer
- apache-airflow
- Add "DatasetAlias" for creating datasets or dataset events in runtime
- Implement half of AIP-74
[Feb 2023 - July 2024] Software Engineer, Astronomer
- apache-airflow
- Allow Airflow tasks to execute directly from the trigger
- Add REST API endpoint to manipulate queued dataset events
- Upgrade apache-airflow-providers-weaviate to 2.0.0 for weaviate-client >= 4.4.0 support
- Add Azure managed identities support to apache-airflow-providers-microsoft-azure
- Add defult_deferrable configuration for easily turning on the deferrable mode of operators
- astronomer-providers
- Contribute existing operators/sensors back to apache-airflow and deprecate this project to reduce maintenance efforts
- Automated the deployment of integration tests and testing against the release of the airflow provider (#987, #1107, #1139, #1110)
- ask-astro
- Setup local dev tools and fix various existing bugs
[Apr 2017 - Feb 2023] Machine Learning Engineer, Rakuten USA
- Productionize machine learning projects
- Implemented SQS and gRPC services for grouping emails with similar structures and extracting user-sensitive data to increase the amount of training data without violating customer privacy regulations.
- Designed and Implemented a two-stage labeling system that automatically communicates between Amazon Mechanical Turk and in-house experts to generate high-quality labeled data and enhance merchandise taxonomy to increase customer conversion rate.
- Migrated and automated the deployment process of AWS Lambda procedures that process customer lifetime value, reducing the effort of maintenance and deployment.
- Build and maintain data pipelines on Apache Airflow
- Implemented a pipeline that processes data larger than 10 GB to infer personalized preferences to help increase customer satisfaction.
- Migrated legacy 1.x Airflow server on AWS EC2 to 2.0.2 Airflow on AWS MWAA, saving developers' effort on dealing with legacy dependencies issues, and created a development airflow environment for doing experiments without affecting the production pipeline.
- Refactored data writing mechanism and reduced the data write time and AWS S3 cost.
- Built alerts and dashboards to monitor pipeline metrics, minimizing the effort of troubleshooting using DataDog, Prometheus, and Kibana.
- Standardize and maintain software engineering practices
- Created and maintained the project templates, with automatic code quality check, testing, containerization, project versioning, releasing, and deployment, and a standard workflow for existing projects to update tools, which reduced project creation time, the communication overhead during code review and provided an easy way for developers to introduce new standards.
- Implemented a life-cycle configuration management tool and a workflow for creating Amazon Sagemaker notebook instances, which saves data scientists' time in handling engineering problems.
- Improved container build time and reduced execution time by 70\% for Jenkins CI/CD pipelines.
- Maintain the core package that's used among most existing projects
- Optimized SQL in a data pipeline and reduced the execution time from infeasible to within half a day.
- Cooperate with overseas teams in US, Ukraine, and India
[Jan 2019 - March 2019] Project Manager, DLT Lab
- Containerized and fixed legacy projects in The Mosquito Man
- Introduced code review culture to a newly formed team
- Set up a drone CI/CD server and created CI pipelines for two ongoing projects
[May 2018 - Nov 2018] Chief Teaching Assistant, X-Village
- Managed the executive team with 16 members
- Organized two months of full-time courses and a one-semester 3 credit course
- Reviewed the teaching proposal of the Python course, "Programming Design Foundation"
- Designed exercises for "Data Structure," the first section of "Computer Science Foundations"
- Lectured "Web Programming, Database/Cloud Computing," the fourth section of "Computer Science Foundations"
X-Village is an experimental education program aiming to equip students not major in computer science with computational thinking and to enhance future cooperation between computer science and other areas.
I was the program executor and the leader of the teaching assistant team. I also designed a half-day exercise for Data Structure and lectured a four-hour web backend course for Web Programming, Database/Cloud Computing.
[July 2015 - July 2016] Substitute Military Service, K-12 Education Administration, Ministry of Education
- Maintained legacy systems implemented in multiple languages, including
C#
,VBScript
,PHP
, etc. - Developed automation programs for generating reports, which save 80% of human labor time
- Delivered a human resource management system using django
Community Involvement
[Nov 2023 - Current] Volunteer PyCon Taiwan
- Maintain pycontw-blog
[Nov 2022 - Sep 2023] Marketing Team Lead, PyCon Taiwan 2023
- Migrated PyCon Taiwan Blog to pycontw-blog / https://conf.python.tw
[Nov 2021 - Sep 2022] Vice-Chairperson, PyCon APAC 2022
- Coordinated three squads, including planning, sponsorship, and social media
- Hosted the first Ask Me Anything event to promote Call for Proposals
[Oct 2020 - Nov 2021] Chairperson, PyCon Taiwan 2021
- Coordinated 9 teams and hosted the first online PyCon TW with 550 participants
[Dec 2019 - Sep 2020] Program Chair, PyCon Taiwan 2020
- Coordinated around 20 team members and introduced community tracks and a speaker-dispatch program to increase the interaction between local communities.
[Jul 2019 โ Nov 2019] Program Committee Member, PyCon Taiwan 2019
- Contact keynote speakers and financial aid applicants
- Contribute to the post-event report generator
Talk and Tutorial
- Unlocking Python's Core Magic
- 2024/09/28 ๐ฏ๐ต PyCon JP 2024 โ slide
- Unleash the Chaos: Developing a Linter for Un-Pythonic Code!
- 2024/09/21 ๐น๐ผ PyCon TW 2024 โ slide
- What If...? Running Airflow Tasks without the workers
- 2024/09/11 ๐บ๐ธ Airflow summit 2024 โ slide
- Starts Airflow task execution directly from the triggerer
- 2024/05/08 ๐ป Airflow Town Hall โ slide
- Intro to Airflow - From Zero to Hero
- 2024/02/17 ๐ป ๆบไพ้ฉไฝ โ slide
- Atomic Commits: An Easy & Proven Way to Manage & Automate Release Process
- 2023/07/29 ๐น๐ผ COSCUP 2023 โ slide, ๐ฌrecording
- Python Table Manners
- 2020/11/07 ๐น๐ผ Taichung.py โ slide
- 2020/10/16 ๐น๐ผ Hualien.py โ slide
- 2020/08/31 ๐น๐ผ Kaohsiung.py โ slide
- 2020/07/24 ๐ป Euro Python 2020 โ slide, ๐ฌrecording
- 2019/11/17 ๐จ๐ฆ PyCon CA 2019 โ slide
- 2019/10/24 Taipei.py
- commitizen-tools: What can we gain from crafting a git message convention?
- 2020/06/18 ๐น๐ผ Taipei.py โ slide
- 2020/04/25 ๐ป Remote Python Pizza 2020 โ slide
- How to get more than PyCon in a PyCon
- 2019/09/16 ๐ฏ๐ต PyCon JP 2019 - Peer Reviewed Lightning Talk โ slide
- X-Village - ็จไธๅฐๅ
ฉๅๆๆบๅๅ
ฉๅๆ็่ชฒ็จ
- 2019/03/24 ๐น๐ผ SITCON 2019 โ slide, ๐ฌrecording
- Intro to Python Data Science Tools
- CRUD in Flask
- ่ณ็ฎก่ฌๅบง (ไธๅ ดๅทฅ่ณ็ฎก็็ๆผ่ฌ)
- 2017/01/22 2018ๆๅคงๅทฅ่ณ็ฎก็ โ slide
- Bot Development
- 2016/12/08 ๐น๐ผ NCKU CSIE - Introduction to Knowledge Discovery and Data Engineering โ slide
- Keras Demo
- 2016/11/03 ๐น๐ผ ๆทฑๅบฆไนๅค โ slide
For more slides, please check my Speaker Deck.
Podcast
Award
- Honorable Mention, 2013 Railway Application Section Problem Solving Competition
Publication
- Wei Lee, Chien-Wei Chang, Po-An Yang, Chi-Hsuan Huang, Ming-Kuang Wu, Chu-Cheng Hsieh, Kun-Ta Chuang "Effective Quality Assurance for Data Labels through Crowdsourcing and Domain Expert Collaboration" 21st International Conference on Extending Database Technology, Demo Track (EDBT-2018)
- I-Lin Wang, Wei Lee, Chiao-Yu Liao "Effective Heuristics for Scheduling Hump and Pullback Engines in Railroad Yard Operational Plans" Proceedings of the 10th Annual Conference of the Operations Research Society at Taiwan (ORSTW 2013)
Education
[2016-2018]
Master, Computer Science and Information Engineering
National Cheng Kung University, Tainan
GPA: 4.16/4.3
[2011-2015]
Bachelor, Industrial and Information Management
Double Major: Computer Science and Information Engineering
National Cheng Kung University, Tainan
GPA: 3.77/4.0 (CSIE GA: 3.87/4.0)
Tutorial and Study Note
Slide
- Git Tutorial
- example: Git-Tutorial-Sample