Skill
- Programming Language: Python
- Data Engineering: Snowflake, Redis, MySQL, PostgreSQL, Redshift
- MLOps: Apache Airflow, DVC, dbt, Great Expectations
- Backend Development: flask, Django, FastAPI
- DevOps tools and others: Docker, Kubernetes, Jenkins, GitHub Actions, Git, AWS Services
Work Experience
[Apr 2017 - Present] Machine Leraning Engineer, Rakuten USA
- Productionize machine learning projects
- Implemented an SQS service and gRPC services for grouping emails with similar structures and extracting user-sensitive data to increase the amount of training data without violating customer privacy regulations.
- Designed and Implemented a two-stage labeling system that automatically communicates between Amazon Mechanical Turk and in-house experts to generate high-quality labeled data and enhance merchandise taxonomy to increase customer conversion rate.
- Migrated and automated the deployment process of AWS Lambda procedures that process customer lifetime value, reducing the effort of maintenance and deployment.
- Build and maintain data pipelines on Apache Airflow
- Implemented a pipeline that processes data larger than 10 GB to infer personalized preferences to help increase customer satisfaction.
- Migrated legacy 1.x Airflow server on AWS EC2 to 2.0.2 Airflow on AWS MWAA, saving developers' effort on dealing with legacy dependencies issues, and created a development airflow environment for doing experiments without affecting the production pipeline.
- Refactored data writing mechanism and reduced the data write time and AWS S3 cost.
- Built alerts and dashboards to monitor pipelines metrics to minimize the effort of troubleshooting using DataDog, Prometheus, and Kibana.
- Standardize and maintain software engineering practices
- Created and maintain the project templates, with automatic code quality check, testing, containerization, project versioning, releasing, and deployment, and a standard workflow for existing projects to update tools, which reduced project creation time, the communication overhead during code review and provided an easy way for developers to introduce new standards.
- Implemented a life-cycle configuration management tool and a workflow for creating Amazon Sagemaker notebook instances which saves data scientists' time on handling engineering problems.
- Improved container build time and reduced execution time by 70\% for Jenkins CI/CD pipelines.
- Maintain core package that's used among most existing projects
- Optimized SQL in a data pipeline and reduced the execution time from infeasible to within half a day.
- Cooperate with overseas teams in US, Ukraine, and India
[Jan 2019 - March 2019] Project Manager, DLT Lab
- Containerized and fixed legacy projects in The Mosquito Man
- Introduced code review culture to a newly formed team
- Set up a drone CI/CD server and created CI pipelines for two ongoing projects
[May 2018 - Nov 2018] Chief Teaching Assistant, X-Village
- Managed the executive team with 16 members
- Organized two months of full-time courses and a one-semester 3 credit course
- Reviewed the teaching proposal of the Python course, "Programming Design Foundation"
- Designed exercises for "Data Structure," the first section of "Computer Science Foundations"
- Lectured "Web Programming, Database/Cloud Computing," the fourth section of "Computer Science Foundations"
X-Village is an experimental education program aiming to equip students not major in computer science with computational thinking and to enhance future cooperation between computer science and other areas.
I was the executor of the program and the leader of the teaching assistant team. Besides, I designed a half-day exercise for Data Structure and lectured a four-hour web backend course for Web Programming, Database/Cloud Computing.
[July 2015 - July 2016] Substitute Military Service, K-12 Education Administration, Ministry of Education
- Maintained legacy systems implemented in multiple languages, including
C#
,VBScript
,PHP
, etc. - Developed automation programs for generating reports which save 80% of human labor time
- Delivered a human resource management system using django
Community Involvement
[Nov 2021 - Present] Vice-Chairperson, PyCon APAC 2022
- Coordinated 3 squads, including planning, sponsorship, and social media
- Hosted the first Ask Me Anything event for promoting Call for Proposals
[Oct 2020 - Nov 2021] Chairperson, PyCon Taiwan 2021
- Coordinated 9 teams and hosted the first online PyCon TW with 550 participants
[Dec 2019 - Sep 2020] Program Chair, PyCon Taiwan 2020
- Coordinated around 20 team members and introduced community tracks and a speaker-dispatch program to increase the interaction between local communities.
[Jul 2019 – Nov 2019] Program Committee Member, PyCon Taiwan 2019
- Contact keynote speakers and financial aid applicants
- Contribute to the post-event report generator
Talk and Tutorial
- Python Table Manners
- 2020/11/7 Taichung.py: slide
- 2020/10/16 Hualien.py: slide
- 2020/08/31 Kaohsiung.py: slide
- 2020/07/24 Euro Python 2020
- 2019/11/17 PyCon CA 2019: slide
- 2019/10/24 Taipei.py
- commitizen-tools: What can we gain from crafting a git message convention?
- 2020/06/18 Taipei.py: slide
- 2020/04/25 Remote Python Pizza 2020: slide
- How to get more than PyCon in a PyCon
- 2019/09/16 PyCon JP 2019 - Peer Reviewed Lightning Talk: slide
- X-Village - 用不到兩個月準備兩個月的課程
- 2019/03/24 SITCON 2019
- Intro to Python Data Science Tools
- CRUD in Flask
- 資管講座 (一場工資管營的演講)
- 2017/01/22 2018成大工資管營: slide
- Bot Development
- 2016/12/08 NCKU CSIE - Introduction to Knowledge Discovery and Data Engineering: slide
- Keras Demo
For more slides, please check my Speaker Deck.
Award
- Honorable Mention, 2013 Railway Application Section Problem Solving Competition
Publication
- Wei Lee, Chien-Wei Chang, Po-An Yang, Chi-Hsuan Huang, Ming-Kuang Wu, Chu-Cheng Hsieh, Kun-Ta Chuang "Effective Quality Assurance for Data Labels through Crowdsourcing and Domain Expert Collaboration" 21st International Conference on Extending Database Technology, Demo Track (EDBT-2018)
- I-Lin Wang, Wei Lee, Chiao-Yu Liao "Effective Heuristics for Scheduling Hump and Pullback Engines in Railroad Yard Operational Plans" Proceedings of the 10th Annual Conference of the Operations Research Society at Taiwan (ORSTW 2013)
Education
[2016-2018]
Master, Computer Science and Information Engineering
National Cheng Kung University, Tainan
GPA: 4.16/4.3
[2011-2015]
Bachelor, Industrial and Information Management
Double Major: Computer Science and Information Engineering
National Cheng Kung University, Tainan
GPA: 3.77/4.0 (CSIE GA: 3.87/4.0)
Additional Experience
Open Source Contributions
- commitizen-tools: Maintainer
- git-extras
- pycontw
- mail_handler: Author
- pycontw-postevent-report-generator: Maintainer
- beeeware
- flask
- open-edx
- wtforms-json
- pipreqs
- pip-check
- pelican-clean-blog
- templater
- Update templater to Python3 and release templater3
Web Service
- SITW 二手網 (Backend Development)
Chat Bot
Utility
Tutorial and Study Note
Slide
- Git Tutorial
- Sample: Git-Tutorial-Sample
Books
MOOCs
- Machine Learning (Coursera)
- Intro to Machine Learning
- Intro to Data Science Udacity
- Assignments for Udacity Deep Learning class with TensorFlow
Comments
Do you like this article? What do your tink about it? Leave you comment below