Achieve 300 contributions in Apache Airflow

Category Tech

Following up on Achieve 200 contributions in Apache Airflow.
Just a quick reflection on what I did between 200 and 300 contributions.

It seems that most of the PRs still relate to Dataset/Asset. I thought I had spent more time on AIP-72 and AIP-83, but perhaps those are PRs created by others that I took over. Or maybe my memory is just failing me...

airflow-300-contributions

In addition to the Airflow repo itself, I also spent a decent amount of time contributing to ruff - Airflow rules, which makes me the 29th contributor to ruff. It’s kind of weird, considering that I know almost nothing about Rust.

ruff

Dataset / Asset Alias

  1. set "has_outlet_datasets" to true if "dataset alias" exists
  2. Add dataset alias unique constraint and remove wrong dataset alias removing logic
  3. docs(dataset): illustrate when dataset aliases are resolved
  4. fix DagPriorityParsingRequest unique constraint error when dataset aliases are resolved into new datasets
  5. fix(dag): avoid getting dataset next run info for unresolved dataset alias
  6. allow dataset alias to add more than one dataset events

Dataset / Asset

  1. Add DatasetDagRunQueue to all the consuming DAGs of a dataset alias
  2. Fix dataset page cannot correctly load triggered dag runs due to lack of dagId
  3. Show only the source on the consumer DAG page and only triggered DAG run in the producer DAG page
  4. fix wrong link to the source DAG in consumer DAG's dataset event section
  5. feat(datasets): make strict_dataset_uri_validation default to True
  6. Rewrite how dag to dataset / dataset alias are stored
  7. Rewrite how DAG to dataset / dataset alias are stored
  8. fix(datasets/managers): fix error handling file loc when dataset alias resolved into new datasets

AIP-74, 75 - Data Asset and Asset Centric Syntax

  1. Rename dataset related python variable names to asset
  2. Rename Dataset database tables as Asset
  3. Rename dataset endpoints as asset endpoints
  4. fix(assets/managers): fix error handling file loc when asset alias resolved into new assets
  5. Rename dataset as asset in UI
  6. feat(providers/amazon): Use asset in common provider
  7. feat(providers/openlineage): Use asset in common provider
  8. feat(providers/fab): Use asset in common provider
  9. Add Dataset, Model asset subclasses
  10. fix(migration): fix dataset to asset migration typo
  11. Fix AIP-74 migration errors
  12. fix typo in dag_schedule_dataset_alias_reference migration file
  13. add migration file to rename dag_schedule_dataset_alias_reference constraint typo
  14. Resolve warning in Dataset Alias migration
  15. fix(providers/fab): alias is_authorized_dataset to is_authorized_asset
  16. fix(providers/amazon): alias is_authorized_dataset to is_authorized_asset
  17. remove the to-write asset active dag warnings that already exists in the db instead of those that does not exist
  18. Move Asset user facing components to task_sdk
  19. Add missing attribute "name" and "group" for Asset and "group" for AssetAlias in serialization, api and methods
  20. fix(scheduler_job_runner/asset): fix how asset dag warning is added
  21. Raise deprecation warning when accessing inlet or outlet events through str
  22. feat(dataset): allow "airflow.dataset.metadata.Metadata" import for backward compat
  23. feat(datasets): add backward compat for DatasetAll, DatasetAny, expand_alias_to_datasets and DatasetAliasEvent
  24. Respect Asset.name when accessing inlet and outlet events
  25. fix(providers/common/compat): add back add_input_dataset and add_output_dataset to NoOpCollector
  26. Raise deprecation warning when accessing metadata through str
  27. Fail a task if an inlet or outlet asset is inactive or an inactive asset is added to an asset alias
  28. feat(asset): change asset inactive warning to log Asset instead of AssetModel
  29. Combine asset events fetching logic into one SQL query and clean up unnecessary asset-triggered dag data

AIP-72 - Task Execution Interface

  1. feat(task_sdk): add support for inlet_events in Task Context

Tooling for migrating Airflow 2 to 3

  1. ci(github-actions): add uv to news-fragment action
  2. docs(newsfragement): fix typos in 41762, 42060 and remove unnecessary 41814
  3. docs(newsfragment): these deprecated things are functions instead of arguments
  4. docs(newsfragment): add template for significant newsfragments
  5. feat(cli): add "core.task_runner" and "core.enable_xcom_pickling" to unsupported config check to command "airflow config lint"
  6. Update existing significant newsfragments with the later introduced template format
  7. Extend and fix "airflow config lint" rules
  8. Backport "airflow config lint"
  9. Add newsfragment and migration rules for scheduler.dag_dir_list_interval β†’ dag_bundles.refresh_interval configuration change
  10. Add missing significant newsfragments and migration rules needed
  11. ci(github-actions): add a script to check significant newsfragments
  12. docs(newsfragment): add significant newsfragment to PR 42252
  13. ci(github-actions): relax docutils version to support python 3.8
  14. docs(newsfragments): update migration rules status
  15. fix(task_sdk): add missing type column to TIRuntimeCheckPayload
  16. docs(newsfragments): update 46572 newsfrgments content
  17. Fix significant format and update the checking script
  18. feat: migrate new config rules back to v2-10-test
  19. docs(newsfragments): update migration rules in newsfragments

AIP-83 amendment - Restore uniqueness for logical_date while allowing it to be nullable

  1. Set logical_date and data_interval to None for asset-triggered dags and forbid them to be accessed in context/template

Providers

  1. add missing sync_hook_class to CloudDataTransferServiceAsyncHook
  2. fix test_yandex_lockbox_secret_backend_get_connection_from_json by removing non-json extra
  3. handle ClientError raised after key is missing during DyanmoDB table.get_item
  4. fix(providers/common/sql): add dummy connection setter for backward compatibility
  5. feat(providers/common/sql): add warning to connection setter
  6. fix(providers/databricks): remove additional argument passed to repair_run
  7. fix(provider/edge): add back missing method map
  8. docs(newsfragments): fix typo and improve significant newfragment template

Misc

  1. fix(TriggeredDagRuns): fix wrong link in triggered dag run
  2. Return string representation if XComArgs existing during resolving and include_xcom is set to False
  3. Allowing DateTimeSensorAsync, FileSensor and TimeSensorAsync to start execution from trigger during dynamic task mapping
  4. refactor how triggered dag run url is replaced
  5. Change inserted airflow version of "update-migration-references" command from airflow_version='...' to airflow_version="..."
  6. Fix missing source link for the mapped task with index 0
  7. remove the removed --use-migration-files argument of "airflow db reset" command in run_generate_migration.sh
  8. docs(deferring): fix missing import in example and remove unnecessary example
  9. Set end_date and duration for triggers completed with end_from_trigger as True
  10. ci: improve check_deferrable_default script to cover positional variables
  11. ci: improve check_deferrable_default script to cover positional variables
  12. Add warning that listeners can be dangerous
  13. ci: auto fix default_deferrable value with LibCST
  14. ci(pre-commit): lower minimum libcst version to 1.1.0 for python 3.8 support
  15. Autofix default deferrable with LibCST
  16. add "enable_tracemalloc" to log memory usage in scheduler
  17. ci(pre-commit): migrate pre-commit config
  18. fix(dag_warning): rename argument error_type as warning_type
  19. Add newsfragment PR 43393
  20. refactor(trigger_rule): remove deprecated NONE_FAILED_OR_SKIPPED
  21. Ensure check_query_exists returns a bool (#43978)

Okay, so that's it. If I ever have such a summary, the next time I'll start with feat(api_fastapi): include asset ID in asset nodes when calling "/ui/dependencies" and "/ui/structure/structure_data" #47381.

\