An orchestration platform for the development, production, and observation of data assets.
meta.dagster.asset_key
. This field takes in a list of strings that are used as the components of the generated AssetKey
.version: 2
models:
- name: users
config:
meta:
dagster:
asset_key: ["my", "custom", "asset_key"]
meta.dagster.group
. This field takes in a string that is used as the Dagster group for the generated software-defined asset corresponding to the dbt model.version: 2
models:
- name: users
config:
meta:
dagster:
group: "my_group"
dagster-msteams
and dagster-mlflow
packages could be installed with incompatible versions of the dagster
package due to a missing pin.dagster-daemon run
command sometimes kept code server subprocesses open longer than it needed to, making the process use more memory.@observable_source_asset
s with AutoMaterializePolicies, it was possible for downstream assets to get “stuck”, not getting materialized when other upstream assets changed, or for multiple down materializations to be kicked off in response to the same version being observed multiple times. This has been fixed.@dbt_assets
project_dir
and target_path
in DbtCliTask
are converted from type str
to type pathlib.Path
.stdout
.dagster
field under +meta
configuration. The following are equivalent:Before:
version: 2
models:
- name: users
config:
dagster_freshness_policy:
maximum_lag_minutes: 60
cron_schedule: '0 9 * * *'
dagster_auto_materialize_policy:
type: 'lazy'
After:
version: 2
models:
- name: users
config:
meta:
dagster:
freshness_policy:
maximum_lag_minutes: 60
cron_schedule: '0 9 * * *'
auto_materialize_policy:
type: 'lazy'
Added support for Pythonic Config classes to the @configured
API, which makes reusing op and asset definitions easier:
class GreetingConfig(Config):
message: str
@op
def greeting_op(config: GreetingConfig):
print(config.message)
class HelloConfig(Config):
name: str
@configured(greeting_op)
def hello_op(config: HelloConfig):
return GreetingConfig(message=f"Hello, {config.name}!")
Added AssetExecutionContext
to replace OpExecutionContext
as the context object passed in to @asset
functions.
TimeWindowPartitionMapping
now contains an allow_nonexistent_upstream_partitions
argument that, when set to True
, allows a downstream partition subset to have nonexistent upstream parents.
Unpinned the alembic
dependency in the dagster
package.
[ui] A new “Assets” tab is available from the Overview page.
[ui] The Backfills table now includes links to the assets that were targeted by the backfill.
croniter==1.4.0
. Users of earlier versions of Dagster can pin croniter<1.4
.@run_failure_sensor
.py.typed
was missing in the dagster-graphql
package. Thanks @Tanguy-LeFloch!AutoMaterializePolicy
s will now be cleared after 1 week.@dbt_assets
:
profile
and target
can now be customized on the DbtCli
resource.partial_parse.msgpack
is detected in the target directory of your dbt project, it is now copied into the target directories created by DbtCli
to take advantage of partial parsing.@dbt_assets
can now be customized by overriding DbtManifest.node_info_to_metadata
.AssetMaterialization
s.serverK8sConfig.containerConfig.name
did not actually change the container name.define_asset_job
now accepts a hooks
argument.sqlalchemy==2.x
additionalInstanceConfig
key that allows you to supply additional configuration to the Dagster instance.EcsRunLauncher
now uses a different task definition family for each job, instead of registering a new task definition revision each time a different job is launched.EcsRunLauncher
now includes a run_ecs_tags
config key that lets you configure tags on the launched ECS task for each run.SkipReason
, the SkipReason
would be ignored. This has been fixed.dagster code-server start
CLI, instead of only reloading your code when you initiate a reload from the Dagster UI.metadata
parameter to define_asset_job
(Thanks Elliot2718!)
poll_sync_run
to handle the “preparing” status from the Census API (Thanks ldnicolasmay!)
@observable_source_asset
-decorated functions can now return a DataVersionsByPartition
to record versions for partitions.@dbt_assets
DbtCliTask
's created by invoking DbtCli.cli(...)
now have a method .is_successful()
, which returns a boolean representing whether the underlying CLI process executed the dbt command successfully.@dbt_assets
can now be customized by overriding DbtManifest.node_info_to_description
.@dbt_assets
.server_ecs_tags
and run_ecs_tags
that apply to each service or task created by the agent. See the docs for more information.instance.get_run_partition_data
in Dagster Cloud..env
file in the working directory when running dagster dev
can now be used for Dagster system variables like DAGSTER_HOME
or environment variables referenced in your dagster.yaml
file using an env:
key. Previously, setting a .env
file only worked for environment variables referenced in your Dagster code.submit_job_execution
can now take in a RunConfig
object. Previously, it could only take a Python dictionary with the run configuration.jobs
argument to Definitions
./instance/runs
path instead of the new /runs
.create_databricks_run_now_op
, thanks @srggrs!dagster code-server start
command that can be used to launch a code server, much like dagster api grpc
. Unlike dagster api grpc
, however, dagster code-server start
runs the code in a subprocess, so it can reload code from the Dagster UI without needing to restart the command. This can be useful for jobs that load code from some external source and may want to reload job definitions without restarting the process.sensors.num_submit_workers
key to dagster.yaml
that can be used to decrease latency when a sensor emits multiple run requests within a single tick. See the docs for more information.k8s_job_executor
can now be used to launch each step of a job in its own Kubernetes, pod, even if the Dagster deployment is not using the K8sRunLauncher
to launch each run in its own Kubernetes pod.io_manager_def
param on an asset.maxResumeRunAttempts
to null in the helm chart would cause it to be set to a default value of 3 instead of disabling run retries.k8s_job_executor
would sometimes fail with a 409 Conflict error after retrying the creation of a Kubernetes pod for a step, due to the job having already been created during a previous attempt despite raising an error.op_name
was passed to load_assets_from_dbt_manifest
, and a select
parameter was specified, a suffix would be appended to the desired op name.dagit
would lead to JavaScript bundle loading errors.DbtCli
, in dagster_dbt.cli
. This new resource only support dbt-core>=1.4.0
.@dbt_assets
in dagster_dbt.asset_decorator
that allows you to specify a compute function for a selected set of dbt assets that loaded as an AssetsDefinition
.max_materializations_per_minute
parameter (with a default of 1) to AutoMaterializationPolicy.eager()
and AutoMaterializationPolicy.lazy()
allows you to set bounds on the volume of work that may be automatically kicked off for each asset. To restore the previous behavior, you can explicitly set this limit to None
.DailyPartitionsDefinition
, HourlyPartitionsDefinition
, WeeklyPartitionsDefinition
, and MonthlyPartitionsDefinition
now support and end_date
attribute.AutoMaterializePolicy
s or build_asset_reconciliation_sensor
, a single new data version from an observable source asset could trigger multiple runs of the downstream assets. This has been fixed.EnvVar
and IntEnvVar
within raw run config - although they just returned the name of the env var rather than retrieve its value. This has been fixed to error directly.*
was being interpreted as glob pattern, rather than as a dbt selector argument. We now explicitly set the default selection pattern as fqn:*
.dagster-cloud serverless deploy
did not create a unique image tag if the --image
tag was not specified.op_name
on load_assets_from_dbt_project
and load_assets_from_dbt_manifest
(thanks @wkeifenheim!)TimeWindowPartitionMapping
objects are now current-time aware. Subsequently, only upstream/downstream partitions existent at the current time are returned.ExecuteJobResult
was renamed to JobExecutionResult
(ExecuteJobResult
remains a deprecated alias)AssetSelection.key_prefixes
method allows matching asset keys starting with a provided prefix.ConfigurablePickledObjectADLS2IOManager
that uses pythonic configDataProcResource
follows the Pythonic resource system. The existing dataproc_resource
remains supported.SnowflakeIOManager
now supports private_key
s that have been base64
encoded to avoid issues with newlines in the private key. Non-base64 encoded keys are still supported. See the SnowflakeIOManager
documentation for more information on base64
encoded private keys.kvs
, instance_info
, and daemon_hearbeats
table for existing Postgres storage instances that were migrating from before 1.2.2
. This should unblock users relying on the existence of a primary key constraint for replication.SensorResult
evaluation where multipartitioned run requests containing a dynamic partition added in a dynamic partitions request object would raise an invalid partition key error.TimeWindowPartitionMapping
.ExecuteJobResult
is now a deprecated alias for the new name, JobExecutionResult
.airbyte_resource
to load_assets_from_connections
, you may now provide an instance of the AirbyteResource
class, rather than just airbyte_resource.configured(...)
(thanks @joel-olazagasti!)
dagster-celery
CLI accepted an inconsistent configuration format - it now matches the same format as the celery_executor
. Thanks @boenshao!ecs_timeout
in your ECS user code launcher config to extend how long the ECS agent polls for new code servers to start. Extending this timeout is useful if your code server takes an unusually long time to start up - for example, because it uses a very large image.load_assets_from_package_module
and the other core load_assets_from_
methods now accept a source_key_prefix
argument, which allows applying a key prefix to all the source assets that are loaded.
OpExecutionContext
now has an asset_partitions_time_window_for_input
method.
RunFailureSensorContext
now has a get_step_failure_events
method.
The Pythonic resource system now supports a set of lifecycle hooks which can be used to manage setup and teardown:
class MyAPIClientResource(ConfigurableResource):
api_key: str
_internal_client: MyAPIClient = PrivateAttr()
def setup_for_execution(self, context):
self._internal_client = MyAPIClient(self.api_key)
def get_all_items(self):
return self._internal_client.items.get()
Added support for specifying input and output config on ConfigurableIOManager
.
QueuedRunCoordinator
and SubmitRunContext
are now exposed as public dagster exports.
[ui] Downstream cross-location dependencies of all source assets are now visible on the asset graph. Previously these dependencies were only displayed if the source asset was defined as a regular asset.
[ui] A new filtering experience is available on the Runs page after enabling feature flag “Experimental Runs table view with filtering”.
[dagster-aws] Allow the S3 compute log manager to specify a show_url_only: true
config option, which will display a URL to the S3 file in dagit, instead of the contents of the log file.
[dagster-aws] PickledObjectS3IOManager
now fully supports loading partitioned inputs.
[dagster-azure] PickedObjectADLS2IOManager
now fully supports loading partitioned inputs.
[dagster-gcp] New GCSResource
and ConfigurablePickledObjectGCSIOManager
follow the Pythonic resource system. The existing gcs_resource
and gcs_pickle_io_manager
remain supported.
[dagster-gcp] New BigQueryResource
follows the Pythonic resource system. The existing bigquery_resource
remains supported.
[dagster-gcp] PickledObjectGCSIOManager
now fully supports loading partitioned inputs.
[dagster-postgres] The event watching implementation has been moved from listen/notify based to the polling watcher used by MySQL and SQLite.
[dagster-slack] Add monitor_all_repositories
to make_slack_on_run_failure_sensor
, thanks @danielgafni!
[dagster-snowflake] New SnowflakeResource
follows the Pythonic resource system. The existing snowflake_resource
remains supported.
AssetMetadataValue.value
that would cause an infinite recursion error.volumes
and volumeMounts
values have been added to the agent helm chart.load_assets_from_airbyte_instance
and load_assets_from_airbyte_project
now take a connection_to_auto_materialize_policy_fn
for setting AutoMaterializePolicy
s on Airbyte assetsNumReplicas
parameter on the agent template in CloudFormation, or the dagsterCloudAgent.replicas
field in Helm.enableZeroDowntimeDeploys
parameter to true in the CloudFormation stack for your agent.AssetsDefinition.from_graph
, as well as the@graph_asset
and @graph_multi_asset
decorators now support specifying AutoMaterializePolicy
s.async def
ops/assets no longer prematurely finalize async generators during execution.build_asset_reconciliation_sensor
) could incorrectly launch new runs for partitions that already had an in-progress run. This has been fixed.run_request_for_partition
now throws an error. Instead, users should yield directly instantiated run requests via RunRequest(partition_key=...)
.graph_asset
and graph_multi_asset
now support specifying resource_defs
directly (thanks @kmontag42)!node_info_to_auto_materialize_policy_fn
param added to load_assets_from_dbt_*
functions. (thanks @askvinni)!partition_key
field to RunStatusSensorContext
(thanks @pdstrnadJC)!latest_materialization_records_by_key
method.allPartitions
argument is provided.