search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Guide on Logging Asset Metadata in Dagster

schedule Aug 10, 2023
Last updated
local_offer
Dagster
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

We can easily log metadata when creating our assets in Dagster. We will then be able to see them in the Dagster UI.

Logging asset metadata

To demonstrate, let's set up the same Dagster environment as we did in our Getting Started with Dagster guide. Our project structure is like so:

my_dagster_code_location
├── __init__.py
└── my_assets.py

Where the __init__.py is:

from dagster import Definitions, load_assets_from_modules
from . import my_assets

all_assets = load_assets_from_modules([my_assets])
defs = Definitions(assets=all_assets)

Copy the paste the following code into my_assets.py:

from dagster import asset, MetadataValue, Output
import pandas as pd

@asset(name="iris_data")
def get_iris_data():
df = pd.read_csv("https://raw.githubusercontent.com/SkyTowner/sample_data/main/iris_data.csv")
return df

@asset(name="setosa")
def get_setosa(iris_data):
df_setosa = iris_data.query("species == 0")
return Output(
value=df_setosa,
metadata={
"n_rows": len(df_setosa),
"preview": MetadataValue.md(df_setosa.head().to_markdown()),
}
)

Here, note the following:

  • instead of returning df_setosa directly, we wrap the return value in Dagster's Output object. This allows us to log metadata (n_rows and preview in our case).

  • Dagster supports logs in markdown format via MetadataValue.md(-). We use a Pandas DataFrame's to_markdown() method to convert the DataFrame into a markdown string.

Let's now launch our Dagster UI like so:

dagster dev -m my_dagster_code_location
2023-07-15 13:22:52 +0800 - dagit - INFO - Serving dagit on http://127.0.0.1:3000 in process 49252

In the Dagster UI, materialize the setosa asset. Click on setosa and we will see the meta information that we logged earlier:

Great, we see our markdown parsed elegantly as a table!

NOTE

The other way of logging asset metadata is by using the context.add_output_metadata(-) function in a custom IO manager. This approach is explored in this sectionlink. The advantage of using an IO manager is that we do not need to wrap the output of our function in Dagster's Output(-). This is great because tampering with the output of a function makes unit tests a challenge to write.

Visualizing changes in metadata over time

Using the Dagster UI, we can visualize the changes in the value of metadata over time. Suppose we materialized the setosa asset multiple times, each time adding a random integer to the n_rows metadata value.

We can view the metadata plots by clicking on the asset in the graph like so:

As of now, Dagster does not support changing the scale of the Timestamp axis, which is set to days by default. Technically, there are several points plotted in the above time series graph, but because the interval between each run is too short, we see more or less a vertical line.

We can also access a bigger plot by navigating to the Assets tab in the header and clicking on our setosa asset. Next, we click on the Plots tab to see the same time-series graph:

Again, we cannot change the x-axis scale here for now 😟.

Logging images

Since Dagster allows logging in markdown format, we can also log images! Copy and paste the following code into my_assets.py:

from dagster import asset, MetadataValue, Output
import matplotlib.pyplot as plt
import pandas as pd

# for handling image
import base64
from io import BytesIO

@asset(name="iris_data")
def get_iris_data():
df = pd.read_csv("https://raw.githubusercontent.com/SkyTowner/sample_data/main/iris_data.csv")
return df

def get_img_as_md(df_setosa):
plt.figure(figsize=(10, 6))
plt.title("Setosa sepal length vs petal length")
plt.scatter(df_setosa["sepal_length"], df_setosa["petal_length"])
buffer = BytesIO()
plt.savefig(buffer, format="png")
image_data = base64.b64encode(buffer.getvalue())
return f"![img](data:image/png;base64,{image_data.decode()})"

@asset(name="setosa")
def get_setosa(iris_data):
df_setosa = iris_data.query("species == 0")
return Output(
value=df_setosa,
metadata={
"n_rows": len(df_setosa),
"preview": MetadataValue.md(df_setosa.head().to_markdown()),
"plot": MetadataValue.md(get_img_as_md(df_setosa))
},
)

Here, the get_img_as_md(-) method returns an image encoded as base64 in markdown format. More specifically, the image is first generated using matplolib and then stored as a buffer using BytesIO(). We then encode the buffer as base64 and return this as a markdown string.

Back in our Dagster UI, materialize the assets once more. Click on the setosa asset to see:

Click on Show Markdown in the plot field to see our image:

Great, we managed to log a scatter plot image as metadata!

Adding description and metadata to assets

Besides the name property, we can supply other properties such as description and metadata like so:

from dagster import Definitions, asset
import pandas as pd

@asset(name="iris_data", description="My description", metadata={"key1": "val1", "key2": "val2"})
def get_iris_data():
return pd.read_csv("https://raw.githubusercontent.com/SkyTowner/sample_data/main/iris_data.csv")

The description of the assets will be displayed in multiple places. Firstly, it will be displayed in the data lineage:

It will also be displayed in the assets catalog, which can be found by clicking on the Assets header:

The description as well as the asset metadata will be displayed in the data lineage screen when clicking on the asset:

Note that the metadata property here is not intended to describe the content of the asset (e.g. the number of rows of the outputted DataFrame), but rather the nature of the asset (e.g. the name of the person who wrote the code).

NOTE

The description property is parsed as markdown. For instance, consider the following:

@asset(name="iris_data", description="My **description**")
def get_iris_data():
return pd.read_csv("https://raw.githubusercontent.com/SkyTowner/sample_data/main/iris_data.csv")

This will be rendered in the UI like so:

Notice how the description in the graph is parsed incorrectly although the description in the right panel is parsed correctly. One quick fix is to write in plain text for the first line, then switch to markdown in the subsequent lines:

@asset(name="iris_data", description="My description\n\nI am a **bold text**")
def get_iris_data():
return pd.read_csv("https://raw.githubusercontent.com/SkyTowner/sample_data/main/iris_data.csv")

This will render the following:

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
4
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!