Usage

To use Dask Azure Blob FileSystem in a project:

from azureblobfs.dask import DaskAzureBlobFileSystem

This import makes sure that dask is aware of azure blob filesystem. Next we import dask to read our data:

import dask.dataframe as dd

Then you load your data as usual:

data = dd.read_csv("abfs://account_name/mycontainer/weather*.csv",
    storage_options={"account_name": account_name,
        "account_key": account_key})

If you don’t provide account_name or account_key, you would need to set them via environment variables AZURE_BLOB_ACCOUNT_NAME and AZURE_BLOB_ACCOUNT_KEY respectively. In which case your code would be much simpler:

data = dd.read_csv("abfs://account_name/mycontainer/weather*.csv")

The account_name in the URL is the same as AZURE_BLOB_ACCOUNT_NAME, so you can remove a lot more of the hardcoding:

data = dd.read_csv("abfs://{account_name}/mycontainer/weather*.csv"
    .format(account_name=os.environ.get("AZURE_BLOB_ACCOUNT_NAME")

You won’t even have to hardcode abfs:// if you want to use it from DaskAzureBlobFileSystem.protocol. Now our code becomes more verbose, but has even fewer hardcoding:

data = dd.read_csv("{protocol}://{account_name}/mycontainer/weather*.csv"
    .format(protocol=DaskAzureBlobFileSystem.protocol,
        account_name=os.environ.get("AZURE_BLOB_ACCOUNT_NAME")