Usage¶
To use Dask Azure Blob FileSystem in a project:
from azureblobfs.dask import DaskAzureBlobFileSystem
This import makes sure that dask is aware of azure blob filesystem. Next we import dask to read our data:
import dask.dataframe as dd
Then you load your data as usual:
data = dd.read_csv("abfs://account_name/mycontainer/weather*.csv",
storage_options={"account_name": account_name,
"account_key": account_key})
If you don’t provide account_name or account_key, you would need to set them via environment variables AZURE_BLOB_ACCOUNT_NAME and AZURE_BLOB_ACCOUNT_KEY respectively. In which case your code would be much simpler:
data = dd.read_csv("abfs://account_name/mycontainer/weather*.csv")
The account_name in the URL is the same as AZURE_BLOB_ACCOUNT_NAME, so you can remove a lot more of the hardcoding:
data = dd.read_csv("abfs://{account_name}/mycontainer/weather*.csv"
.format(account_name=os.environ.get("AZURE_BLOB_ACCOUNT_NAME")
You won’t even have to hardcode abfs:// if you want to use it from DaskAzureBlobFileSystem.protocol. Now our code becomes more verbose, but has even fewer hardcoding:
data = dd.read_csv("{protocol}://{account_name}/mycontainer/weather*.csv"
.format(protocol=DaskAzureBlobFileSystem.protocol,
account_name=os.environ.get("AZURE_BLOB_ACCOUNT_NAME")