This article serves as reference for storage types provided by major cloud providers and trend I observe: using fuse to make “object storage” as “file storage”

Even “object storage” is very popular in cloud, there are lots of use cases to have shared “file storage” (e.g. multiple compute instances need to access the same filesystem, e.g. loading machine learning models, media files, etc.). Traditionally, all major cloud providers have file-share-like services, e.g. Azure files, AWS EFS, GCP Filestore. However, that usually means you need to store data on different places, e.g. object storage and file storage. Can you simplify the process and maintain one data repository? Can you store data in “object storage” and mount them as local filesystem. Recently, I found that Fuse comes to serve this purpose. Azure, AWS, GCP all have similar technologies. Also, on Azure, Azure storage keeps improving bandwidth, even higher than Azure files now. Next we will introduce Azure blobfuse.

Basic step

We will be using blobfuse v2

Install blobfuse v2

Blobfuse2 Build from source code

Installation DEB file

Download link: https://github.com/Azure/azure-storage-fuse/releases/tag/blobfuse2-2.0.2

Arch Linux

paru -S azure-storage-fuse
 
blobfuse2 --help
 
blobfuse2 --version

Configure BlobFuse2

Modify /etc/fuse.conf and uncomment user_allow_other

sed '/user_allow_other/s/^#//g' -i /etc/fuse.conf

Configure blobfuse v2 caching (blobfuse uses cache to speed up repeated file retrieval)

https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy#configure-caching-for-smaller-files

  • RAM

    sudo mkdir /mnt/ramdisk
    sudo mount -t tmpfs -o size=2g tmpfs /mnt/ramdisk
    sudo mkdir /mnt/ramdisk/blobfuse2tmp
    sudo chown $USER /mnt/ramdisk/blobfuse2tmp

Set-up rsyslog

Set-up

sudo mkdir /etc/rsyslog.d
cd /etc/rsyslog.d
sudo wget https://raw.githubusercontent.com/Azure/azure-storage-fuse/main/setup/11-blobfuse2.conf

Set-up Logrotate

sudo wget -O blobfuse2 https://raw.githubusercontent.com/Azure/azure-storage-fuse/main/setup/blobfuse2-logrotate

Create mount folder

#+begin_src bash mkdir ~/mycontainer #+end_sre

Use this config file from blobfuse repo to populate config.yaml

# Refer ./setup/baseConfig.yaml for full set of config parameters
allow-other: false
logging:
  type: syslog
  level: log_debug
components:
  - libfuse
  - file_cache
  - attr_cache
  - azstorage
libfuse:
  attribute-expiration-sec: 120
  entry-expiration-sec: 120
  negative-entry-expiration-sec: 240
file_cache:
  path: /mnt/ramdisk/blobfuse2tmp
  timeout-sec: 120
  max-size-mb: 4096
attr_cache:
  timeout-sec: 7200
azstorage:
  type: block
  account-name: mystorageaccount
  account-key: mystoragekey
  endpoint: https://mystorageaccount.blob.core.windows.net
  mode: key
  container: mycontainer

Create group:

sudo groupadd fuse

Add to group:

sudo usermod -aG fuse yanboyang713

Mount with blobfuse

blobfuse2 mount /home/yanboyang713/mycontainer/ --config-file=/home/yanboyang713/fileCacheConfig-ok.yaml --ignore-open-flags --foreground=true --allow-other

NOTE: Ignoring invalid max threads value 4294967295 > max (100000). Please, go to Set the max threads value

Now you can access Blob through the mounted directory, and you can see the file in Blob

cd ~/mycontainer
mkdir test
echo "hello world" > test/blob.txt

To unmount

sudo blobfuse2 unmount ~/mycontainer

usr/bin/fusermount

blobfuse2 unmount ~/bf2a/ Error: failed to unmount home/yanboyang713/bf2a [exec: “fusermount”: executable file not found in $PATH]

Solution: sudo ln -s /usr/bin/fusermount3 /usr/bin/fusermount

Show mount

blobfuse2 mount list

Create User

sudo useradd -m azure

Create DIR

mkdir azure-storage-fuse
mkdir mntblobfuse

Create Blob Configure File: BlobConfigFile=/home/azure/azure-storage-fuse/blobfuse2.yaml

In modern Linux, systemd is to manage services in a robust way, providing fault-tolerance, proper initialization. Following is systemd example for blobfuse.

systemd

/etc/systemd/system/blobfuse2.service

Description=A virtual file system adapter for Azure Blob storage.
After=network.target
[Service]
# Configures the mount point.
Environment=BlobMountingPoint=<path/to/the/mounting/point>
# Config file path
Environment=BlobConfigFile=<path/to/the/config/file>
Type=forking
ExecStart=/usr/bin/blobfuse2 mount ${BlobMountingPoint} --config-file=${BlobConfigFile}
ExecStop=/usr/bin/blobfuse2 unmount ${BlobMountingPoint}
[Install]
WantedBy=multi-user.target

NOTE:

foreground: true

Start systemd unit

sudo systemctl daemon-reload
 
sudo systemctl start blobfuse2
 
sudo systemctl status blobfuse2
 
sudo systemctl enable blobfuse2

https://github.com/mikaelweave/blobfuse-automount/tree/master/etc https://github.com/Azure/azure-storage-fuse/tree/c8fa8aab4936dcfc32254b8d4f1de818b45bb7ac/systemd/without-config-file

Add an Existing User Account to a Group

usermod -a -G examplegroup exampleusername

How to make it more secure? You can see our storage account key is stored as plain text in a file. Keeping secret in a file is not that secure. While developers can securely store the secrets in Azure Key Vault, services need a way to access Azure Key Vault. Managed identities provide an automatically managed identity in Azure Active Directory for applications to use when connecting to resources that support Azure Active Directory (Azure AD) authentication. Applications can use managed identities to obtain Azure AD tokens without having to manage any credentials. Lots of Azure services support managed identities, e.g. you can assign managed identity to Azure VM, then the VM can use managed identity to access Azure resources (think about not VM accessing resources, but a specific application (therefore multiple VMs forming an application accessing services))

Use managed identity

https://techcommunity.microsoft.com/t5/azure-paas-blog/mount-blob-storage-on-linux-vm-using-managed-identities-or/ba-p/1821744

Troubleshoot

/var/log/blobfuse2.log https://github.com/Azure/azure-storage-fuse/blob/main/TSG.md

Reference List

  1. https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-what-is
  2. https://learn.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux
  3. https://github.com/Azure/azure-storage-fuse
  4. https://aur.archlinux.org/packages/azure-storage-fuse
  5. https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-configuration
  6. https://toggen.com.au/it-tips/blobfuse2/