Managing Storage
TIDE storage is a shared resource and we want to empower users to manage their storage allocations effectively. We will cover options for managing your storage capacity for block, file and object storage. If you find that you need more capacity, you can request an increase to existing storage through the process detailed in the Requesting Storage guide. When you no longer need your storage allocation, please use the same request form to let the TIDE Support Team know.
This documentation assumes that you are familiar with the content from the Storage Services overview. This documentation assumes that you have some experience with the linux command line. If you are not familiar with the linux command line, we recommend this beginner’s guide; you can use a terminal in JupyterHub to follow along.
Managing Block and File Storage
Block and file storage are mounted as linux volumes and can be managed in a similar fashion. You can use several common linux commands to interact with these storage types. It is important to check your usage of block and file storage periodically to avoid reaching full capacity. Reaching full capacity will cause various issues that may disrupt your workflow.
Checking Storage Capacity
You can check your storage capacity using the df
command.
This will show you information for storage volumes including total, used and available capacity.
For example, on JupyterHub your persistent storage is provided by block storage which is mounted at the path /home/jovyan
.
You can see your capacity by running this command:
df -h /home/jovyan
- Note: The
-h
flag prints storage size in human-friendly output
Example output:
Filesystem Size Used Avail Use% Mounted on
/dev/drbd1024 75G 9.1G 66G 13% /home/jovyan
In this case, we have 75 GB total, 9.1 GB used, and 66 GB remaining.
Checking Storage Usage
You can check how much storage is used by a specific directory and its sub-directories using the du
command.
For example, if you wanted to see how much space your home directory in JupyterHub is using, then you could run:
du -h /home/jovyan
- Note: The
-h
flag prints storage size in human-friendly output
Example output:
2.1M /home/jovyan/tutorial/LOGS
568K /home/jovyan/tutorial/photos
60M /home/jovyan/tutorial
20K /home/jovyan/.scm_gui/userscripts
64K /home/jovyan/.scm_gui
0 /home/jovyan/.scm/packages/AMS2024.1.packages
0 /home/jovyan/.scm/packages
0 /home/jovyan/.scm
8.6G /home/jovyan
- Note: This output has been truncated for brevity
You could then inspect the directories to see if you still need the data that is stored there.
Managing Object Storage
Object storage is managed differently than block and file storage, since it is not mounted as a linux volume. TIDE object storage is S3-compatible, so there are many tools that may be used to manage TIDE object storage. TIDE block storage, being S3-compatible, organizes storage into buckets.
Checking Usage
The National Research Platform (NRP) provides a useful storage page that provides a high-level view of the object storage use for TIDE. This information is updated about once every 24 hours, so there may be a delay. You can check when the information was last updated using the “LastChecked” column.
To access this information follow these steps:
- Navigate to the storage page
- Sign in using your single sign-on credentials
- Click the “Pool” drop down and select “TIDE S3”
- Click the tidesupport user
- Peruse the object storage buckets and storage consumption
Using Rclone
Rclone is a popular command line tool that offers file transfer support for several cloud providers. We recommend it here as it supports S3-compatible storage providers like TIDE’s Ceph object store. Rclone’s syntax is inspired by linux file management commands, so it may feel somewhat familiar though it does have some differences.
Configuring Rclone
Assuming that you have requested and been granted TIDE object storage access, you can use the following text guide to configure Rclone on your local computer. This text guide may be supplemented with this recording for configuring Rclone.
- Note: Rclone is updated frequently and the configuration options may change order, the important thing is to use the “S3 Compliant” and “Ceph object storage” options
- Start rclone configuration
rclone config
- New remote
n
- Recommend a short name for typing and using in scripts
s3
- Search for the number associated with “Amazon S3 Compliant Storage Providers …”
- At the time of writing this was option 4
4
- Search for the number associated with “Ceph Object Storage”
- At the time of writing this was option 4
4
- Enter AWS Credentials
- At the time of writing this was option 1
1
- Note: These will be displayed in plain text
- Enter your Access Key
- Enter your Secret Key
- Leave the Region empty (hit enter)
- Since we are setting this up on a local computer, we are external to the cluster, so we will use the external endpoint
https://s3-tide.nrp-nautilus.io
- Note: If you are configuring this for use inside the cluster I.E. in a batch job or JupyterHub then the endpoint is
http://rook-ceph-rgw-tide.rook-tide
- Note: If you received a different set of endpoints when you were granted your S3 credentials, then please use those instead
- Leave the location constraint empty (hit enter)
- For the ACL, use “Owner gets FULL_CONTROL”
- At the time of writing this was option 1
1
- Leave server-side encryption empty (hit enter)
- Leave KMS ID empty (hit enter)
- Do not edit advanced config
n
- Keep this remote
y
- You should now see a list of your remotes:
Current remotes: Name Type ==== ==== s3 s3
- Quit rclone config
q
- Test your configuration
rclone ls s3:my-bucket
- Note: Replace ‘s3’ with your chosen remote name, replace ‘my-bucket’ with your bucket name
- If you did not get an error message, then your configuration was successful!
- Note: You may not get any output, this just means that your bucket is empty
Common Rclone Commands
After configuring Rclone for TIDE object storage, you can use the following commands to transfer and interact with files. This text guide may be supplemented with this recording for how to use rclone. For a full list of available Rclone commands, please see the official rclone commands page.
Checking Endpoints
You can check your configured remotes in Rclone. Your endpoints are destinations that you can transfer files to and from.
Example command:
rclone config
Example output:
Current remotes:
Name Type
==== ====
s3 s3
tide-s3 s3
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q>
You can exit this command by typing q
and hitting enter.
Listing Directories and Files
You can list all of the directories in your bucket.
Example command:
rclone lsd s3:bucket
- Note: Replace
bucket
above with your bucket name
Example output:
0 1999-12-31 16:00:00 -1 rclone-recipe
You can list all of the files in your bucket, but we recommend only listing files in a specific directory if you have a lot of files.
Example command:
rclone ls s3:bucket
- Note: Replace
bucket
above with your bucket name - Note: To
ls
a particular directory, supply the directory name after the bucket like sos3:bucket/my-directory
Example output:
121 rclone-recipe/copy-train-push.sh
101 rclone-recipe/hello.txt
173 rclone-recipe/train.py
- Note: The numbers in the first column are the file sizes in bytes
Copying Directories and Files
You can copy individual files to or from your bucket.
Example command:
rclone copy my-file.txt s3:bucket
- Note: Replace
bucket
above with your bucket name - Note: To
copy
into a particular directory, supply the directory name after the bucket like sos3:bucket/my-directory
- Note: To copy from your bucket to the machine you are running Rclone on, simply reverse the order above
- Note: For large transfers you can see progress by supplying the
-P
flag immediately followingrclone
Example output:
<blank>
- Note: If you see output, please check it as it may indicate an error
You can copy entire directories to or from your bucket.
Example command:
rclone copy directory/ s3:bucket/directory
- Note: Replace
bucket
above with your bucket name - Note: To copy from your bucket to the machine you are running Rclone on, simply reverse the order of
directory/
ands3:bucket/directory
above
Example output:
<blank>
- Note: If you see output, please check it as it may indicate an error
Removing Directories and Files
You can remove files from your bucket.
Example command:
rclone deletefile s3:bucket/my-file.txt
- Note: Replace
bucket
above with your bucket name - Note: TIDE object storage is not backed up, so please be careful deleting data
Example output:
<blank>
- Note: If you see output, please check it as it may indicate an error
You can remove entire directories from your bucket.
- Note: TIDE object storage is not backed up, so please be careful deleting data
Example command:
rclone delete s3:bucket/directory
- Note: Replace
bucket
above with your bucket name
Example output:
<blank>
- Note: If you see output, please check it as it may indicate an error