Vault Basics¶

Creating Vaults¶

You can create a vault as long as it has a unique name within your account domain. Vault and object names are case-insensitive. Once you create a vault you'll be able to add folders, upload files, and create datasets. To be safe, a special method is provided to retrieve the vault by name if it already exists:

Python R

from solvebio import Vault

# Create a vault by name (only if it doesn't exist) in your account domain
vault_x = Vault.get_or_create_by_full_path('Vault X')

# Create a vault (fails if it already exists)
vault_x = Vault.create(name='Vault X')

library(solvebio)

# Create a vault by name (only if it doesn't exist)
vault_x <- Vault.get_or_create_by_full_path('Vault X')

# Create a vault (fails if it already exists)
vault_x <- Vault.create(name='Vault X')

Retrieving Vaults¶

You can retrieve any shared vault by name or full path (e.g. domain:name). The only exception is your personal vault which has a special name, ~, which is also its full path. If the vault is shared with you from another organization, you must retrieve it by its full path (e.g. solvebio:public). Vault names are case insensitive. You can also retrieve multiple vaults matching a given advanced search query (e.g. user:username).

Python R

from solvebio import Vault

# Retrieve your personal vault
my_vault = Vault.get_personal_vault()

# Your personal vault also has the shortcut `~`
my_vault = Vault.get_by_full_path('~')

# Retrieve a shared vault by name
vault_x = Vault.get_by_full_path('Vault X')

# Retrieve a vault from a different domain
public_vault = Vault.get_by_full_path('solvebio:public')

# Retrieve a vault by ID
public_vault = Vault.retrieve('2956')

# Retrieve all vaults which match a given Advanced search query
specific_user_vaults = Vault.all(query='user:john')

library(solvebio)

# Retrieve your personal vault
my_vault <- Vault.get_personal_vault()

# Your personal vault also has the shortcut `~`
my_vault <- Vault.get_by_full_path('~')

# Retrieve a shared vault by name
vault_x <- Vault.get_by_full_path('Vault X')

# Retrieve a vault from a different domain
public_vault <- Vault.get_by_full_path('solvebio:public')

# Retrieve a vault by ID
public_vault <- Vault.retrieve('2956')

# Retrieve all vaults which match a given Advanced search query
specific_user_vaults <- Vault.all(query='user:john')

Creating Folders¶

Folders can only be created within any vault that you have write-level permissions for. Folder names are case-insensitive. If you attempt to create a folder with a duplicate name, the vault will add an incrementing number to the name (i.e. folder, folder-1, folder-2, ...).

Python R

from solvebio import Vault

# First, retrieve the vault
vault = Vault.get_personal_vault()

# Create the folder at the root of the vault (path is optional)
folder = vault.create_folder('new-folder', path='/')

library(solvebio)

# First, retrieve the vault
vault <- Vault.get_personal_vault()

# Create the folder at the root of the vault
folder <- Vault.create_folder(vault$id, '/new-folder')

Uploading Files¶

You can upload files into any vault that you have write-level access for. File names are case-insensitive. Uploading a file with a duplicate name (or the same name as a folder) will cause the new file's name to be auto-incremented (i.e. file, file-1, file-2, ...).

Upload size limits

The max upload size is 5GB. We recommend gzipping your files before uploading if they are large. Contact SolveBio Support if your files are larger than 5GB.

Python R

from solvebio import Vault

# First retrieve the vault
vault = Vault.get_personal_vault()

# Upload your file into the root of the vault
vault.upload_file('data.csv', '/')

library(solvebio)

# First retrieve the vault
vault <- Vault.get_personal_vault()

# Upload your file into the root of the vault
Object.upload_file('./data.csv', vault$id, '/')

# You can also specify a new filename for the uploaded file:
Object.upload_file('./data.tsv', vault$id, '/', 'data_with_a_description.csv')

Batch Uploading (Python Only)¶

If you have many files to upload at once you can use the upload command built-into SolveBio's Python module. This command is designed to be "idempotent", which means that if called more than once it will cross-check the files and upload only the local files and folders that do not yet exist in the vault.

# Upload all the CSV files into the root of your personal vault
solvebio upload --full-path "~/" ./*.csv

# Create the target path if not exists
solvebio upload --full-path "~/some-non-existent-path" --create-full-path ./*.csv

# Upload CSV files, but exclude some of them by name
solvebio upload --full-path "~/" --exclude old-csv-files/*  ./*.csv

# Run in dry run mode to see before running
solvebio upload --full-path "~/" --exclude old-csv-files/*  ./*.csv --dry-run

Note that comparison is performed by file name and by file md5

For full usage

1	solvebio upload --help

Downloading Files¶

You can download any existing file from a vault (requires read access to the vault):

Python R

from solvebio import Object

# Retrieve an existing file from your personal vault
csv_file = Object.get_by_full_path('~/data.csv')

# Download it to the current directory
csv_file.download('./')

library(solvebio)

# Retrieve an existing file from your personal vault
csv_file = Object.get_by_full_path('vault:/data.csv')

# Get the download URL for the file and download it
url <- Object.get_download_url(csv_file$id)
download.file(url, 'data.csv')

If you want to download more than one file in the same folder:

Python R

from solvebio import Object, Vault

# Retrieve a vault
vault = Vault.get_personal_vault()

folder = Object.get_by_full_path("vault:/path/to/folder")
for file_ in folder.files():
     file_.download()

#Search for a particular object in the vault
files = vault.search('xyz', object_type='file')
for file in files:
    file.download()

files <- Vault.search(vault$id, 'xyz', object_type='file')

for (file in files) {
    url <- Object.get_download_url(file$id)
    download.file(url, file$name)
}

For more information on searching for files in the vault, refer to Searching vaults

Downloading using the Python client¶

The Python client can also be used to download individual files or entire folders.

# Download a single file
solvebio download "~/path/to/file.txt" .

# Download a folder
solvebio download --recursive "~/path/to/folder" local_folder

# Download a folder, but exclude hidden files and folders
solvebio download --recursive "~/path/to/folder" local_folder --exclude "*/.*"

# Download a folder, but exclude DS_store files
solvebio download --recursive "~/path/to/folder" local_folder --exclude "*/.DS_store"

# Download only PDF files within a folder
# --include always supersedes --exclude
solvebio download --recursive "~/path/to/folder" local_folder --exclude "*" --include "*.pdf"

# The --delete flag will delete local files that do not match
# those found in the vault. Always use the --dry-run mode first
# with this option as it will delete files permanently.
solvebio download --recursive "~/path/to/folder" local_folder --delete --dry-run

For full usage

1	solvebio download --help

Creating Datasets¶

While all vault providers support flat files, only vaults created with the "SolveBio" vault provider (the default provider) support datasets.

Please see the dataset documentation for information on working with datasets:

Metadata and Tags¶

You may add tags and metadata to any vault object (files, folders, and datasets).

Tags¶

Tags are a case-insensitive lists of strings. Tags can be used to filter and search for objects.

Python R

from solvebio import Vault

# Upload a file or retrieve one
vault = Vault.get_personal_vault()
csv_file = vault.upload_file('data.csv', '/')

# Add some tags to the object
csv_file.tags = ['tag1', 'tag2']
csv_file.save()

# There are also shortcuts to add and remove tags
csv_file.tag('tag3')
csv_file.untag('tag1')

library(solvebio)

# Upload a file
vault <- Vault.get_personal_vault()
object <- Object.upload_file('./analysis.tsv', vault$id, '/')

# Add metadata and tags to the object
Object.update(object$id, tags=list("tag1", "tag2"))

Metadata¶

Metadata are represented by key/value pairs. While nested value pairs are allowed, we recommend using a flat metadata structure.

Python R

from solvebio import Vault

# Upload a file or retrieve one
vault = Vault.get_personal_vault()
csv_file = vault.upload_file('data.csv', '/')

# Add metadata to the object
csv_file.metadata = {'file_type': 'CSV', 'project': 'My Project'}
csv_file.save()

library(solvebio)

# Upload a file
vault <- Vault.get_personal_vault()
object <- Object.upload_file('./analysis.tsv', vault$id, '/')

# Add metadata and tags to the object
Object.update(object$id, metadata=list(file_type="CSV", project="My Project"))

Metadata Links

Any metadata values that contain links will be converted to links on the SolveBio UI. Links in Metadata

Searching within Vaults¶

You can search for files, folders, and datasets within any vault by name or other attributes. Use the advanced search query syntax (e.g. user:username) to search for anything.

Python R

from solvebio import Vault

# Retrieve a vault
vault = Vault.get_personal_vault()

# Search across files, folders, and datasets in the vault
objects = vault.search('xyz')

# Search for a particular object type: file/folder/dataset
files = vault.search('xyz AND type:file')

# List all datasets in a vault
datasets = vault.datasets()

# List all datasets in a folder
folder = next(vault.folders())
datasets = folder.datasets()

# Find all objects matching an exact filename
data_objects = vault.objects(filename='data.csv')

# Find files that contain a string
samples = vault.files(query='tumor_sample_x')

# Find files with a specific path
samples = vault.files(query='/brca/october/samples')

# Find datasets
public_vault = Vault.get_by_full_path('solvebio:public')
clinvar = public_vault.datasets(query='clinvar')

# List all the child folders of a specific folder (subfolders)
path = 'solvebio:public:/MEDLINE/2.3.3-2018'
folder = Object.get_by_full_path(path)
child_folders = [i.filename for i in folder.folders()]

# Search for all XML files
xml_files = [i.filename for i in folder.search('*.xml.gz AND type:file')]

# Get all the files in a folder recursively
path = 'solvebio:public:/MEDLINE'
folder = Object.get_by_full_path(path)
files = folder.files(recursive=True)

library(solvebio)

# Retrieve a vault
vault <- Vault.get_personal_vault()

# Search across files, folders, and datasets in the vault
objects <- Vault.search(vault$id, query='xyz')

# Search for a particular object type: file/folder/dataset
files <- Vault.search(vault$id, 'xyz', object_type='file')

# List all datasets in a vault
datasets <- Vault.datasets(vault$id)

# Find all objects matching an exact filename
data_objects <- Vault.objects(vault$id, filename='data.csv')

Advanced search¶

You can list all objects within a vault that match a specific pattern (i.e. find all the files within a certain folder) by providing a case-insensitive regular expression to the regex parameter. It is highly recommended to use Object.search() instead of searching by regular expression, unless it is absolutely necessary.

Python R

from solvebio import Vault
from solvebio import Object

# Get the public vault
public_vault = Vault.get_by_full_path('solvebio:public')
# Find datasets using regex
clinvar_v2 = public_vault.datasets(regex='/ClinVar/2.*')

# List the dataset ids of every dataset that has Outcome somewhere in the path
all_outcomes = [d.id for d in Object.all(regex=".*Outcome.*", type='dataset')]

# List the filenames of all xml files within a specific path
path = 'solvebio:public:/MEDLINE/2.3.3-2018/updatefiles'
folder = Object.get_by_full_path(path)
xml_files = [i.filename for i in folder.files(regex="{}.*.xml.gz".format(folder.path))]
# Unix style wildcards are supported too
xml_files = [i.filename for i in folder.files(glob="{}*.xml.gz".format(folder.path))]

library(solvebio)

# Retrieve the SolveBio public vault
# List all datasets within a specific folder using regex
public <- Vault.get_by_full_path('solvebio:public')
all_clinvar_datasets <- Vault.datasets(public$id, regex='/ClinVar/.*')

Move files between folders¶

You can search for files in one folder using the aforementioned querying and move them to another folder.

Python R

from solvebio import Object

# Get the full path to the current and new folder where you want to move your files
new_folder = Object.get_or_create_by_full_path("~/my/new/folder", object_type="folder")
current_folder = Object.get_or_create_by_full_path("~/my/existing/folder", object_type="folder")

# Query current folder for the specific files
files = current_folder.files(query="my_search_string")

# Change the parent id of each folder in order to move it to the new folder
for file_ in files:
    file_.parent_object_id = new_folder.id
    file_.save()

library(solvebio)

# Get the vault
vault_x <- Vault.get_or_create_by_full_path('Vault X')
# Create the folder in the vault
new_folder <- Vault.create_folder(vault_x$id, '/new-folder')

# Query current vault for the specific files
files <- Vault.search(vault_x$id, 'xyz', object_type='file')

# Update parent_object_id in order to move the file to the new folder
for (file in files) {
    Object.update(file$id, parent_object_id=new_folder$id)
}

Deleting Vaults and Objects¶

Deletions cannot be undone

Deleting vaults and folders will irreversibly delete all the objects within them. Deleting files and datasets will result in loss of access to the underlying data and cannot be undone.

You can delete any vault or object (file, folder, or dataset) that you have admin-level permissions on. Deleting a vault or folder will automatically delete all its contents.

Python R

from solvebio import Vault

# Create an empty folder in your personal vault
vault = Vault.get_personal_vault()
folder = vault.create_folder('test-delete-folder', path='/')

# Deletion of any object requires a confirmation from the user.
# You can disable this confirmation by passing the `force=True` flag.
folder.delete()
>>> Are you sure you want to delete this object? [y/N] y

library(solvebio)

# Create an empty folder in your personal vault
vault <- Vault.get_personal_vault()
folder <- Vault.create_folder(vault$id, '/test-delete-folder')

# Create the folder at the root of the vault
Object.delete(folder$id)

Last updated 2022-12-07.

Have questions or comments about this article? Get in touch with SolveBio Support by submitting a ticket or by sending us an email.