Skip to content

Vault Basics

Creating Vaults

You can create a vault as long as it has a unique name within your account domain. Vault and object names are case-insensitive. Once you create a vault you'll be able to add folders, upload files, and create datasets. To be safe, a special method is provided to retrieve the vault by name if it already exists:

1
2
3
4
5
6
7
from solvebio import Vault

# Create a vault by name (only if it doesn't exist) in your account domain
vault_x = Vault.get_or_create_by_full_path('Vault X')

# Create a vault (fails if it already exists)
vault_x = Vault.create(name='Vault X')
1
2
3
4
5
6
7
library(solvebio)

# Create a vault by name (only if it doesn't exist)
vault_x <- Vault.get_or_create_by_full_path('Vault X')

# Create a vault (fails if it already exists)
vault_x <- Vault.create(name='Vault X')

Retrieving Vaults

You can retrieve any shared vault by name or full path (e.g. domain:name). The only exception is your personal vault which has a special name, ~, which is also its full path. If the vault is shared with you from another organization, you must retrieve it by its full path (e.g. solvebio:public). Vault names are case insensitive.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from solvebio import Vault

# Retrieve your personal vault
my_vault = Vault.get_personal_vault()

# Your personal vault also has the shortcut `~`
my_vault = Vault.get_by_full_path('~')

# Retrieve a shared vault by name
vault_x = Vault.get_by_full_path('Vault X')

# Retrieve a vault from a different domain
public_vault = Vault.get_by_full_path('solvebio:public')

# Retrieve a vault by ID
public_vault = Vault.retrieve('2956')
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
library(solvebio)

# Retrieve your personal vault
my_vault <- Vault.get_personal_vault()

# Your personal vault also has the shortcut `~`
my_vault <- Vault.get_by_full_path('~')

# Retrieve a shared vault by name
vault_x <- Vault.get_by_full_path('Vault X')

# Retrieve a vault from a different domain
public_vault <- Vault.get_by_full_path('solvebio:public')

# Retrieve a vault by ID
public_vault <- Vault.retrieve('2956')

Creating Folders

Folders can only be created within any vault that you have write-level permissions for. Folder names are case-insensitive. If you attempt to create a folder with a duplicate name, the vault will add an incrementing number to the name (i.e. folder, folder-1, folder-2, ...).

1
2
3
4
5
6
7
from solvebio import Vault

# First, retrieve the vault
vault = Vault.get_personal_vault()

# Create the folder at the root of the vault (path is optional)
folder = vault.create_folder('new-folder', path='/')
1
2
3
4
5
6
7
library(solvebio)

# First, retrieve the vault
vault <- Vault.get_personal_vault()

# Create the folder at the root of the vault
folder <- Vault.create_folder(vault$id, '/new-folder')

Uploading Files

You can upload files into any vault that you have write-level access for. File names are case-insensitive. Uploading a file with a duplicate name (or the same name as a folder) will cause the new file's name to be auto-incremented (i.e. file, file-1, file-2, ...).

Upload size limits

The max upload size is 5GB. We recommend gzipping your files before uploading if they are large. Contact SolveBio Support if your files are larger than 5GB.

1
2
3
4
5
6
7
from solvebio import Vault

# First retrieve the vault
vault = Vault.get_personal_vault()

# Upload your file into the root of the vault
vault.upload_file('data.csv', '/')
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
library(solvebio)

# First retrieve the vault
vault <- Vault.get_personal_vault()

# Upload your file into the root of the vault
Object.upload_file('./data.csv', vault$id, '/')

# You can also specify a new filename for the uploaded file:
Object.upload_file('./data.tsv', vault$id, '/', 'data_with_a_description.csv')

Batch Uploading (Python Only)

If you have many files to upload at once you can use the upload command built-into SolveBio's Python module. This command is designed to be "idempotent", which means that if called more than once it will cross-check the files and upload only the local files and folders that do not yet exist in the vault.

1
2
# Upload all the CSV files into the root of your personal vault
solvebio upload --full-path "~/" ./*.csv

Note that comparison is performed by file name and not by file contents.

Downloading Files

You can download any existing file from a vault (requires read access to the vault):

1
2
3
4
5
6
7
from solvebio import Object

# Retrieve an existing file from your personal vault
csv_file = Object.get_by_full_path('~/data.csv')

# Download it to the current directory
csv_file.download('./')
1
2
3
4
5
6
7
8
library(solvebio)

# Retrieve an existing file from your personal vault
csv_file = Object.get_by_full_path('vault:/data.csv')

# Get the download URL for the file and download it
url <- Object.get_download_url(csv_file$id)
download.file(url, 'data.csv')

Creating Datasets

While all vault providers support flat files, only vaults created with the "SolveBio" vault provider (the default provider) support datasets.

Please see the dataset documentation for information on working with datasets:

Metadata and Tags

You may add tags and metadata to any vault object (files, folders, and datasets). Tags are lists of strings and metadata are represented by key/value pairs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from solvebio import Vault

# Upload a file or retrieve one
vault = Vault.get_personal_vault()
csv_file = vault.upload_file('data.csv', '/')

# Add metadata and tags to the object
csv_file.metadata = {'file_type': 'CSV', 'project': 'My Project'}
csv_file.tags = ['CSVs', 'project files']
csv_file.save()
1
2
3
4
5
6
7
8
library(solvebio)

# Upload a file
vault <- Vault.get_personal_vault()
object <- Object.upload_file('./analysis.tsv', vault$id, '/')

# Add metadata and tags to the object
Object.update(object$id, metadata=list(file_type="CSV", project="My Project"), tags=list("CSVs", "project files"))

Searching Vaults

You can search for files, folders, and datasets by name within any vault. You can also list all objects within a vault that match a specific pattern (i.e. find all the files within a certain folder) by providing a case-insensitive regular expression to the regex parameter:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
from solvebio import Vault

# Retrieve a vault
vault = Vault.get_personal_vault()

# Search across files, folders, and datasets in the vault
objects = vault.search('xyz')

# Search for a particular object type: file/folder/dataset
files = vault.search('xyz', object_type='file')

# List all datasets in a vault
datasets = vault.datasets()

# Find all objects matching an exact filename
data_objects = vault.objects(filename='data.csv')

# Find files that contain a string
samples = vault.files(query='tumor_sample_x')

# Find files with a specific path
samples = vault.files(query='/brca/october/samples')

# Find datasets
public_vault = Vault.get_by_full_path('solvebio:public')
clinvar = public_vault.datasets(query='clinvar')

# Find datasets using regex
clinvar_v2  = public_vault.datasets(regex='/ClinVar/2.*')

# List the dataset ids of every dataset that has Outcome somewhere in the path
all_outcomes = [d.id for d in Object.all(regex=".*Outcome.*", type='dataset')]

# List the filenames of all xml files within a specific path
path = 'solvebio:public:/MEDLINE/2.3.3-2018/updatefiles'
folder = Object.get_by_full_path(path)
xml_files = [i.filename for i in folder.vault.files(regex="{}.*.xml.gz".format(folder.path))]

# List all the child folders of a specific folder (subfolders)
path = 'solvebio:public:/MEDLINE/2.3.3-2018'
folder = Object.get_by_full_path(path)
children_folders = [i.filename for i in folder.vault.folders() if i.parent_object_id == folder.id]
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
library(solvebio)

# Retrieve a vault
vault <- Vault.get_personal_vault()

# Search across files, folders, and datasets in the vault
objects <- Vault.search(vault$id, query='xyz')

# Search for a particular object type: file/folder/dataset
files <- Vault.search(vault$id, 'xyz', object_type='file')

# List all datasets in a vault
datasets <- Vault.datasets(vault$id)

# Find all objects matching an exact filename
data_objects <- Vault.objects(vault$id, filename='data.csv')

# Retrieve the SolveBio public vault
# List all datasets within a specific folder using regex
public <- Vault.get_by_full_path('solvebio:public')
all_clinvar_datasets <- Vault.datasets(public$id, regex='/ClinVar/.*')

Deleting Vaults and Objects

Deletions cannot be undone

Deleting vaults and folders will irreversibly delete all the objects within them. Deleting files and datasets will result in loss of access to the underlying data and cannot be undone.

You can delete any vault or object (file, folder, or dataset) that you have admin-level permissions on. Deleting a vault or folder will automatically delete all its contents.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from solvebio import Vault

# Create an empty folder in your personal vault
vault = Vault.get_personal_vault()
folder = vault.create_folder('test-delete-folder', path='/')

# Deletion of any object requires a confirmation from the user.
# You can disable this confirmation by passing the `force=True` flag.
folder.delete()
>>> Are you sure you want to delete this object? [y/N] y
1
2
3
4
5
6
7
8
library(solvebio)

# Create an empty folder in your personal vault
vault <- Vault.get_personal_vault()
folder <- Vault.create_folder(vault$id, '/test-delete-folder')

# Create the folder at the root of the vault
Object.delete(folder$id)