Vault Basics¶
Creating Vaults¶
You can create a vault as long as it has a unique name within your account domain. Vault and object names are case-insensitive. Once you create a vault you'll be able to add folders, upload files, and create datasets. To be safe, a special method is provided to retrieve the vault by name if it already exists:
1 2 3 4 5 6 7 | from solvebio import Vault # Create a vault by name (only if it doesn't exist) in your account domain vault_x = Vault.get_or_create_by_full_path('Vault X') # Create a vault (fails if it already exists) vault_x = Vault.create(name='Vault X') |
1 2 3 4 5 6 7 | library(solvebio) # Create a vault by name (only if it doesn't exist) vault_x <- Vault.get_or_create_by_full_path('Vault X') # Create a vault (fails if it already exists) vault_x <- Vault.create(name='Vault X') |
Retrieving Vaults¶
You can retrieve any shared vault by name or full path (e.g. domain:name
). The only exception is your personal vault which has a special name, ~
, which is also its full path. If the vault is shared with you from another organization, you must retrieve it by its full path (e.g. solvebio:public
). Vault names are case insensitive. You can also retrieve multiple vaults matching a given advanced search query (e.g. user:username
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | from solvebio import Vault # Retrieve your personal vault my_vault = Vault.get_personal_vault() # Your personal vault also has the shortcut `~` my_vault = Vault.get_by_full_path('~') # Retrieve a shared vault by name vault_x = Vault.get_by_full_path('Vault X') # Retrieve a vault from a different domain public_vault = Vault.get_by_full_path('solvebio:public') # Retrieve a vault by ID public_vault = Vault.retrieve('2956') # Retrieve all vaults which match a given Advanced search query specific_user_vaults = Vault.all(query='user:john') |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | library(solvebio) # Retrieve your personal vault my_vault <- Vault.get_personal_vault() # Your personal vault also has the shortcut `~` my_vault <- Vault.get_by_full_path('~') # Retrieve a shared vault by name vault_x <- Vault.get_by_full_path('Vault X') # Retrieve a vault from a different domain public_vault <- Vault.get_by_full_path('solvebio:public') # Retrieve a vault by ID public_vault <- Vault.retrieve('2956') # Retrieve all vaults which match a given Advanced search query specific_user_vaults <- Vault.all(query='user:john') |
Creating Folders¶
Folders can only be created within any vault that you have write-level permissions for. Folder names are case-insensitive. If you attempt to create a folder with a duplicate name, the vault will add an incrementing number to the name (i.e. folder, folder-1, folder-2, ...).
1 2 3 4 5 6 7 | from solvebio import Vault # First, retrieve the vault vault = Vault.get_personal_vault() # Create the folder at the root of the vault (path is optional) folder = vault.create_folder('new-folder', path='/') |
1 2 3 4 5 6 7 | library(solvebio) # First, retrieve the vault vault <- Vault.get_personal_vault() # Create the folder at the root of the vault folder <- Vault.create_folder(vault$id, '/new-folder') |
Uploading Files¶
You can upload files into any vault that you have write-level access for. File names are case-insensitive. Uploading a file with a duplicate name (or the same name as a folder) will cause the new file's name to be auto-incremented (i.e. file, file-1, file-2, ...).
Upload size limits
The max upload size is 5GB. We recommend gzipping your files before uploading if they are large. Contact SolveBio Support if your files are larger than 5GB.
1 2 3 4 5 6 7 | from solvebio import Vault # First retrieve the vault vault = Vault.get_personal_vault() # Upload your file into the root of the vault vault.upload_file('data.csv', '/') |
1 2 3 4 5 6 7 8 9 10 | library(solvebio) # First retrieve the vault vault <- Vault.get_personal_vault() # Upload your file into the root of the vault Object.upload_file('./data.csv', vault$id, '/') # You can also specify a new filename for the uploaded file: Object.upload_file('./data.tsv', vault$id, '/', 'data_with_a_description.csv') |
Batch Uploading (Python Only)¶
If you have many files to upload at once you can use the upload
command built-into SolveBio's Python module.
This command is designed to be "idempotent", which means that if called more than once it will cross-check the files and upload only the local files and folders that do not yet exist in the vault.
1 2 3 4 5 6 7 8 9 10 11 | # Upload all the CSV files into the root of your personal vault solvebio upload --full-path "~/" ./*.csv # Create the target path if not exists solvebio upload --full-path "~/some-non-existent-path" --create-full-path ./*.csv # Upload CSV files, but exclude some of them by name solvebio upload --full-path "~/" --exclude old-csv-files/* ./*.csv # Run in dry run mode to see before running solvebio upload --full-path "~/" --exclude old-csv-files/* ./*.csv --dry-run |
Note that comparison is performed by file name and by file md5
For full usage
1 | solvebio upload --help |
Downloading Files¶
You can download any existing file from a vault (requires read access to the vault):
1 2 3 4 5 6 7 | from solvebio import Object # Retrieve an existing file from your personal vault csv_file = Object.get_by_full_path('~/data.csv') # Download it to the current directory csv_file.download('./') |
1 2 3 4 5 6 7 8 | library(solvebio) # Retrieve an existing file from your personal vault csv_file = Object.get_by_full_path('vault:/data.csv') # Get the download URL for the file and download it url <- Object.get_download_url(csv_file$id) download.file(url, 'data.csv') |
If you want to download more than one file in the same folder:
1 2 3 4 5 6 7 8 9 10 11 12 13 | from solvebio import Object, Vault # Retrieve a vault vault = Vault.get_personal_vault() folder = Object.get_by_full_path("vault:/path/to/folder") for file_ in folder.files(): file_.download() #Search for a particular object in the vault files = vault.search('xyz', object_type='file') for file in files: file.download() |
1 2 3 4 5 6 | files <- Vault.search(vault$id, 'xyz', object_type='file') for (file in files) { url <- Object.get_download_url(file$id) download.file(url, file$name) } |
For more information on searching for files in the vault, refer to Searching vaults
Downloading using the Python client¶
The Python client can also be used to download individual files or entire folders.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # Download a single file solvebio download "~/path/to/file.txt" . # Download a folder solvebio download --recursive "~/path/to/folder" local_folder # Download a folder, but exclude hidden files and folders solvebio download --recursive "~/path/to/folder" local_folder --exclude "*/.*" # Download a folder, but exclude DS_store files solvebio download --recursive "~/path/to/folder" local_folder --exclude "*/.DS_store" # Download only PDF files within a folder # --include always supersedes --exclude solvebio download --recursive "~/path/to/folder" local_folder --exclude "*" --include "*.pdf" # The --delete flag will delete local files that do not match # those found in the vault. Always use the --dry-run mode first # with this option as it will delete files permanently. solvebio download --recursive "~/path/to/folder" local_folder --delete --dry-run |
For full usage
1 | solvebio download --help |
Creating Datasets¶
While all vault providers support flat files, only vaults created with the "SolveBio" vault provider (the default provider) support datasets.
Please see the dataset documentation for information on working with datasets:
- Learn more about creating datasets →
- Learn more about importing data →
- Learn more about transforming data →
- Learn more about querying datasets →
- Learn more about exporting data →
Metadata and Tags¶
You may add tags and metadata to any vault object (files, folders, and datasets).
Tags¶
Tags are a case-insensitive lists of strings. Tags can be used to filter and search for objects.
1 2 3 4 5 6 7 8 9 10 11 12 13 | from solvebio import Vault # Upload a file or retrieve one vault = Vault.get_personal_vault() csv_file = vault.upload_file('data.csv', '/') # Add some tags to the object csv_file.tags = ['tag1', 'tag2'] csv_file.save() # There are also shortcuts to add and remove tags csv_file.tag('tag3') csv_file.untag('tag1') |
1 2 3 4 5 6 7 8 | library(solvebio) # Upload a file vault <- Vault.get_personal_vault() object <- Object.upload_file('./analysis.tsv', vault$id, '/') # Add metadata and tags to the object Object.update(object$id, tags=list("tag1", "tag2")) |
Metadata¶
Metadata are represented by key/value pairs. While nested value pairs are allowed, we recommend using a flat metadata structure.
1 2 3 4 5 6 7 8 9 | from solvebio import Vault # Upload a file or retrieve one vault = Vault.get_personal_vault() csv_file = vault.upload_file('data.csv', '/') # Add metadata to the object csv_file.metadata = {'file_type': 'CSV', 'project': 'My Project'} csv_file.save() |
1 2 3 4 5 6 7 8 | library(solvebio) # Upload a file vault <- Vault.get_personal_vault() object <- Object.upload_file('./analysis.tsv', vault$id, '/') # Add metadata and tags to the object Object.update(object$id, metadata=list(file_type="CSV", project="My Project")) |
Metadata Links
Any metadata values that contain links will be converted to links on the SolveBio UI.
Searching within Vaults¶
You can search for files, folders, and datasets within any vault by name or other attributes. Use the advanced search query syntax (e.g. user:username
) to search for anything.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | from solvebio import Vault # Retrieve a vault vault = Vault.get_personal_vault() # Search across files, folders, and datasets in the vault objects = vault.search('xyz') # Search for a particular object type: file/folder/dataset files = vault.search('xyz AND type:file') # List all datasets in a vault datasets = vault.datasets() # List all datasets in a folder folder = next(vault.folders()) datasets = folder.datasets() # Find all objects matching an exact filename data_objects = vault.objects(filename='data.csv') # Find files that contain a string samples = vault.files(query='tumor_sample_x') # Find files with a specific path samples = vault.files(query='/brca/october/samples') # Find datasets public_vault = Vault.get_by_full_path('solvebio:public') clinvar = public_vault.datasets(query='clinvar') # List all the child folders of a specific folder (subfolders) path = 'solvebio:public:/MEDLINE/2.3.3-2018' folder = Object.get_by_full_path(path) child_folders = [i.filename for i in folder.folders()] # Search for all XML files xml_files = [i.filename for i in folder.search('*.xml.gz AND type:file')] # Get all the files in a folder recursively path = 'solvebio:public:/MEDLINE' folder = Object.get_by_full_path(path) files = folder.files(recursive=True) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | library(solvebio) # Retrieve a vault vault <- Vault.get_personal_vault() # Search across files, folders, and datasets in the vault objects <- Vault.search(vault$id, query='xyz') # Search for a particular object type: file/folder/dataset files <- Vault.search(vault$id, 'xyz', object_type='file') # List all datasets in a vault datasets <- Vault.datasets(vault$id) # Find all objects matching an exact filename data_objects <- Vault.objects(vault$id, filename='data.csv') |
Advanced search¶
You can list all objects within a vault that match a specific pattern (i.e. find all the files within a certain folder) by providing a case-insensitive regular expression to the regex parameter.
It is highly recommended to use Object.search()
instead of searching by regular expression, unless it is absolutely necessary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | from solvebio import Vault from solvebio import Object # Get the public vault public_vault = Vault.get_by_full_path('solvebio:public') # Find datasets using regex clinvar_v2 = public_vault.datasets(regex='/ClinVar/2.*') # List the dataset ids of every dataset that has Outcome somewhere in the path all_outcomes = [d.id for d in Object.all(regex=".*Outcome.*", type='dataset')] # List the filenames of all xml files within a specific path path = 'solvebio:public:/MEDLINE/2.3.3-2018/updatefiles' folder = Object.get_by_full_path(path) xml_files = [i.filename for i in folder.files(regex="{}.*.xml.gz".format(folder.path))] # Unix style wildcards are supported too xml_files = [i.filename for i in folder.files(glob="{}*.xml.gz".format(folder.path))] |
1 2 3 4 5 6 | library(solvebio) # Retrieve the SolveBio public vault # List all datasets within a specific folder using regex public <- Vault.get_by_full_path('solvebio:public') all_clinvar_datasets <- Vault.datasets(public$id, regex='/ClinVar/.*') |
Move files between folders¶
You can search for files in one folder using the aforementioned querying and move them to another folder.
1 2 3 4 5 6 7 8 9 10 11 12 13 | from solvebio import Object # Get the full path to the current and new folder where you want to move your files new_folder = Object.get_or_create_by_full_path("~/my/new/folder", object_type="folder") current_folder = Object.get_or_create_by_full_path("~/my/existing/folder", object_type="folder") # Query current folder for the specific files files = current_folder.files(query="my_search_string") # Change the parent id of each folder in order to move it to the new folder for file_ in files: file_.parent_object_id = new_folder.id file_.save() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | library(solvebio) # Get the vault vault_x <- Vault.get_or_create_by_full_path('Vault X') # Create the folder in the vault new_folder <- Vault.create_folder(vault_x$id, '/new-folder') # Query current vault for the specific files files <- Vault.search(vault_x$id, 'xyz', object_type='file') # Update parent_object_id in order to move the file to the new folder for (file in files) { Object.update(file$id, parent_object_id=new_folder$id) } |
Deleting Vaults and Objects¶
Deletions cannot be undone
Deleting vaults and folders will irreversibly delete all the objects within them. Deleting files and datasets will result in loss of access to the underlying data and cannot be undone.
You can delete any vault or object (file, folder, or dataset) that you have admin-level permissions on. Deleting a vault or folder will automatically delete all its contents.
1 2 3 4 5 6 7 8 9 10 | from solvebio import Vault # Create an empty folder in your personal vault vault = Vault.get_personal_vault() folder = vault.create_folder('test-delete-folder', path='/') # Deletion of any object requires a confirmation from the user. # You can disable this confirmation by passing the `force=True` flag. folder.delete() >>> Are you sure you want to delete this object? [y/N] y |
1 2 3 4 5 6 7 8 | library(solvebio) # Create an empty folder in your personal vault vault <- Vault.get_personal_vault() folder <- Vault.create_folder(vault$id, '/test-delete-folder') # Create the folder at the root of the vault Object.delete(folder$id) |