Skip to content

Python: Global Search

Global Search allows you to search for vaults, files, folders, and datasets by name, tags, user, date, and other metadata which can be customized. For more information about Global Search please look at Global Search Overview section.

Similarly to Global Search on the web application, the search functionality is available through solvebio Python and R clients as well.

GitHub examples

Please find the full example in the solvebio-python repository on GitHub: global search example notebook.

Importing SolveBio library and logging in

1
2
3
4
5
6
7
# Importing SolveBio library
from solvebio import login
from solvebio import Filter
from solvebio import GlobalSearch

# Logging to SolveBio
login()

GlobalSearch performs search based on the provided set of parameters (filters, entities, query, limit, ordering, etc.):

  • query (optional): An optional query string (advanced search).
  • filters (optional): Filter or List of filter objects.
  • entities (optional): List of entity tuples to filter on (entity type, entity).
  • ordering (optional): List of fields to order the results by.
  • limit (optional): Maximum number of query results to return.
  • page_size (optional): Number of results to fetch per query page.
  • result_class (optional): Class of object returned by query.
  • debug (optional): Sends debug information to the API.
  • raw_results (optional): Whether to use raw API response or to cast logical objects to Vault and Object instances.

As previously seen, all parameters are optional. Performing a search without any parameters is equivalent to global search on SolveBio without any filters - it will return all objects:

1
2
3
# No filters applied
search_results = GlobalSearch()
print('Returned {} objects.'.format(len(search_results)))
1
Returned 1449 objects.

Each result object has the following attributes:

1
print(search_results)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|                    Fields | Data                                             |
|---------------------------+--------------------------------------------------|
|                   _errors | {  "samples": "'Dataset query failed: datase ... |
|                       _id | dataset-1426183806524170528                      |
|                created_at | 2021-01-12T17:02:38.336007+00:00                 |
|                 full_path | solvebio:public:/ClinVar/5.2.0-20210110/Variants-|
|                        id | 1426183806524170528                              |
|                indexed_at | 2021-11-25T17:22:55.233690+00:00                 |
|                      name | Variants-GRCH37-1                                |
|                    parent | 5.2.0-20210110                                   |
|                 parent_id | 1426114255474932968                              |
|                      path | solvebio:public:/ClinVar/5.2.0-20210110/Variants-|
| postproc_template_version |                                                  |
|                   samples | []                                               |
|             storage_class | Archive                                          |
|                      tags | ['fuji', 'clinvar', 'public data']               |
|                      type | dataset                                          |
|                updated_at | 2021-02-05T20:18:12.301835+00:00                 |
|                      user | Jeff                                             |
|                     vault | public                                           |
|                  vault_id | 7205                                             |

... 1448 more results.

You may use the limit parameter to limit the number of returned objects:

1
2
3
# No filters applied with limit parameter
search_results = GlobalSearch(limit=200)
print('Returned {} objects.'.format(len(search_results)))
1
Returned 200 objects.

Type of results by default it either Vault instance or Object instance:

1
2
# Type of results
type(search_results[0])
1
solvebio.resource.object.Object

Advanced search query

You may perform the advanced search, similar as you would do on SolveBio, by using query argument:

1
2
3
# Advanced search
advanced_query_results = GlobalSearch(query="test")
print('Returned {} objects.'.format(len(advanced_query_results)))
1
Returned 16 objects.

Query argument is the first positional argument, so you can just provide query string instead:

1
2
3
# Advanced search
advanced_query_results = GlobalSearch("fuji")
print('Returned {} objects.'.format(len(advanced_query_results)))
1
Returned 1408 objects.

For all of datasets that have the global beacon enabled, we should be able to perform entity search and see those datasets in the results:

1
2
3
4
5
# Keyword based entity search example
GlobalSearch(entities=[["gene", "BRCA2"]])

# Function based entity search example
GlobalSearch().entity(gene="BRCA2")
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|                    Fields | Data                                             |
|---------------------------+--------------------------------------------------|
|                   _errors | {  "samples": "Dataset query failed: dataset ... |
|                       _id | dataset-1658666726768179211                      |
|                created_at | 2021-11-29T11:24:42.093240+00:00                 |
|                 full_path | solvebio:public:/beacon-test-dataset             |
|                        id | 1658666726768179211                              |
|                indexed_at | 2022-01-13T09:59:14.378879+00:00                 |
|                      name | beacon-test-dataset                              |
|                    parent |                                                  |
|                 parent_id |                                                  |
|                      path | solvebio:public:/beacon-test-dataset             |
| postproc_template_version |                                                  |
|                   samples | []                                               |
|             storage_class | Standard-IA                                      |
|                      tags | ['fuji', 'test', 'public data', 'other tag']     |
|                      type | dataset                                          |
|                updated_at | 2022-01-13T09:59:14.268177+00:00                 |
|                      user | Nikola                                           |
|                     vault | public                                           |
|                  vault_id | 7205                                             |

... 2 more results.

If there are no datasets containing provided entity, the empty list will be returned:

1
2
# Entity search example
GlobalSearch(entities=[["variant", "GRCH38-7-140753336-140753336-T"]])
1
Query returned 0 results.

You may combine multiple parameters to narrow down the search results. For example you can set the entities and query advanced search parameter together:

1
2
# Multiple search parameters
GlobalSearch(entities=[["gene","BRCA2"]], query="test")
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|                    Fields | Data                                             |
|---------------------------+--------------------------------------------------|
|                   _errors | {  "samples": "Dataset query failed: dataset ... |
|                       _id | dataset-1658666726768179211                      |
|                created_at | 2021-11-29T11:24:42.093240+00:00                 |
|                 full_path | solvebio:public:/beacon-test-dataset             |
|                        id | 1658666726768179211                              |
|                indexed_at | 2022-01-13T09:59:14.378879+00:00                 |
|                      name | beacon-test-dataset                              |
|                    parent |                                                  |
|                 parent_id |                                                  |
|                      path | solvebio:public:/beacon-test-dataset             |
| postproc_template_version |                                                  |
|                   samples | []                                               |
|             storage_class | Standard-IA                                      |
|                      tags | ['fuji', 'test', 'public data', 'other tag']     |
|                      type | dataset                                          |
|                updated_at | 2022-01-13T09:59:14.268177+00:00                 |
|                      user | Nikola                                           |
|                     vault | public                                           |
|                  vault_id | 7205                                             |

... 0 more results.

Getting the Global Search subjects

We can also retrieve the list of subjects:

1
2
3
4
5
6
# Get list of subjects for the entity search
search = GlobalSearch(entities=[["gene","BRCA2"]])
search.subjects()

# Subjects count
search.subjects_count()
1
2
3
4
5
6
[{'access': True,
    'dataset_id': '1589830521744205858',
    'dataset_path': 'solvebio:public:/HGNC/3.3.0-2019-07-22/HGNC-1',
    'subject': 'U43746'}]

1

Similar as filtering fileds in the dataset (please see the table and examples how to use "filter actions"), you may apply the same filtering mechanism to apply filters to Global Search:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Global Search object
search = GlobalSearch()

# Equals match (in list)
vaults = search.filter(type__in=["vault"])
print('Found {} vaults.'.format(len(vaults)))

# Equals match (in list)
folders = search.filter(type__in=["folder"])
print('Found {} folders.'.format(len(folders)))

# Date range
objects = search.filter(created_at__range=["2021-11-28","2021-12-28"])
print('Found {} objects.'.format(len(objects)))
1
2
3
Found 4 vaults.
Found 90 folders.
Found 5 objects.

You may also combine filters to create more complex searches. Please look at the docs for combining filters for dataset querying, similar logic applies here as well:

1
2
3
4
# Search for all datasets that are creted by the user Nikola
f = Filter(type="dataset") & Filter(user="Nikola")
results = GlobalSearch(filters=f)
print('Found {} objects.'.format(len(results)))
1
Found 4 objects.

Chaining search requests

Here you may find the examples on how to chain multiple method calls to perform the successive search requests:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
s = GlobalSearch()

# Entity search
print("Results:")
for result in s.entity(gene="BRCA2"):
    print("\t" + result.id)

# Get subjects with BRCA2 in public vault
print("Subjects:")
for subject in s.filter(vault="public").entity(gene="BRCA2").subjects():
    print("\t" + subject["subject"])

# Get subjects count with BRCA2 in public vault
subjects_count = s.filter(vault="public").entity(gene="BRCA2").subjects_count()
print("{} subjects found.".format(subjects_count))

# Get all vaults with BRCA2 datasets
print("Facets:")
facets = s.entity(gene="BRCA2").facets("vault")
print(facets)
1
2
3
4
5
6
7
8
9
Results:
    1658666726768179211
    1453602241738607801
    1589830521744205858
Subjects:
    U43746
1 subjects found.
Facets:
{'vault': [['public', 3]]}