Python: Global Search¶
Global Search allows you to search for vaults, files, folders, and datasets by name, tags, user, date, and other metadata which can be customized. For more information about Global Search please look at Global Search Overview section.
Similarly to Global Search on the web application, the search functionality is available through solvebio Python and R clients as well.
GitHub examples
Please find the full example in the solvebio-python
repository on GitHub:
global search example notebook.
Importing SolveBio library and logging in¶
1 2 3 4 5 6 7 | # Importing SolveBio library from solvebio import login from solvebio import Filter from solvebio import GlobalSearch # Logging to SolveBio login() |
Performing Global Search¶
GlobalSearch performs search based on the provided set of parameters (filters, entities, query, limit, ordering, etc.):
query
(optional): An optional query string (advanced search).filters
(optional): Filter or List of filter objects.entities
(optional): List of entity tuples to filter on (entity type, entity).ordering
(optional): List of fields to order the results by.limit
(optional): Maximum number of query results to return.page_size
(optional): Number of results to fetch per query page.result_class
(optional): Class of object returned by query.debug
(optional): Sends debug information to the API.raw_results
(optional): Whether to use raw API response or to cast logical objects to Vault and Object instances.
As previously seen, all parameters are optional. Performing a search without any parameters is equivalent to global search on SolveBio without any filters - it will return all objects:
1 2 3 | # No filters applied search_results = GlobalSearch() print('Returned {} objects.'.format(len(search_results))) |
1 | Returned 1449 objects. |
Each result object has the following attributes:
1 | print(search_results) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | | Fields | Data | |---------------------------+--------------------------------------------------| | _errors | { "samples": "'Dataset query failed: datase ... | | _id | dataset-1426183806524170528 | | created_at | 2021-01-12T17:02:38.336007+00:00 | | full_path | solvebio:public:/ClinVar/5.2.0-20210110/Variants-| | id | 1426183806524170528 | | indexed_at | 2021-11-25T17:22:55.233690+00:00 | | name | Variants-GRCH37-1 | | parent | 5.2.0-20210110 | | parent_id | 1426114255474932968 | | path | solvebio:public:/ClinVar/5.2.0-20210110/Variants-| | postproc_template_version | | | samples | [] | | storage_class | Archive | | tags | ['fuji', 'clinvar', 'public data'] | | type | dataset | | updated_at | 2021-02-05T20:18:12.301835+00:00 | | user | Jeff | | vault | public | | vault_id | 7205 | ... 1448 more results. |
You may use the limit
parameter to limit the number of returned objects:
1 2 3 | # No filters applied with limit parameter search_results = GlobalSearch(limit=200) print('Returned {} objects.'.format(len(search_results))) |
1 | Returned 200 objects. |
Type of results by default it either Vault instance or Object instance:
1 2 | # Type of results type(search_results[0]) |
1 | solvebio.resource.object.Object |
Advanced search query¶
You may perform the advanced search, similar as you would do on SolveBio, by using query
argument:
1 2 3 | # Advanced search advanced_query_results = GlobalSearch(query="test") print('Returned {} objects.'.format(len(advanced_query_results))) |
1 | Returned 16 objects. |
Query argument is the first positional argument, so you can just provide query string instead:
1 2 3 | # Advanced search advanced_query_results = GlobalSearch("fuji") print('Returned {} objects.'.format(len(advanced_query_results))) |
1 | Returned 1408 objects. |
Global Beacon Search¶
For all of datasets that have the global beacon enabled, we should be able to perform entity search and see those datasets in the results:
1 2 3 4 5 | # Keyword based entity search example GlobalSearch(entities=[["gene", "BRCA2"]]) # Function based entity search example GlobalSearch().entity(gene="BRCA2") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | | Fields | Data | |---------------------------+--------------------------------------------------| | _errors | { "samples": "Dataset query failed: dataset ... | | _id | dataset-1658666726768179211 | | created_at | 2021-11-29T11:24:42.093240+00:00 | | full_path | solvebio:public:/beacon-test-dataset | | id | 1658666726768179211 | | indexed_at | 2022-01-13T09:59:14.378879+00:00 | | name | beacon-test-dataset | | parent | | | parent_id | | | path | solvebio:public:/beacon-test-dataset | | postproc_template_version | | | samples | [] | | storage_class | Standard-IA | | tags | ['fuji', 'test', 'public data', 'other tag'] | | type | dataset | | updated_at | 2022-01-13T09:59:14.268177+00:00 | | user | Nikola | | vault | public | | vault_id | 7205 | ... 2 more results. |
If there are no datasets containing provided entity, the empty list will be returned:
1 2 | # Entity search example GlobalSearch(entities=[["variant", "GRCH38-7-140753336-140753336-T"]]) |
1 | Query returned 0 results. |
You may combine multiple parameters to narrow down the search results.
For example you can set the entities
and query
advanced search parameter together:
1 2 | # Multiple search parameters GlobalSearch(entities=[["gene","BRCA2"]], query="test") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | | Fields | Data | |---------------------------+--------------------------------------------------| | _errors | { "samples": "Dataset query failed: dataset ... | | _id | dataset-1658666726768179211 | | created_at | 2021-11-29T11:24:42.093240+00:00 | | full_path | solvebio:public:/beacon-test-dataset | | id | 1658666726768179211 | | indexed_at | 2022-01-13T09:59:14.378879+00:00 | | name | beacon-test-dataset | | parent | | | parent_id | | | path | solvebio:public:/beacon-test-dataset | | postproc_template_version | | | samples | [] | | storage_class | Standard-IA | | tags | ['fuji', 'test', 'public data', 'other tag'] | | type | dataset | | updated_at | 2022-01-13T09:59:14.268177+00:00 | | user | Nikola | | vault | public | | vault_id | 7205 | ... 0 more results. |
Getting the Global Search subjects¶
We can also retrieve the list of subjects
:
1 2 3 4 5 6 | # Get list of subjects for the entity search search = GlobalSearch(entities=[["gene","BRCA2"]]) search.subjects() # Subjects count search.subjects_count() |
1 2 3 4 5 6 | [{'access': True, 'dataset_id': '1589830521744205858', 'dataset_path': 'solvebio:public:/HGNC/3.3.0-2019-07-22/HGNC-1', 'subject': 'U43746'}] 1 |
Applying filters for Global Search¶
Similar as filtering fileds in the dataset (please see the table and examples how to use "filter actions"), you may apply the same filtering mechanism to apply filters to Global Search
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Global Search object search = GlobalSearch() # Equals match (in list) vaults = search.filter(type__in=["vault"]) print('Found {} vaults.'.format(len(vaults))) # Equals match (in list) folders = search.filter(type__in=["folder"]) print('Found {} folders.'.format(len(folders))) # Date range objects = search.filter(created_at__range=["2021-11-28","2021-12-28"]) print('Found {} objects.'.format(len(objects))) |
1 2 3 | Found 4 vaults. Found 90 folders. Found 5 objects. |
You may also combine filters to create more complex searches. Please look at the docs for combining filters for dataset querying, similar logic applies here as well:
1 2 3 4 | # Search for all datasets that are creted by the user Nikola f = Filter(type="dataset") & Filter(user="Nikola") results = GlobalSearch(filters=f) print('Found {} objects.'.format(len(results))) |
1 | Found 4 objects. |
Chaining search requests¶
Here you may find the examples on how to chain multiple method calls to perform the successive search requests:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | s = GlobalSearch() # Entity search print("Results:") for result in s.entity(gene="BRCA2"): print("\t" + result.id) # Get subjects with BRCA2 in public vault print("Subjects:") for subject in s.filter(vault="public").entity(gene="BRCA2").subjects(): print("\t" + subject["subject"]) # Get subjects count with BRCA2 in public vault subjects_count = s.filter(vault="public").entity(gene="BRCA2").subjects_count() print("{} subjects found.".format(subjects_count)) # Get all vaults with BRCA2 datasets print("Facets:") facets = s.entity(gene="BRCA2").facets("vault") print(facets) |
1 2 3 4 5 6 7 8 9 | Results: 1658666726768179211 1453602241738607801 1589830521744205858 Subjects: U43746 1 subjects found. Facets: {'vault': [['public', 3]]} |