R: Global Beacon and Global Search¶
GitHub examples
Please find the full example in the solvebio-r
repository on GitHub:
global search and global beacon example notebook.
Global Beacon¶
Global Beacon lets anyone in your organization find datasets based on the entities it contains (i.e. variants, genets, targets). Only datasets that contain entities can be indexed.
For more information about Global Beacons please look at Global Beacons Overview section.
1 2 3 | # Login to SolveBio via API key library("solvebio") login(api_key="YOUR_API_KEY") |
First let’s start with enabling Global Beacon on the dataset:
1 2 3 4 5 | # Dataset ID dataset_id <- "1658666726768179211" # Turn on Global Beacon on the selected dataset Object.enable_global_beacon(dataset_id) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ## $id ## [1] 110 ## ## $datastore_id ## [1] 7 ## ## $dataset_id ## [1] 1.658667e+18 ## ## $status ## [1] "indexing" ## ## $progress_percent ## [1] 0 ## ## $is_deleted ## [1] FALSE ## ## $`_url` ## [1] "https://solvebio.api-stag.solvebio.com/v2/objects/1658666726768179211/beacon" |
Let’s check now the status of Global Beacon indexing for the datasets:
1 2 3 4 5 | # Waiting a minute until indexing is complete Sys.sleep(60) # Getting the status of global beacon on the dataset Object.get_global_beacon_status(dataset_id) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ## $id ## [1] 110 ## ## $datastore_id ## [1] 7 ## ## $dataset_id ## [1] 1.658667e+18 ## ## $status ## [1] "completed" ## ## $progress_percent ## [1] 100 ## ## $is_deleted ## [1] FALSE ## ## $`_url` ## [1] "https://solvebio.api-stag.solvebio.com/v2/objects/1658666726768179211/beacon" |
As we can see from the response, the indexing in complete. It means that now we can perform search on dataset entities.
Similarly to enabling Global Beacon for the datasets, you may also disable it using disable_global_beacon function:
1 | Object.disable_global_beacon("1676139881237207342") |
Global Search¶
Global Search allows you to search for vaults, files, folders, and datasets by name, tags, user, date, and other metadata which can be customized.
For more information about Global Search please look at Global Search Overview section.
Similarly to Global Search on the web application, the search functionality is available through SolveBio R client as well.
Global Search functions¶
GlobalSearch
module provides three functions and all of them use the same set of parameters (filters, entities, query, limit, offset, etc):
1. GlobalSearch.search¶
Performs a global search based on provided filters, entities, queries (advanced query) and returns an R data frame containing results from API response. For full list of results set paginate attribute to TRUE:
1 2 3 | GlobalSearch.search(filters = '[{"and":[["type__in",["dataset"]]]}]') GlobalSearch.search(query = "fuji") GlobalSearch.search(query = "fuji", paginate=TRUE) |
2. GlobalSearch.subjects¶
Similar to the search function, subjects function returns the results in form of R data frame. Returned data frame contains subjects:
1 | GlobalSearch.subjects(entities = '[["gene","BRCA2"]]') |
3. GlobalSearch.request¶
Performs low-level global search based on the provided filters, similar as you would do from web application. In the response there will be the following attributes:
- results - list of vault objects (datasets, files, folders and vaults). Those are the same - - objects from the “Results” tab in GlobalSearch page on Mesh.
- total - number of objects in the search results
- vaults - list of vaults
- subjects - list of subjects
- subjects_count - number of subjects in the subjects list
- took - amount of time it took to perform search
- offset - offset for pagination, offset from the first result you want to fetch
You may call the request function by providing some of the following arguments: filters, entities, query (advanced search query), limit, offset:
1 2 3 | GlobalSearch.request(query = "fuji", limit=200) GlobalSearch.request(entities = '[["gene","BRCA2"]]') GlobalSearch.request(entities = '[["gene","BRCA2"]]', filters = '[{"and":[{"and":[["created_at__range",["2021-11-28","2021-12-28"]]]},["type__in",["dataset"]]]}]') |
Recomended functions to use:¶
GlobalSearch.search
- for getting the search results.GlobalSearch.subjects
- for getting the subjects.
Search Examples¶
1. Global Beacon Search¶
As we previously indexed the dataset, we should be able to perform entity search and see that dataset in the results.
1 2 | results <- GlobalSearch.search(entities = '[["gene","BRCA2"]]') results |
1 2 3 4 | _id name parent tags created_at 1 dataset-1453602241738607801 Genes-94-human-mouse-wOXjFS-new NA <chr [2]> 2021-02-19T12:58:10.501840+00:00 2 dataset-1589830521744205858 HGNC-1 3.3.0-2019-07-22 <chr [2]> 2021-08-26T11:59:26.985030+00:00 2 rows | 1-5 of 20 columns |
Each result object has the following attributes:
1 | names(results) |
1 2 3 4 5 6 7 8 9 10 | ## [1] "_id" "name" ## [3] "parent" "tags" ## [5] "created_at" "indexed_at" ## [7] "updated_at" "id" ## [9] "parent_id" "postproc_template_version" ## [11] "vault_id" "user" ## [13] "samples" "storage_class" ## [15] "path" "vault" ## [17] "type" "full_path" ## [19] "_errors.samples" |
2. Applying filters for Global Search¶
Search only for vaults:
1 2 | response <- GlobalSearch.search(filters = '[{"and":[["type__in",["vault"]]]}]') response |
1 2 3 4 5 6 | _id name parent tags created_at 1 vault-8167 user-8677 NA <chr [0]> 2021-11-22T11:53:33.237948+00:00 2 vault-7205 public NA <chr [2]> 2017-07-11T00:02:46.269448+00:00 3 vault-8029 public-noncommercial NA <chr [1]> 2018-12-10T23:22:57.959306+00:00 4 vault-8144 s3_select_test NA <chr [0]> 2020-09-24T15:11:19.258504+00:00 4 rows | 1-6 of 19 columns |
Search based on date created:
1 2 | response <- GlobalSearch.search(filters = '[{"and":[{"and":[["created_at__range",["2021-11-21","2021-12-28"]]]}]}]') response |
1 2 3 4 5 6 7 8 9 | _id <chr> 1 vault-8167 2 file-1658666341029984987 3 dataset-1658666726768179211 4 dataset-1660154687706419046 5 dataset-1658744573117757007 6 folder-1661598621921253160 6 rows | 1-2 of 19 columns |
3. Advanced search query¶
The search
function has embedded pagination in itself, so by setting that attribute to true, it will fetch all results (warning: that operation may be costly and time consuming).
By performing advanced search using query
argument, only 100 objects are returned and you may see the reason for that in the following output message:
1 2 3 4 5 | # Advanced search response <- GlobalSearch.search(query = "fuji") # Number of objects in the response dim(response) |
1 2 3 4 5 | ## Warning in GlobalSearch.search(query = "fuji"): This call returned only ## the first page of records. To retrieve more pages automatically, please set ## paginate=TRUE when calling GlobalSearch.search(). FALSE ## [1] 100 18 |
We can perform a request function call to get the full API response and see how many results we have in total:
1 2 | response <- GlobalSearch.request(query = "fuji") response$total |
1 | ## [1] 1407 |
There are 1407 object in total but we have only 100 of them returned as default. In order to get all the results you may use parameter paginate = TRUE (please note that retrieving all objects may take a while):
1 2 | results <- GlobalSearch.search(query = "fuji", paginate = TRUE) print(dim(results)) |
1 | ## [1] 1407 19 |
Alternatively, instead of using paginate parameter you may use the limit parameter instead. Here we’re setting limit to 500 objects:
1 2 | results <- GlobalSearch.search(query = "fuji", limit=500) print(dim(results)) |
1 2 3 4 5 | ## Warning in GlobalSearch.search(query = "fuji", limit = 500): This call returned ## only the first page of records. To retrieve more pages automatically, please set ## paginate=TRUE when calling GlobalSearch.search(). FALSE ## [1] 500 19 |
4. Getting the Global Search subjects¶
Similar as in the search function to get the result objects in the previous sections, we can use subjects function to get a data frame containing only subjects:
1 | GlobalSearch.subjects(entities = '[["gene","BRCA2"]]') |
1 2 3 | access dataset_id dataset_path subject 1 TRUE 1589830521744205858 solvebio:public:/HGNC/3.3.0-2019-07-22/HGNC-1 U43746 1 row |