Skip to content

R: Global Beacon and Global Search

GitHub examples

Please find the full example in the solvebio-r repository on GitHub: global search and global beacon example notebook.

Global Beacon

Global Beacon lets anyone in your organization find datasets based on the entities it contains (i.e. variants, genets, targets). Only datasets that contain entities can be indexed.

For more information about Global Beacons please look at Global Beacons Overview section.

1
2
3
# Login to SolveBio via API key
library("solvebio")
login(api_key="YOUR_API_KEY")

First let’s start with enabling Global Beacon on the dataset:

1
2
3
4
5
# Dataset ID
dataset_id <- "1658666726768179211"

# Turn on Global Beacon on the selected dataset
Object.enable_global_beacon(dataset_id)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## $id
## [1] 110
## 
## $datastore_id
## [1] 7
## 
## $dataset_id
## [1] 1.658667e+18
## 
## $status
## [1] "indexing"
## 
## $progress_percent
## [1] 0
## 
## $is_deleted
## [1] FALSE
## 
## $`_url`
## [1] "https://solvebio.api-stag.solvebio.com/v2/objects/1658666726768179211/beacon"

Let’s check now the status of Global Beacon indexing for the datasets:

1
2
3
4
5
# Waiting a minute until indexing is complete
Sys.sleep(60)

# Getting the status of global beacon on the dataset
Object.get_global_beacon_status(dataset_id)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## $id
## [1] 110
## 
## $datastore_id
## [1] 7
## 
## $dataset_id
## [1] 1.658667e+18
## 
## $status
## [1] "completed"
## 
## $progress_percent
## [1] 100
## 
## $is_deleted
## [1] FALSE
## 
## $`_url`
## [1] "https://solvebio.api-stag.solvebio.com/v2/objects/1658666726768179211/beacon"

As we can see from the response, the indexing in complete. It means that now we can perform search on dataset entities.

Similarly to enabling Global Beacon for the datasets, you may also disable it using disable_global_beacon function:

1
Object.disable_global_beacon("1676139881237207342")

Global Search allows you to search for vaults, files, folders, and datasets by name, tags, user, date, and other metadata which can be customized.

For more information about Global Search please look at Global Search Overview section.

Similarly to Global Search on the web application, the search functionality is available through SolveBio R client as well.

Global Search functions

GlobalSearch module provides three functions and all of them use the same set of parameters (filters, entities, query, limit, offset, etc):

1. GlobalSearch.search

Performs a global search based on provided filters, entities, queries (advanced query) and returns an R data frame containing results from API response. For full list of results set paginate attribute to TRUE:

1
2
3
GlobalSearch.search(filters = '[{"and":[["type__in",["dataset"]]]}]')
GlobalSearch.search(query = "fuji")
GlobalSearch.search(query = "fuji", paginate=TRUE)

2. GlobalSearch.subjects

Similar to the search function, subjects function returns the results in form of R data frame. Returned data frame contains subjects:

1
GlobalSearch.subjects(entities = '[["gene","BRCA2"]]')

3. GlobalSearch.request

Performs low-level global search based on the provided filters, similar as you would do from web application. In the response there will be the following attributes:

  • results - list of vault objects (datasets, files, folders and vaults). Those are the same - - objects from the “Results” tab in GlobalSearch page on Mesh.
  • total - number of objects in the search results
  • vaults - list of vaults
  • subjects - list of subjects
  • subjects_count - number of subjects in the subjects list
  • took - amount of time it took to perform search
  • offset - offset for pagination, offset from the first result you want to fetch

You may call the request function by providing some of the following arguments: filters, entities, query (advanced search query), limit, offset:

1
2
3
GlobalSearch.request(query = "fuji", limit=200)
GlobalSearch.request(entities = '[["gene","BRCA2"]]')
GlobalSearch.request(entities = '[["gene","BRCA2"]]', filters = '[{"and":[{"and":[["created_at__range",["2021-11-28","2021-12-28"]]]},["type__in",["dataset"]]]}]')

Recomended functions to use:

  • GlobalSearch.search - for getting the search results.
  • GlobalSearch.subjects - for getting the subjects.

Search Examples

As we previously indexed the dataset, we should be able to perform entity search and see that dataset in the results.

1
2
results <- GlobalSearch.search(entities = '[["gene","BRCA2"]]')
results
1
2
3
4
    _id                         name                                parent              tags            created_at
1   dataset-1453602241738607801 Genes-94-human-mouse-wOXjFS-new     NA                  <chr [2]>       2021-02-19T12:58:10.501840+00:00
2   dataset-1589830521744205858 HGNC-1                              3.3.0-2019-07-22    <chr [2]>       2021-08-26T11:59:26.985030+00:00
2 rows | 1-5 of 20 columns

Each result object has the following attributes:

1
names(results)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
##  [1] "_id"                       "name"                     
##  [3] "parent"                    "tags"                     
##  [5] "created_at"                "indexed_at"               
##  [7] "updated_at"                "id"                       
##  [9] "parent_id"                 "postproc_template_version"
## [11] "vault_id"                  "user"                     
## [13] "samples"                   "storage_class"            
## [15] "path"                      "vault"                    
## [17] "type"                      "full_path"                
## [19] "_errors.samples"

Search only for vaults:

1
2
response <- GlobalSearch.search(filters = '[{"and":[["type__in",["vault"]]]}]')
response
1
2
3
4
5
6
    _id         name                    parent      tags        created_at
1   vault-8167  user-8677               NA          <chr [0]>   2021-11-22T11:53:33.237948+00:00    
2   vault-7205  public                  NA          <chr [2]>   2017-07-11T00:02:46.269448+00:00    
3   vault-8029  public-noncommercial    NA          <chr [1]>   2018-12-10T23:22:57.959306+00:00    
4   vault-8144  s3_select_test          NA          <chr [0]>   2020-09-24T15:11:19.258504+00:00    
4 rows | 1-6 of 19 columns

Search based on date created:

1
2
response <- GlobalSearch.search(filters = '[{"and":[{"and":[["created_at__range",["2021-11-21","2021-12-28"]]]}]}]')
response
1
2
3
4
5
6
7
8
9
_id
<chr>
1   vault-8167  
2   file-1658666341029984987    
3   dataset-1658666726768179211 
4   dataset-1660154687706419046 
5   dataset-1658744573117757007 
6   folder-1661598621921253160  
6 rows | 1-2 of 19 columns

3. Advanced search query

The search function has embedded pagination in itself, so by setting that attribute to true, it will fetch all results (warning: that operation may be costly and time consuming).

By performing advanced search using query argument, only 100 objects are returned and you may see the reason for that in the following output message:

1
2
3
4
5
# Advanced search
response <- GlobalSearch.search(query = "fuji")

# Number of objects in the response
dim(response)
1
2
3
4
5
## Warning in GlobalSearch.search(query = "fuji"): This call returned only
## the first page of records. To retrieve more pages automatically, please set
## paginate=TRUE when calling GlobalSearch.search(). FALSE

## [1] 100  18

We can perform a request function call to get the full API response and see how many results we have in total:

1
2
response <- GlobalSearch.request(query = "fuji")
response$total
1
## [1] 1407

There are 1407 object in total but we have only 100 of them returned as default. In order to get all the results you may use parameter paginate = TRUE (please note that retrieving all objects may take a while):

1
2
results <- GlobalSearch.search(query = "fuji", paginate = TRUE)
print(dim(results))
1
## [1] 1407   19

Alternatively, instead of using paginate parameter you may use the limit parameter instead. Here we’re setting limit to 500 objects:

1
2
results <- GlobalSearch.search(query = "fuji", limit=500)
print(dim(results))
1
2
3
4
5
## Warning in GlobalSearch.search(query = "fuji", limit = 500): This call returned
## only the first page of records. To retrieve more pages automatically, please set
## paginate=TRUE when calling GlobalSearch.search(). FALSE

## [1] 500  19

4. Getting the Global Search subjects

Similar as in the search function to get the result objects in the previous sections, we can use subjects function to get a data frame containing only subjects:

1
GlobalSearch.subjects(entities = '[["gene","BRCA2"]]')
1
2
3
    access  dataset_id          dataset_path                                    subject
1   TRUE    1589830521744205858 solvebio:public:/HGNC/3.3.0-2019-07-22/HGNC-1   U43746
1 row