Skip to content

Beacons

What are Beacons?

Beacons are specialized search endpoints that enable discovery of datasets that contain information of interest, such as genes, genetic variants, or other entities, without fully exposing sensitive data. Beacons can be created for any SolveBio dataset and shared with other people in your account, even if they don't have direct access to the full dataset. When queried, a beacon will only return the number of matching results.

For example, a user may ask: "do you have any information about variant X?", and get a yes/no answer from a beacon. Beacons are a great way to search across a wide range of public, private, and commercial SolveBio datasets. For large organizations that continuously generate and collect complex molecular datasets, beacons help answer the increasingly common "have we seen this before?" question.

Beacons can be used to enhance SolveBio reports with more relevant information to users. The variant, gene, and literature reports will automatically show all beacon sets visible to the current user.

Beacon Basics

On SolveBio, beacons are organized into "beacon sets". Currently, each beacon can only be part of one beacon set. There is no limit to the number of beacons a "set" can have.

In the following example, we'll create a series of beacons that query different versions of the ClinVar dataset. ClinVar is a public dataset that maintains relationships between genotypes and clinical phenotypes. It is updated monthly. A beacon set like this can show when a variant or gene first appeared in ClinVar, among other things.

First, create a new beacon set:

1
2
3
from solvebio import BeaconSet

beacon_set = BeaconSet.create(title="ClinVar Over Time")
1
2
3
library(solvebio)

beacon_set <- BeaconSet.create(title="ClinVar Over Time")

Your new beacon set will now be visible on any entity report, such as the BRCA2 report.

Now, make your beacon set useful by adding a few beacons for a few versions of the ClinVar dataset:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from solvebio import Dataset

# Create a beacon for three versions of the ClinVar variants dataset

# 3.1.0-2015-01-13
dataset = Dataset.get_by_full_path('solvebio:public:/ClinVar/3.1.0-2015-01-13/Variants-GRCh37')
Beacon.create(beacon_set_id=beacon_set.id, vault_object_id=dataset.id, title='2015-01-13')

# 3.7.3-2016-10-03
dataset = Dataset.get_by_full_path('solvebio:public:/ClinVar/3.7.3-2016-10-03/Variants-GRCh37')
Beacon.create(beacon_set_id=beacon_set.id, vault_object_id=dataset.id, title='2016-10-03')

# 3.7.4-2017-02-28
dataset = Dataset.get_by_full_path('solvebio:public:/ClinVar/3.7.4-2017-02-28/Variants-GRCh37')
Beacon.create(beacon_set_id=beacon_set.id, vault_object_id=dataset.id, title='2017-02-28')
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Create a beacon for three versions of the ClinVar variants dataset

# 3.1.0-2015-01-13
dataset <- Dataset.get_by_full_path('solvebio:public:/ClinVar/3.1.0-2015-01-13/Variants-GRCh37')
Beacon.create(beacon_set_id=beacon_set$id, vault_object_id=dataset$id, title='2015-01-13')

# 3.7.3-2016-10-03
dataset <- Dataset.get_by_full_path('solvebio:public:/ClinVar/3.7.3-2016-10-03/Variants-GRCh37')
Beacon.create(beacon_set_id=beacon_set$id, vault_object_id=dataset$id, title='2016-10-03')

# 3.7.4-2017-02-28
dataset <- Dataset.get_by_full_path('solvebio:public:/ClinVar/3.7.4-2017-02-28/Variants-GRCh37')
Beacon.create(beacon_set_id=beacon_set$id, vault_object_id=dataset$id, title='2017-02-28')

If you look at this example variant report, you'll see that this particular variant was present in previous versions of ClinVar except the version released in February 2017. The variant was incorrectly normalized by the NCBI and released in that version.

When querying a beacon set, all its beacons are queried in parallel. Here's how you query the beacon set from a script:

1
beacon_set.query('GRCH37-7-117199644-117199647-A', entity_type='variant')
1
BeaconSet.query(beacon_set$id, 'GRCH37-7-117199644-117199647-A', entity_type='variant')

Flexible Querying with Beacons

Another feature of beacons is the ability to search across the string/text field contents of datasets. To do this, provide only the query parameter (omit the entity_type parameter):

1
beacon_set.query('*cancer*')
1
BeaconSet.query(beacon_set$id, '*cancer*')

You will see that the query matches all versions of ClinVar. Learn more about the syntax of query strings.