Skip to content

Expression Recipes

The following recipes assume the following context:

{
    "record": {
        "variant": "GRCH37-5-36241637-36241637-C",
        "gene": "NADK2",
        "genomic_coordinates": {
            "build": "GRCH37",
            "chromosome": "5",
            "start": 36241637,
            "stop": 36241637
        }
    }
}

Retrieve a variant's allele frequency from ExAC

dataset_field_values('solvebio:public:/ExAC/1.3.0-r0.3/ExAC-GRCh37', 'af', entities=[('variant', record.variant)])

Retrieve a variant's clinical significance from ClinVar

dataset_field_values('solvebio:public:/ClinVar/3.7.4-2017-01-30/Combined-GRCh37', 'clinical_significance', entities=[('variant', record.variant)])

Calculate the prevalence of a gene within a multi-sample dataset

prevalence('solvebio:public:/TCGA/1.2.0-2015-02-11/SomaticMutations-GRCh37', entity=('gene', record.gene), sample_field='patient_barcode')

Normalize a variant (trim and left-shuffle the variant)

normalize_variant(record.variant)

Beacon public datasets for a variant

beacon(record.variant, 'variant', visibility='public')

Calculate the top terms for a string field in a dataset

dataset_field_top_terms('solvebio:public:/ClinVar/3.7.4-2017-01-30/Combined-GRCh37', 'clinical_significance')

Calculate statistics about a numeric field in a dataset

dataset_field_stats('solvebio:public:/ClinVar/3.7.4-2017-01-30/Combined-GRCh37', 'clinical_significance')

Predict the effects of a variant on genes, transcripts, and proteins

predict_variant_effects(record.variant)

Retrieve the sequence of a particular genomic region

genomic_sequence('GRCH37-5-36241600-36241660')

# output
# CCAGCTGCTTCAGGTCCTCCTCCGAGAGCTCCGCGTAACGGTACCGCTGCTGCTCGAACTC

Get the reverse complemented sequence of a particular genomic region

''.join(reversed([{
    'A': 'T',
    'T': 'A',
    'C': 'G',
    'G': 'C'
}.get(nuc) for nuc in genomic_sequence('GRCH37-5-36241600-36241660')]))

# output
# GAGTTCGAGCAGCAGCGGTACCGTTACGCGGAGCTCTCGGAGGAGGACCTGAAGCAGCTGG

Find the GENCODE genes that overlap a genomic region

dataset_field_values('solvebio:public:/GENCODE/2.2.0-24/GENCODE-GRCh37', 'gene_symbol', entities=[('genomic_region', record.genomic_coordinates)], filters=[('feature', 'gene'), ('gene_type', 'protein_coding'), ('gene_status', 'KNOWN')])

Split a string by whitespace or specific delimiter

split(record.variant, '-')