Skip to content

Expression Functions

SolveBio expressions can use Python-like functions to pull data from any dataset, calculate statistics, or run advanced algorithms.

Function List

All available functions are listed below:

Function Data Type Description
annotate object Annotate a record with a template.
annotate(record, template, debug, include_errors)
Learn more →
beacon object Retrieves the beacon results for any entity.
beacon(entity, entity_type, beacon_set, datasets, visibility)
Learn more →
classify_variant object Classify a variant using one of multiple classifiers.
classify_variant(variant, classifier)
Learn more →
coerce_list auto (list) Coerce a value to a list. Single items will become a single value list. Lists will remain lists. None will return an empty list.
coerce_list(value)
Learn more →
concat string Combine text from multiple lists or strings.
concat(values, delimiter)
Learn more →
crossmap string Convert a variant or genomic region entity between different genome builds using the Ensembl CrossMap tool. The functionality of this expression is the same as UCSC's liftOver tool.
crossmap(entity, target_build)
Learn more →
dataset_count integer Calculate the total number of results (or "hits") for a given query. Returns the number of results.
dataset_count(dataset, entities, filters, query)
Learn more →
dataset_entity_top_terms string (list) Retrieve the top entities for any entity field in a dataset. Returns a list of strings, in order of occurrence or None if the dataset can not be queried by this entity.
dataset_entity_top_terms(dataset, entity, limit, filters, query)
Learn more →
dataset_field_percentiles object Calculates the percentiles for any integer field. Returns an object containing the desired percentiles.
dataset_field_percentiles(dataset, field, percents, entities, filters, query)
Learn more →
dataset_field_stats object Calculates statistics for any numeric field. Returns an object containing field statistics.
dataset_field_stats(dataset, field, entities, filters, query)
Learn more →
dataset_field_terms_count integer Retrieve the number of unique terms for any string field in a dataset. Returns the number of unique terms.
dataset_field_terms_count(dataset, field, entities, filters, query)
Learn more →
dataset_field_top_terms object (list) Retrieve the top terms for any string field in a dataset. Returns a list of objects containing the term and number of times it occurs, in order of occurrence.
dataset_field_top_terms(dataset, field, limit, entities, filters, query)
Learn more →
dataset_field_values auto (list) Retrieves a list of non-empty values for a dataset field. Returns a list of values from the specified field.
dataset_field_values(dataset, field, limit, entities, filters, query)
Learn more →
dataset_query object (list) Query any dataset with optional filters and/or entities. Returns a list of results.
dataset_query(dataset, fields, limit, entities, filters, query)
Learn more →
datetime_format string Format datetime strings. By default, it returns an ISO 8601 format date time string. To override, provide an optional input_format or output_format to be used.
datetime_format(value, input_format, output_format)
Learn more →
entity_ids string Retrieve one or more normalized entity IDs for a query.
entity_ids(entity_type, entity)
Learn more →
error error Raise a FunctionError
error(message)
Learn more →
explode object (list) Split N values from M list fields into N records. If _id is in the original record, each new record will have an integer appended to the _id with the index of each exploded record.
explode(record, fields)
Learn more →
findall string (list) Returns all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, returns a list of groups.
findall(pattern, string, regex_ignorecase, regex_dotall, regex_multiline)
Learn more →
genomic_sequence string Retrieves a specific sequence from the genome.
genomic_sequence(genomic_region)
Learn more →
get auto Get the value at any depth of a nested object based on the path described by path. If path doesn't exist, default is returned.
get(obj, path, default)
Learn more →
melt object (list) Convert a wide dataset to a long dataset by "melting" one or more fields into "key" and "value" fields. All fields must have the same data type.
melt(record, fields, key_field, value_field, melt_list_values)
Learn more →
normalize_aa_change string Normalize an amino acid change (beta)
normalize_aa_change(aa_change, ref, alt)
Learn more →
normalize_variant string Normalize a variant ID (minimal representation and left shifting).
normalize_variant(variant)
Learn more →
now string Retrieves the current date and time.
now(timezone, template)
Learn more →
predict_variant_effects object (list) Predict the effects of a variant using Veppy.
predict_variant_effects(variant, default_transcript, gene_model)
Learn more →
prevalence double Calculates the frequency that a value occurs within a population. Typically used to calculate the prevalence of variants or genes across samples in a dataset. Returns the frequency of occurrence. Please note: in large datasets the result is approximate and can have an error of up to 5%.
prevalence(dataset, entity, sample_field, filters)
Learn more →
search boolean Scan through string looking for the first location where the regular expression pattern produces a match. Returns True on a match and False if no position in the string matches the pattern.
search(pattern, string, regex_ignorecase, regex_dotall, regex_multiline)
Learn more →
search_groups string (list) Scan through string looking for the first location where the regular expression pattern produces a match. Returns a list of strings corresponding to the groups in the pattern.
search_groups(pattern, string, regex_ignorecase, regex_dotall, regex_multiline)
Learn more →
split string (list) Split text based on a delimiter and optionally strip whitespace.
split(value, delimiter, regex, strip, regex_ignorecase, regex_dotall, regex_multiline)
Learn more →
sub string Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn't found, string is returned unchanged.
sub(pattern, repl, string, count, regex_ignorecase, regex_dotall, regex_multiline)
Learn more →
tabulate object (list) Converts a list of objects into a table (i.e. a two-dimensional array).
tabulate(objects, fields, header)
Learn more →
today string Returns the current date.
today(timezone, template)
Learn more →
translate_variant object Translate variant into a protein change.
translate_variant(variant, gene_model, transcript, include_effects)
Learn more →
user object Returns the currently authenticated user.
user()
Learn more →

annotate

Annotate a record with a template.

Output data type: object

Syntax

annotate(record, template, debug, include_errors)
  • record: (object) The record to be annotated
  • template: (str) The ID of the template
  • debug: (bool) Enable debug mode (default: False)
  • include_errors: (bool) Include errors in output (default: True)

beacon

Retrieves the beacon results for any entity.

Output data type: object

Output object properties:

  • failed_count: The number of datasets that failed (timed-out)
  • failed: List of datasets that failed (timed-out)
  • not_found_count: The number of datasets without results
  • found_count: The number of datasets with results
  • found: List of datasets with results
  • not_found: List of datasets without results

Syntax

beacon(entity, entity_type, beacon_set, datasets, visibility)
  • entity: The entity value
  • entity_type: A valid entity type
  • beacon_set (optional): A valid beacon set ID
  • datasets (optional): A list of datasets to beacon
  • visibility (optional): Which datasets to beacon (default: vault)

classify_variant

Classify a variant using one of multiple classifiers.

Output data type: object

Syntax

classify_variant(variant, classifier)
  • variant: The variant
  • classifier: The desired classifier (default: "germline")

coerce_list

Coerce a value to a list. Single items will become a single value list.

Lists will remain lists. None will return an empty list.

Output data type: auto (list)

Syntax

coerce_list(value)
  • value: The value to coerce to a list

concat

Combine text from multiple lists or strings.

Output data type: string

Syntax

concat(values, delimiter)
  • values: The list of values to concatenate
  • delimiter (default: ""): The character to use in between values

crossmap

Convert a variant or genomic region entity between different genome builds

using the Ensembl CrossMap tool. The functionality of this expression is the same as UCSC's liftOver tool.

Output data type: string

Syntax

crossmap(entity, target_build)
  • entity: The entity (either a valid SolveBio variant BUILD-CHROMOSOME-START-STOP-ALT or genomic region BUILD-CHROMOSOME-START-STOP)
  • target_build: The target genome build (GRCH37 or GRCH38)

Examples

crossmap("GRCH38-13-32338647-32338647-T", "GRCH37")

dataset_count

Calculate the total number of results (or "hits") for a given query.

Returns the number of results.

Output data type: integer

Syntax

dataset_count(dataset, entities, filters, query)
  • dataset: Any dataset with query permissions
  • entities (optional): A list of entity tuples: [(entity_type, entity)]
  • filters (optional): A valid filter block
  • query (optional): A query string

dataset_entity_top_terms

Retrieve the top entities for any entity field in a dataset.

Returns a list of strings, in order of occurrence or None if the dataset can not be queried by this entity.

Output data type: string (list)

Syntax

dataset_entity_top_terms(dataset, entity, limit, filters, query)
  • dataset: Any dataset with query permissions
  • entity: The entity_type to return within the dataset
  • limit (optional): The number of terms to retrieve (default: 1000)
  • filters (optional): Dataset filters
  • query (optional): A query string

Examples

dataset_entity_top_terms("solvebio:public:/ClinVar/5.1.0-20200720/Variants-GRCH38", "gene")

dataset_field_percentiles

Calculates the percentiles for any integer field.

Returns an object containing the desired percentiles.

Output data type: object

Syntax

dataset_field_percentiles(dataset, field, percents, entities, filters, query)
  • dataset: Any dataset with query permissions
  • field: The field within the dataset
  • percents: The percentiles to calculate (default: 1, 5, 25, 50, 75, 95, 99)
  • entities (optional): A list of entity tuples: [(entity_type, entity)]
  • filters (optional): Dataset filters
  • query (optional): A query string

dataset_field_stats

Calculates statistics for any numeric field.

Returns an object containing field statistics.

Output data type: object

Output object properties:

  • count: The total number of values
  • max: The maximum value observed
  • sum: The sum of all values
  • avg: The average value
  • min: The minimum value observed

Syntax

dataset_field_stats(dataset, field, entities, filters, query)
  • dataset: Any dataset with query permissions
  • field: The field within the dataset
  • entities (optional): A list of entity tuples: [(entity_type, entity)]
  • filters (optional): Dataset filters
  • query (optional): A query string

dataset_field_terms_count

Retrieve the number of unique terms for any string field in a dataset.

Returns the number of unique terms.

Output data type: integer

Syntax

dataset_field_terms_count(dataset, field, entities, filters, query)
  • dataset: Any dataset with query permissions
  • field: The field within the dataset
  • entities (optional): A list of entity tuples: [(entity_type, entity)]
  • filters (optional): Dataset filters
  • query (optional): A query string

Examples

dataset_field_terms_count("solvebio:public:/ClinVar/5.1.0-20200720/Variants-GRCh38", "clinical_significance")

dataset_field_top_terms

Retrieve the top terms for any string field in a dataset.

Returns a list of objects containing the term and number of times it occurs, in order of occurrence.

Output data type: object (list)

Output object properties:

  • count: Number of times it occurs
  • term: Term value

Syntax

dataset_field_top_terms(dataset, field, limit, entities, filters, query)
  • dataset: Any dataset with query permissions
  • field: The field within the dataset
  • limit (optional): The number of terms to retrieve (default: 10)
  • entities (optional): A list of entity tuples: [(entity_type, entity)]
  • filters (optional): Dataset filters
  • query (optional): A query string

Examples

dataset_field_top_terms("solvebio:public:/ClinVar/5.1.0-20200720/Variants-GRCh38", "clinical_significance")

dataset_field_values

Retrieves a list of non-empty values for a dataset field.

Returns a list of values from the specified field.

Output data type: auto (list)

Syntax

dataset_field_values(dataset, field, limit, entities, filters, query)
  • dataset: Any dataset with query permissions
  • field: The field within the dataset
  • limit (optional): The number of values to return (default: 10)
  • entities (optional): A list of entity tuples: [(entity_type, entity)]
  • filters (optional): Dataset filters
  • query (optional): A query string

dataset_query

Query any dataset with optional filters and/or entities.

Returns a list of results.

Output data type: object (list)

Syntax

dataset_query(dataset, fields, limit, entities, filters, query)
  • dataset: Any dataset with query permissions
  • fields (optional): Fields to retrieve (default: all)
  • limit (optional): The number of values to return (default: 1)
  • entities (optional): A list of entity tuples: [(entity_type, entity)]
  • filters (optional): Dataset filters
  • query (optional): A query string

Examples

dataset_query("solvebio:public:/ClinVar/5.1.0-20200720/Variants-GRCh38", fields=["clinical_significance"], query="*cancer*")

dataset_query("solvebio:public:/ClinVar/5.1.0-20200720/Variants-GRCh38", entities=[["variant", "GRCH38-13-32357842-32357842-TA"]])

datetime_format

Format datetime strings. By default, it returns an ISO 8601 format date time string.

To override, provide an optional input_format or output_format to be used.

Output data type: string

Syntax

datetime_format(value, input_format, output_format)
  • value: (str) A string containing a date/time stamp
  • input_format: (str) The input format of the date (e.g. "%d/%m/%y %H:%M")
  • output_format: (str) The output format of the date (ISO 8601 format is the default: "%Y-%m-%dT%H:%M:%S")

entity_ids

Retrieve one or more normalized entity IDs for a query.

Output data type: string

Syntax

entity_ids(entity_type, entity)
  • entity_type: The entity type to retrieve
  • entity: The entity or query string

error

Raise a FunctionError

Output data type: error

Syntax

error(message)
  • message: An error message to raise

explode

Split N values from M list fields into N records.

If _id is in the original record, each new record will have an integer appended to the _id with the index of each exploded record.

Output data type: object (list)

Syntax

explode(record, fields)
  • record: (object) The record to be splitted
  • fields: (list or tuple) the fields IDs

findall

Returns all non-overlapping matches of pattern in string,

as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, returns a list of groups.

Output data type: string (list)

Syntax

findall(pattern, string, regex_ignorecase, regex_dotall, regex_multiline)
  • pattern: The regular expression pattern
  • string: The string to search
  • regex_ignorecase (default: None): With a "regex" pattern, will perform a case insensitive matching.
  • regex_dotall (default: None): With a "regex" pattern, will make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline.
  • regex_multiline (default: None): With a "regex" pattern, when specified, the pattern character "^" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "^" matches only at the beginning of the string, and "" only at the end of the string and immediately before the newline (if any) at the end of the string.

genomic_sequence

Retrieves a specific sequence from the genome.

Output data type: string

Syntax

genomic_sequence(genomic_region)
  • genomic_region: A valid genomic region in the form: BUILD-CHROMOSOME-START-STOP

Examples

genomic_sequence("GRCh37-5-36241400-36241700")

get

Get the value at any depth of a nested object based on the path

described by path. If path doesn't exist, default is returned.

Output data type: auto

Syntax

get(obj, path, default)
  • obj: (list|dict) The object to process
  • path: (str|list) List or . delimited string of path describing path.
  • default (keyword): Default value to return if path doesn't exist. Defaults to None.

melt

Convert a wide dataset to a long dataset by "melting" one or more fields

into "key" and "value" fields. All fields must have the same data type.

Output data type: object (list)

Syntax

melt(record, fields, key_field, value_field, melt_list_values)
  • record: (object) The record to be melted
  • fields: (list or tuple) the fields IDs
  • key_field: (str) key field (default: "key")
  • value_field: (str) value field (default: "value")
  • melt_list_values: (bool) (default: False)

normalize_aa_change

Normalize an amino acid change (beta)

Output data type: string

Syntax

normalize_aa_change(aa_change, ref, alt)
  • aa_change: The aa_change
  • ref: (optional) Reference allele
  • alt: (optional) Alternate allele

normalize_variant

Normalize a variant ID (minimal representation and left shifting).

Output data type: string

Syntax

normalize_variant(variant)
  • variant: The variant

now

Retrieves the current date and time.

Output data type: string

Syntax

now(timezone, template)
  • timezone (default: EST): The timezone to use for the date
  • template (default: ISO 8601): The format in which to represent the date/time, defaults to ISO 8601 format (%Y-%m-%dT%H:%M:%S)

predict_variant_effects

Predict the effects of a variant using Veppy.

Output data type: object (list)

Output object properties:

  • so_term: The Sequence Ontology term
  • impact: The effect impact
  • so_accession: The Sequence Ontology accession number
  • transcript: The affected transcript ID
  • lof: True if the mutation is predicted to cause the protein to lose its function

Syntax

predict_variant_effects(variant, default_transcript, gene_model)
  • variant: The variant
  • default_transcript (optional): If True, return effects for just the default transcript. If a specific transcript, then limits results to this transcript only. Otherwise returns effects for all transcripts.
  • gene_model (optional): The desired gene model: refseq (default) or ensembl

Examples

predict_variant_effects("GRCH38-7-117559590-117559593-A")

prevalence

Calculates the frequency that a value occurs within a population.

Typically used to calculate the prevalence of variants or genes across samples in a dataset. Returns the frequency of occurrence.

Please note: in large datasets the result is approximate and can have an error of up to 5%.

Output data type: double

Syntax

prevalence(dataset, entity, sample_field, filters)
  • dataset: Any dataset with discover permissions
  • entity: A single entity tuple: (entity_type, entity)
  • sample_field: The field containing the sample IDs
  • filters (optional): Filters to apply on the dataset

Scan through string looking for the first location where

the regular expression pattern produces a match. Returns True on a match and False if no position in the string matches the pattern.

Output data type: boolean

Syntax

search(pattern, string, regex_ignorecase, regex_dotall, regex_multiline)
  • pattern: The regular expression pattern
  • string: The string to search
  • regex_ignorecase (default: None): With a "regex" pattern, will perform a case insensitive matching.
  • regex_dotall (default: None): With a "regex" pattern, will make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline.
  • regex_multiline (default: None): With a "regex" pattern, when specified, the pattern character "^" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "^" matches only at the beginning of the string, and "" only at the end of the string and immediately before the newline (if any) at the end of the string.

search_groups

Scan through string looking for the first location where

the regular expression pattern produces a match. Returns a list of strings corresponding to the groups in the pattern.

Output data type: string (list)

Syntax

search_groups(pattern, string, regex_ignorecase, regex_dotall, regex_multiline)
  • pattern: The regular expression pattern
  • string: The string to search
  • regex_ignorecase (default: None): With a "regex" pattern, will perform a case insensitive matching.
  • regex_dotall (default: None): With a "regex" pattern, will make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline.
  • regex_multiline (default: None): With a "regex" pattern, when specified, the pattern character "^" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "^" matches only at the beginning of the string, and "" only at the end of the string and immediately before the newline (if any) at the end of the string.

split

Split text based on a delimiter and optionally strip whitespace.

Output data type: string (list)

Syntax

split(value, delimiter, regex, strip, regex_ignorecase, regex_dotall, regex_multiline)
  • value: The string to split
  • delimiter (default: any whitespace): The character(s) to split on
  • regex (default: None): A valid Python regular expression pattern to split on.
  • strip (default: True): Strip whitespace from each resulting value
  • regex_ignorecase (default: None): With a "regex" pattern, will perform a case insensitive matching.
  • regex_dotall (default: None): With a "regex" pattern, will make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline.
  • regex_multiline (default: None): With a "regex" pattern, when specified, the pattern character "^" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "^" matches only at the beginning of the string, and "" only at the end of the string and immediately before the newline (if any) at the end of the string.

sub

Return the string obtained by replacing the leftmost

non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn't found, string is returned unchanged.

Output data type: string

Syntax

sub(pattern, repl, string, count, regex_ignorecase, regex_dotall, regex_multiline)
  • pattern: The regular expression pattern
  • repl: The string to replace matches with
  • string: The string to search
  • count: (default: 0) The maximum number of pattern occurrences to be replaced.If zero, all occurrences will be replaces.
  • regex_ignorecase (default: None): With a "regex" pattern, will perform a case insensitive matching.
  • regex_dotall (default: None): With a "regex" pattern, will make the "." special character match any character at all, including a newline; without this flag, "." will match anything except a newline.
  • regex_multiline (default: None): With a "regex" pattern, when specified, the pattern character "^" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "^" matches only at the beginning of the string, and "" only at the end of the string and immediately before the newline (if any) at the end of the string.

tabulate

Converts a list of objects into a table (i.e. a two-dimensional array).

Output data type: object (list)

Syntax

tabulate(objects, fields, header)
  • objects: The list of objects
  • fields (optional): List of fields to include (default: all)
  • header (optional): Include a header row (default: True)

today

Returns the current date.

Output data type: string

Syntax

today(timezone, template)
  • timezone (default: EST): The timezone to use for the date
  • template (default: YYYY-MM-DD): The format in which to represent the date

translate_variant

Translate variant into a protein change.

Output data type: object

Output object properties:

  • protein_length: Number of amino acids in the protein
  • cdna_change: cDNA change
  • protein_change: Protein change
  • protein_coordinates: A dictionary containing start and stop coordinatesand the affected transcript id
  • gene: HUGO gene symbol
  • transcript: The transcript ID
  • effects: list of effects

Syntax

translate_variant(variant, gene_model, transcript, include_effects)
  • variant: The variant
  • gene_model (optional): The desired gene model: refseq (default) or ensembl
  • transcript (optional): Limits results to this transcript only
  • include_effects (optional): Returns the effects of the variant using Veppy

Examples

translate_variant("GRCH38-7-117559590-117559593-A")

translate_variant("GRCH38-7-117559590-117559593-A", gene_model="ensembl")

translate_variant("GRCH38-7-117559590-117559593-A", transcript="NM_000492.3")

translate_variant("GRCH38-7-117559590-117559593-A", include_effects=True)

user

Returns the currently authenticated user.

Output data type: object

Output object properties:

  • name: The user's full name.
  • email: The user's email address.