Skip to content

Expressions Overview

Looking for expression functions?

Jump to the expression function reference.

SolveBio expressions are Python-like formulas that can be used to pull data from datasets, calculate statistics, or run advanced algorithms. They are typically used when transforming datasets but have many other uses such as building web application widgets, pulling SolveBio data into Excel or Google Sheets, or when augmenting databases outside SolveBio.

To jump right into some examples, see the Recipes page.

Expression Syntax

Expressions resemble a single line of code in the Python language. Expressions can only be one valid line of code but can contain line breaks for presentation purposes. As a result, they do not support the declaration of variables or classes.

Expressions can use the built-in library of expression functions. If you have an idea for a new function, feel free to contact SolveBio at any time with your suggestion.

Expressions can be a simple static value, such as a number or string:

# Numeric expression
1 + (2 * 3)
# output: 7

# String expression
"hello" + " world"
# output: "hello world"

Expressions can also reference context values provided during evaluation or annotation:

# String expression with context: {"record": {"a": "hello"}}
record["a"] + " world"
# output: "hello world"

You can also use a number of built-in Python functions such as len, min, max, sum, round, range, and a wide range of SolveBio-specific functions. In addition, you can wrap functions in other functions, and iterate through lists. This makes it possible to construct advanced expressions that pull and manipulate data from other datasets:

# Numeric expression using built-in functions
sum(i for i in range(100))
# output: 4950

# Numeric expression using a SolveBio function
dataset_field_stats("solvebio:public:/ClinVar/3.7.4-2017-01-30/Combined-GRCh37", "review_status_star")["avg"]
# output: 0.883874789018

Data Types and Lists

Expressions always have a return value. The value's data type depends on the expression, but can be one of the following:

Data Type Description
string (default) A valid UTF-8 string with up to 32,766 characters.
text A valid UTF-8 string of any length.
blob A valid UTF-8 string of any length (this data type is not indexed for search).
date A string in ISO 8601 format, for example: "2017-03-29T14:52:01".
integer A signed 32-bit integer with a minimum value of -231 and a maximum value of 231-1.
long A signed 64-bit integer with a minimum value of -263 and a maximum value of 263-1.
float single-precision 32-bit IEEE 754 floating point.
double A double-precision 64-bit IEEE 754 floating point.
boolean Casts the result to a boolean: True or False. Uses Python's truth value testing rules.
object A key/value, JSON-like object, similar to a Python dictionary.

Expressions can be set up to return a single value (default) or a list of values. Enabling list mode will cause the expression to always cast the return value as a list, and vice-versa.

Type Casting

Each data type "casts" the result of an expression ensuring compatibility with the underlying dataset system. If the result of an expression is incompatible with the data type, an error will be raised for that record.

For return values compatible with the required data type, the final result should be straightforward. It is important to note that expressions make a distinction between null values (i.e. Python None), empty strings, and empty lists. The following tables show what to expect when encountering these types of values for different data types:

Expression Result Data Type As Value As List
"" string/text "" [""]
[""] string/text "" [""]
None string/text None [None]
[None] string/text None [None]
[] string/text None []
"" integer/float None [None]
[""] integer/float None [None]
None integer/float None [None]
[None] integer/float None [None]
[] integer/float None []
"" boolean False [False]
[""] boolean False [False]
None boolean None [None]
[None] boolean None [None]
[] boolean None []
float("inf") float Infinity [Infinity]
float("-inf") float -Infinity [-Infinity]
float("NaN") float None [None]

Using Expressions

Most commonly, expressions are used to transform datasets.

Dataset imports and migrations are asynchronous tasks that can take time to run. There are two ways to run expressions in real-time:

  • Evaluation: run a single expression with custom context values.
  • Annotation: run one or more expressions on an arbitrary list of records.

Evaluate an Expression

The ability to evaluate a single expression is helpful when testing new expressions, or in the context of an application view that needs a very specific piece of information.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from solvebio import Expression

# Static expression
expr = Expression('"hello" + " " + "world"')
expr.evaluate(data_type='string', is_list=False)
# Response: 'hello world'

# Expression with a context variable "my_field"
expr = Expression('"hello" + " " + my_field')
data = {'my_field': 'world'}
expr.evaluate(data=data, data_type='string', is_list=False)
# Response: 'hello world'
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
require(solvebio)

# Static expression
Expression.evaluate('"hello" + " " + "world"')
# Response: 'hello world'

# Expression with a context variable "my_field"
Expression.evaluate(
    '"hello" + " " + my_field',
    data=list(my_field='world')
)
# Response: 'hello world'
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Static expression
SolveBio.Expression('"hello" + " " + "world"', 'string')
    .evaluate({})
    .then(function(response) {
        // "hello world"
        console.log(response.result);
    });

// Expression with a context variable "my_field"
SolveBio.Expression('"hello" + " " + my_field', 'string')
    .evaluate({my_field: 'world'})
    .then(function(response) {
        // "hello world"
        console.log(response.result);
    });

Annotate a List of Records

You can annotate a list of records in real time (i.e. without saving them to a dataset) using the annotate endpoint. This provides a way to quickly test one or more expressions on a list of records. To annotate an entire dataset, see Transforming Datasets.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
from solvebio import Annotator

# Define a set of records
records = [
    {'gene': 'BRCA1'},
    {'gene': 'BRCA2'},
    {'gene': 'BRAF'},
    {'gene': 'TTN'},
    {'gene': 'TP53'}
]

# Define the fields to annotate
fields = [
    {
        # How many times is the gene in ClinVar?
        'name': 'clinvar_count',
        'data_type': 'integer',
        'expression': """
            dataset_count(
                "solvebio:public:/ClinVar/3.7.4-2017-01-30/Combined-GRCh37",
                entities=[("gene", record.gene)]
            )
        """
    },
    {
        # What chromosome is the gene on?
        'name': 'chromosome',
        'data_type': 'string',
        'expression': """
            dataset_query(
                "solvebio:public:/GENCODE/2.2.0-24/GENCODE-GRCh37",
                entities=[("gene", record.gene)],
                filters=[("feature", "gene")]
            )[0]["genomic_coordinates"]["chromosome"]
        """
    },
    {
        # Set the current date and time.
        'name': 'date_evaluated',
        'data_type': 'date',
        'expression': 'now()'
    }
]


for r in Annotator(fields=fields).annotate(records):
    print r
# Response:
# {
#   'results': [
#       {
#           'chromosome': '17',
#           'clinvar_count': 4692,
#           'date_evaluated': '2017-03-29T15:32:02',
#           'gene': 'BRCA1'
#       },
#       ...
#   ]
# }
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
require(solvebio)

records = list(
    list(gene="BRCA1"),
    list(gene="BRCA2"),
    list(gene="BRAF"),
    list(gene="TTN"),
    list(gene="TP53")
)

# Define the fields to annotate
fields = list(
    list(
        # How many times is the gene in ClinVar?
        name="clinvar_count",
        data_type="integer",
        expression="
            dataset_count(
                 'solvebio:public:/ClinVar/3.7.4-2017-01-30/Combined-GRCh37',
                 entities=[('gene', record.gene)]
            )
        "
    ),
    list(
        # What chromosome is the gene on?
        name="chromosome",
        data_type="string",
        expression="
            dataset_query(
                'solvebio:public:/GENCODE/2.2.0-24/GENCODE-GRCh37',
                entities=[('gene', record.gene)],
                filters=[('feature', 'gene')]
            )[0]['genomic_coordinates']['chromosome']
        "
    ),
    list(
        # Set the current date and time.
        name="date_evaluated",
        data_type="date",
        expression="now()"
    )
)

Annotator.annotate(records=records, fields=fields)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// Define a set of records
var records = [
    {'gene': 'BRCA1'},
    {'gene': 'BRCA2'},
    {'gene': 'BRAF'},
    {'gene': 'TTN'},
    {'gene': 'TP53'}
]

// Define the fields to annotate
var fields = [
    {
        // How many times is the gene in ClinVar?
        'name': 'clinvar_count',
        'data_type': 'integer',
        'expression': "" +
        "    dataset_count(" +
        "        'solvebio:public:/ClinVar/3.7.4-2017-01-30/Combined-GRCh37'," +
        "        entities=[('gene', record.gene)]" +
        "    )"
    },
    {
        // What chromosome is the gene on?
        'name': 'chromosome',
        'data_type': 'string',
        'expression': "" +
        "    dataset_query(" +
        "        'solvebio:public:/GENCODE/2.2.0-24/GENCODE-GRCh37'," +
        "        entities=[('gene', record.gene)]," +
        "        filters=[('feature', 'gene')]" +
        "    )[0]['genomic_coordinates']['chromosome']"
    },
    {
        // Set the current date and time.
        'name': 'date_evaluated',
        'data_type': 'date',
        'expression': 'now()'
    }
]


SolveBio.Annotator(fields)
    .annotate(records)
    .then(function(response) {
        console.log(response);
    });

Common Issues

Using expressions does require some basic knowledge of Python. Due to the condensed nature of an expression, syntax errors can be hard to spot. Selecting the correct data type can also be confusing at times.

The data type defines the final output of an expression.

One common issue is confusing the data type of a particular function with the final output of an expression. When working with functions, look to its documentation to see what it returns. If it returns a list, make sure your expression handles that, even if list mode is disabled.

Some data types are incompatible with some functions.

If a function returns a string, but you've set the expression's data type to an integer, double, or object, it may not evaluate properly.

When list mode is disabled it will never return a list.

If list mode is disabled but the expression returns a list, only the first value will be returned. Conversely, if list mode is enabled, the value will always be cast to a list.