Skip to content

Dataset Charts

Overview

Charts are an essential part of working with datasets by making it easy to see numerical distributions, and most common terms instantly. You can use the SolveBio Web UI to plot various charts for datasets respecting any dataset filters that have been applied. Generated plots can be exported into image using the Export to PNG button in the upper right corner of the chart.

The chart types available for a given dataset depend on the data types found in the dataset.

Chart type Description Supported field types
Bar chart Plot the number of occurrences of each unique value in a string field. string
Box plot Plot the minimum, first quartile, median, third quartile, and maximum value in a numerical field. float, double
Genomic Scatter Plot the values of a numeric field over genomic coordinates. integer, long, date, float, double
Histogram Plot the distribution of values in a numerical or date field. integer, long, date
Prevalence Plot the frequency of a value (variant or gene) within a study. string
Scatter Plot Plot the values of a numeric field over optional numeric or string field. integer, long, date, float, double

Bar Chart

Plot the number of occurrences of each unique value in a string field. This plot shows you the most common values of the selected field. The most common 10 values are loaded by default, but you can load more using the button below the chart.

Box Plot

A box plot is a convenient way of picturing groups of data showing: minimum, first quartile, median, third quartile, and maximum value. It is possible to split the box plot into multiple plots by a selected category (one per unique value of the category field, up to a maximum of 10). Only string fields can be used as a category.

Y-axis scale can be switched from linear to logarithmic by clicking the linear or logarithmic button in the top right corner of the chart page.

Genomic Scatter

Plot the values of a selected numeric field over the entire genome in a single chart. Due to the amount of data required, this chart can take some time to load.

Histogram

Plot the distribution of values in a numerical or date field. It does so by creating intervals (bins) and counts how many values fall into each bin.

Prevalence Plot

This chart is typically used to plot the frequency of gene mutations across all subjects within a study. It shows the prevalence of each value from a primary selected field across all values of a selected second (category) field. Both fields must be string fields.

Please note that in large datasets the results are approximated and can have an error of up to 5%.

Scatter Plot

Plot the values of a numeric field over optional string or numeric field. If the category field is not selected, all data will be shown along a single X-axis point.

Coloring of the points can be achieved by selecting the additional field from the Color by dropdown. For performance reasons, it is not recommended adding coloring to a field with more than 25 different values. The scatter plot also has the ability to draw lines that will connect the dots that have the same color. For better results you can disable the jitter effect when displaying lines on the plot.

You can select an optional third categorical field (string field only) to split the chart into multiple charts, one per unique value of the category field, up to a maximum of 100 (for example, plot one scatter plot by sample). You can also select the size of the dots displayed in the chart (default size is 2).