Charts are an essential part of working with datasets by making it easy to see numerical distributions, and most common terms instantly. You can use the SolveBio Web UI to plot various charts for datasets respecting any dataset filters that have been applied. Generated plots can be exported into image using the Export to PNG button in the upper right corner of the chart.
The chart types available for a given dataset depend on the data types found in the dataset.
|Chart type||Description||Supported field types|
|Bar chart||Plot the number of occurrences of each unique value in a
|Box plot||Plot the minimum, first quartile, median, third quartile, and maximum value in a numerical field.||float, double|
|Genomic Scatter||Plot the values of a numeric field over genomic coordinates.||integer, long, date, float, double|
|Histogram||Plot the distribution of values in a numerical or date field.||integer, long, date|
|Prevalence||Plot the frequency of a value (variant or gene) within a study.||string|
|Scatter Plot||Plot the values of a numeric field over optional numeric or
||integer, long, date, float, double|
Plot the number of occurrences of each unique value in a
string field. This plot shows you the most common values of the selected field.
The most common 10 values are loaded by default, but you can load more using the button below the chart.
A box plot is a convenient way of picturing groups of data showing: minimum, first quartile, median, third quartile, and maximum value.
It is possible to split the box plot into multiple plots by a selected category (one per unique value of the category field, up to a maximum of 10).
string fields can be used as a category.
Y-axis scale can be switched from
logarithmic by clicking the
logarithmic button in the top right corner of the chart page.
Plot the values of a selected numeric field over the entire genome in a single chart. Due to the amount of data required, this chart can take some time to load.
Plot the distribution of values in a numerical or date field. It does so by creating intervals (bins) and counts how many values fall into each bin.
This chart is typically used to plot the frequency of gene mutations across all subjects within a study. It shows the prevalence of each value from a primary selected field across all values of a selected second (category) field. Both fields must be
Please note that in large datasets the results are approximated and can have an error of up to 5%.
Plot the values of a numeric field over optional
string or numeric field. If the category field is not selected, all data will be shown along a single X-axis point.
Coloring of the points can be achieved by selecting the additional field from the
Color by dropdown.
For performance reasons, it is not recommended adding coloring to a field with more than 25 different values.
The scatter plot also has the ability to draw lines that will connect the dots that have the same color.
For better results you can disable the jitter effect when displaying lines on the plot.
You can select an optional third categorical field (
string field only) to split the chart into multiple charts, one per unique value of the category field, up to a maximum of 100 (for example, plot one scatter plot by sample).
You can also select the size of the dots displayed in the chart (default size is 2).