Dataset Charts¶

Overview¶

Charts are an essential part of working with datasets by making it easy to see numerical distributions, and most common terms instantly. You can use the SolveBio Web UI to plot various charts for datasets respecting any dataset filters that have been applied. Generated plots can be exported into image using the Export to PNG button in the upper right corner of the chart.

The chart types available for a given dataset depend on the data types found in the dataset.

Chart type	Description	Supported field types
Bar chart	Plot the number of occurrences of each unique value in a `string` field.	string
Box plot	Plot the minimum, first quartile, median, third quartile, and maximum value in a numerical field.	float, double
Genomic Scatter	Plot the values of a numeric field over genomic coordinates.	integer, long, date, float, double
Histogram	Plot the distribution of values in a numerical or date field.	integer, long, date
Prevalence	Plot the frequency of a value (variant or gene) within a study.	string
Scatter Plot	Plot the values of a numeric field over optional numeric or `string` field.	integer, long, date, float, double

Bar Chart¶

Plot the number of occurrences of each unique value in a string field. This plot shows you the most common values of the selected field. The most common 10 values are loaded by default, but you can load more using the button below the chart.

Box Plot¶

A box plot is a convenient way of picturing groups of data showing: minimum, first quartile, median, third quartile, and maximum value. It is possible to split the box plot into multiple plots by a selected category (one per unique value of the category field, up to a maximum of 10). Only string fields can be used as a category.

Y-axis scale can be switched from linear to logarithmic by clicking the linear or logarithmic button in the top right corner of the chart page.

Genomic Scatter¶

Plot the values of a selected numeric field over the entire genome in a single chart. Due to the amount of data required, this chart can take some time to load.

Histogram¶

Plot the distribution of values in a numerical or date field. It does so by creating intervals (bins) and counts how many values fall into each bin.

Prevalence Plot¶

This chart is typically used to plot the frequency of gene mutations across all subjects within a study. It shows the prevalence of each value from a primary selected field across all values of a selected second (category) field. Both fields must be string fields.

Please note that in large datasets the results are approximated and can have an error of up to 5%.

Scatter Plot¶

Plot the values of a numeric field over optional string or numeric field. If the category field is not selected, all data will be shown along a single X-axis point.

Coloring of the points can be achieved by selecting the additional field from the Color by dropdown. For performance reasons, it is not recommended adding coloring to a field with more than 25 different values. The scatter plot also has the ability to draw lines that will connect the dots that have the same color. For better results you can disable the jitter effect when displaying lines on the plot.

You can select an optional third categorical field (string field only) to split the chart into multiple charts, one per unique value of the category field, up to a maximum of 100 (for example, plot one scatter plot by sample). You can also select the size of the dots displayed in the chart (default size is 2).

Last updated 2022-12-07.

Have questions or comments about this article? Get in touch with SolveBio Support by submitting a ticket or by sending us an email.