Current release: ReMM v0.4

Description	Link (Size)	Tabix Index (Size)	MD5 check sums
ReMM scores for GRCh38	(15 GB)	(2.7 MB)	(4 KB)
ReMM scores for GRCh37	(16 GB)	(2.7 MB)	(4 KB)

Previous release: ReMM v0.31.post1

Description	Link (Size)	Tabix Index (Size)	MD5 check sums
ReMM scores for GRCh38	(17 GB)	(2.7 MB)	(4 KB)
ReMM scores for GRCh37	(12 GB)	(2.4 MB)	(4 KB)

All these files can be also downloaded from Zenodo:

For downloading large files, we highly recommend a download manager or another tool that allows you to continue interrupted downloads (e.g. wget -c).

ReMM score changelog

0.4:

Features:

For missing values using genome mean of feature for sequence and conservaton features. 1 for p-value. All other features have zero as missing value
Updating DGVCount to 02/25/2020 on hg19/hg38
Update dbVARCount to 10/20/2021 on hg19/hg38
Update ISCApath to 11/03/2021 on hg19/hg38
Replace tfbsConsSites with UCSC table encRegTfbsClustered on hg19/hg38

Software:

Using parSMURF for training

Complete retraining of hg19 and hg38 builds (hg19: AUROC=0.993; AUPRC=0.394; hg38: AUROC=0.996; AUPRC=0.610)

0.3.1.post1:

New hg38 release. Completely retrained on the new genome build.

Training data

Liftover positives. No change in size.
Negatives used from CADD v1.4 GRCh38 (human derived), filtered as described in the original paper (https://doi.org/10.1016/j.ajhg.2016.07.005) Size slightly different (hg38: 13,902,234; hg19: 14,755,199).

Features

Same size as in hg19: 26 features.
We tried to use the same features as in hg19. Sometimes new versions of data have to be used (e.g. DGV, ISCA, dbVAR).

Training was done with the parSMURF implementation of hyperSMURF.

Same hg19 parameters are used.

Metrics via 10-fold cytoband cross-validation (same cytoband to fold map).

Area under the ROC curve: 0.996 (hg19: 0.989, see https://doi.org/10.1016/j.ajhg.2016.07.005).
Area under the precision recall curve: 0.548 (hg19: 0.441,, see see https://doi.org/10.1016/j.ajhg.2016.07.005).

Scores for hg19 in this release are the same as version 0.3.1. Only the files have been renamed.

0.3.1:

Bugfix of region chr17:79759050-81195210. Region is missing in older versions.

0.3:

First official public version.
Values for positions in training data are computed by cytoband-aware 10 fold cross-validation.
Other position scores are compted by a generalized model of all training data.
This version was used in the Genomiser publication (Smeley et.al. A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease. AHJG. 2016).