Current release: ReMM v0.4
Description | Link (Size) | Tabix Index (Size) | MD5 check sums |
---|---|---|---|
ReMM scores for GRCh38 | (15 GB) | (2.7 MB) | (4 KB) |
ReMM scores for GRCh37 | (16 GB) | (2.7 MB) | (4 KB) |
Previous release: ReMM v0.31.post1
Description | Link (Size) | Tabix Index (Size) | MD5 check sums |
---|---|---|---|
ReMM scores for GRCh38 | (17 GB) | (2.7 MB) | (4 KB) |
ReMM scores for GRCh37 | (12 GB) | (2.4 MB) | (4 KB) |
All these files can be also downloaded from Zenodo:
For downloading large files, we highly recommend a download manager or another tool that allows you to continue interrupted downloads (e.g. wget -c).
ReMM score changelog
0.4:
- Features:
- For missing values using genome mean of feature for sequence and conservaton features. 1 for p-value. All other features have zero as missing value
- Updating DGVCount to 02/25/2020 on hg19/hg38
- Update dbVARCount to 10/20/2021 on hg19/hg38
- Update ISCApath to 11/03/2021 on hg19/hg38
- Replace tfbsConsSites with UCSC table encRegTfbsClustered on hg19/hg38
- Software:
- Using parSMURF for training
- Complete retraining of hg19 and hg38 builds (hg19: AUROC=0.993; AUPRC=0.394; hg38: AUROC=0.996; AUPRC=0.610)
0.3.1.post1:
- New hg38 release. Completely retrained on the new genome build.
- Training data
- Liftover positives. No change in size.
- Negatives used from CADD v1.4 GRCh38 (human derived), filtered as described in the original paper (https://doi.org/10.1016/j.ajhg.2016.07.005) Size slightly different (hg38: 13,902,234; hg19: 14,755,199).
- Features
- Same size as in hg19: 26 features.
- We tried to use the same features as in hg19. Sometimes new versions of data have to be used (e.g. DGV, ISCA, dbVAR).
- Training was done with the parSMURF implementation of hyperSMURF.
- Same hg19 parameters are used.
- Metrics via 10-fold cytoband cross-validation (same cytoband to fold map).
- Area under the ROC curve: 0.996 (hg19: 0.989, see https://doi.org/10.1016/j.ajhg.2016.07.005).
- Area under the precision recall curve: 0.548 (hg19: 0.441,, see see https://doi.org/10.1016/j.ajhg.2016.07.005).
- Scores for hg19 in this release are the same as version 0.3.1. Only the files have been renamed.
0.3.1:
- Bugfix of region chr17:79759050-81195210. Region is missing in older versions.
0.3:
- First official public version.
- Values for positions in training data are computed by cytoband-aware 10 fold cross-validation.
- Other position scores are compted by a generalized model of all training data.
- This version was used in the Genomiser publication (Smeley et.al. A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease. AHJG. 2016).