# Usage
**scATS** can analyze both single-end and paired-end 5'-end scRNA-seq data, enabling direct quantification using Seurat objects, while incorporating **RNA degradation** (RD) modeling through expectation-maximization (EM).
The main functions of scATS are: `TSSCDF`, `FindMarkersByTheta`,`FindMarkers`, `ThetaByGroup`,`psi`, `Sashimi`
## TSS inference and quantification
The input files include:
- Seurat object (R object)
- Alignment file (bam file)
- Annotation file (gtf file)
```r
# load packages
suppressMessages({
library(scATS)
library(Seurat)
library(data.table)
})
# load input files
seu <- readRDS("demo/demo_seurat.Rds")
Genes <- rownames(seu)
gtfFile <- file.path("demo/demo.gtf")
bams <- "demo/demo.bam"
file.exists(bams)
# quantification using wrapper function
scats <- TSSCDF(object = seu, bam = bams, gtf = gtfFile, genes = Genes, verbose = TRUE, scDR = TRUE, UTROnly = TRUE)
scats
```
class: scATSDataSet
dim: 6 2000
metadata(2): version parameters
assays(4): counts psi theta alpha
rownames(6): OR4F5@1:69071:+ OR4F5@1:69090:+ ... SCNN1D@1:1287217:+ SCNN1D@1:1287223:+
rowData names(12): TSS gene_id ... alpha theta
colnames(2000): TGGACGCTCCTTCAAT-1 CGAGCACGTCAGAATA-1 ... ATCGAGTAGCGGCTTC-1 CTAGAGTAGTGCCATT-1
colData names(4): orig.ident nCount_RNA nFeature_RNA CellType
The quantification results are stored in `scats@rowRanges` and contain the following columns:
| column name | content |
| ---------------- | ------------------------------------------------------------------------------------------ |
| seqnames | The chromosomal name of the gene to which TSS belongs. |
| ranges | The genomic locus of TSS, where the growth rate is highest. |
| strand | The genomic strand of the gene to which TSS belongs. |
| gene_id | The Ensembl ID of the gene to which TSS belongs. |
| gene_name | The HGNC Symbol of the gene to which TSS belongs. |
| TSS | The ID of the TSS in the format of [seqnames:ranges:strand]. |
| Region | The growth area of the sorted 5'-starts of read 1, or it can also be interpreted as the
TSS cluster. Another region must show a significant increase in the number of reads. |
| PSI | The percent spliced in (PSI) of TSS. |
| Percent | The ratio of TSS reads. |
| Annotated | The nearest annotated TSS locus that exists in the region, if it is NA,
indicates that there are no annotated TSS loci in the region. |
| Greedy | Greedy for the first TSS. If it is TRUE, it indicates that the first TSS is quantified
even if it does not meet the specified condition." |
| beta | The area under the cumulative distribution curve : close to 1, indicates no degradation,
while close to 0 indicates severe degradation. |
| alpha | The degradation index fitted using the EM algorithm: the larger the value, the more severe the degradation. |
| theta | The PSI value fitted using the EM algorithm. |
| AllReads | The total number of reads used for quantification. |
***Note : All the following analyses are based on the `scats` object.***
## Finding differentially expressed ATSs
### Finding markers of all group
**Identifying ATS markers based on `θ` value.**
```r
DE <- scATS::FindMarkersByTheta(object = scats, groupBy = "CellType", group1 = NULL, group2 = NULL, cores = 20)
DE
```
gene TSS group1 theta1 theta2 cell1 cell2 percent1 percent2 p
1: OR4F5 OR4F5@1:69063:+ A 0.3090495 0.2476250 14 9 1.3725490 0.9183673 0.2824086
2: OR4F5 OR4F5@1:69063:+ B 0.2476250 0.3090495 9 14 0.9183673 1.3725490 0.2824086
3: OR4F5 OR4F5@1:69071:+ A 0.2092985 0.3593406 12 17 1.1764706 1.7346939 0.1862959
4: OR4F5 OR4F5@1:69071:+ B 0.3593406 0.2092985 17 12 1.7346939 1.1764706 0.1862959
5: OR4F5 OR4F5@1:69080:+ A 0.1622722 0.1533179 9 9 0.8823529 0.9183673 0.9162350
6: OR4F5 OR4F5@1:69080:+ B 0.1533179 0.1622722 9 9 0.9183673 0.8823529 0.9162350
The `DE` object contains following columns:
| column name | content |
| ---------------- | ------------------------------------------------------------------------------------------ |
| gene | The HGNC Symbol of the gene to which TSS belongs. |
| TSS | The ID of the TSS in the format of [gene_name:seqnames:ranges:strand]. |
| group1 | The levels of group. |
| theta1/2 | The theta value of the TSS in the group. |
| cell1/2 | The number of cell-type corresponding to group. |
| percent1/2 | The ratio of TSS reads. |
| p | The p-value from the Wilcoxon test for differences in psi values among groups. |
**Identifying ATS markers based on `ψ` value.**
```r
DE <- scATS::FindMarkers(object = scats, groupBy = "CellType", group1 = NULL, group2 = NULL, cores = 20, majorOnly = F)
DE
```
TSS G1 G2 n1 n2 N1 N2 Cells1 Cells2 PSI1 PSI2 PseudobulkPSI1
1: OR4F5@1:69071:+ A Other 66 58 226 207 1020 980 0.2306640 0.2385772 0.2010582
2: OR4F5@1:69071:+ B Other 58 66 207 226 980 1020 0.2385772 0.2306640 0.3260394
PseudobulkPSI2 wald.test wilcox.test prop.test
1: 0.3260394 0.9230017 0.9005852 1.279065e-09
2: 0.2010582 0.9230017 0.9005852 1.279065e-09
The `DE` object contains following columns:
| column name | content |
| ---------------- | ------------------------------------------------------------------------------------------ |
| TSS | The ID of the TSS in the format of [gene_name:seqnames:ranges:strand]. |
| G1/2 | The levels of group. |
| n1/2 | The expression number of the TSS in the G1/2. |
| N1/2 | The expression number of the host gene in the G1/2. |
| Cells1/2 | The number of cell-type corresponding to G1/2. |
| PSI1/2 | The average sum of all individual cell PSIs. |
| PseudobulkPSI1/2 | The PSI value calculated by combining all reads and treating them as a pseudobulk sample. |
| wald.test | The p-value from the Wald test for differences in psi values among groups. |
| wilcox.test | The p-value from the Wilcoxon test for differences in psi values among groups. |
| prop.test | The p-value from the Proportion test for differences in psi values among groups. |
### Finding markers of given group
```r
### based on θ value
DE <- scATS::FindMarkersByTheta(object = scats, groupBy = "CellType",group1 = "A", group2 = "B", cores = 20)
### based on ψ value
DE <- scATS::FindMarkers(object = scats, groupBy = "CellType", group1 = "A", group2 = "B", cores = 20, majorOnly = F,gene = "OR4F5")
```
In addition, you can specify the host genes used in the calculation by setting the `gene` parameter.
## Calculating PSI
**Calculating PSI based on `θ` value.**
```r
theta <- scATS::ThetaByGroup(object = scats, gene = "OR4F5",groupBy = "CellType")
theta
```
Group TSS alpha theta
1: A 1:69090:+ 0.3766315 0.4308618
2: A 1:69071:+ 0.9681916 0.5691382
3: B 1:69090:+ 0.2349261 0.1561318
4: B 1:69071:+ 0.6822974 0.8438682
**Calculating PSI based on `ψ` value.**
```r
psi <- scATS::psi(object = scats, groupBy = "CellType", TSS=c("OR4F5@1:69071:+", "OR4F5@1:69090:+"))
psi
```
TSS groupBy Cells N n mean sd se ci median Q1 Q3 mad iqr
1 OR4F5@1:69071:+ A 1020 105 66 0.5813757 0.4859188 0.04742081 0.09403726 1 0 1 0 1
2 OR4F5@1:69071:+ B 980 96 58 0.5856689 0.4889869 0.04990702 0.09907795 1 0 1 0 1
3 OR4F5@1:69090:+ A 1020 105 46 0.4186243 0.4859188 0.04742081 0.09403726 0 0 1 0 1
4 OR4F5@1:69090:+ B 980 96 42 0.4143311 0.4889869 0.04990702 0.09907795 0 0 1 0 1
PseudobulkPSI
1 0.4935065
2 0.6436285
3 0.5064935
4 0.3563715
The `theta` or `psi` object contains following columns:
| column name | content |
| ---------------- | ------------------------------------------------------------------------------------------ |
| Group/groupBy | The level of group. |
| TSS | The ID of the TSS in the format of [gene_name:seqnames:ranges:strand]. |
| alpha | The degradation index fitted using the EM algorithm. |
| Cells | The number of cell-type corresponding to a given groups (The same applies to the following.). |
| N | The expression of the host gene. |
| n | The expression of the TSS. |
| mean | The mean of PSI. |
| sd | The standard deviation (std) of PSI. |
| se | The standard error (SE) of PSI. |
| ci | The confidence interval (CI) of PSI. |
| median | The median of PSI. |
| Q1 | The first quartile (Q1) of PSI. |
| Q3 | The third quartile (Q3) of PSI. |
| mad | The median absolute deviation (MAD) of PSI. |
| iqr | The interquartile range (IQR) of PSI. |
| theta/PseudobulkPSI | The PSI value calculated by combining all reads and treating them as a pseudobulk sample. |
## Sashimi plots
```r
# Take SCNN1D gene as an example
gene_symbol <- "SCNN1D"
tss <- GenomicRanges::start(scats@rowRanges[scats@rowRanges$gene_name==gene_symbol])
scATS::Sashimi(object = scats,
bam = bams,
xlimit = c(1280352, 1287325),
gtf = gtfFile,
gene = gene_symbol,
TSS = tss, # TSS位点
free_y = T,#是否scale
base_size = 12, #read部分字体大小
rel_height=0.9, #注释/read ,小于1 read部分比例更大
fill.color = "grey",
line.color = "red",
line.type = 3) -> p
p
```

```r
sessionInfo()
```
```r
R version 4.5.2 (2025-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0 LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C
time zone: Asia/Shanghai
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] scATS_0.5.6
loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.22 splines_4.5.2
[3] later_1.4.4 BiocIO_1.20.0
[5] filelock_1.0.3 bitops_1.0-9
[7] tibble_3.3.0 R.oo_1.27.1
[9] polyclip_1.10-7 graph_1.88.1
[11] rpart_4.1.24 XML_3.99-0.20
[13] fastDummies_1.7.5 httr2_1.2.2
[15] lifecycle_1.0.4 OrganismDbi_1.52.0
[17] ensembldb_2.34.0 globals_0.18.0
[19] lattice_0.22-5 MASS_7.3-65
[21] backports_1.5.0 magrittr_2.0.4
[23] rmarkdown_2.30 Hmisc_5.2-4
[25] plotly_4.11.0 yaml_2.3.12
[27] httpuv_1.6.16 otel_0.2.0
[29] Seurat_5.3.1 sctransform_0.4.2
[31] spam_2.11-1 sp_2.2-0
[33] spatstat.sparse_3.1-0 reticulate_1.44.1
[35] ggbio_1.58.0 cowplot_1.2.0
[37] pbapply_1.7-4 DBI_1.2.3
[39] RColorBrewer_1.1-3 abind_1.4-8
[41] Rtsne_0.17 GenomicRanges_1.62.1
[43] mixtools_2.0.0.1 purrr_1.2.0
[45] R.utils_2.13.0 AnnotationFilter_1.34.0
[47] biovizBase_1.58.0 BiocGenerics_0.56.0
[49] RCurl_1.98-1.17 nnet_7.3-20
[51] rappdirs_0.3.3 VariantAnnotation_1.56.0
[53] IRanges_2.44.0 S4Vectors_0.48.0
[55] ggrepel_0.9.6 irlba_2.3.5.1
[57] listenv_0.10.0 spatstat.utils_3.2-0
[59] goftest_1.2-3 RSpectra_0.16-2
[61] spatstat.random_3.4-3 fitdistrplus_1.2-4
[63] parallelly_1.46.0 codetools_0.2-19
[65] DelayedArray_0.36.0 tidyselect_1.2.1
[67] UCSC.utils_1.6.1 farver_2.1.2
[69] BiocFileCache_3.0.0 matrixStats_1.5.0
[71] stats4_4.5.2 base64enc_0.1-3
[73] spatstat.explore_3.6-0 Seqinfo_1.0.0
[75] GenomicAlignments_1.46.0 jsonlite_2.0.0
[77] progressr_0.18.0 Formula_1.2-5
[79] ggridges_0.5.7 survival_3.8-3
[81] progress_1.2.3 segmented_2.1-4
[83] tools_4.5.2 ica_1.0-3
[85] Rcpp_1.1.0 glue_1.8.0
[87] gridExtra_2.3 SparseArray_1.10.8
[89] xfun_0.55 MatrixGenerics_1.22.0
[91] GenomeInfoDb_1.46.2 dplyr_1.1.4
[93] BiocManager_1.30.27 fastmap_1.2.0
[95] digest_0.6.39 R6_2.6.1
[97] mime_0.13 colorspace_2.1-2
[99] scattermore_1.2 tensor_1.5.1
[101] biomaRt_2.66.0 dichromat_2.0-0.1
[103] spatstat.data_3.1-9 RSQLite_2.4.5
[105] cigarillo_1.0.0 R.methodsS3_1.8.2
[107] tidyr_1.3.2 generics_0.1.4
[109] data.table_1.17.8 rtracklayer_1.70.0
[111] prettyunits_1.2.0 httr_1.4.7
[113] htmlwidgets_1.6.4 S4Arrays_1.10.1
[115] uwot_0.2.4 pkgconfig_2.0.3
[117] gtable_0.3.6 blob_1.2.4
[119] lmtest_0.9-40 S7_0.2.1
[121] XVector_0.50.0 htmltools_0.5.9
[123] dotCall64_1.2 RBGL_1.86.0
[125] ProtGenerics_1.42.0 SeuratObject_5.3.0
[127] scales_1.4.0 Biobase_2.70.0
[129] png_0.1-8 spatstat.univar_3.1-5
[131] rstudioapi_0.17.1 knitr_1.51
[133] reshape2_1.4.5 rjson_0.2.23
[135] checkmate_2.3.3 nlme_3.1-168
[137] curl_7.0.0 cachem_1.1.0
[139] zoo_1.8-15 stringr_1.6.0
[141] KernSmooth_2.23-26 parallel_4.5.2
[143] miniUI_0.1.2 foreign_0.8-90
[145] AnnotationDbi_1.72.0 restfulr_0.0.16
[147] pillar_1.11.1 grid_4.5.2
[149] vctrs_0.6.5 RANN_2.6.2
[151] VGAM_1.1-14 promises_1.5.0
[153] dbplyr_2.5.1 xtable_1.8-4
[155] cluster_2.1.8.1 htmlTable_2.4.3
[157] evaluate_1.0.5 GenomicFeatures_1.62.0
[159] cli_3.6.5 compiler_4.5.2
[161] Rsamtools_2.26.0 rlang_1.1.6
[163] crayon_1.5.3 future.apply_1.20.1
[165] mclust_6.1.2 plyr_1.8.9
[167] stringi_1.8.7 viridisLite_0.4.2
[169] deldir_2.0-4 BiocParallel_1.44.0
[171] txdbmaker_1.6.0 Biostrings_2.78.0
[173] lazyeval_0.2.2 spatstat.geom_3.6-1
[175] Matrix_1.7-4 BSgenome_1.78.0
[177] RcppHNSW_0.6.0 hms_1.1.4
[179] patchwork_1.3.2 bit64_4.6.0-1
[181] future_1.68.0 ggplot2_4.0.1
[183] KEGGREST_1.50.0 shiny_1.12.1
[185] SummarizedExperiment_1.40.0 kernlab_0.9-33
[187] ROCR_1.0-11 igraph_2.2.1
[189] memoise_2.0.1 bit_4.6.0
```