gencfu - Linux
Overview
gencfu is a tool used primarily in bioinformatics to generate clustered features (CFUs) from single-cell sequencing data. CFUs are groups of cells that share similar expression profiles and are believed to belong to the same cell type or subpopulation.
Syntax
gencfu [options] <input.tsv> <output.tsv>
Options/Flags
- -c, –clusters: Number of clusters to generate. Default: 10
- -m, –metric: Distance metric to use for clustering. Options include:
- euclidean
- cosine
- pearson Default: euclidean
- -t, –threshold: Minimum distance threshold for merging clusters. Default: 0.5
- -i, –iterations: Number of iterations to run the clustering algorithm. Default: 100
- -s, –seed: Random seed for the clustering algorithm. If not specified, a random seed will be used.
- -o, –output-stats: Write additional statistics to the output file.
Examples
- Generate 15 CFUs from a single-cell RNA-seq dataset using the default settings:
gencfu -c 15 input.tsv output.tsv
- Generate 20 CFUs using the cosine distance metric with a threshold of 0.8:
gencfu -c 20 -m cosine -t 0.8 input.tsv output.tsv
- Run the clustering algorithm for 200 iterations and write additional statistics to the output file:
gencfu -i 200 -s 1234 -o output.tsv input.tsv
Common Issues
- If the clustering algorithm does not converge, try increasing the number of iterations (-i).
- If the resulting CFUs are too broad or too specific, adjust the threshold (-t) accordingly.
Integration
gencfu can be integrated into analysis pipelines for single-cell RNA-seq data. It can be used as a preprocessing step to identify cell types and subpopulations, which can then be used for downstream analysis, such as gene expression analysis or cell trajectory reconstruction.