Skip to content

handling of DCC output for single-ended inputs #210

@yerahko

Description

@yerahko

Description of the bug

Hi there. It looks like the DCC module expects an output from circtools-detect named "_tmp_circtools/tmp_printcirclines*", but it appears to me that circtools is only producing this file when paired-end fastqs are analyzed. Thus, the pipeline fails at line 42 of circrna/modules/local/dcc/dcc/main.nf when input files are single-end:

  mv: can't rename '_tmp_circtools/tmp_printcirclines.[0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]': No such file or directory

I'm not too familiar with DCC/circtools so not sure if there is another output from DCC/circtools that would be better used. Here were the output specifications of circtools-detect.. https://github.com/dieterich-lab/circtools/blob/c8b7f8447faa2d8081fcbbb13e91cb8e8f18a88c/docs/Detect.rst#output-files

Output files

The output of circtools detect consists of the following four files: CircRNACount, CircCoordinates, LinearCount and CircSkipJunctions.

  • CircRNACount: a table containing read counts for circRNAs detected. First three columns are chr, circRNA start, circRNA end. From fourth column on are the circRNA read counts, one sample per column, shown in the order given in your samplesheet.

  • CircCoordinates: circular RNA annotations in BED format. The columns are chr, start, end, genename, junctiontype (based on STAR; 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT), strand, circRNA region (startregion-endregion), overall regions (the genomic features circRNA coordinates interval covers).

  • LinearCount: host gene expression count table, same setup with CircRNACount file.

  • CircSkipJunctions: circSkip junctions. The first three columns are the same as in LinearCount/CircRNACount, the following columns represent the circSkip junctions found for each sample. circSkip junctions are given as chr:start-end:count, e.g. chr1:1787-6949:10. It is possible that for one circRNA multiple circSkip junctions are found due to the fact the the circular RNA may arise from different isoforms. In this case, multiple circSkip junctions are delimited with semicolon. A 0 implies that no circSkip junctions have been found for this circRNA.

Command used and terminal output

nextflow run nf-core/circrna -r dev -profile test,singularity --tools dcc --input samples_single.csv




-[nf-core/circrna] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_CIRCRNA:CIRCRNA:BSJ_DETECTION:DCC:MAIN (fust1_3)'

Caused by:
  Process `NFCORE_CIRCRNA:CIRCRNA:BSJ_DETECTION:DCC:MAIN (fust1_3)` terminated with an error exit status (1)


Command executed:

  printf "paired.junctions" > samplesheet


  circtools detect @samplesheet  -D -an gtf.filtered.gtf  -F -M -k -Nr 1 1 -A chrI.fa -N -T 4

  mv _tmp_circtools/tmp_printcirclines.[0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z] fust1_3_reads.junctions
  mv CircCoordinates fust1_3_coordinates.tsv
  mv CircRNACount fust1_3_counts.tsv

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCRNA:CIRCRNA:BSJ_DETECTION:DCC:MAIN":
      circtools: $(circtools -V)
  END_VERSIONS

Command exit status:
  1

Command output:
  Output folder ./ already exists, reusing
  circtools 2.0 started
  28 CPU cores available, using 4
  started circRNA detection from file paired.junctions
  	=> locating circRNAs (unstranded mode) [paired.junctions]
  	=> sorting circRNAs (unstranded mode) [paired.junctions]
  finished circRNA detection from file paired.junctions
  WARNING: non-stranded data, the strand of circRNAs guessed from the strand of host genes
  Combining individual circRNA read counts
  Using files _tmp_DCC/tmp_circCount and _tmp_DCC/tmp_coordinates for filtering
  Filtering by read counts
  Remove ChrM
  Count CircSkip junctions

Command error:
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  Output folder ./ already exists, reusing
  circtools 2.0 started
  28 CPU cores available, using 4
  started circRNA detection from file paired.junctions
  	=> locating circRNAs (unstranded mode) [paired.junctions]
  	=> sorting circRNAs (unstranded mode) [paired.junctions]
  finished circRNA detection from file paired.junctions
  WARNING: non-stranded data, the strand of circRNAs guessed from the strand of host genes
  Combining individual circRNA read counts
  Using files _tmp_DCC/tmp_circCount and _tmp_DCC/tmp_coordinates for filtering
  Filtering by read counts
  Remove ChrM
  Count CircSkip junctions
  mv: can't rename '_tmp_circtools/tmp_printcirclines.[0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]': No such file or directory

Work dir:
  /*****/projects/nf_circ_test/bug/work/15/7ec763644d95e985dbf218370d3707

Container:
  /*****/scratch/nxf_sing_cache/depot.galaxyproject.org-singularity-circtools-2.0--pyhdfd78af_0.img

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

Relevant files

samples_single.csv

System information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions