This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
The ANL Sequencing Service is a bioinformatics pipeline for processing Illumina sequencing data. It handles BCL to FASTQ conversion, data upload to SHOCK storage, and SRA (Sequence Read Archive) submissions. The service is containerized and designed to run both as standalone scripts and Docker containers.
-
bin/: Shell scripts for sequencing workflow execution
master-script.sh: Main BCL to FASTQ conversion orchestratorSHOCK_functions.sh: Reusable functions for SHOCK data storage API interactionsPROJECT_settings.sh: Project-wide configuration settings- Various
shock-*.shscripts for data transfer operations
-
lib/: Libraries in Go and Python
lib/go/: Go module (v1.15) with node management functionalitylib/python/: Python classes for run folder management and SRA operationsRunFolder.py: Manages sequencing run folder metadataSRA.py: Main entry point for SRA submission workflows
-
Docker/: Multi-platform container configurations
base.dockerfile: Base container for anl-seq-serviceFreyja.dockerfile: Specialized container for Freyja analysisbuild-anl-seq-service.sh: Multi-arch build script for containers
- Input: Illumina sequencing run folders with BCL files
- Processing: BCL to FASTQ conversion using bcl2fastq2
- Storage: Upload to SHOCK distributed storage system
- Output: FASTQ files and metadata for downstream analysis
Build multi-architecture containers:
# Build and push latest version
cd Docker && ./build-anl-seq-service.sh
# Build specific containers
docker build --platform linux/amd64 -t Freyja:latest -f Freyja.dockerfile .Run BCL to FASTQ conversion:
# Basic usage
bin/master-script.sh -i <input_dir> -o <output_dir> -s <sample_sheet>
# With additional options
bin/master-script.sh -i <input_dir> -o <output_dir> -s <sample_sheet> -m -t s_1Run SRA submission (main Docker entry point):
python3 lib/python/SRA.py --helpThe service integrates with SHOCK (distributed storage system) through functions in bin/SHOCK_functions.sh:
- Authentication via
/usr/local/share/anl-seq-service/auth.env - MD5 checksum validation for uploads
- Duplicate file detection
- Metadata management through JSON attributes
AUTH: SHOCK authentication header (loaded from auth.env)SHOCK_SERVER: SHOCK server URLOWNER: Set to "ANL-SEQ-Core" in project settings
- Default thread count: 8
- Supported data types: run-folder-archive-thumbnails, run-folder-archive-fastq, run-folder-archive-sav, run-folder-archive-raw
- Adapter files located in
share/adapter-directory/
- Ubuntu base image
- bcl2fastq2 (Illumina BCL conversion tool)
- Python 3 with pysftp, cwlref-runner
- Standard bioinformatics tools: bowtie2, idba
- System utilities: jq, curl, wget
- Go 1.15
- github.com/go-delve/delve v1.6.0 (debugging)
- All SHOCK uploads include MD5 validation
- The service supports various Illumina index configurations (Nextera, TruSeq, dual index)
- Scripts are designed for both single and paired-end sequencing runs
- Container entry point defaults to SRA.py with --help
- Authentication credentials must be properly configured in auth.env for SHOCK operations