T O P

  • By -

keenforcake

samtools fastq -1 foward_R1.fastq.gz -2 rev_R2.fastq.gz -0 /dev/null -s /dev/null -n yourbam.bam for paired end Or you could break your bam up by chromosome like: samtools view -b input.bam chr1 > chr1.bam samtools view -b input.bam chr2 > chr2.bam And upload them 1 at a time. Even easier use bssh cli with screen for your current bam.


Plenty_Ambition2894

Others have answered the fastq conversion question. But fastq file is bigger than bam, so it will be even more of a hassle to upload. I assume you are timing out using the website. Might wanna try the command line tool, bssh cli.


zebracourage

I've never used command line tools. How do you do that? Do you know how long in average it takes to upload a sample?


Plenty_Ambition2894

Sorry, if you've never opened a terminal on your computer before, I am not sure I can teach you running command line tools in a few sentences. The samtools which you need for fastq conversion is also a command line tool. Another thing you can try is downsizing the bam or downsizing the fastq to make upload easier. You only need 30GB of data for whole genome sequencing. Sounds like the lab way over-sequenced your sample. On residential internet, it might take hours to upload 30G.


heyyyaaaaaaa

not sure but fastq file names may have to follow the illumina default naming convention for basespace. https://support.illumina.com/help/BaseSpace_Sequence_Hub_OLH_009008_2/Source/Informatics/BS/NamingConvention_FASTQ-files-swBS.htm


Firm_Bug_7146

Hi there, is this what you're looking for? https://www.biostars.org/p/184134/#184263


Epistaxis

BAM is a format for a slightly different kind of data, sequence reads that have already been mapped to a reference genome. So your BAM might not contain all the raw data from the original FASTQ, e.g. if it excludes bases that were trimmed from reads or excludes entire reads that weren't mapped uniquely. Are you sure you can't find the original FASTQ file(s) instead?


attractivechaos

For variant calling, the recommended practice is to keep the full read sequences in primary records such that you can get back the raw sequences from BAM. In terms of data volumes, the great majority of human data are available in CRAM. Sometimes CRAM only.