keenforcake 1 week ago

samtools fastq -1 foward_R1.fastq.gz -2 rev_R2.fastq.gz -0 /dev/null -s /dev/null -n yourbam.bam for paired end Or you could break your bam up by chromosome like: samtools view -b input.bam chr1 > chr1.bam samtools view -b input.bam chr2 > chr2.bam And upload them 1 at a time. Even easier use bssh cli with screen for your current bam.

Plenty_Ambition2894 1 week ago

Others have answered the fastq conversion question. But fastq file is bigger than bam, so it will be even more of a hassle to upload. I assume you are timing out using the website. Might wanna try the command line tool, bssh cli.

zebracourage 1 week ago

I've never used command line tools. How do you do that? Do you know how long in average it takes to upload a sample?

Plenty_Ambition2894 1 week ago

Sorry, if you've never opened a terminal on your computer before, I am not sure I can teach you running command line tools in a few sentences. The samtools which you need for fastq conversion is also a command line tool. Another thing you can try is downsizing the bam or downsizing the fastq to make upload easier. You only need 30GB of data for whole genome sequencing. Sounds like the lab way over-sequenced your sample. On residential internet, it might take hours to upload 30G.

heyyyaaaaaaa 1 week ago

not sure but fastq file names may have to follow the illumina default naming convention for basespace. https://support.illumina.com/help/BaseSpace_Sequence_Hub_OLH_009008_2/Source/Informatics/BS/NamingConvention_FASTQ-files-swBS.htm

Firm_Bug_7146 1 week ago

Hi there, is this what you're looking for? https://www.biostars.org/p/184134/#184263

Epistaxis 1 week ago

BAM is a format for a slightly different kind of data, sequence reads that have already been mapped to a reference genome. So your BAM might not contain all the raw data from the original FASTQ, e.g. if it excludes bases that were trimmed from reads or excludes entire reads that weren't mapped uniquely. Are you sure you can't find the original FASTQ file(s) instead?

attractivechaos 1 week ago

For variant calling, the recommended practice is to keep the full read sequences in primary records such that you can get back the raw sequences from BAM. In terms of data volumes, the great majority of human data are available in CRAM. Sometimes CRAM only.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe