GTEx corrupted data

I downloaded some data from GTEx following the instruction: https://anvilproject.org/learn/reference/gtex-v8-free-egress-instructions
Using the command:
gen3-client download-multiple --profile=${profile_name} -manifest=${manifest_name}.json --no-prompt --numparallel=20 --protocol=s3 --download-path=${download_path} --skip-completed
Although after downloading all WES data and some RNA-seq data, I found out that a considerable number of files are corrupted. Specifically, when using samtools view on these bam files, samtools gave the following error:

[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_uncompress] Inflate operation failed: progress temporarily not possible, or in() / out() returned an error
[E::bgzf_read] Read block operation failed with error 1 after 1 of 267 bytes
[main_samview] truncated file.
16838939

So I checked the last 28 bytes of the file, which are supposed to be EOF of a bam file:
tail -c 28 ${bam_file_name}.bam | xxd -p
which shows that the last 28 bytes are all 0s instead of EOF. Then I checked last part of each bam file affected by the issue and found out that all of those bam files have large chunks of 0s at the end of the files.

All of the downloading mentioned before was performed on a HPC under centos-release-7-8.2003.0.el7.centos.x86_64. Then I downloaded one of the bam file on my local mac machine, which is also affected by this issue.

Can someone help to check out what is going on here? Thanks a lot!

Hi @Yuhe_Cheng,

welcome to the Forum and thank you for the question.
We have also received your email via our Help Desk. We'd be grateful if you could provide us a complete list of the troublesome BAM files via email and we will look into this.

Many thanks and happy holidays!

Hi, I have the same issue when downloading GTEx RNA-seq data.
So far I downloaded 925 files and 137 of them are EOF error based on "samtools quickcheck".
What is the correct email address that I can share the list of 137 EOF bam files?

Hi @Juheon_Maeng,

The support email is support@datacommons.io.

1 Like