Egress Fees when downloading data to AWS instance

I have been using gen3 to download .bam files onto my AWS cloud instances for analysis. When looking at my bill, I noticed that I am incurring a bit of AWS Egress fees even though I am downloading data onto my cloud instances (~0.8% of the data I have downloaded/ingressed using Gen3).

Is this data egress due to some acknowledgement/verification protocol that is running in the background of the gen3-client?

Is it possible for me to avoid or reduce this egress fee during data download using the gen3-client into my AWS instances?

Thank you!

Hi, ktan8! Thanks for reaching out. Can you help me better understand what you mean when you say "~0.8% of the data I have downloaded/ingressed using Gen3"? Do you mean that only 0.8% of the data you have downloaded from your Gen3 instance has incurred an egress charge?

I do know that AWS does charge egress charges because data transfer uses bandwidth. This could be a helpful resource for you: AWS Data Transfer Charges for Server and Serverless Architectures | AWS Partner Network (APN) Blog.

Let me ask our team if there's a strategy that could help you reduce the egress fee.

-- Sara

Hi Sara,

Thanks for your reply. I have been trying to download a fair bit of data through the Gen3 client (e.g. ~100 TB) through an AWS instance for analysis.

However, due to the nature of the network protocol (TCP I presume), a request has to be made from my AWS instance to the Gen3 data server before the Gen3 then sends the requested data back to my AWS instance. Somehow this request is made very frequently such that it ends up to be ~0.8% of the total data I try to download. Thus, when I try to download 100TB of data from the Gen3 server to my AWS instance, my AWS instance will send out 800 GB (0.8% * 100 TB) of requests to the Gen3 data server. This 800 GB of request that I send out to the Gen3 data server is treated by AWS as an "Egress charge". I was not charged for the 100 TB of Ingress into AWS.

However, this 800GB of request charges quickly adds up, and can cost quite a bit over time. I therefore wonder if there's a way to modify the network protocol (e.g. packet size??) such that we can minimize the request made by the gen3-client to the data server? With that, I would not have to pay as much for the egress.

Thank you so much for your help and for checking with your team!

Hi, ktan8, thanks for your patience! I have not been able to determine whether there is a way to modify the network protocol to reduce your charges. However, I was able to collect a couple other ideas for reducing your egress charges:

  1. (If your Gen3 server is also in AWS) Get your source and destination to be in the same AWS region: Egress fees generally don’t apply when transfer is within one AWS region. (See the Data Transfer tab here. ) Check the regions used by your S3 AWS service and your AWS Gen3 server. The way to check in the AWS console will vary slightly by service, probably. For S3, this could be helpful to figure out the region: Get the Region where the Amazon S3 bucket resides using an AWS SDK - Amazon Simple Storage Service This may be helpful for EC2: Regions and Zones - Amazon Elastic Compute Cloud

  2. Leverage discounts from Internet2: If your institution is a part of Internet2, you should reach out to Internet2 or find an AWS Reseller who will offer them Internet2 terms, which includes a decent egress discount.

If I hear of other strategies, I will add to this conversation.

However -- you may want to ask the Gen3 community what strategies they have found successful, and get a crowd-sourced answer. For that -- I invite you to join our community Slack channel, filled with other Gen3 users with plenty of varied experience. You can request access to join through the Google form linked there.

Thanks!
-- Sara

Hi Sara,

This is really helpful!

  1. I am not sure which region the data is located in, and therefore am not sure which AWS region I should use for data colocation. I was trying to download ICGC genomics data from the Bionimbus Protected Cloud (PDC) (https://docs.icgc.org/download/downloading-data/#downloading-data-from-pdc-repository) using the gen3 client.

Also, on my bill, it simply describe the charge as "$0.090 per GB - first 10 TB / month data transfer out beyond the global free tier"

  1. The point about Internet2 is noted. This is really informative. We will look into it.

Thank you so much for your help!