AMAZON Software Development Engineer, EC2 Instance Networking in Santa Clara, CA

pin
pin
Description

Join our team building the scale-out networking backbone that powers the world's largest AI training clusters. We're developing high-performance RDMA and RoCE solutions that enable distributed training of trillion-parameter models across thousands of compute nodes on AWS infrastructure.

Our team is responsible for creating the networking software that connects massive AI accelerator clusters, focusing on SmartNIC integration, collective communication optimization, and ultra-high-bandwidth inter-rack connectivity. You'll be working at the intersection of cloud infrastructure and state-of-the-art AI hardware to solve some of the most challenging networking problems in distributed computing.


Key job responsibilities
- Design and develop high-performance networking software solutions utilizing RDMA and RoCE technologies for large-scale AI clusters
- Integrate SmartNIC acceleration hardware with EC2 control plane systems and APIs
- Implement and optimize collective communication patterns for distributed AI training workloads
- Develop comprehensive performance monitoring, metrics collection, and benchmarking tools for high-bandwidth cluster interconnects
- Create automated testing frameworks and stress testing tools for multi-rack distributed systems
- Debug complex system-level issues across hardware acceleration, kernel networking, and distributed applications
- Collaborate on architecture decisions for next-generation scale-out AI infrastructure
- Participate in design reviews, code reviews, and technical documentation

About the team
Utility Computing (UC)
AWS Utility Computing (UC) provides product innovations from foundational services such as Amazons Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWSs services and features apart in the industry. As a member of the UC organization, youll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.
Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and were building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.
Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasnt followed a traditional path, or includes alternative experiences, dont let it stop you from applying.
About AWS
Amazon Web Services (AWS) is the worlds most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating thats why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.
Inclusive Team Culture
Here at AWS, its in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (diversity) conferences, inspire us to never stop embracing our uniqueness.
Work/Life Balance
We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, theres nothing we cant achieve in the cloud.
Mentorship & Career Growth
Were continuously raising our performance bar as we strive to become Earths Best Employer. Thats why youll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.

Basic Qualifications

- 3 years of non-internship professional software development experience
- 2 years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Strong programming skills in C/C with focus on high-performance systems
- Experience with RDMA technologies and RoCE implementations
- Familiarity with collective communication libraries (NCCL, RCCL, OneCCL, MPI)
- Experience with Linux networking, kernel development, and distributed systems
- Understanding of high-performance computing clusters and parallel programming

Preferred Qualifications

- 3 years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Bachelor's degree in computer science or equivalent
- Experience with SmartNIC programming and network acceleration hardware APIs
- Knowledge of large-scale AI training infrastructure and multi-rack cluster networking
- Experience with performance optimization, benchmarking, and system-level debugging
- Understanding of AI accelerator architectures and scale-out communication patterns
- Experience with cloud infrastructure integration and virtualization technologies
- Bachelor's degree in Computer Science, Computer Engineering, or related field
- Strong problem-solving skills and experience with complex distributed systems
- Proficiency in design and analysis of algorithms and data structures
- Linux operating system knowledge
- In-depth knowledge of TCP/IP
- Kernel or embedded development, particularly Linux kernel
- Strong knowledge of Computer Science fundamentals in data structures, algorithm design, problem solving, and complexity analysis
- Knowledge of, at least, one modern programming language such as C, C , rust, Python or Perl
- Experience developing complex software systems that have been successfully delivered to customers
- Knowledge of professional software engineering practices & best practices for the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations
- Ability to take a project from scoping requirements through actual launch of the project
- Experience in communicating with users, other technical teams, and management to collect requirements, describe software product features, and technical designs
- Experiencing mentoring junior software development engineers and driving engineering excellence

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Companys reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information. If the country/region youre applying in isnt listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at />
USA, CA, Santa Clara - 165,200.00 - 223,600.00 USD annually

Join our team building the scale-out networking backbone that powers the world's largest AI training clusters. We're developing high-performance RDMA and Ro. CE solutions that enable distributed training of trillion-parameter models across thousands of compute nodes on AWS infrastructure. Our team is responsible for creating the networking software that connects massive AI accelerator clusters, focusing on Smart. NIC integration, collective communication optimization, and ultra-high-bandwidth inter-rack connectivity. You'll be working at the intersection of cloud infrastructure and state-of-the-art AI hardware to solve some of the most challenging networking problems in distributed computing. Key job responsibilities - Design and develop high-performance networking software solutions utilizing RDMA and Ro. CE technologies for large-scale AI clusters - Integrate Smart. NIC acceleration hardware with EC 2 control plane systems and APIs - Implement and optimize collective communication patterns for distributed AI training workloads - Develop comprehensive performance monitoring, metrics collection, and benchmarking tools for high-bandwidth cluster interconnects - Create automated testing frameworks and stress testing tools for multi-rack distributed systems - Debug complex system-level issues across hardware acceleration, kernel networking, and distributed applications - Collaborate on architecture decisions for next-generation scale-out AI infrastructure - Participate in design reviews, code reviews, and technical documentation. About the team. Utility Computing (UC)AWS Utility Computing (UC) provides product innovations from foundational services such as Amazons Simple Storage Service (S 3) and Amazon Elastic Compute Cloud (EC 2), to consistently released new product innovations that continue to set AW - Ss services and features apart in the industry. As a member of the UC organization, youll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services. Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and were building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future. Diverse Experiences. AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasnt followed a traditional path, or includes alternative experiences, dont let it stop you from applying. About AWS - Amazon Web Services (AWS) is the worlds most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating thats why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture. Here at AWS, its in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and Amaze. Con (diversity) conferences, inspire us to never stop embracing our uniqueness. Work/ Life Balance. We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, theres nothing we cant achieve in the cloud. Mentorship & Career Growth. Were continuously raising our performance bar as we strive to become Earths Best Employer. Thats why youll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Basic Qualifications- 3 years of non-internship professional software development experience- 2 years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience- Strong programming skills in C/ C with focus on high-performance systems- Experience with RDMA technologies and Ro. CE implementations- Familiarity with collective communication libraries (NCCL, RCCL, One. CCL, MPI)- Experience with Linux networking, kernel development, and distributed systems- Understanding of high-performance computing clusters and parallel programming. Preferred Qualifications- 3 years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience- Bachelor's degree in computer science or equivalent- Experience with Smart. NIC programming and network acceleration hardware APIs- Knowledge of large-scale AI training infrastructure and multi-rack cluster networking- Experience with performance optimization, benchmarking, and system-level debugging- Understanding of AI accelerator architectures and scale-out communication patterns- Experience with cloud infrastructure integration and virtualization technologies- Bachelor's degree in Computer Science, Computer Engineering, or related field- Strong problem-solving skills and experience with complex distributed systems- Proficiency in design and analysis of algorithms and data structures- Linux operating system knowledge- In-depth knowledge of TCP/ IP- Kernel or embedded development, particularly Linux kernel- Strong knowledge of Computer Science fundamentals in data structures, algorithm design, problem solving, and complexity analysis- Knowledge of, at least, one modern programming language such as C, C , rust, Python or Perl- Experience developing complex software systems that have been successfully delivered to customers- Knowledge of professional software engineering practices & best practices for the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations- Ability to take a project from scoping requirements through actual launch of the project- Experience in communicating with users, other technical teams, and management to collect requirements, describe software product features, and technical designs- Experiencing mentoring junior software development engineers and driving engineering excellence.
search terms: Development Engineer+Network
pin
pin
Local Job Bulletin is an independent Job Search Engine. Local Job Bulletin is not endorsed, sponsored or affiliated with the actual employer of the job. All trademarks, service marks, logos, domain names, and job descriptions are the property of their respective holder.
 
 
Local Job Bulletin is an independent Job Search Engine. Local Job Bulletin is not an agent or representative and is not endorsed, sponsored or affiliated with any employer. Local Job Bulletin uses proprietary technology to keep the availability and accuracy of its job listings and their details. All trademarks, service marks, logos, domain names, job descriptions and other company descriptions / details are the property of their respective holder. Local Job Bulletin does not have its users apply for a job on the LocalJobBulletin.com website. Additionally, Local Job Bulletin may provide a list of third-party job listings that may not be affiliated with any employer. Please make sure you understand and agree to the website's Terms & Conditions and Privacy Policies you are applying on as they may differ from ours and are not in our control.;
pin
pin