Software Engineer - XR Codec Interactions and Avatars Team

Software Engineer – XR Codec Interactions and Avatars Team

Company	Meta
Location	Redmond, WA, USA, Pittsburgh, PA, USA
Salary	$56.25 – $173000
Type	Full-Time
Degrees	Bachelor’s
Experience Level	Mid Level, Senior

Currently has, or is in the process of obtaining a Bachelor’s degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
3+ years of experience in UNIX/LINUX and clear understanding of TCP/IP network fundamentals
5+ years of experience coding in at least one of the following languages: C++, Python, or Rust
Experience with software development practices such as source control, code reviews, unit testing, debugging and profiling
Experience with Internet service architecture capacity planning and/or handling needs for urgent capacity augmentation
Knowledge of common web technologies and/or Internet service architectures (such as LAMP or MEAN stacks, CDN, Load Balancing techniques, etc.)
Experience configuring and running infrastructure level applications, such as Kubernetes, Terraform, MySQL, SLURM, etc.

Leverage the scale and complexity of the larger Meta infrastructure to accelerate our Codec Interaction and Avatars projects
Influence outcomes within your immediate team, peer engineering teams, and with cross-functional stakeholders
Work independently, handle large projects simultaneously, and prioritize team roadmap and deliverables by balancing required effort with resulting impact
Own Research Super Cluster back-end services which handle fleet management, infrastructure components that drive Meta’s advances in AI, core services which are used by every team at XRCIA, networking systems, and everything in between
Author and review code, develop documentation and capacity plans, and debug the hardest problems, all live, on some of the largest and most complex systems in the world
Together with your engineering team, you will share an on-call rotation and be an escalation contact for service incidents. Provide on-call support and lead incident root cause analysis through multiple data engineering layers (compute, storage, network) for GPU clusters and act as a final escalation point

Thorough understanding of Linux operating system, including the networking subsystem
Experience in distributed system performance measurement, logging, and optimization
Experience with Python library management systems such as Conda
Prior experience in cluster oncall operations, including troubleshooting server/scheduler/storage errors, maintaining compute/storage environments/libraries/tools, helping onboard users to the cluster, and answering general questions from users
Prior experience in cluster coordination and strategy planning, including collecting/understanding needs of users, developing tools to improve user experience, providing guidance on best practices, forecasting compute/storage needs, and developing long-term user experience/compute/storage strategies
Prior experience building tooling for monitoring and telemetry
Prior experience in developing/managing distributed network file systems
Prior experience in network security