Grid Engineer (Systems Engineer III)


Job Details


Job Description

Serve as a Grid Engineer on the High Performance Computing (HPC) team providing expert IT technical services to a federal client. The position supports the deployment and sustainment of Grid technologies/Parallel Cluster in Amazon Web Services. Responsible for developingcustom Amazon Machine Image's (AMI's) and automating the build and provisioning of clusters. Collaborates with a highly technical user base on how to properly use the Grid and Slurm. Provides engineering support for Artificial Intelligence and Machine Learning. Develops user documentation, guides and training sessions for staff.

Key Tasks and Responsibilities

Experience administrating, configuring, operating, and maintaining a clustered system run by a Resource Manager such as SLURM, YARN (Hadoop), Torque, Condor, PBS, Sun Grid Engine, or equivalent.

Experience supporting statistical or research applications such as MATLAB, STATA, SAS, Mathematica, or similar.

Experience managing the configuration and compliance of Linux Systems in a clustered environment using a DevOps technology such as Ansible or Puppet.

Experience deploying systems in on-premises (physical equipment) and in cloud environment including Amazon Web Services (AWS)

Experience in advanced Linux administration in administering production Linux computer systems, including strong command line Linux operating system skills, working knowledge of or experience with hardware and software security practices, and experience scripting in Bash, Perl, Python, or similar languages.

Experience with building and deploying containerized, GPU-enabled applications in Docker, Singularity, or Kubernetes.

Experience with programming and implementing scientific and physics M&S algorithms, Big Data, and Data Science.

Provide engineering support for Artificial Intelligence and Machine Learning, including architecture, strategy, training, and operational support.

Ability to create and maintain system documentation.




Education & Experience

8 or more years of IT experience

Bachelor's degree (or equivalent number years' experience) in Computer Science, Data Analytics or related field

Specialized experience must include all of the following:

Experience with Slurm;

Experience with Grid or Parallel Cluster

Experience with Red Hat Enterprise Linux

Experience with Ansible or equivalent

Experience using tools such as JIRA, Git, Atlassian Stash, Bamboo, etc.

Experience with Amazon Web Services CloudFormation and Lamda functions is a plus.

Certifications

None

Security Clearance

Public Trust High (Tier 4/BI) Risk Level

Must be a US citizen or Lawful Permanent Residents (LPR)

Other (Travel, Work Environment, DoD 8570 Requirements, Administrative Notes, etc.)

Remote Work or D.C. Office

Computer World Services is an affirmative action and equal employment opportunity employer. Current employees and/or qualified applicants will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, disability, protected veteran status, genetic information or any other characteristic protected by local, state, or federal laws, rules, or regulations.

Computer World Services is committed to the full inclusion of all qualified individuals. As part of this commitment, Computer World Services will ensure that individuals with disabilities (IWD) are provided reasonable accommodations. If reasonable accommodation is needed to participate in the job application or interview process, to perform essential job functions, and/or to receive other benefits and privileges of employment, please contact Aaron McClellan in Human Resources at 314.###.#### or





 Computer World Services (CWS)Corporation

 06/16/2024

 Washington,DC