We are looking for a skilled Linux System Administrator to support our Machine Learning and Artificial Intelligence operations. The successful candidate will be responsible for ensuring the stability, scalability, and security of our Linux-based infrastructure, which includes but is not limited to, clusters, grids, and clouds. This role requires strong technical expertise in Linux system administration, as well as experience with containerization (e.g., Docker) and, orchestration (e.g., Kubernet... more details
We are looking for a skilled Linux System Administrator to support our Machine Learning and Artificial Intelligence operations. The successful candidate will be responsible for ensuring the stability, scalability, and security of our Linux-based infrastructure, which includes but is not limited to, clusters, grids, and clouds. This role requires strong technical expertise in Linux system administration, as well as experience with containerization (e.g., Docker) and, orchestration (e.g., Kubernetes). The ideal candidate will have a passion for ML/AI and be eager to collaborate with our data science and engineering teams to optimize our workflows.
Manage and maintain the health of our Linux-based infrastructure, including servers, clusters, grids, and clouds.
Ensure system uptime, performance, and security by monitoring logs, metrics, and alerts.
Implement automation tools (e.g., Ansible, SaltStack) to streamline system deployment, configuration, and management.
Collaborate with data science and engineering teams to design and implement optimized workflows for ML/AI workloads.
Provide technical guidance on Linux system administration best practices and standards.
Troubleshoot complex system issues and provide timely resolution.
Develop and maintain documentation of system configurations, procedures, and troubleshooting guides.
In-depth knowledge of Linux distributions (e.g., Ubuntu, CentOS), including kernel tuning, system configuration, and troubleshooting.
Experience with containerization using Docker and orchestration using Kubernetes.
Experience with configuration management tools (e.g., Ansible, SaltStack).
Excellent problem-solving skills, with the ability to work independently and as part of a team.
Strong communication and documentation skills.
It would be nice if you also had:
Experience with ML/AI frameworks and libraries (e.g., TensorFlow, PyTorch).
Knowledge of data storage solutions (e.g., HDFS, Ceph).
Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
Job Abstracts is an independent Job Search Engine. Job Abstracts is not an agent or representative and is not endorsed, sponsored or affiliated with any employer. Job Abstracts uses proprietary technology to keep the availability and accuracy of its job listings and their details. All trademarks, service marks, logos, domain names, and job descriptions are the property of their respective holder. Job Abstracts does not have its members apply for a job on the jobabstracts.com website. Additionally, Job Abstracts may provide a list of third-party job listings that may not be affiliated with any employer. Please make sure you understand and agree to the website's Terms & Conditions and Privacy Policies you are applying on as they may differ from ours and are not in our control.
We would like to take a second to Welcome You to Job Abstracts, the nation’s largest Pure Job Board. With over 3.1 million job listings from 15,000+ Companies & Organizations, we help job searchers find careers that match their interests. As an anonymous user, you have probably discovered how easy our system is to use. However, you have just scratched the surface of what we can offer.
We encourage you to Register so you can use our most powerful features: searching with multiple terms, setting up multiple locations, establishing favorite companies, and accessing your search history. If you find a job you like, you can apply directly for it, and then, keep notes on it. We will also keep a lookout for jobs that match your search terms and email you when we find something you may like.
You can register for free and the system is free to use. If you like our system so far, click on Register and unlock the power required by serious job searchers.
Any time you conduct a search, the system shows you job matches, ranked by their Relevance Score (RS).
The score is calculated by a proprietary algorithm that uses Intelligent Machine Learning.
The Relevance Score tells you how well the job opportunity matches your search term or terms.
When not logged in, the system is limited to one search term. Scores for single term matches are usually lower.
When you register, log in, and set up multiple terms prioritized by importance, the jobs found for you will receive a much higher Relevance Score.