- Experience: 6+ years
- Project: Offshore
- Mandatory Skills: LLM, NLP, MLOps, Python, Machine Learning, Databricks, Terraform
- Salary– 75000
Job Overview:
We are seeking an experienced Machine Learning Engineer with expertise in Databricks, AWS, Large Language Models (LLMs), deployment, and model monitoring. This role focuses on developing, optimizing, deploying, and monitoring ML models at scale, ensuring high performance, reliability, and seamless integration with production systems. The ideal candidate should have a strong background in LLM fine-tuning, MLOps, cloud-based AI infrastructure, and model monitoring to detect drift, degradation, and tuning opportunities.
Key Responsibilities
Model Development & Optimization
- Design, train, and fine-tune LLMs (GPT, LLaMA, Falcon, Mistral, etc.) on large-scale datasets.
- Implement feature engineering, model optimization, and hyperparameter tuning to enhance performance.
- Utilize Databricks MLFlow for experiment tracking, model versioning, and performance monitoring.
- Work with transformer architectures, embeddings, and retrieval-augmented generation (RAG) techniques.
- Optimize models for inference efficiency using quantization, pruning, and distillation techniques.
Model Deployment & MLOps
- Deploy ML models using AWS SageMaker, Databricks Model Serving, and Kubernetes-based solutions.
- Build and manage scalable inference pipelines for real-time and batch processing.
- Automate ML workflows using CI/CD pipelines, Terraform, and Databricks Workflows.
- Implement automated retraining pipelines based on performance monitoring metrics.
- Ensure model reproducibility and governance by tracking datasets, code, and model artifacts.
Model Monitoring & Drift Detection
- Implement real-time model monitoring to track inference performance, latency, and errors.
- Develop model drift detection pipelines to identify shifts in data distributions and feature importance.
- Set up alerting systems for data drift, concept drift, and model degradation.
- Use tools like Evidently AI, WhyLabs, Arize AI, Databricks ML Monitoring, or SageMaker Model Monitor for continuous model evaluation.
- Design and implement feedback loops to improve model accuracy over time.
Cloud & Infrastructure
- Design and implement ML infrastructure using AWS services (S3, SageMaker, Lambda, Bedrock, ECS, EKS).
- Utilize Databricks for distributed training, data processing, and collaborative model development.
- Optimize GPU/TPU-based training and inference for large-scale ML workloads.
- Ensure security, compliance, and access control for deployed AI systems.
Collaboration & Research
- Work closely with data engineers, product teams, and researchers to develop AI-driven solutions.
- Stay up-to-date with advancements in LLMs, generative AI, and deep learning frameworks (PyTorch, TensorFlow, Hugging Face).
- Contribute to research and experimentation in multi-modal AI, embeddings, and reinforcement learning.
Job Category: Development
Job Type: Full Time
Job Location: Remote Job