LLM Data Engineer | United States | Fully Remote Job at Halo Media, Florida, FL

ZVZrajZ3K3J4aWtOdWlEQzlyZDlNa1hDVXc9PQ==
  • Halo Media
  • Florida, FL

Job Description

We are seeking an experienced AI/LLM Data Engineer to build and maintain the data pipeline for our Generative AI platform. The ideal candidate will be well-versed in the latest Large Language Model (LLM) technologies and have a strong background in data engineering, with a focus on Retrieval-Augmented Generation (RAG) and knowledge-base techniques.  This role sits in the AI COE within DX Tech & Digital. As a AI/LLM Data Engineer (you will report into the Director, AI Solutions & Development who oversees the AI COE.  You will work on highly visible strategic projects, collaborating with cross-functional teams  to define requirements and deliver high-quality AI solutions.  The ideal candidate will have a passion for Generative AI and LLMs, with a proven track record of delivering innovative AI applications. Responsibilities  • Design, implement, and maintain an end-to-end multi-stage data pipeline for LLMs, including Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) data processes  • Identify, evaluate, and integrate diverse data sources and domains to support the Generative AI platform  • Develop and optimize data processing workflows for chunking, indexing, ingestion, and vectorization for both text and non-text data  • Benchmark and implement various vector stores, embedding techniques, and retrieval methods  • Create a flexible pipeline supporting multiple embedding algorithms, vector stores, and search types (e.g., vector search, hybrid search)  • Implement and maintain auto-tagging systems and data preparation processes for LLMs  • Develop tools for text and image data crawling, cleaning, and refinement  • Collaborate with cross-functional teams to ensure data quality and relevance for AI/ML models  • Work with data lake house architectures to optimize data storage and processing  • Integrate and optimize workflows using Snowflake and various vector store technologies  • Master's degree in Computer Science, Data Science, or a related field  • 3-5 years of work experience in data engineering, preferably in AI/ML contexts  • Proficiency in Python, JSON, and related tools  • Strong understanding of LLM architectures, training processes, and data requirements  • Experience with RAG systems, knowledge base construction, and vector databases  • Familiarity with embedding techniques, similarity search algorithms, and information retrieval concepts  • Hands-on experience with data cleaning, tagging, and annotation processes (both manual and automated)  • Knowledge of data crawling techniques and associated ethical considerations  • Strong problem-solving skills and ability to work in a fast-paced, innovative environment  • Familiarity with Snowflake and its integration in AI/ML pipelines  • Experience with various vector store technologies and their applications in AI  • Understanding of data lakehouse concepts and architectures  • Excellent communication, collaboration, and problem-solving skills.  • Ability to translate business needs into technical solutions.  • Passion for innovation and a commitment to ethical AI development.  • Experience building LLMs pipeline using framework like LangChain, LlamaIndex, Semantic Kernel, OpenAI functions. • Familiar with different LLM parameters like temperate, top-k, and repeat penalty, and different LLM outcome evaluation data science metrics and methodologies.  Preferred Skills Experience with popular LLM/ RAG frameworks Familiarity with distributed computing platforms (e.g., Apache Spark, Dask)  Knowledge of data versioning and experiment tracking tools  Experience with cloud platforms (AWS, GCP, or Azure) for large-scale data processing  Understanding of data privacy and security best practices  Practical experience implementing data lakehouse solutions  Proficiency in optimizing queries and data processes in Snowflake or Databricks Hands-on experience with different vector store technologies ~ US employees benefit package.

Job Tags

Full time, Work experience placement, Flexible hours,

Similar Jobs

Confluence Technologies

Information Security Analyst (GRC SOC 2) Job at Confluence Technologies

 ...Information Security Analyst (GRC SOC 2) Location: United States - REMOTE Why Confluence? Over the past several years we have undergone a great deal of positive change and growth to become the company we are today. Our global footprint now spans multiple countries... 

The County Line

Server Job at The County Line

 ...unique and legendary as the food we serve. Server Job Responsibilities / Duties Follow...  ..., closing duties and side work On time and attends all shift meetings Follow...  ...guide guest through our ordering and payment process Take orders from guests for food and beverages... 

Good Samaritan

CNA - Weekend Warrior - FT Days Job at Good Samaritan

 ...NE Address: 1222 S 7th St, Albion, NE 68620, USA Shift: Weekend Job Schedule: Full time Weekly Hours: 36.00 Salary Range...  ...care preferred. Required Certified Nursing Assistant (CNA) certification with the State Board of Nursing, or state... 

TELUS International AI Inc.

Search Engine Evaluator BRAZIL Job at TELUS International AI Inc.

 ...better opportunity to reach your dreams. Keep your head up, Join our team today!#remotecareers #workfromhome #freelance Search Engine Evaluator BRAZIL The Role - Search Engine Evaluator Do you spend a lot of time on the internet and would like to get paid for... 

Spherion

PRODUCTION WORKER Job at Spherion

 ...additional hours that fit YOUR schedule- Temp to hire position in 90 days (meeting hiring...  ...)- $18 per hour to start - depending on experience- Promotional and training...  ...safetySpherion MA is the TOP RATED staffing agency in our area!Visit us, call or text today...