Data Engineer

Post Date

Jun 08, 2026

Location

Houston,
Texas

ZIP/Postal Code

77079

Job Type

Contract-to-perm

Job Description

• Design, build, and maintain data ingestion pipelines from SQL Server into Azure Databricks using CDC-based and batch patterns
• Implement and manage medallion architecture (Bronze, Silver, Gold) using Delta Lake and Unity Catalog
• Write and optimize complex T-SQL queries, stored procedures, and CDC configurations on Microsoft SQL Server
• Develop PySpark and Spark SQL transformations for large-scale data processing and curated analytical layers
• Build and maintain vector database pipelines that transform structured and unstructured source data into embeddings for downstream AI and search applications
• Collaborate with BI and analytics teams to deliver curated data models, dashboards, and AI/BI Genie experiences
• Configure and troubleshoot connectivity between SQL Server, Databricks, and third-party connector applications
• Monitor pipeline health, implement alerting, and resolve data quality and performance issues proactively
• Maintain technical documentation for pipelines, schemas, and architectural decisions

Pay Rate - 70-80hr

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Required Skills & Experience

SQL Server (MSSQL)
• 3+ years of hands-on experience with Microsoft SQL Server (2016 or later)
• Proficiency in T-SQL including CTEs, window functions, dynamic SQL, and query optimization
• Experience configuring and managing Change Data Capture (CDC) for incremental data extraction
• Ability to read execution plans; experience with indexing strategies and statistics management
• Working knowledge of isolation levels, locking behavior, and blocking resolution (RCSI, snapshot isolation)
• Understanding of replication topologies and their impact on downstream pipeline design
Azure Databricks
• 2+ years building notebooks, jobs, and workflows in Azure Databricks
• Hands-on experience implementing Bronze/Silver/Gold medallion architecture using Delta Lake
• Proficiency in PySpark and Spark SQL for large-scale data transformation
• Experience with Unity Catalog for governance and access control
• Experience implementing CDC-based incremental pipelines using watermarks or Delta merge patterns
• Familiarity with cluster configuration, compute management, and job scheduling
Data integration & connectors
• Experience configuring JDBC/ODBC connectivity between SQL Server and cloud compute platforms
• Familiarity with one or more connector/orchestration platforms (Azure Data Factory, Fivetran, dbt, or similar)
• Understanding of Azure networking (VNet peering, NSGs, private endpoints)
• Experience with secret management using Azure Key Vault and Databricks secret scopes
Vector databases & AI infrastructure
• Experience designing and managing vector databases (Pinecone, Weaviate, Chroma, pgvector, or similar)
• Understanding of embedding models and how to generate, store, and query vector representations of data
• Familiarity with similarity search concepts (cosine similarity, ANN indexing such as HNSW or IVF)
• Experience integrating vector stores into retrieval pipelines (RAG patterns, semantic search, or recommendation systems)
General engineering
• Proficiency in Python for pipeline development and automation
• Version control experience using Git
• Strong written communication skills; ability to produce and maintain technical documentation
• Comfortable working within Azure cloud environments

Nice to Have Skills & Experience

• Experience with Tableau, Power BI, or similar BI tools as a downstream data consumer
• Familiarity with ERP source systems (field service management, inventory, or financial platforms)
• Experience with dbt for transformation layer management
• Knowledge of cloud cost optimization strategies for Databricks compute
• Microsoft Certified: Azure Data Engineer Associate or equivalent certification
• Exposure to LLM orchestration frameworks such as LangChain or LlamaIndex

Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.