Design and implement scalable and reliable data pipelines to ingest, process, and store diverse data at scale, using technologies such as Databricks, Apache Spark, Hadoop, and Kafka.
Work within cloud environments like AWS (preferred) or Azure to leverage services including but not limited to EC2, RDS, S3, Lambda, and Azure Data Lake for efficient data handling and processing.
Develop and optimize data models and storage solutions (SQL, NoSQL, Data Lakes) to support operational and analytical applications, ensuring data quality and accessibility.
Utilize ETL tools and frameworks (e.g., Apache Airflow, Talend) to automate data workflows, ensuring efficient data integration and timely availability of data for analytics.
Implement pipelines with a high degree of automation
Collaborate closely with data scientists, providing the data infrastructure and tools needed for complex analytical models, leveraging Python or R for data processing scripts.
Ensure compliance with data governance and security policies, implementing best practices in data encryption, masking, and access controls within a cloud environment.
Monitor and troubleshoot data pipelines and databases for performance issues, applying tuning techniques to optimize data access and throughput.
Stay abreast of emerging technologies and methodologies in data engineering, advocating for and implementing improvements to the data ecosystem.
We are a company committed to creating inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity employer that believes everyone matters. Qualified candidates will receive consideration for employment opportunities without regard to race, religion, sex, age, marital status, national origin, sexual orientation, citizenship status, disability, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to
Human Resources Request Form. The EEOC "Know Your Rights" Poster is available
here.
To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy:
https://insightglobal.com/workforce-privacy-policy/ .
Bachelor's or masters degree in computer science, MIS, or other business discipline or equivalent combination of education and/or experience.
8+ years of experience in data engineering, with a proven track record in designing and operating large-scale data pipelines and architectures.
Expertise in developing ETL/ELT workflows.
Fluent in Infrastructure-as-code paradigms (e.g. Terraform)
Comprehensive knowledge of platforms and services like Databricks, Dataiku, and AWS native
data offerings
Solid experience with big data technologies (Databricks, Apache Spark, Hadoop, Kafka) and cloud services (AWS, Azure) related to data processing and storage.
Strong experience in AWS (preferred) and Azure cloud services, with hands-on experience in integrating cloud storage and compute services with Databricks.
Proficient in SQL and programming languages relevant to data engineering (Python, Java, Scala).
Hands on RDBMS experience (data modeling, analysis, programming, stored procedures)
Familiarity with machine learning model deployment and management practices is a plus.
Fluent with CI/CD workflows & automation
Strong communication skills, capable of collaborating effectively across technical and non-technical teams.
Databricks Certified Associate Developer for Apache Spark, AWS Certified Solutions Architect, or other relevant certifications.
Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.