Data Engineer (Pyspark)

Post Date

Dec 10, 2025

Location

Hillandale,
Maryland

ZIP/Postal Code

20903

Job Type

Contract

Job Description

• Responsible for developing, expanding and optimizing our data and data pipeline architecture, as well as optimizing data flow and collection for cross functional teams.
• Support software developers, database architects, data analysts and data scientists on data initiatives and ensure optimal data delivery architecture is consistent throughout ongoing projects.
• Creates new pipeline and maintains existing pipeline, updates Extract, Transform, Load (ETL) process, creates new ETL feature , builds PoCs with Redshift Spectrum, Databricks, AWS EMR, SageMaker, etc.;
• Implements, with support of project data specialists, large dataset engineering: data augmentation, data quality analysis, data analytics (anomalies and trends), data profiling, data algorithms, and (measure/develop) data maturity models and develop data strategy recommendations.
• Operate large-scale data processing pipelines and resolve business and technical issues pertaining to the processing and data quality.
• Assemble large, complex sets of data that meet non-functional and functional business requirements
• Identify, design, and implement internal process improvements including re-designing data infrastructure for greater scalability, optimizing data delivery, and automating manual processes ?
• Building required infrastructure for optimal extraction, transformation and loading of data from various data sources using AWS and SQL technologies
• Building analytical tools to utilize the data pipeline, providing actionable insight into key business performance metrics including operational efficiency and customer acquisition?
• Working with stakeholders including data, design, product and government stakeholders and assisting them with data-related technical issues
• Write unit and integration tests for all data processing code.
• Work with DevOps engineers on CI, CD, and IaC.
• Read specs and translate them into code and design documents.
• Perform code reviews and develop processes for improving code quality.
• Perform other duties as assigned.

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Required Skills & Experience

• Minimum of 8 years of previous Data Engineer or hands on software development experience with at least 4 of those years using Python, Java and cloud technologies for data pipelining.
• A Bachelor’s degree in Computer Science, Information Systems, Engineering, Business, or other related scientific or technical discipline. With ten years of general information technology experience and at least eight years of specialized experience, a degree is NOT required.
• Expert data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up.
• Self-sufficient and comfortable supporting the data needs of multiple teams, systems, and products.
• Experienced in designing data architecture for shared services, scalability, and performance
• Experienced in designing data services including API, meta data, and data catalogue.
• Experienced in data governance process to ingest (batch, stream), curate, and share data with upstream and downstream data users.
• Ability to build and optimize data sets, ‘big data’ data pipelines and architectures?
• Ability to perform root cause analysis on external and internal processes and data to identify opportunities for improvement and answer questions?
• Excellent analytic skills associated with working on unstructured datasets?
• Ability to build processes that support data transformation, workload management, data structures, dependency and metadata
• Demonstrated understanding and experience using software and tools including big data tools like Spark and Hadoop; relational databases including MySQL and Postgres; workflow management and pipeline tools such as Apache Airflow, and AWS Step Function; AWS cloud services including Redshift, RDS, EMR and EC2; stream-processing systems like Spark-Streaming and Storm; and object function/object-oriented scripting languages including Scala, Java and Python.?
• Flexible and willing to accept a change in priorities as necessary.
• Ability to work in a fast-paced, team-oriented environment
• Experience with Agile methodology, using test-driven development.
• Experience with GitHub and Atlassian Jira/Confluence.
• Excellent command of written and spoken English.

60-75H
Exact compensation may vary based on several factors, including skills, experience, and education.
Employees in this role will enjoy a comprehensive benefits package starting on day one of
employment, including options for medical, dental, and vision insurance. Eligibility to enroll in
the 401(k) retirement plan begins after 90 days of employment. Additionally, employees in this
role will have access to paid sick leave and other paid time off benefits as required under the
applicable law of the worksite location.

Nice to Have Skills & Experience

• Databricks Certification, Google’s Certified Professional-Data-Engineer certification, IBM Certified Data Engineer – Big Data certification, CCP Data Engineer for Cloudera
• Experience with healthcare quality data including Medicaid and CHIP provider data, beneficiary data, claims data, and quality measure data.

Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.