diverse sources, transform it into usable formats, and load it into data warehouses, data lakes or lakehouses. Big Data Technologies: Utilize big data technologies such as Spark, Kafka, and Flink for distributed data processing and analytics. Cloud Platforms: Deploy and manage data solutions on cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP), leveraging cloud-native services More ❯
diverse sources, transform it into usable formats, and load it into data warehouses, data lakes or lakehouses. Big Data Technologies: Utilize big data technologies such as Spark, Kafka, and Flink for distributed data processing and analytics. Cloud Platforms: Deploy and manage data solutions on cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP), leveraging cloud-native services More ❯
with a focus on data quality and reliability. Design and manage data storage solutions, including databases, warehouses, and lakes. Leverage cloud-native services and distributed processing tools (e.g., ApacheFlink, AWS Batch) to support large-scale data workloads. Operations & Tooling Monitor, troubleshoot, and optimize data pipelines to ensure performance and cost efficiency. Implement data governance, access controls, and security … pipelines and data architectures. Hands-on expertise with cloud platforms (e.g., AWS) and cloud-native data services. Comfortable with big data tools and distributed processing frameworks such as ApacheFlink or AWS Batch. Strong understanding of data governance, security, and best practices for data quality. Effective communicator with the ability to work across technical and non-technical teams. Additional … following prior to applying to GSR? Experience level, applicable to this role? Select How many years have you designed, built, and operated stateful, exactly once streaming pipelines in ApacheFlink (or an equivalent framework such as Spark Structured Streaming or Kafka Streams)? Select Which statement best describes your hands on responsibility for architecting and tuning cloud native data lake More ❯
Arlington, Virginia, United States Hybrid / WFH Options
Full Visibility LLC
ETL tools and data workflow orchestration (e.g., Apache Airflow, Luigi, Prefect) Strong programming skills in Python, SQL, or Scala Experience with open-source data processing tools (e.g., Kafka, Spark, Flink, Hadoop) Familiarity with database technologies (PostgreSQL, MySQL, or NoSQL solutions) Ability to work in a fast-paced environment with large-scale datasets Preferred: • Experience with forensic data processing or More ❯
Head of Data & Analytics Architecture and AI page is loaded Head of Data & Analytics Architecture and AI Apply locations Chiswick Park time type Full time posted on Posted 30+ Days Ago job requisition id JR19765 Want to help us bring More ❯
Experience working in environments with AI/ML components or interest in learning data workflows for ML applications . Bonus if you have e xposure to Kafka, Spark, or Flink . Experience with data compliance regulations (GDPR). What you can expect from us: Opportunity for annual bonuses Medical Insurance Cycle to work scheme Work from home and wellbeing More ❯
are recognised by industry leaders like Gartner's Magic Quadrant, Forrester Wave and Frost Radar. Our tech stack: Superset and similar data visualisation tools. ETL tools: Airflow, DBT, Airbyte, Flink, etc. Data warehousing and storage solutions: ClickHouse, Trino, S3. AWS Cloud, Kubernetes, Helm. Relevant programming languages for data engineering tasks: SQL, Python, Java, etc. What you will be doing More ❯
Experience working in environments with AI/ML components or interest in learning data workflows for ML applications . Bonus if you have e xposure to Kafka, Spark, or Flink . Experience with data compliance regulations (GDPR). What you can expect from us: Salary 65-75k Opportunity for annual bonuses Medical Insurance Cycle to work scheme Work More ❯
with demonstrated ability to solve complex distributed systems problems independently Experience building infrastructure for large-scale data processing pipelines (both batch and streaming) using tools like Spark, Kafka, ApacheFlink, Apache Beam, and with proprietary solutions like Nebius Experience designing and implementing large-scale data storage systems (feature stores, timeseries DBs) for ML use cases, with strong familiarity with More ❯
generation (SSG) in Next.js Experience with testing frameworks like Jest, Cypress, or React Testing Library. Experience with authentication strategies using OAuth, JWT, or Cognito Familiarity with Apache Spark/Flink for real-time data processing is an advantage. Hands-on experience with CI/CD tools Commercial awareness and knowledge of public sector. Excellent communicator, able to interact with More ❯
Out in Science, Technology, Engineering, and Mathematics
challenges of dealing with large data sets, both structured and unstructured Used a range of open source frameworks and development tools, e.g. NumPy/SciPy/Pandas, Spark, Kafka, Flink Working knowledge of one or more relevant database technologies, e.g. Oracle, Postgres, MongoDB, ArcticDB. Proficient on Linux Advantageous: An excellent understanding of financial markets and instruments An understanding of More ❯
to cross-functional teams, ensuring best practices in data architecture, security and cloud computing Proficiency in data modelling, ETL processes, data warehousing, distributed systems and metadata systems Utilise ApacheFlink and other streaming technologies to build real-time data processing systems that handle large-scale, high-throughput data Ensure all data solutions comply with industry standards and government regulations … not limited to EC2, S3, RDS, Lambda and Redshift. Experience with other cloud providers (e.g., Azure, GCP) is a plus In-depth knowledge and hands-on experience with ApacheFlink for real-time data processing Proven experience in mentoring and managing teams, with a focus on developing talent and fostering a collaborative work environment Strong ability to engage with More ❯
Hands-on experience with SQL, Data Pipelines, Data Orchestration and Integration Tools Experience in data platforms on premises/cloud using technologies such as: Hadoop, Kafka, Apache Spark, ApacheFlink, object, relational and NoSQL data stores. Hands-on experience with big data application development and cloud data warehousing (e.g. Hadoop, Spark, Redshift, Snowflake, GCP BigQuery) Expertise in building data More ❯
Reston, Virginia, United States Hybrid / WFH Options
CGI
leveraging S3, Redshift, AWS Glue, EMR, Azure Data Lake, and Power BI to deliver secure, high-performance solutions and self-service BI ecosystems. Skilled in leveraging Apache Airflow, ApacheFlink and other Data tools Experienced in distributed data compute architecture using Apache Spark and PySpark. Education: Bachelor's degree in computer science, Information Systems or related field CGI is More ❯
PySpark Experience deploying and maintaining Cloudera or Apache Spark clusters Experience designing and maintaining Data Lakes or Data Lakehouses Experience with big data tools such as Spark, NiFi, Kafka, Flink, or at multi-petabyte scale Experience in designing and maintaining ETL or ELT data pipelines utilizing storage, serialization formats, schemas, such as Parquet and Avro Experience administrating and maintaining More ❯
Java, C, C++ for distributed systems, with proficiency in networking, multi-threading and implementation of REST APIs Experience with the Spring framework, messaging frameworks (Kafka, RabbitMQ), streaming analytics (ApacheFlink, Spark), management of containerized applications (Kubernetes). Experience with Enabling tools (Git, Maven, Jira), DevOps (Bamboo, Jenkins, GitLab Cl/Pipelines), Continuous Monitoring (ELK Stack (ElasticSearch, Logstash and Kibana More ❯
learning techniques and the key parameters that affect their performance 2+ years of experience with Big Data programming technologies, including Hadoop Distributed File System ( HDFS ) , Apache Spark, or ApacheFlink 2+ years of experience working with a wide range of predictive and decision models and tools for developing models Experience in natural language processing topics, including tagging, syntactic parsing More ❯
non-technical stakeholders • A background in software engineering, MLOps, or data engineering with production ML experience Nice to have: • Familiarity with streaming or event-driven ML architectures (e.g. Kafka, Flink, Spark Structured Streaming) • Experience working in regulated domains such as insurance, finance, or healthcare • Exposure to large language models (LLMs), vector databases, or RAG pipelines • Experience building or managing More ❯
deploying and integrating containerized software applications using container orchestration platforms, including Kubernetes 2+ years of experience implementing event-driven or streaming architectures leveraging Kafka, Amazon SNS, RedPanda, or ApacheFlink 2+ years of experience with running, troubleshooting, and debugging applications on Linux systems 1+ year of experience with building or maintaining production-grade RESTful APIs or software interfaces Experience More ❯
with big data technologies ( e.g. , Spark, Hadoop)Background in time-series analysis and forecastingExperience with data governance and security best practicesReal-time data streaming is a plus (Kafka, Beam, Flink)Experience with Kubernetes is a plusEnergy/maritime domain knowledge is a plus What We Offer Competitive salary commensurate with experience and comprehensive benefits package (medical, dental, vision) Significant More ❯
in data processing and reporting. In this role, you will own the reliability, performance, and operational excellence of our real-time and batch data pipelines built on AWS, ApacheFlink, Kafka, and Python. You'll act as the first line of defense for data-related incidents , rapidly diagnose root causes, and implement resilient solutions that keep critical reporting systems … on-call escalation for data pipeline incidents, including real-time stream failures and batch job errors. Rapidly analyze logs, metrics, and trace data to pinpoint failure points across AWS, Flink, Kafka, and Python layers. Lead post-incident reviews: identify root causes, document findings, and drive corrective actions to closure. Reliability & Monitoring Design, implement, and maintain robust observability for data … batch environments. Architecture & Automation Collaborate with data engineering and product teams to architect scalable, fault-tolerant pipelines using AWS services (e.g., Step Functions , EMR , Lambda , Redshift ) integrated with ApacheFlink and Kafka . Troubleshoot & Maintain Python -based applications. Harden CI/CD for data jobs: implement automated testing of data schemas, versioned Flink jobs, and migration scripts. Performance More ❯
Vortexa is a fast-growing international technology business founded to solve the immense information gap that exists in the energy industry. By using massive amounts of new satellite data and pioneering work in artificial intelligence, Vortexa creates an unprecedented view More ❯
Baltimore, Maryland, United States Hybrid / WFH Options
OneMain Financial
as but not limited to: Python, Typescript, Scala, SQL. 5 years of hands-on cloud computing experience in AWS. Deep functional experience with EKS, Aurora, MSK, DBT, Airflow, and Flink is a strong plus. In-depth RDBMS development experience (e.g., PostgreSQL, MySQL, Aurora). Experienced in designing, implementing and CI/CD pipelines and Infrastructure-as-Code. Experience with More ❯
as but not limited to: Python, Typescript, Scala, SQL. 5 years of hands-on cloud computing experience in AWS. Deep functional experience with EKS, Aurora, MSK, DBT, Airflow, and Flink is a strong plus. In-depth RDBMS development experience (e.g., PostgreSQL, MySQL, Aurora). Experienced in designing, implementing and CI/CD pipelines and Infrastructure-as-Code. Experience with More ❯