ingestion pipelines. Requirements: Proven experience working with Python or Java or C# Experience working with ELT/ELT technologies such as Airflow, Argo, Dagster, Spark, Hive Strong technical expertise, especially in data processing and exploration, with a willingness to learn new technologies. A passion for automation and driving continual more »
technology to automate data pipelines and build analytical warehouses· Deep understanding of cloud-based data platforms (Azure SQL DB, Azure Synapse, ADLS, AWS, Hadoop, Spark, Snowflake, No-SQL etc).· Proficient scripting in programming languages such as Java, Python, Scala· Expert in SQLMachine Learning· Good basic understanding of the … reference to the required data structures, formats and hygiene.· Au fair with how current machine learning tools and platforms (Python, IBM, Knime, Github, R, Spark, Weka, Amazon ML, Azure ML etc) integrate as part of the analytical ecosystem· Experience with the tools used to validate, monitor and deploy ML more »
Maidstone, Kent, United Kingdom Hybrid / WFH Options
Worley
technology to automate data pipelines and build analytical warehouses· Deep understanding of cloud-based data platforms (Azure SQL DB, Azure Synapse, ADLS, AWS, Hadoop, Spark, Snowflake, No-SQL etc).· Proficient scripting in programming languages such as Java, Python, Scala· Expert in SQLMachine Learning· Good basic understanding of the … reference to the required data structures, formats and hygiene.· Au fair with how current machine learning tools and platforms (Python, IBM, Knime, Github, R, Spark, Weka, Amazon ML, Azure ML etc) integrate as part of the analytical ecosystem· Experience with the tools used to validate, monitor and deploy ML more »
Luton, England, United Kingdom Hybrid / WFH Options
Ventula Consulting
science and analytics team in deploying pipelines. Coach and mentor the team to improve development standards. Key requirements: Strong hands-on experience with Databricks, Spark, SQL or Scala. Proven experience designing and building data solutions on a cloud based, big data distributed system (AWS/Azure etc.) Hands-on … models and following best practices. The Ability to develop pipelines using SageMaker, MLFlow or similar frameworks. Strong experience with data programming frameworks such as Apache Spark. Understanding of common Data Science and Machine Learning models, libraries and frameworks. This role provides a competitive salary plus excellent benefits package. In more »
data platform from a legacy system to one based on AWS EMR, with Amazon RDS and DynamoDB ingestion converted to Parquet files, interrogatable through Spark and MapReduce. This modern platform will support rapid data insight generation, data experiments for new product development, our live Machine Learning solutions and live … to-target mappings) to testing and service optimisation.) Good familiarity with our developing key services/applications - AmazonRDS, Amazon DynamoDB, AWS Glue, MapReduce, Hive, Spark, YARN, Airflow. Ability to work with a range of structured, semi-structured and unstructured file formats including Parquet, json, csv, pdf, jpg. Accomplished data more »
Complexio is Foundational AI. This works to automate business activities by ingesting whole company data – both structured and unstructured – and making sense of it. Using proprietary models and algorithms, Complexio forms a deep understanding of how humans are interacting and more »
in a technical and analytical role Experience of Data Lake/Hadoop platform implementation Hands-on experience in implementation and performance tuning Hadoop/Spark implementations Experience Apache Hadoop and the Hadoop ecosystem Experience with one or more relevant tools (Sqoop, Flume, Kafka, Oozie, Hue, Zookeeper, Hcatalog, Solr … Avro) Experience with one or more SQL-on-Hadoop technology (Hive, Impala, Spark SQL, Presto) Experience developing software code in one or more programming languages (Java, Python, etc.) Preferred Qualifications Masters or PhD in Computer Science, Physics, Engineering or Math Hands on experience leading large-scale global data warehousing more »
Python). Expert in key data engineering platforms such as Kafka or other streaming technologies, data lakes (AWS S3, Iceberg, Parquet), analytics technologies (Trinio, Spark), automation technologies (Airflow, ML Flow) and data governance (DataHub). People management and technical leadership experience. Are passionate about agile software delivery with a … Have an excellent working knowledge of AWS services (EMR, ECS, IAM, EC2, S3, DynamoDB, MSK). Our Technology Stack ð» Scala and Python Kafka, Spark, Kafka Streams, Kinesis, Akka and KSQL AWS, S3, Iceberg, Parquet, Glue and Spark/EMR for our Data Lake Elasticsearch, Dynamodb and Redis more »
improvements Key Skills 3+ years of Python experience Highly statistical and Analytical Exposure to Google Cloud Platform ( BigQuery, GCS, Datalab, Dataproc, Cloud ML (desirable) Spark & Hadoop experience Strong communication skills Good problem solving skills Qualifications Bachelor's degree or equivalent experience in a quantative field (Statistics, Mathematics, Computer Science … classification techniques, and algorithms Fluency in a programming language (Python, C,C++, Java, SQL) Familiarity with Big Data frameworks and visualization tools (Cassandra, Hadoop, Spark, Tableau) This is a permanent position, and offers flexibility with Hybrid working, 2-3 days per week in the office, depending on workload more »
Senior Data Scientist Mobysoft is one of the fastest growing SaaS providers in the UK and has been shortlisted in the "Top 50 fastest growing technology companies in the North" for four successive years. Mobysoft provides predictive analytical software that more »
Cognism is a market leader in international sales intelligence. Access to our premium data, has helped a wide variety of global revenue teams change their approach to prospecting, resulting in predictable and prosperous outcomes. Following multiple successful funding rounds and more »
Greater London, England, United Kingdom Hybrid / WFH Options
Hunter Bond
My client are looking for a talented and motivated Big Data Architect (Azure, Databricks, Spark) to be based in their London office. You'll be responsible for providing technical leadership in architecting and designing end-to-end solutions for the organisation's datalake initiatives, as they provide increasing numbers … improvements in design, processes, and implementation to improve operational management, scalability, and extensibility. The following skills/experience is essential: Strong implementation experience using Spark and Databricks Strong Cloud experience (ideally Azure) Previously heavily involved in an implementation programme Data Warehouse Strong stakeholder management experience Excellent IT background, ideally more »
develop innovative solutionsData engineering skills: Proficiency in designing, building, developing, and optimizing data pipelines, as well as experience with big data processing tools like ApacheSpark, Hadoop, and DataflowExperience in designing & operating Operational Datastore/Data Lake/Data Warehouse platforms at scale with high-availabilityData integration: Familiarity … with data integration tools and techniques, including ETL (Extract, Transform, Load) processes and real-time data streaming (e.g., using Apache Kafka, Kinesis, or Pub/Sub), exposing data sets via GraphQLCloud platforms expertise: Deep understanding of GCP/AWS services, architectures, and best practices, with hands-on experience in more »
at scale utilising the best breed of Cloud services and technologies. So, what tools and technologies will you be using? AWS Python Databricks/Spark Trino Airflow Docker CloudFormation/Terraform SQL/NoSQL We provide you with the opportunity to think freely and work creatively and right now … Other skills we are looking for you to demonstrate include: Experience of data storage technologies: Delta Lake, Iceberg, Hudi Sound knowledge and understanding of ApacheSpark, Databricks or Hadoop Ability to take business requirements and translate these into tech specifications Knowledge of Architecture best practices and patterns Competence more »
City Of Bristol, England, United Kingdom Hybrid / WFH Options
Anson McCade
and product development, encompassing experience in both stream and batch processing. Designing and deploying production data pipelines, utilizing languages such as Java, Python, Scala, Spark, and SQL. In addition, you should have proficiency or familiarity with: Scripting and data extraction via APIs, along with composing SQL queries. Integrating data more »
Bristol, England, United Kingdom Hybrid / WFH Options
Made Tech
and able to guide how one could deploy infrastructure into different environments. Knowledge of handling and transforming various data types (JSON, CSV, etc) with ApacheSpark, Databricks or Hadoop Good understanding of possible architectures involved in modern data system design (Data Warehouse, Data Lakes, Data Meshes) Ability to more »
Manchester, England, United Kingdom Hybrid / WFH Options
Made Tech
and able to guide how one could deploy infrastructure into different environments. Knowledge of handling and transforming various data types (JSON, CSV, etc) with ApacheSpark, Databricks or Hadoop Good understanding of possible architectures involved in modern data system design (Data Warehouse, Data Lakes, Data Meshes) Ability to more »
Google Cloud Professional Cloud Architect or Professional Cloud Developer certification Very Disrable to have hands-on experience with ETL tools, Hadoop-based technologies (e.g., Spark), and batch/streaming data pipelines (e.g., Beam, Flink etc) Proven expertise in designing and constructing data lakes and data warehouse solutions utilising technologies more »
science, Information Technology, or a related fieldExperience with containerisation and orchestration technologies (e.g., Docker, Kubernetes).Knowledge of big data technologies and frameworks (e.g., Hadoop, Spark).Familiarity with other cloud platforms (e.g. AWS, Google Cloud) and PaaS providers (e.g. Snowflake)Knowledge of Inner or Open Source paradigm and way of more »
science, Information Technology, or a related fieldExperience with containerisation and orchestration technologies (e.g., Docker, Kubernetes).Knowledge of big data technologies and frameworks (e.g., Hadoop, Spark).Familiarity with other cloud platforms (e.g. AWS, Google Cloud) and PaaS providers (e.g. Snowflake)Knowledge of Inner or Open Source paradigm and way of more »
Experience with data structures/algorithms, building Data Platforms, Data-lake and Business Intelligence solutions.Experience as a data engineer: implementing data pipelines (using PySpark, Spark SQL, Scala, etc), orchestration tools/services (i.e. Airflow, data factory) and testing frameworks.1 year plus of experience in a technical leadership roleExperience in more »
data lake/warehouse/hub built in GCP. You are confident using the full suite of Google data products, IAC, CI/CD, Spark and Kafka. Our core toolbox includes Google Cloud Big Data technologies, Scala, Java & Python, Jenkins amongst others. We value first principles reasoning to select more »
Engineering experience Demostrate In-depth knowledge of large-scale data platforms (Databricks, Snowflake) and cloud-native tools (Azure Synapse, RedShift) Experience of analytics technologies (Spark, Hadoop, Kafka) Have familiarity with Data Lakehouse architecture, SQL Server, DataOps, and data lineage concepts Demonstrate In-depth knowledge of large-scale data platforms more »
ETL/ELT tools.Experience with NoSQL type environments, Data Lakes, Lake-Houses (Cassandra, MongoDB or Neptune).Experience with distributed storage, processing engines such as Apache Hadoop and Apache Spark.Experience with message brokering/stream processing services such as Apache Kafka, Confluent, Azure Stream Analytics.Experience in Test Driven more »
objectives. So each team leverages the technology that fits their needs best. You’ll see us working with data processing/streaming frameworks like Apache Flink and Spark; Database technologies like MySQL, PostgreSQL, DynamoDB and Redis; and breaking things using in-house chaos principles and tools such as … latency, near real-time products: Java and Scala based Web Services, Databricks Data Lakes (Delta Lakes), AWS Kinesis and MSK, AWS ElasticSearch, AWS RDS, Apache Flink & Spark, scripting using Python, Terraform’s infrastructure as a code. The interview process Our interview aims to take a relaxed & practical approach more »