Big Data Engineer
Creating actionable insights from the information it holds is a key business enabler for Sainsbury’s Argos and Argos IT is pioneering platforms and solutions that deliver on this vision. Feeding insights gleaned from big data into the decision making process can drive a retail business in promising new ways.
Creating a single view of what our customers want from us can streamline the customer journey and help ensure that products are available, dynamically priced, and that stock is optimally managed. Building and maintaining the platforms and tools that facilitate this is a challenging undertaking in a diverse, ever evolving enterprise.
Big Data Engineers are expected to lead in some areas and are required to understand and, in many cases, contribute to the big picture of our big data journey.
Central to this is the establishment of a Hadoop cluster and an associated ecosystem of tools. The Hadoop cluster will provide a basis for a number of jobs, ranging from traditional batch, real time analytics and machine learning.
What you’ll be doing
* Assisting in the build, maintenance, enhancement and support of big data platforms, to include Hadoop and Kafka
* Working with service providers, such as providers of on premise data centres, to ensure that platforms remain operational, performant, are sized for current and anticipated workloads, and are generally fit for purpose
* Knowledge Transfer with third party, offshore 24x7 support and ops teams
* Defining and adhering to cyber security guidelines for data at rest and in transiInvolvement in security audits as appropriate
* Identifying data of value, working with data stewards within the business to help determine its format and veracity, cleanse and transform, and make usable for big data jobs
* Working to define and develop strategies for ingesting data into the big data pipeline from various sources within the business, utilising existing services and middleware in some cases
* Working within engineering to define and schedule jobs for big data analysis, optimising jobs within the cluster and ensuring that business users have access to the information they need at all times
* Development of big data applications, creation of modules for data ingestion, transformation, batch and real time data streaming
* Working in an agile environment and able to accommodate changing requirements
* Working on integration projects across Sainsbury’s Argos that include information sharing and making information accessible in a secure way within a governance framework
* Apache Hadoop
* Cloudera Hadoop and its ecosystem - HDFS, Yarn, HBase, Hive, Impala, Apache Kylin
* Big data distributed, in-memory technologies, such as Spark (2.x)
* Distributed, scalable message brokers, in particular Apache Kafka
* Experience of data ingestion in a big data environment, including third party tooling, e.g. Talend, Pentaho
* Know how in traditional ETL activities
* Knowledge of batch processing, Map Reduce, and export into traditional data warehouses, SQL or No SQL, such as Mongo DB
* Appreciation of machine learning techniques and strategies, ML algorithms, training data sets, and supervised learning
* Knowledge of programming languages such as Java, Python, Scala* Knowledge of resource managers in a microservices environment such as Mesos\Marathon
* Knowledge of microservice architectural design patterns
* Knowledge of cloud computing, specifically AWS
* Knowledge of devops practices – continuous development, Docker, Jenkins