Semester 1: Introduction to Data Engineering
This Professional Certificate is for anyone who wants to develop job-ready skills, tools, and a portfolio for an entry-level data engineer position. Throughout the self-paced online courses, you will immerse yourself in the role of a data engineer and acquire the essential skills you need to work with a range of tools and databases to design, deploy, and manage structured and unstructured data.
What you'll learn
—Create, design, and manage relational databases and apply database administration (DBA) concepts to RDBMSs such as MySQL, PostgreSQL, and IBM Db2.
—Implement ETL & Data Pipelines with Bash, Airflow & Kafka; architect, populate, deploy Data Warehouses; create BI reports & interactive dashboards.
Skills you'll gain
—Relational Database Management System (RDBMS)
—ETL & Data Pipelines
—NoSQL and Big Data
—Apache Spark
—SQL
—Data Science
—Database (DBMS)
—NoSQL
—Python Programming
*Data Analysis
*Pandas
*Nump
CURRICULUM:
—What is Data Engineering?
—The Data Engineering Ecosystem
—Data Engineering Lifecycle
—Career Opportunities and Data Engineering in Action
Semester 2: Python Project for Data Engineering
Showcase your Python skills in this Data Engineering Project! This short course is designed to apply your basic Python skills through the implementation of various techniques for gathering and manipulating data.
You will take on the role of a Data Engineer by extracting data from multiple sources, and converting the data into specific formats and making it ready for loading into a database for analysis. You will also demonstrate your knowledge of web scraping and utilizing APIs to extract data.
By the end of this hands-on project, you will have shown your proficiency with important skills to Extract Transform and Load (ETL) data using an IDE, and of course, Python Programming.
Upon completion of this course, you will also have a great new addition to your portfolio!
What you'll learn
—Demonstrate your skills in Python for working with and manipulating data
—Implement webscraping and use APIs to extract data with Python
—Play the role of a Data Engineer working on a real project to extract, transform, and load data
—Use Jupyter notebooks and IDEs to complete your project
Skills you'll gain
—Python Programming
—Information Engineering
—Extract Transform and Load (ETL)
—Data Engineer
—Web Scraping
CURRICULUM:
—Extract, Transform, Load(ETL)
—Final Project
Semester 3: Introduction to Relational Databases (RDBMS)
Are you ready to dive into the world of data engineering? In this beginner level course, you will gain a solid understanding of how data is stored, processed, and accessed in relational databases (RDBMSes). You will work with different types of databases that are appropriate for various data processing requirements.
You will begin this course by being introduced to relational database concepts, as well as several industry standard relational databases, including IBM DB2, MySQL, and PostgreSQL. Next, you’ll utilize RDBMS tools used by professionals such as phpMyAdmin and pgAdmin for creating and maintaining relational databases. You will also use the command line and SQL statements to create and manage tables.
This course incorporates hands-on, practical exercises to help you demonstrate your learning. You will work with real databases and explore real-world datasets. You will create database instances and populate them with tables and data.
What you'll learn
—Describe data, databases, relational databases, and cloud databases.
—Describe information and data models, relational databases, and relational model concepts (including schemas and tables).
—Explain an Entity Relationship Diagram and design a relational database for a specific use case.
—Develop a working knowledge of popular DBMSes including MySQL, PostgreSQL, and IBM DB2
Skills you'll gain
—Database (DB) Design
—Postgresql
—Relational Database Management System (RDBMS)
—Database Architecture
—MySQL
CURRICULUM:
—Relational Database Concepts
—Using Relational Databases
—MySQL and PostgreSQL
—Final Project and Assessment
Semester 4: Relational Database Administration (DBA)
Get started with Relational Database Administration and Database Management in this self-paced course!
This course begins with an introduction to database management; you will learn about things like the Database Management Lifecycle, the roles of a Database Administrator (DBA) as well as database storage. You will then discover some of the activities, techniques, and best practices for managing a database.
You will also learn about database optimization, including updating statistics, slow queries, types of indexes, and index creation and usage. You will learn about configuring and upgrading database server software and related products. You’ll also learn about database security; how to implement user authentication, assign roles, and assign object-level permissions. And gain an understanding of how to perform backup and restore procedures in case of system failures.
What you'll learn
—Create, query, and configure databases and access and build system objects such as tables.
—Perform basic database management including backing up and restoring databases as well as managing user roles and permissions.
—Monitor and optimize important aspects of database performance.
—Troubleshoot database issues such as connectivity, login, and configuration and automate functions such as reports, notifications, and alerts.
Skills you'll gain
—Database Security
—Database (DBMS)
—Database Servers
—database administration
—Relational Database
CURRICULUM
—Introduction to Database Management
—Monitoring and Optimization
—Troubleshooting & Automation
Semester 5: ETL and Data Pipelines with Shell, Airflow and Kafka
Delve into the two different approaches to converting raw data into analytics-ready data. One approach is the Extract, Transform, Load (ETL) process. The other contrasting approach is the Extract, Load, and Transform (ELT) process. ETL processes apply to data warehouses and data marts. ELT processes apply to data lakes, where the data is transformed on demand by the requesting/calling application.
In this course, you will learn about the different tools and techniques that are used with ETL and Data pipelines. Both ETL and ELT extract data from source systems, move the data through the data pipeline, and store the data in destination systems. During this course, you will experience how ELT and ETL processing differ and identify use cases for both. You will identify methods and tools used for extracting the data, merging extracted data either logically or physically, and for loading data into data repositories.
You will also define transformations to apply to source data to make the data credible, contextual, and accessible to data users. You will be able to outline some of the multiple methods for loading data into the destination system, verifying data quality, monitoring load failures, and the use of recovery mechanisms in case of failure.
What you'll learn
—Describe and contrast Extract, Transform, Load (ETL) processes and Extract, Load, Transform (ELT) processes.
—Explain batch vs concurrent modes of execution.
—Implement ETL workflow through bash and Python functions.
—Describe data pipeline components, processes, tools, and technologies.
Skills you'll gain
—Extract Transform and Load (ETL)
—Data Engineer
—Apache Kafka
—Apache Airflow
—Data Pipelines
CURRICULUM:
—Data Processing Techniques
—ETL & Data Pipelines: Tools and Techniques
—Building Data Pipelines using Airflow
—Building Streaming Pipelines using Kafka
Semester 6: Introduction to NoSQL Databases
Get started with NoSQL Databases with this beginner-friendly introductory course! This course will provide technical, hands-on knowledge of NoSQL databases and Database-as-a-Service (DaaS) offerings. With the advent of Big Data and agile development methodologies, NoSQL databases have gained a lot of relevance in the database landscape. Their main advantage is the ability to handle scalability and flexibility issues modern applications raise.
You will start this course by learning the history and the basics of NoSQL databases (document, key-value, column, and graph) and discover their key characteristics and benefits. You will learn about the four categories of NoSQL databases and how they differ. You’ll also explore the differences between the ACID and BASE consistency models, the pros and cons of distributed systems, and when to use RDBMS and NoSQL. You will also learn about vector databases, an emerging class of databases popular in AI.
Next, you will explore the architecture and features of several implementations of NoSQL databases, namely MongoDB, Cassandra, and IBM Cloudant. You will learn about the common tasks that they each perform and their key and defining characteristics.
What you'll learn
—Differentiate among the four main categories of NoSQL repositories.
—Describe the characteristics, features, benefits, limitations, and applications of the more popular Big Data processing tools.
—Perform common tasks using MongoDB tasks including create, read, update, and delete (CRUD) operations.
—Execute keyspace, table, and CRUD operations in Cassandra.
Skills you'll gain
—Cloud Database
—Mongodb
—Cassandra
—NoSQL
—Cloudant
CURRICULUM:
—Introducing NoSQL
—Introducing MongoDB: An Open-Source NoSQL Database
—Introducing Apache Cassandra: An Open-Source NoSQL Database
—Final Assignment: Working with NoSQL Database
EXTRA BI Dashboards with IBM Cognos Analytics and Google Looker
CURRICULUM:
—IBM Cognos Analytics for Data Analysis and Visuals
—Data Visuals and Dashboard with Google looker Studio
—Final Project and Course Wrap-Up