Home > Glossary > AWS Certified Cloud Practitioner > AWS Glue

📖 What is AWS Glue?

AWS Glue is a fully managed ETL (Extract, Transform, Load) service designed to discover, prepare, and integrate data for analytics. It provides a data catalog, automatically detects schemas, and generates ETL code, simplifying data preparation for data warehouses and analytics applications.

🥋 Sensei Says:

"The exam emphasizes Glue’s data catalog and its role in enabling other analytics services like Athena and Redshift Spectrum. Understand the difference between Glue crawlers and Glue jobs. Be prepared to identify scenarios where Glue is the optimal ETL solution."

📚 Certification: AWS Certified Cloud Practitioner (CLF-C02)

🔑 What are the Key Concepts of AWS Glue?

▸ AWS Glue Data Catalog is a central metadata repository storing schema information, making data discoverable and queryable by other AWS analytics services.
▸ Glue Crawlers automatically scan data sources and infer schemas, eliminating manual schema definition and simplifying data integration processes.
▸ Glue Jobs are ETL scripts (Python or Scala) that transform data; they can be scheduled or triggered on-demand for batch or streaming processing.
▸ Glue integrates seamlessly with other AWS services like S3, Redshift, Athena, and EMR, providing a comprehensive data analytics pipeline.
▸ Glue supports both serverless Spark and Ray execution environments, offering flexibility and scalability for ETL workloads.

🎯 How does AWS Glue appear on the CLF-C02 Exam?

You may be asked to identify the AWS service best suited for automatically discovering the schema of data stored in an S3 bucket before querying it with Athena.

A scenario might describe a company needing to build a data lake and prepare data for analysis – determine which service provides the ETL capabilities and data catalog.

Expect questions about how Glue Crawlers can be used to maintain an up-to-date data catalog as new data is added to S3 buckets.

❓ Frequently Asked Questions

When would I choose Glue over other ETL tools like AWS Data Pipeline?

Glue is serverless and automatically scales, making it ideal for unpredictable workloads. Data Pipeline is more suited for complex, scheduled workflows with specific dependencies.

What is the difference between a Glue Crawler and a Glue Job, and when do I use each?

Crawlers discover schemas and populate the Data Catalog. Jobs perform the actual data transformation using code. You use a Crawler *before* a Job to understand the data structure.

Can Glue handle semi-structured data like JSON and Parquet?

Yes, Glue excels at handling semi-structured data formats like JSON, Parquet, and ORC. Its crawlers can automatically infer schemas from these formats, simplifying ETL processes.

Related Terms from AWS Certified Cloud Practitioner

Database as a Service (DBaaS) AWS CloudTrail Infrastructure as Code (IaC) Serverless Computing Amazon VPC Security Groups AWS Backup

📝 Related Study Guides

Study Guide 8 min read

AWS Cloud Practitioner (CLF-C02): Complete 2026 Study Guide

The AWS Cloud Practitioner CLF-C02 certification validates foundational cloud knowledge across four domains: Cloud Concepts, Security and Compliance, Cloud Technology and Services, and Billing and Pricing. Prepare with a 4-week study plan focusing on core AWS services like EC2, S3, IAM, and Lambda, combined with scenario-based practice questions to build exam confidence.

Study Guide 10 min read

AWS Cloud Practitioner (CLF-C02) Study Guide for 2026

The AWS Cloud Practitioner (CLF-C02) exam validates overall understanding of the AWS Cloud platform. To pass, you must master four domains: Cloud Concepts, Security and Compliance, Technology, and Billing and Pricing. A successful strategy combines official AWS documentation with rigorous practice exams to benchmark your knowledge across all service categories.

Deep Dive 8 min read

AWS Support Plans & Pricing: CLF-C02 Exam Guide

AWS offers four support plans—Basic, Developer, Business, and Enterprise—differing by response time, access to engineers, and the inclusion of a Technical Account Manager (TAM). For the CLF-C02 exam, you must distinguish these tiers and understand pricing models like On-Demand, Reserved, Spot, and Savings Plans to optimize cloud costs.