Amazon Athena is a tool that allows data analysts to use standard SQL to query data within objects stored in Amazon S3. It can be accessed through the AWS Management Console, a Java Database Connectivity driver, an ODBC driver or an API.
This query service eliminates the need for complex ETL tasks to prepare data for analysis making it easy for anyone with SQL skills to quickly analyse large-scale datasets. The user only needs to point to the data in S3, define the schema and start running Structured Query Language queries. Athena can process logs, run interactive queries and perform ad-hoc analysis. It can also process structured, semi-structured and unstructured data sets.
Benefits of Amazon Athena
Ease of use
Athena uses Presto, an SQL query engine designed to run interactive analytic queries against data sources of all sizes. It also supports a range of data formats including Parquet, JSON, Avro, CSV, and ORC. This allows Athena to run quick ad-hoc analysis as well as more complex requests including windows functions, nested queries, large joins, and arrays.
Athena is serverless. This means that the user can quickly query data without having to configure or manage any infrastructure. Additionally, the customer doesn’t have to worry about failures, software updates or scaling the servers or data warehouses as the datasets and number of users grow.
The interactive query service allows data analysts to tap into their data in Amazon S3 without having to create processes to extract, transform and load the data.
Pay per query
With Athena, users pay only for the queries they run and the amount of data scanned per each query. The nice thing about Athena is that users can significantly reduce their charges by partitioning, compressing and converting their data into columnar formats. Additionally, there are no additional storage charges since the queries are performed directly in S3.
It’s worth mentioning that failed queries are not charged. However, if the query is cancelled manually, the user will be charged for the amount of data scanned before the query was cancelled.
This interactive query tool is optimized for fast performance with S3. It can perform queries in parallel, allowing users to get results within seconds.
Amazon Athena can integrate with a variety of tools including AWS Glue, Amazon QuickSight and Key Management Service (KMS). Integrating Athena with Glue gives users access to the Glue Catalog allowing them to create a unified metadata repository across different services. Integrating the service with Amazon QuickSight allows data analysts to easily visualize data stored in S3.
When to Use Athena Versus Other Big Data Services
Amazon’s query services, sophisticated data processing, and data warehouses all address different needs. Athena, for instance, is a great tool for running ad-hoc queries on smaller datasets. It’s also a good choice for companies with complex types of data – struts, maps, and arrays.
Amazon Redshift, an Amazon data warehouse, provides fast query performance for workloads that involve complex Structured Query Language with subqueries and multiple joins. Amazon Elastic MapReduce (EMR), a sophisticated data processing service, is a cost-effective option for running highly distributed frameworks such as Presto, Spark, and Hadoop when compared to on-premises software.
Maybe you'll find this ebook interesting: