Trending

#AwsLakeFormation

Latest posts tagged with #AwsLakeFormation on Bluesky

Latest Top
Trending

Posts tagged #AwsLakeFormation

AWS Lake Formation enhances cross-account sharing AWS Lake Formation now enhances cross-account sharing, allowing you to share hundreds of thousands of tables across accounts. You can centralize permissions in Lake Formation for resources such as catalogs, databases, and tables for multi-account analytics environments that require fine-grained access controls at scale. You can share Data Catalog resources (databases, tables, and columns) with external IAM principals, AWS accounts, AWS Organizations, and organizational units (OUs). Lake Formation sets up a single AWS Resource Access Manager resource share for an unlimited number of tables to another account, eliminating previous resource association limits per resource type. To get started, upgrade to cross-account version 5 through the Lake Formation console or API. Any new cross-account permission grants will automatically use wildcard patterns in the AWS Resource Access Manager resource shares instead of individual resource associations. All existing cross-account shares continue to function, and all existing Lake Formation APIs remain compatible. To learn more, visit the AWS Lake Formation https://aws.amazon.com/lake-formation/ and https://docs.aws.amazon.com/lake-formation/latest/dg/cross-account-permissions.html. For AWS Lake Formation Region availability, please see the https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/.

AWS Lake Formation enhances cross-account sharing

AWS Lake Formation now enhances cross-account sharing, allowing you to share hundreds of thousands of tables across accounts. You can centralize permissions in Lake Formation for resources such as catalogs, databases, an...

#AWS #AwsLakeFormation

0 0 0 0
AWS Lake Formation is now available in Asia Pacific (New Zealand) Region AWS Lake Formation is now available in the Asia Pacific (New Zealand) Region, enabling you to centrally manage and scale fine-grained data access permissions and share data securely within and outside your organization. https://aws.amazon.com/lake-formation/ is a service that allows you to define where your data resides and what data access and security policies you want to apply. Your users can then access the centralized AWS Glue Data Catalog which describes available data sets and their appropriate usage. Your users can then usethese data sets with their choice of analytics and machine learning services, like Amazon EMR for Apache Spark, Amazon Redshift, AWS Glue, Amazon QuickSight, and Amazon Athena. To learn more about Lake Formation, visit the https://docs.aws.amazon.com/lake-formation/. For AWS Lake Formation Region availability, please see the https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/.

AWS Lake Formation is now available in Asia Pacific (New Zealand) Region

AWS Lake Formation is now available in the Asia Pacific (New Zealand) Region, enabling you to centrally manage and scale fine-grained data access permissions and share data securely within and outs...

#AWS #AwsLakeFormation

0 0 0 0
Preview
AWS Lake Formation is now available in Asia Pacific (New Zealand) Region AWS Lake Formation is now available in the Asia Pacific (New Zealand) Region, enabling you to centrally manage and scale fine-grained data access permissions and share data securely within and outside your organization. AWS Lake Formation is a service that allows you to define where your data resides and what data access and security policies you want to apply. Your users can then access the centralized AWS Glue Data Catalog which describes available data sets and their appropriate usage. Your users can then usethese data sets with their choice of analytics and machine learning services, like Amazon EMR for Apache Spark, Amazon Redshift, AWS Glue, Amazon QuickSight, and Amazon Athena. To learn more about Lake Formation, visit the documentation. For AWS Lake Formation Region availability, please see the AWS Region table.

πŸ†• AWS Lake Formation is now available in Asia Pacific (New Zealand), enabling centralized data access management and secure data sharing. Users can leverage AWS Glue Data Catalog with analytics services like Amazon EMR, Redshift, Glue, QuickSight, and Athena.

#AWS #AwsLakeFormation

0 0 0 0
The Amazon SageMaker lakehouse architecture now supports tag-based access control for federated catalogs The Amazon SageMaker lakehouse architecture now supports tag based access control (TBAC) for managing fine-grained data access across federated catalogs. This capability, previously available only for default AWS Glue Data Catalog resources, is now available across Amazon S3 Tables, Amazon Redshift data warehouses, and federated data sources including Amazon DynamoDB, PostgreSQL, and SQL Server. TBAC enables simplified permission management by logically grouping catalog resources using tags, allows scaling permissions across datasets with a minimal set of permissions, and also facilitates data sharing across different accounts. TBAC simplifies how administrators manage data access permissions by replacing direct resource-level permissions with tag-based grants. Instead of manually assigning permissions to individual tables or columns, administrators can now efficiently control access through tags that are automatically inherited by resources. This inheritance feature ensures that new tables automatically receive appropriate fine-grained access controls without additional policy modifications. You can get started with TBAC through the AWS Lake Formation console. Create tags using key-value pairs, associate them with databases, tables, or columns, and grant permissions to principals based on specific tags. Users can then access tagged resources through Amazon Athena, Amazon Redshift, Amazon EMR, or Amazon SageMaker Unified Studio. This feature is available through the AWS Management Console, AWS CLI, and AWS SDKs in all commercial https://docs.aws.amazon.com/lake-formation/latest/dg/supported-regions.html. To get started, read the https://aws.amazon.com/blogs/big-data/the-amazon-sagemaker-lakehouse-architecture-now-supports-tag-based-access-control-for-federated-catalogs/ and visit the https://docs.aws.amazon.com/lake-formation/latest/dg/tag-based-access-control.html.Β 

The Amazon SageMaker lakehouse architecture now supports tag-based access control for federated catalogs

The Amazon SageMaker lakehouse architecture now supports tag based access control (TBAC) for managing fine-grained data access across federated catalogs. This capabi...

#AWS #AwsLakeFormation

1 0 0 0
Preview
The Amazon SageMaker lakehouse architecture now supports tag-based access control for federated catalogs The Amazon SageMaker lakehouse architecture now supports tag based access control (TBAC) for managing fine-grained data access across federated catalogs. This capability, previously available only for default AWS Glue Data Catalog resources, is now available across Amazon S3 Tables, Amazon Redshift data warehouses, and federated data sources including Amazon DynamoDB, PostgreSQL, and SQL Server. TBAC enables simplified permission management by logically grouping catalog resources using tags, allows scaling permissions across datasets with a minimal set of permissions, and also facilitates data sharing across different accounts. TBAC simplifies how administrators manage data access permissions by replacing direct resource-level permissions with tag-based grants. Instead of manually assigning permissions to individual tables or columns, administrators can now efficiently control access through tags that are automatically inherited by resources. This inheritance feature ensures that new tables automatically receive appropriate fine-grained access controls without additional policy modifications. You can get started with TBAC through the AWS Lake Formation console. Create tags using key-value pairs, associate them with databases, tables, or columns, and grant permissions to principals based on specific tags. Users can then access tagged resources through Amazon Athena, Amazon Redshift, Amazon EMR, or Amazon SageMaker Unified Studio. This feature is available through the AWS Management Console, AWS CLI, and AWS SDKs in all commercial AWS Regions. To get started, read the blog and visit the Lake Formation Tags documentation.

πŸ†• Amazon SageMaker lakehouse adds tag-based access control for federated catalogs, enhancing data access management across Amazon S3, Redshift, DynamoDB, etc. It simplifies permission scaling and data sharing across accounts, available via AWS Lake Formation console.

#AWS #AwsLakeFormation

1 0 0 0
AWS Glue enables enhanced Apache Spark capabilities for AWS Lake Formation tables with full table access AWS Glue now supports read and write operations from AWS Glue 5.0 Apache Spark jobs on AWS Lake Formation registered tables when the job role has full table access. This capability enables Data Manipulation Language (DML) operations including CREATE, ALTER, DELETE, UPDATE, and MERGE INTO statements on Apache Hive and Iceberg tables from within the same Apache Spark application. While Lake Formation's fine-grained access control (FGAC) offers granular security controls at row, column, and cell levels, many ETL workloads simply need full table access. This new feature enables AWS Glue 5.0 Spark jobs to directly read and write data when full table access is granted, removing limitations that previously restricted certain Extract, Transform, and Load (ETL) operations. You can now leverage advanced Spark capabilities including Resilient Distributed Datasets (RDDs), custom libraries, and User Defined Functions (UDFs) with Lake Formation tables. Additionally, data teams can run complex, interactive Spark applications through SageMaker Unified Studio in compatibility mode while maintaining Lake Formation's table-level security boundaries. This feature is available in all AWS Regions where AWS Glue and AWS Lake Formation are supported. To learn more, visit the https://aws.amazon.com/glue/ and https://docs.aws.amazon.com/glue/latest/dg/security-access-control-fta.html.

AWS Glue enables enhanced Apache Spark capabilities for AWS Lake Formation tables with full table access

AWS Glue now supports read and write operations from AWS Glue 5.0 Apache Spark jobs on AWS Lake Formation registered tables when the job role has full table...

#AWS #AwsGlue #AwsLakeFormation

1 0 0 0
Preview
AWS Glue enables enhanced Apache Spark capabilities for AWS Lake Formation tables with full table access AWS Glue now supports read and write operations from AWS Glue 5.0 Apache Spark jobs on AWS Lake Formation registered tables when the job role has full table access. This capability enables Data Manipulation Language (DML) operations including CREATE, ALTER, DELETE, UPDATE, and MERGE INTO statements on Apache Hive and Iceberg tables from within the same Apache Spark application. While Lake Formation's fine-grained access control (FGAC) offers granular security controls at row, column, and cell levels, many ETL workloads simply need full table access. This new feature enables AWS Glue 5.0 Spark jobs to directly read and write data when full table access is granted, removing limitations that previously restricted certain Extract, Transform, and Load (ETL) operations. You can now leverage advanced Spark capabilities including Resilient Distributed Datasets (RDDs), custom libraries, and User Defined Functions (UDFs) with Lake Formation tables. Additionally, data teams can run complex, interactive Spark applications through SageMaker Unified Studio in compatibility mode while maintaining Lake Formation's table-level security boundaries. This feature is available in all AWS Regions where AWS Glue and AWS Lake Formation are supported. To learn more, visit the AWS Glue product page and documentation.

πŸ†• AWS Glue now supports full table access for Spark jobs on AWS Lake Formation tables, enabling DML operations and advanced Spark features, simplifying ETL workloads and enhancing data manipulation capabilities. Available in all supported regions.

#AWS #AwsGlue #AwsLakeFormation

1 0 0 0
Announcing fine-grained access control via AWS Lake Formation with EMR on EKS We are excited to announce the general availability of fine-grained data access control (FGAC) via AWS Lake Formation for Apache Spark with Amazon EMR on EKS. This enables you to enforce full FGAC policies (database, table, column, row, and cell-level) defined in Lake Formation for your data lake tables from EMR on EKS Spark jobs. We are also sharing the general availability of Glue Data Catalog views with EMR on EKS for Spark workflows. Lake Formation simplifies building, securing, and managing data lakes by allowing you to define fine-grained access controls through grant and revoke statements, similar to RDBMS. The same Lake Formation rules now apply to Spark jobs on EMR on EKS for Hudi, Delta Lake, and Iceberg table formats, further simplifying data lake security and governance. AWS Glue Data Catalog views with EMR on EKS allows customers to create views from Spark jobs that can be queried from multiple engines without requiring access to referenced tables. Administrators can control underlying data access using the rich SQL dialect provided by EMR on EKS Spark jobs. Access is managed with AWS Lake Formation permissions, including named resource grants, data filters, and lake formation tags. All requests are logged in AWS CloudTrail. Fine-grained access control for Apache Spark batch jobs on EMR on EKS is available with the EMR 7.7 release in all regions where EMR on EKS is available. To get started, see https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/security_iam_fgac-lf.html. Β 

Announcing fine-grained access control via AWS Lake Formation with EMR on EKS

We are excited to announce the general availability of fine-grained data access control (FGAC) via AWS Lake Formation for Apache Spark with Amazon EMR on EKS. This en...

#AWS #AwsGovcloudUs #AwsLakeFormation #AmazonEmr

0 0 0 0
Preview
Announcing fine-grained access control via AWS Lake Formation with EMR on EKS We are excited to announce the general availability of fine-grained data access control (FGAC) via AWS Lake Formation for Apache Spark with Amazon EMR on EKS. This enables you to enforce full FGAC policies (database, table, column, row, and cell-level) defined in Lake Formation for your data lake tables from EMR on EKS Spark jobs. We are also sharing the general availability of Glue Data Catalog views with EMR on EKS for Spark workflows. Lake Formation simplifies building, securing, and managing data lakes by allowing you to define fine-grained access controls through grant and revoke statements, similar to RDBMS. The same Lake Formation rules now apply to Spark jobs on EMR on EKS for Hudi, Delta Lake, and Iceberg table formats, further simplifying data lake security and governance. AWS Glue Data Catalog views with EMR on EKS allows customers to create views from Spark jobs that can be queried from multiple engines without requiring access to referenced tables. Administrators can control underlying data access using the rich SQL dialect provided by EMR on EKS Spark jobs. Access is managed with AWS Lake Formation permissions, including named resource grants, data filters, and lake formation tags. All requests are logged in AWS CloudTrail. Fine-grained access control for Apache Spark batch jobs on EMR on EKS is available with the EMR 7.7 release in all regions where EMR on EKS is available. To get started, see Using AWS Lake Formation with Amazon EMR on EKS.

πŸ†• Announcing fine-grained access control via AWS Lake Formation with EMR on EKS

#AWS #AwsGovcloudUs #AwsLakeFormation #AmazonEmr

0 0 0 0
Amazon SageMaker Lakehouse integrated access controls now available in Amazon Athena federated queries Connect, discover, and govern data across silos with Amazon SageMaker Lakehouse's new data catalog and permissions capabilities, enabling centralized access and fine-grained controls.

Amazon SageMaker Lakehouse integrated access controls now available in Amazon Athena federated queries

Connect, discover, and govern data across silos with Amazon SageMaker Lakehouse's new data catalog and permissions...

#AWS #AmazonAthena #Announcements #AwsLakeFormation #Featured #Launch #News

0 0 0 0
Amazon SageMaker Lakehouse integrated access controls now available in Amazon Athena federated queries Amazon SageMaker now supports connectivity, discovery, querying, and enforcing fine-grained data access controls on federated sources when querying data with Amazon Athena. Athena is a query service that makes it simple to analyze your data lake and federated data sources such as Amazon Redshift, Amazon DynamoDB, or Snowflake using SQL without extract, transform, and load (ETL) scripts. Now, data workers can connect to and unify these data sources within SageMaker Lakehouse. Federated source metadata is unified in SageMaker Lakehouse, where you apply fine-grained policies in one place, helping to streamline analytics workflows and secure your data. Log into Amazon SageMaker Unified Studio, connect to a federated data source in SageMaker Lakehouse, and govern data with column- and tag-based permissions that are enforced when querying federated data sources with Athena. In addition to the SageMaker Unified Studio, you can connect to these data sources through the Athena console and API. To help you automate and streamline connector set up, the new user experiences allow you to create and manage connections to data sources with ease. Now, organizations can extract insights from a unified set of data sources while strengthening security posture, wherever your data is stored. The unification and fine-grained access controls on federated sources are available in all AWS Regions where SageMaker Lakehouse is available. To learn more, visit SageMaker Lakehouse https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/lakehouse.html.

Amazon SageMaker Lakehouse integrated access controls now available in Amazon Athena federated queries

Amazon SageMaker now supports connectivity, discovery, querying, and enforcing fine-grained data access controls on federated sources wh...

#AWS #AmazonAthena #AwsLakeFormation #AmazonSagemaker

0 0 0 0
Preview
Amazon SageMaker Lakehouse integrated access controls now available in Amazon Athena federated queries Amazon SageMaker now supports connectivity, discovery, querying, and enforcing fine-grained data access controls on federated sources when querying data with Amazon Athena. Athena is a query service that makes it simple to analyze your data lake and federated data sources such as Amazon Redshift, Amazon DynamoDB, or Snowflake using SQL without extract, transform, and load (ETL) scripts. Now, data workers can connect to and unify these data sources within SageMaker Lakehouse. Federated source metadata is unified in SageMaker Lakehouse, where you apply fine-grained policies in one place, helping to streamline analytics workflows and secure your data. Log into Amazon SageMaker Unified Studio, connect to a federated data source in SageMaker Lakehouse, and govern data with column- and tag-based permissions that are enforced when querying federated data sources with Athena. In addition to the SageMaker Unified Studio, you can connect to these data sources through the Athena console and API. To help you automate and streamline connector set up, the new user experiences allow you to create and manage connections to data sources with ease. Now, organizations can extract insights from a unified set of data sources while strengthening security posture, wherever your data is stored. The unification and fine-grained access controls on federated sources are available in all AWS Regions where SageMaker Lakehouse is available. To learn more, visit SageMaker Lakehouse documentation.

πŸ†• Amazon SageMaker Lakehouse integrated access controls now available in Amazon Athena federated queries

#AWS #AmazonAthena #AwsLakeFormation #AmazonSagemaker

0 0 0 0
Amazon SageMaker Lakehouse integrated access controls now available in Amazon Athena federated queries Connect, discover, and govern data across silos with Amazon SageMaker Lakehouse's new data catalog and permissions capabilities, enabling centralized access and fine-grained controls.

Amazon SageMaker Lakehouse integrated access controls now available in Amazon Athena federated queries

Connect, discover, and govern data across silos with Amazon SageMaker Lakehouse's new data catalog and permissions...

#AWS #AmazonAthena #Announcements #AwsLakeFormation #Featured #Launch #News

0 0 0 0
Preview
AWS Lake Formation now supports named LF-Tag expressions Today, AWS announces the general availability of named LF-Tag expressions in AWS Lake Formation. With this launch, customers can create and manage named combinations of LF-Tags. With Named LF-Tag expressions, customers can now create permission expressions that better represent complex business requirements in permissions. Customers use LF-Tags to create complex data grants based on attributes and want to manage the combination of LF-Tags. Now, when customers want to grant the same combination of LF-Tags to multiple users, they can create a named LF-Tag expression and grant that expression to multiple users rather than providing the full expression for every grant. Additionally, changes in a customer’s LF-Tag ontology, for example for changes in business requirements, means customers can update a single expression instead of all permissions that used the changed LF-Tags. Named LF-Tag expressions are generally available in commercial AWS Regions where AWS Lake Formation is available and the AWS GovCloud (US) Regions. To get started with this feature, visit the AWS Lake Formation documentation.

πŸ†• AWS Lake Formation now supports named LF-Tag expressions

#AWS #AwsGovcloudUs #AwsLakeFormation

1 0 0 0
AWS Lake Formation now supports named LF-Tag expressions Today, AWS announces the general availability of named LF-Tag expressions in https://aws.amazon.com/lake-formation/. With this launch, customers can create and manage named combinations of LF-Tags. With Named LF-Tag expressions, customers can now create permission expressions that better represent complex business requirements in permissions. Customers use LF-Tags to create complex data grants based on attributes and want to manage the combination of LF-Tags. Now, when customers want to grant the same combination of LF-Tags to multiple users, they can create a named LF-Tag expression and grant that expression to multiple users rather than providing the full expression for every grant. Additionally, changes in a customer’s LF-Tag ontology, for example for changes in business requirements, means customers can update a single expression instead of all permissions that used the changed LF-Tags. Named LF-Tag expressions are generally available in commercial AWS Regions where AWS Lake Formation is available and the AWS GovCloud (US) Regions. To get started with this feature, visit the https://docs.aws.amazon.com/lake-formation/latest/dg/managing-tag-expressions.html. Β 

AWS Lake Formation now supports named LF-Tag expressions

Today, AWS announces the general availability of named LF-Tag expressions in https://aws.amazon.com/lake-formation/ With this launch, customers can create and manage named combinations of LF-Tags. ...

#AWS #AwsGovcloudUs #AwsLakeFormation

0 0 0 0
AWS Glue Data Catalog now supports Apache Iceberg automatic table optimization through Amazon VPC https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html now supports automatic optimization of Apache Iceberg tables that can be only accessed from a specific Amazon Virtual Private Cloud (VPC) environment. You can enable automatic optimization by providing a VPC configuration to optimize storage and improve query performance while keeping your tables secure. AWS Glue Data Catalog supports compaction, snapshot retention and unreferenced file management that help you reduce metadata overhead, control storage costs and improve query performance. Customers who have governance and security configurations that require an Amazon S3 bucket to reside within a specific VPC can now use it with Glue Catalog. This gives you broader capabilities for automatic management of your Apache Iceberg data, regardless of where it's stored on Amazon S3. Automatic optimization for Iceberg tables through Amazon VPC is available in 13 AWS regions US East (N. Virginia, Ohio), US West (Oregon), Europe (Ireland, London, Frankfurt, Stockholm), Asia Pacific (Tokyo, Seoul, Mumbai, Singapore, Sydney), South America (SΓ£o Paulo). Customers can enable this through the AWS Console, AWS CLI, or AWS SDKs. To get started, you can now provide the Glue network connection as an additional configuration along with optimization settings such as default retention period and days to keep unreferenced files. The AWS Glue Data Catalog will use the VPC information in the Glue connection to access Amazon S3 buckets and optimize Apache Iceberg tables. To learn more, read the https://aws.amazon.com/blogs/big-data/aws-glue-data-catalog-supports-automatic-optimization-of-apache-iceberg-tables-through-your-amazon-vpc/, and visit the AWS Glue Data Catalog https://docs.aws.amazon.com/glue/latest/dg/table-optimizers.html. Β 

AWS Glue Data Catalog now supports Apache Iceberg automatic table optimization through Amazon VPC

docs.aws.amazon.com/glue/latest/dg/catalog-a... now supports automatic optimization of Apache Iceberg tables that can be only accessed from a s...

#AWS #AwsLakeFormation #AwsGlue

0 0 0 0
Preview
AWS Glue Data Catalog now supports Apache Iceberg automatic table optimization through Amazon VPC AWS Glue Data Catalog now supports automatic optimization of Apache Iceberg tables that can be only accessed from a specific Amazon Virtual Private Cloud (VPC) environment. You can enable automatic optimization by providing a VPC configuration to optimize storage and improve query performance while keeping your tables secure. AWS Glue Data Catalog supports compaction, snapshot retention and unreferenced file management that help you reduce metadata overhead, control storage costs and improve query performance. Customers who have governance and security configurations that require an Amazon S3 bucket to reside within a specific VPC can now use it with Glue Catalog. This gives you broader capabilities for automatic management of your Apache Iceberg data, regardless of where it's stored on Amazon S3. Automatic optimization for Iceberg tables through Amazon VPC is available in 13 AWS regions US East (N. Virginia, Ohio), US West (Oregon), Europe (Ireland, London, Frankfurt, Stockholm), Asia Pacific (Tokyo, Seoul, Mumbai, Singapore, Sydney), South America (SΓ£o Paulo). Customers can enable this through the AWS Console, AWS CLI, or AWS SDKs. To get started, you can now provide the Glue network connection as an additional configuration along with optimization settings such as default retention period and days to keep unreferenced files. The AWS Glue Data Catalog will use the VPC information in the Glue connection to access Amazon S3 buckets and optimize Apache Iceberg tables. To learn more, read the blog, and visit the AWS Glue Data Catalog documentation.

πŸ†• AWS Glue Data Catalog now supports Apache Iceberg automatic table optimization through Amazon VPC

#AWS #AwsLakeFormation #AwsGlue

0 0 0 0
AWS Lake Formation is now available in the Asia Pacific (Malaysia) Region https://aws.amazon.com/lake-formation/ is a service that allows you to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions. Creating a data lake with Lake Formation allows you to define where your data resides and what data access and security policies you want to apply. Your users can then access the centralized AWS Glue Data Catalog which describes available data sets and their appropriate usage. Your users can then leverage these data sets with their choice of analytics and machine learning services, like Amazon EMR for Apache Spark, Amazon Redshift Spectrum, AWS Glue, Amazon QuickSight, and Amazon Athena. For a list of regions where AWS Lake Formation is available, see the https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/. Β 

AWS Lake Formation is now available in the Asia Pacific (Malaysia) Region

https://aws.amazon.com/lake-formation/ is a service that allows you to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores your data, both ...

#AWS #AwsLakeFormation

0 0 0 0
Preview
AWS Lake Formation is now available in the Asia Pacific (Malaysia) Region AWS Lake Formation is a service that allows you to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions. Creating a data lake with Lake Formation allows you to define where your data resides and what data access and security policies you want to apply. Your users can then access the centralized AWS Glue Data Catalog which describes available data sets and their appropriate usage. Your users can then leverage these data sets with their choice of analytics and machine learning services, like Amazon EMR for Apache Spark, Amazon Redshift Spectrum, AWS Glue, Amazon QuickSight, and Amazon Athena. For a list of regions where AWS Lake Formation is available, see the AWS Region Table.

πŸ†• AWS Lake Formation is now available in the Asia Pacific (Malaysia) Region

#AWS #AwsLakeFormation

0 0 0 0
AWS Glue Data Catalog now supports scheduled generation of column level statistics https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html now supports the scheduled generation of column-level statistics for Apache Iceberg tables and file formats such as Parquet, JSON, CSV, XML, ORC, and ION. With this launch, you can simplify and automate the generation of statistics by creating a recurring schedule in the Glue Data Catalog. These statistics are integrated with the cost-based optimizer (CBO) from Amazon Redshift Spectrum and Amazon Athena, resulting in improved query performance and potential cost savings. Previously, to setup recurring statistics generation schedule, you had to call AWS services using a combination of AWS Lambda and Amazon EventBridge Scheduler. With this new feature, you can now provide the recurring schedule as an additional configuration to Glue Data Catalog along with sampling percentage. For each scheduled run, the number of distinct values (NDVs) are collected for Apache Iceberg tables, and additional statistics such as the number of nulls, maximum, minimum, and average length are collected for other file formats. As the statistics are updated, Amazon Redshift and Amazon Athena use them to optimize queries, using optimizations such as optimal join order or cost based aggregation pushdown. You have visibility into the status and timing of each statistics generation run, as well as the updated statistics values. To get started, you can schedule statistics generation using the AWS Glue Data Catalog Console or AWS Glue APIs. The support for scheduled generation of AWS Glue Catalog statistics is generally available in all regions where Amazon EventBridge Scheduler is available. Visit AWS Glue Catalog https://docs.aws.amazon.com/glue/latest/dg/column-statistics.html to learn more.

AWS Glue Data Catalog now supports scheduled generation of column level statistics

docs.aws.amazon.com/glue/latest/dg/catalog-a... now supports the scheduled generation of column-level statistics for Apache Iceberg tables and file formats su...

#AWS #AwsGlue #AwsLakeFormation

0 0 0 0
Preview
AWS Glue Data Catalog now supports scheduled generation of column level statistics AWS Glue Data Catalog now supports the scheduled generation of column-level statistics for Apache Iceberg tables and file formats such as Parquet, JSON, CSV, XML, ORC, and ION. With this launch, you can simplify and automate the generation of statistics by creating a recurring schedule in the Glue Data Catalog. These statistics are integrated with the cost-based optimizer (CBO) from Amazon Redshift Spectrum and Amazon Athena, resulting in improved query performance and potential cost savings. Previously, to setup recurring statistics generation schedule, you had to call AWS services using a combination of AWS Lambda and Amazon EventBridge Scheduler. With this new feature, you can now provide the recurring schedule as an additional configuration to Glue Data Catalog along with sampling percentage. For each scheduled run, the number of distinct values (NDVs) are collected for Apache Iceberg tables, and additional statistics such as the number of nulls, maximum, minimum, and average length are collected for other file formats. As the statistics are updated, Amazon Redshift and Amazon Athena use them to optimize queries, using optimizations such as optimal join order or cost based aggregation pushdown. You have visibility into the status and timing of each statistics generation run, as well as the updated statistics values. To get started, you can schedule statistics generation using the AWS Glue Data Catalog Console or AWS Glue APIs. The support for scheduled generation of AWS Glue Catalog statistics is generally available in all regions where Amazon EventBridge Scheduler is available. Visit AWS Glue Catalog documentation to learn more.

πŸ†• AWS Glue Data Catalog now supports scheduled generation of column level statistics

#AWS #AwsGlue #AwsLakeFormation

0 0 0 0