Filter options

Publication Date
From
to
Subjects
Journals
Article Types
Countries / Territories
Open Access March 22, 2025

Enhancing Scalability and Performance in Analytics Data Acquisition through Spark Parallelism

Abstract Data acquisition serves as a critical component of modern data architecture, with REST API integration emerging as one of the most common approaches for sourcing external data. This study evaluates the efficiency of various methodologies for collecting data via REST APIs and benchmark their performance. It explores how leveraging the Spark distributed computing platform can optimize large scale [...] Read more.
Data acquisition serves as a critical component of modern data architecture, with REST API integration emerging as one of the most common approaches for sourcing external data. This study evaluates the efficiency of various methodologies for collecting data via REST APIs and benchmark their performance. It explores how leveraging the Spark distributed computing platform can optimize large scale REST API calls, enabling enhanced scalability and improved processing speeds to meet the demands of high volume data workflows.
Figures
PreviousNext
Review Article
Open Access November 24, 2022

Bridging Traditional ETL Pipelines with AI Enhanced Data Workflows: Foundations of Intelligent Automation in Data Engineering

Abstract Machine Learning (ML) and Artificial Intelligence (AI) are having an increasingly transformative impact on all industries and are already used in many mission-critical use cases in production, bringing considerable value. Data engineering, which combines ETL pipelines with other workflows managing data and machine learning operations, is also significantly impacted. The Intelligent Data [...] Read more.
Machine Learning (ML) and Artificial Intelligence (AI) are having an increasingly transformative impact on all industries and are already used in many mission-critical use cases in production, bringing considerable value. Data engineering, which combines ETL pipelines with other workflows managing data and machine learning operations, is also significantly impacted. The Intelligent Data Engineering and Automation framework offers the groundwork for intelligent automation processes. However, ML/AI are not the only disruptive forces; new Big Data technologies inspired by Web2.0 companies are also reshaping the Internet. Companies having the largest Big Data footprints not only provide applications with a Big Data operational model but also source their competitive advantage from data in the form of AI services and, consequently, impact the cost/performance equilibrium of ETL pipelines. All these technologies and reasons help explain why the traditional ETL pipeline design should adapt to current and emerging technologies and may be enhanced through artificial intelligence.
Figures
PreviousNext
Article

Query parameters

Keyword:  Data Workflow

View options

Citations of

Views of

Downloads of