Abstract
Data acquisition serves as a critical component of modern data architecture, with REST API integration emerging as one of the most common approaches for sourcing external data. This study evaluates the efficiency of various methodologies for collecting data via REST APIs and benchmark their performance. It explores how leveraging the Spark distributed computing platform can optimize large scale [...] Read more.
Data acquisition serves as a critical component of modern data architecture, with REST API integration emerging as one of the most common approaches for sourcing external data. This study evaluates the efficiency of various methodologies for collecting data via REST APIs and benchmark their performance. It explores how leveraging the Spark distributed computing platform can optimize large scale REST API calls, enabling enhanced scalability and improved processing speeds to meet the demands of high volume data workflows.