Data quality and data lineage are critical concerns for organizations mandated to comply with stringent regulatory regimes. This paper analyses the latest developments in the governance of data quality and data lineage within a regulated financial services organisation. It sets out the underlying regulatory context, describes the concepts employed in the business environment, summarizes how data quality is captured and monitored, examines the artefacts that record data lineage, reviews the roles and responsibilities of staff who implement the necessary processes, and maps areas where improvements are possible. The internal organization and processes of regulated data platforms are shaped not only by the capabilities prescribed by their technical architecture but also by the regulatory regimes under which they operate. These mandates, in particular, require rigorous examination of four aspects of data quality — accuracy, completeness, consistency, and timeliness — and detailed documentation of how data arrives in its final form in the repository. Although data monitoring, alerting, assessment, and remediation are well established, provenance capture remains an area ripe for further investment.
Improving Data Quality and Lineage in Regulated Financial Data Platforms
September 28, 2020
November 21, 2020
December 20, 2020
December 27, 2020
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Abstract
1. Introduction
Data platforms supporting regulated activities are subject to significant external scrutiny, and it is imperative that the data they produce is accurate, complete, consistent, timely, auditable, and trustworthy. Improving data quality is therefore a pivotal endeavour for organizations that have data lineage and quality issues. While these issues are fundamentally problem domain‐specific, three points are common across all solution approaches:
- Data quality frameworks must support the data management requirements imposed by external regulators;
- A data quality framework supported by the business ownership of data quality‐related components offers the greatest chance of success;
- Metrics and alerting considerations usually determine the quality of the quality solution.
These points apply to many regulated environments, although they have been validated in the context of financial services. In many regulated environments, the data monitoring needs of the organization can be distilled into four major areas: Governance, Accuracy, Lineage, and Monitoring (GALM) [1].
1.1. Overview of Data Governance in Financial Services
The data management quality, governance, and lineage processes of regulated financial platforms are guided by stringent requirements to ensure the data are correct and properly stewarded throughout their lifecycle. Regulatory frameworks imposed by local government agencies or central banks, as well as self-imposed data management policies, prescribe a solid and formalized data organization approach within the financial institution. To achieve the required level of quality, the institution's data must be guaranteed both internally and externally. Internally, quality and completeness (or, in certain cases, accuracy) must be accredited on a timely basis before becoming a formally certified data source. Contains sources are usually used as a cycle to let the data flow in and out, but may serve as foundation for the correct execution of models or simulations (i.e., stress tests) run by the institution. Externally, a set of accepted data sources (authoritative sources) is established as inputs to different calculation processes, and data must be guaranteed for technical issues and fallbacks [2].
To fulfill these needs, allowing clearly defined data responsibilities and addressing specifically the business and regulatory requirements for the creation and monitoring of data quality metrics are fundamental to enforce compliance. A clear articulation of the data custodians and of the seventh and business lines of defense against poor data quality is also required. Guarantees, designs, and roles definitions should come together in a solid data lineage concept, as technical lineage should provide the automation and support to the business-oriented view. Automation in the data quality–defining project also enhances execution and mainstreaming; incorporating data quality indicators into the same design and technical implementations enrich policies and simplify compliance monitoring [3].
2. Regulatory Context and Data Governance Requirements
Financial data platforms must comply with formal industry-specific legislation, which acts as the primary catalyst for data governance processes and resources. Financial institutions face many different obligations related to data management. The Basel Committee on Banking Supervision (BCBS) has issued the BCBS239 principles for effective risk data aggregation and risk reporting and the Financial Stability Board (FSB) has defined key attributes of effective resolution regimes. The requirements articulated by the United States Commodity Futures Trading Commission (CFTC) and the European Market Infrastructure Regulation (EMIR) encompass a broad spectrum of data attributes, including the management of data reference attributes. The Financial Industry Regulatory Authority (FINRA) and the Financial Services Authority (FSA) — responsible for the regulation of investment banking and capital markets in the UK — have also published data management requirements [4].
In addition to these formal obligations, financial institutions increasingly recognize the importance of effective data management in order to satisfy key stakeholders, including senior management and the board of directors, as well as the wider investment community. Investors are devoting greater levels of resources to due diligence and increasingly focusing on the quality of data and its fitness for purpose. Special emphasis is placed on the data quality framework implemented by investment firms, as inaccuracies in performance and expense data driven by a lack of appropriate processes and resources raise significant concerns regarding the management and control of funds [5].
Equation 1) Step-by-step derivations of the key equations
A. Completeness (field-level → record-level → dataset-level)
Step 1: Define the dataset
- Let the dataset have N records (rows), indexed by
- Let there be M required attributes, indexed by
Step 2: Define a “present & acceptable” indicator
Step 3: Field-level completeness (for attribute j)
Count how many records have the field present:
Step 4: Dataset-level completeness
Average across required fields:
2.1. Regulatory Frameworks for Data Management Compliance
A review of established data management citations reveals that few regulatory frameworks address data management directly. Three frameworks stand out: the Sarbanes-Oxley Act, the Basel Accord, and the General Data Protection Regulation. Together, they provide the foundation for data governance in regulated financial environments. The Sarbanes-Oxley Act was created in response to the wave of corporate financial reporting scandals that swept the United States in the early 2000s. Among its provisions, the Act mandates that corporate officers be held accountable for their company’s financial statements. Recognizing that accurate financial statements depend on using accurate underlying data, the Act implicitly requires companies to use data governed according to best practices. As part of the Basel Accord process for regulating banking institutions, banks are required to implement risk measurement frameworks for managing operation risk within their overall risk management processes. As these frameworks include a focus on data quality, Basel requirements have a direct impact on financial data governance activities [6].
The General Data Protection Regulation (GDPR) is a legal framework that sets guidelines for the collection and processing of personal information of individuals within the European Union (EU). The GDPR aims primarily to give control to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU. Personal data must be processed lawfully, transparently, and in a manner that is accessible to the data subject. Data quality is an area of GDPR risk that relates back to enterprise data governance. GDPR explicitly states that personal data must be accurate and kept up to date, as well as data minimization [7].
3. Data Quality Frameworks in Financial Platforms
Although accurate and complete data underpins regulated financial platforms, consistently maintained data quality remains a challenge. An intensive effort is required to achieve sound data accuracy and completeness as well as consistency and timeliness. Regulatory obligations demand that the data sources feeding the models are as accurate and complete as possible. The concept of lineage‐aware data quality metrics provides a framework for assessing data quality through the lens of data lineage. Key indicators are defined that measure important dimensions of data quality (accuracy, completeness, consistency and timeliness) and they are given a practical implementation in a generic architecture with alerting depending on the level of violation [8].
Key indicators may help governing approved data set data quality with alerting mechanisms. Data quality, especially modeled in a regulated financial environment where role definitions, data management workflow and compliance reporting are agreed, is as accurate and complete as possible. Nevertheless, the repercussions of a data quality issue may vary according to the usage of the data. The concept of lineage‐aware data quality metric enables the risk of issues affecting a consumer application to scale the alerting mechanism. The implementation combines data quality monitoring and lineage, and proposes monitoring metadata description that is easily deployable in common information management platforms, regardless of their underlying technology [9].
3.1. Data Accuracy and Completeness
To meet regulatory obligations, data on financial platforms must contain both high accuracy and high completeness. Accuracy denotes that the data is correct, whereas completeness indicates that the data set is as large as necessary. Together, they define whether sufficient detail is present to trustworthy business decision-making.
Timeliness affects the value of any data set or product. Data that loses topicality should be removed permanently from the data set. Data aging and out-of-date monitoring processes are mandatory for any data quality architecture within a corporation’s responsibility [10]. Based on the regulator requirement, aged out information should be grouped as soon as possible. The grouped information should be rendered historical and separated from the actual data available for reporting or operational processing on business operation activities. Different data aging groups may be defined so that the key regulatory requirements are met while providing the highest quality information for the organization. Timeliness can also determine the need for additional redundancy monitoring. For example, when financial information is flagged or categorized as being potentially inaccurate, additional timely redundancy is introduced into the monitoring process until the notification is removed [11].
The definition and capture of the complete data lineage is the starting point of the data provenance architecture. A properly implemented and maintained data lineage defined by business owners and endorsed by the data managers of the regulated platform of a large corporation is essential for good quality information on products, especially in the context of decision-making and risk management [12].
3.2. Data Consistency and Timeliness
The internal and cross-system consistency of a data platform is another important aspect of data correctness. Data should not be contradictory between different fields in the same record or between different words, and the relations and dependencies inherent in the data as defined by business rules should be preserved [13]. For example, orders cannot have a delivery date that is before the order date, nor can the aggregate cost be lower than the sum of its component costs. Some of these relations might also be expressed directly in the structure of the databases using keys and their restrictions, unique constraints, and foreign keys. It is also important for the platform not to return contradictory data. Indeed, two different users should never receive contradicting answers to the same question posed over the data in the platform. A flagged return value of not available may occur under different circumstances, such as during the initial testing phase of a platform, the absence of records complying with the filtering parameters, the failure of one or more back-end processes, or others. Finally, timeliness refers to the fulfillment of the service-level agreement on the maximum time to fulfill a request.
Most dimensions of data quality can be addressed by monitoring the values of a relatively small number of key data quality indicators (DQIs). Each DQI is a quantitative representation of a critical aspect of data quality, the actual dimensions being addressed varying from industry sector to industry sector or even company to company, while the monitoring of the DQI can be implemented through different approaches [14]. The association of DQIs to a monitoring architecture facilitates the detection of data quality issues and often even permits the forecasting of the need for a dirty data correction process, i.e., a process dedicated to fixing detected data quality issues once triggered.
4. Data Lineage Concepts and Implementation
Beyond the need to maintain high-quality data, regulations in many sectors specify that organizations must be able to demonstrate comprehensive data lineage. As principles, requirements, and specific implementations vary significantly within different domains, it is prudent to separate these concerns. Practical and field-tested concepts that supported data-lineage records for one particular regulated platform are described. Technical lineage describes a detailed capture of all of the transformations applied to data as it flows from the point of collection to the point of consumption. Business lineage abstractly depicts from where a data asset is sourced and where it is used. The two types of lineage records serve different purposes. Technical lineage records are primarily of interest to system designers and developers, and business lineage records are primarily for consumers and data quality assessors [15].
A technical lineage record is often automatically captured because many of the data transformations–such as joins, filters, calculations, and merges–are implemented within metadata-managed pipelines or workflows that maintain these records as metadata. Business lineage records are often created manually by business team members working from business knowledge that captures flow direction and links between data elements, data sets, and processes. Ensuring that required business links and flow directionality are maintained as data sets and processes change requires governance. One proven approach is to require business lineage capture as part of new dataset and processing requests, with periodic review and update of the records. Adding provenance capture mechanisms beyond those typically found in data management tools can also support business and/or certified data director and consumer assurance–for example, capturing the signatures of BDDs that covered the data’s generation site and time [16].
Equation B. Accuracy (reference truth set or authoritative source comparison)
Step 1: Define a truth/expected value
- Let be the “true” (or authoritative) value for record , attribute .Let be the platform’s stored value.
Step 2: Define a match indicator
Step 3: Attribute-level accuracy
Step 4: Dataset accuracy across K “critical” fields
If only a subset of critical fields are regulated/important:
4.1. Technical lineage versus business lineage
Capturing technical lineage operates at a database-table-column granularity, recording all transformations from data sources to reporting outputs. It allows tracing of raw reporting database values back to upstream systems, ensuring that the stored values are consistent with original data sources. For a full capture of the technical lineage footprint, every process associated with an ETL job should properly register the transformation details in the metadata repository. Tools have been developed to enable the availability of such information in an automated manner.
Business lineage capture differs from technical lineage in that it covers only key facts and metrics in the reporting database. Business lineage captures the business definition of these facts and metrics, the specific test points defined by the business, and the upstream data sources feeding into these fact and metrics tables, allowing it to provide business stakeholders with key information about report footings. Reports can also be collected from upstream applications to enable business lineage update processes. When a report is run, stakeholders can check business lineage to identify the relevant business definition, test point owners, and upstream sources [17].
4.2. Provenance capture mechanisms
Provenance information can be captured in a data management platform by multiple mechanisms, by collecting data during the creation, visualization, and data exportation phases. In the modern technology era, where everything is implemented and stored in cloud-based services, solutions could be provided by leveraging tools present within cloud platforms, such as metadata extraction through triggers in SQL databases or capturing events in data warehouses through event-driven architectures. Using these tools would allow the automation of the data lineage generation process. This became even more critical considering that the majority of the data within the cloud is manipulated through ETL workflows and pipelines. Organizing cloud-based architectures using the same approach allows the use of data lineage ad-hoc tools that can make use of metadata present in cloud platforms [18].
For example, in AWS architecture, the use of CloudTrail (AWS service to monitor and log events) allows the identification of all S3 object actions, Lambda function calls, API calls made to AWS Glue Data Catalog and Microsoft Azure and Google Cloud provide similar capabilities. Moreover, in data and cloud environments where data is manipulated using ETL engines and SQL-based transformation, the unique implementation of technical data lineage is transformed in the business lineage visualized in analytic dashboards and data mart reports. Whenever an ETL and data pipeline is visualized, the user can trigger or run a complete data pipeline that prepares the data for analytics needs. By architectural design, the data is generated in a temporary storage. Thus, when the data is ready for visualisation, analyser by business users or exposed through API, notifications are fired to capture the business records that provide the needed information for data lineage and data quality control [19].
5. Data Quality Metrics and Monitoring
Data quality is not an idle concern and hence quality metrics must be defined and monitoring architecture put in place. Data quality indicators appear in the regulatory requirements or are derived from business needs. In either case, the metrics must be measurable and quantifiable. In practice, alerting can therefore be designed based on explicit thresholds defined for each indicator [20].
A suitable approach is to leverage enterprise application performance monitoring, commonly known as APM. This captures telemetry data across the deployed technology stack, typically covering service call latencies and error rates, third-party service availability or response times, database query performance, and resource utilization such as CPU, memory, disk I/O, host availability, etc. It is natural to extend APM to cover other aspects, including data quality, for regulated environments. In this context, data quality checks are incorporated as clear indicators in the APM framework [21].
5.1. Defining Key Data Quality Indicators
The effectiveness of data quality management systems hinges on the precise definition and measurement of key data quality indicators (KDPIs)—metrics that reflect the degree of agreement between the data and the field's requirements. Various experts have suggested KDPIs tailored for different conditions; for instance, the European Banking Authority provides a set aimed at financial data services. While some definitions specifically target accuracy or completeness, data quality comprises a constellation of diverse aspects, each with its own distinct metrics. Despite such nuances, a concise set of dimensions characterizes the bulk of literature in this domain [22].
A consolidated data quality framework integrates Timeliness, Completeness, Business Integrity, Uniqueness, and Accuracy. Timeliness addresses the requisite currency of data for compliance with consumer requirements, including data availability and update frequency.
Completeness assesses the proportion of non-missing data compliant with business rules, thereby answering whether a business requirement is met. Business Integrity relates to the degree to which data adheres to predefined structures dictated by the business environment, such as semantic and referential integrity checks [23]. Uniqueness captures the absence of data duplication, reflecting the number of distinct lines in a dataset compared to the total number of lines. Accuracy measures the data's truthfulness or validity. Organizations often apply techniques like root cause analysis (RCA) to assess the dimension of an alarm for a given variable; the analysis also considers the MDM tool to pinpoint specific "bad" records [24].
5.2. Monitoring architectures and alerting
The second core monitoring function is the definition and implementation of a consistent alerting mechanism that relies on the monitoring layer to detect various types of data quality issues and proactively notify the relevant stakeholders. Alerts generally target unplanned events, such as identification of duplicates, sudden increase in empty fields, exceeding thresholds for business rules, and identification of data issues raised by downstream systems. They can also provide information about potentially underlying problems detected by the data quality checks, such as increase in latency for data processing tasks. In a similar manner to data quality checks, data alerts can be based on the notion of data quality incidents and can be classified according to severity, actions required, stakeholders responsible, and remediation deadlines [25].
Notifications are also an integral part of lineage monitoring mechanisms and track the successful completion of input data provenance subscriptions, triggering alerts in case of problems with the lineage capture for critical data consumed by regulated systems. Monitoring capabilities are built into the metadata capture solution to check the connectivity and capture status of individual metadata sources and also the overall lineage system [26].
6. Data Governance Roles and Organizational Structures
Clear delineation of responsibilities and tasks ensures that obligations for managing data quality and lineage are assigned to individuals or groups with the appropriate skills and motivations. Through the Zuora Data Services platform, the following roles have proven effective in a fintech environment [27].
* Data owners are persons or roles that have the authority and responsibility for defining data quality rules and for confirming that data adheres to those rules. Data owners are also accountable for the impact of data attributes on dependent business processes. This role is usually associated with specific use cases or domain areas. For example, the head of data sciences may be the data owner for data quality accuracy during model training, while the head of risk or compliance would assume the role for critical model attributes used for approval-modeling or audit purposes [28].
* Data stewards have primary responsibility for defining the data quality rules (key data quality indicators, or KDQIs) for which other users are accountable. Data stewards typically sit within the business domains that are users of the impacted data. For example, business users of the department's data would be responsible for basic tracking of data count and completeness, while the relevant data science expert could be accountable for more technical metrics such as data range, model drift, data bias, feature stability, or test set leakage [29].
* Data custodians are usually technical staff or data engineers responsible for governing data quality within their technical area(s) of expertise. For example, a member of the engineering team building the data lake would be responsible for monitoring the integrity of data captured from external providers and a back-end software engineer supporting data movement between the Data Services and Data Warehouse platforms would be responsible for confirming data record counts before and after transfer [30].
Equation C. Uniqueness (duplicate-free rate)
Step 1: Define total rows
Step 2: Define number of distinct records
Using a business key (or full row hash), let:
Step 3: Uniqueness score
(So, if 2% are duplicates, .)
6.1. Data Owners, Stewards, and Custodians
Data owners are responsible for quality assurance and the implementation of control measures. They define the policy framework governing the quality of data used in their areas of responsibility, specify the business glossary entry requirements, establish a definition for Functional Data Quality Metrics (FDQMs), assign Data Quality Responsibilities (DQRs) related to the monitoring of FDQMs, and are involved in audit and certification activities. Data owners act as the key interface with regulators and external auditors [31].
Data stewards are responsible for the day-to-day quality of data within their remit. They execute the control measures to ensure the quality of data entering Data Consumable Systems and monitor data quality by reviewing trends of FDQM results. By ensuring the entry control measures are executed, they also contribute to the achievement of required Data Input Quality Goals (DIQGs) [32]. Data custodians monitor and maintain the Data Consumable Systems used in Business Lines in accordance with the procedures defined by the relevant Data Owners. Data custodians are the first point of contact for users from Business Lines and regularly inform Data Owners of problems encountered in these systems.
6.2. policy management and compliance reporting
Policy management and compliance reporting are core functions within a data governance program. A typical organization creates policies that broadly govern data management using the language of its regulatory framework [33]. To demonstrate adherence to these policies, the organization articulates supporting procedures that incorporate regulatory obligations into the detail of everyday work. A map between the policies and supporting procedures is needed to establish clear accountability and ensure routine reviews of procedure effectiveness.
Compliance monitoring is typically performed by an internal auditing function. As it does so, it may examine whether data is subject to the relevant regulatory requirements during its processing lifecycle, whether the relevant data processes are subject to the established data governance policies and supporting procedures, and whether the day-to-day execution of such processes is currently aligned with those procedures [34].
7. Conclusion
The analysis discusses regulatory context, data governance, and quality and lineage practices associated with compliant producer and consumer environments in the financial services sector [35]. Compliance requires effective management and change across people, process, technology, and organizational policies. Regulatory obligations provide a minimal foundation for data quality and governance within data-centric ecosystems, as evidenced by attempts—frequently poorly executed—to minimize or eliminate repeated sanctions and associated Geiger counter failures.Investment banks, exchanges, regulatory authorities, and other global players in the financial services ecosystem no longer operate in relative isolation [36].
With increasing globalization, both data production and consumption have expanded beyond company, regional, and even continent boundaries. Financial-sector actors now share live, enriched, approved, and in some cases, even original, data to enable fast, compliant, profitable, data-driven systems and applications. However, the associated technical, governance, and quality-related issues are often complex and difficult to manage effectively [37].
7.1. Summary and Future Directions
Data compliance, quality, and governance are important aspects of managing regulated financial data platforms. Regulatory frameworks describe the general requirements for data governance, supported by dedicated data quality frameworks. Data lineage development is an equally pressing concern, ensuring the traceability and consistency of data across its lifecycle [38].
Data quality governance embraces the definitions of standard data quality indicators and their continuous monitoring, with alerting systems to detect breaches. A well-defined governance organization considers the roles of data owners, stewards, and custodians, with corresponding policies to maintain compliance with the definition of standard data quality indicators [39].
The analysis presented represents only an initial step. Many financial data platforms are still unaware of the breadth of such data governance activities, especially large-scale or global data services. Other platform aspects not explored require careful consideration: the increasing complexity of data provenance solutions and the evolving landscape of data fabric technologies demand further investigation into their workload distribution. The growing adoption of cloud solutions and the need to preserve service performance and responsiveness underconsistent quality of experience add to the analysis challenge. Finally, the hybrid platform model, which integrates data center and cloud computing paradigms, presents yet another area for future study [40].
References
- Ackermann, F., Howick, S., Quigley, J., Walls, L., & Houghton, T. (2014). Systemic risk elicitation: Using causal maps to engage stakeholders and build a comprehensive view of risks. European Journal of Operational Research, 238(1), 290–299.[CrossRef]
- Meda, R. (2020). Designing Self-Learning Agentic Systems for Dynamic Retail Supply Networks. Online Journal of Materials Science, 1(1), 1-20.[CrossRef]
- Aggarwal, C. C., & Wang, H. (2010). Managing and mining graph data. Springer.[CrossRef]
- Machine Learning Applications inRegulatory Compliance Monitoring forIndustrial Operations. (2020). Global Research Development(GRD) ISSN: 2455-5703, 5(12), 75-95. https://doi.org/10.70179/tqqm2y82.[CrossRef]
- Albanese, D., Lo Duca, M., & Visintainer, R. (2019). Data quality and data governance in financial crime analytics. Journal of Financial Regulation and Compliance, 27(4), 475–492.
- Inala, R. Designing Scalable Technology Architectures for Customer Data in Group Insurance and Investment Platforms.
- Basel Committee on Banking Supervision. (2013). Principles for effective risk data aggregation and risk reporting (BCBS 239). Bank for International Settlements.
- Segireddy, A. R. (2020). Cloud Migration Strategies for High-Volume Financial Messaging Systems.[CrossRef]
- Bennett, K. P., & Campbell, C. (2000). Support vector machines: Hype or hallelujah? SIGKDD Explorations, 2(2), 1–13.[CrossRef]
- Botlagunta, P. N., & Sheelam, G. K. (2020). Data-Driven Design and Validation Techniques in Advanced Chip Engineering. Global Research Development (GRD) ISSN: 2455-5703, 5(12), 243-260.
- Böhme, R., & Moore, T. (2012). The economics of cybersecurity. Journal of Cybersecurity, 1(1), 3–7.
- Gottimukkala, V. R. R. (2020). Energy-Efficient Design Patterns for Large-Scale Banking Applications Deployed on AWS Cloud. power, 9(12).
- Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.[CrossRef]
- Keerthi Amistapuram , "Energy-Efficient System Design for High-Volume Insurance Applications in Cloud-Native Environments," International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering (IJIREEICE), DOI 10.17148/IJIREEICE.2020.81209.
- Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 1–58.[CrossRef]
- Vadisetty, R., Polamarasetti, A., Guntupalli, R., Rongali, S. K., Raghunath, V., Jyothi, V. K., & Kudithipudi, K. (2020). Generative AI for Cloud Infrastructure Automation. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 1(3), 15-20.[CrossRef]
- Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387.[CrossRef]
- Varri, D. B. S. (2020). Automated Vulnerability Detection and Remediation Framework for Enterprise Databases. Available at SSRN 5774865.[CrossRef]
- Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.[CrossRef]
- Rongali, S. K. (2020). Predictive Modeling and Machine Learning Frameworks for Early Disease Detection in Healthcare Data Systems. Current Research in Public Health, 1(1), 1-15.[CrossRef]
- DAMA International. (2017). DAMA-DMBOK: Data management body of knowledge (2nd ed.). Technics Publications.
- Chakilam, C., Koppolu, H. K. R., Chava, K. C., & Suura, S. R. (2020). Integrating Big Data and AI in Cloud-Based Healthcare Systems for Enhanced Patient Care and Disease Management. Global Research Development (GRD) ISSN: 2455-5703, 5(12), 19-42.[CrossRef]
- Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.[CrossRef]
- Dwork, C., & Roth, A. (2014). The algorithmic foundations of differential privacy. Now Publishers.[CrossRef]
- Integrating Big Data and AI in Cloud-Based Healthcare Systems for Enhanced Patient Care and Disease Management. (2020). Global Research Development(GRD) ISSN: 2455-5703, 5(12), 19-42. https://doi.org/10.70179/g32nmm07.[CrossRef]
- Evans, D., & Over, M. (2019). Sanctions compliance and operational risk. Journal of Banking Regulation, 20(3), 243–257.
- Annapareddy, V. N. (2020). Integrating Solar Infrastructure with Cloud Computing for Scalable Energy Solutions. Global Research Development (GRD) ISSN: 2455-5703, 5(12), 152-170.[CrossRef]
- Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1(3), 291–316.[CrossRef]
- Pamisetty, A. (2019). Big Data Engineering for Real-Time Inventory Optimization in Wholesale Distribution Networks. Available at SSRN 5267328.[CrossRef]
- Gill, M., & Spriggs, A. (2018). Sanctions screening technology in global banking. Journal of Financial Crime, 25(4), 1031–1045.
- Goldreich, O. (2009). Foundations of cryptography: Volume 2, basic applications. Cambridge University Press.
- Pamisetty, V. (2020). Optimizing Unclaimed Property Management through Cloud-Enabled AI and Integrated IT Infrastructures. Universal Journal of Finance and Economics, 1(1), 1–20. Retrieved from https://www.scipublications.com/journal/index.php/ujfe/article/view/1338.[CrossRef]
- Groth, P., & Moreau, L. (2013). PROV-overview: An overview of the PROV family of documents. Future Generation Computer Systems, 29(1), 158–165.
- Gadi, A. L. The Role of Digital Twins in Automotive R&D for Rapid Prototyping and System Integration.
- Harman, D. (1992). Relevance feedback and other query modification techniques. Information Processing & Management, 28(5), 561–575.
- Adusupalli, B., Singireddy, S., & Pandiri, L. Implementing Scalable Identity and Access Management Frameworks in Digital Insurance Platforms.
- Hu, W., & Liu, B. (2012). Mining and summarizing customer reviews. Proceedings of the ACM SIGKDD Conference, 1–9.
- Recharla, M. (2020). Targeted Gene Therapy for Spinal Muscular Atrophy: Advances in Delivery Mechanisms and Clinical Outcomes. International Journal of Science and Research (IJSR), 1921–1934. https://doi.org/10.21275/sr20126161624y.[CrossRef]
- Jensen, D., & Neville, J. (2002). Linkage and autocorrelation cause feature selection bias. Proceedings of ICML, 259–266.
- Pallav Kumar Kaulwar, "Designing Secure Data Pipelines for Regulatory Compliance in Cross-Border Tax Consulting," International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering (IJIREEICE), DOI 10.17148/IJIREEICE.2020.81208.