Data
Glossary
ABAC, the Attribute-Based Access Control (ABAC) model is an information access control model that uses attributes as the basis for making decisions about access authorization. Unlike the traditional Role-Based Access Control (RBAC) model, which focuses on user roles, ABAC takes into account attributes such as user identity, role, time, location, and other relevant information. These attributes are evaluated by an access policy engine to determine whether a user is authorized to access a particular resource or action. ABAC offers finer granularity and greater flexibility to manage authorizations under specific conditions, improving security and access management in complex IT environments.
ADINT or Advertising Intelligence is the process of tracking personal information about end customers or consumers to collect all data related to their profiles, habits, and buying behaviors. This methodology involves an acquisition cost because it involves the purchase of databases. Once this customer information has been collected, analyzed and processed, companies are able to perform a fine-grained segmentation of their audience to carry out targeted or even personalized marketing campaigns of conquest or loyalty.
AUDITABILITY, refers to the ability of a system, process, or operation to be reliably and traceably examined and verified. This usually involves setting up tracking and logging mechanisms (logs) that record all data-related interactions and activities. It is the creation and management of complete and transparent audit trails for all data-related interactions and activities, to ensure transparency, compliance, security, and confidence in the use and handling of data.
BIG DATA, a term used to describe a complex, massive data set that requires advanced collection, storage, processing, and analysis techniques.
DATA is raw information, often in digital form, that can be processed and analyzed to obtain information.
DATA ANALYST – A professional who collects, processes, analyzes, and interprets data to provide actionable information for the organization.
DATA BACKUP – The process of regularly copying and backing up data to ensure its availability in the event of a hardware failure, data corruption, or disaster.
DATA BREACH – An incident in which confidential data is compromised, usually as a result of a security breach, which can have harmful consequences for individuals or organizations.
DATA CATALOG is the organized, centralized directory that exhaustively lists the data sets available within an organization. This catalog provides a detailed description of the data, including metadata, content, sources, formats and access rights. Its goal is to facilitate efficient discovery, access, and use of data by internal or external users. By providing a holistic view of available data resources, the data catalog helps improve decision-making, collaboration, and data governance within the business unit.
DATA CLEANING – The process of detecting and correcting errors, inconsistencies, and replicas in data sets to improve their quality and reliability for analysis.
DATA COLLECTION, in the context of data governance refers to the systematic and organized process of gathering information from various sources, with the aim of building workable datasets. This process involves clearly defining the data to be collected, identifying reliable and relevant sources, and putting in place mechanisms to ensure the integrity, quality and compliance of the data collected. Data governance plays a crucial role in data collection by establishing policies and procedures to ensure confidentiality, compliance with regulatory standards, and responsible use of collected information.
DATA COMPRESSION – A technique for reducing the size of data files by eliminating redundancy or using compression algorithms, saving storage space and facilitating data transmission.
DATA DRIVEN DECISION MARKETING – A decision-making process based on the analysis of relevant data and information to improve the accuracy, speed, and quality of strategic and operational decisions.
DATA ENCRYPTION – The process of converting data into an unreadable format, called a cryptogram, to protect its confidentiality and security during storage or transmission.
DATA ETHICS: A set of principles and moral standards that govern the collection, use, and management of data to ensure the responsible and ethical use of personal and sensitive information.
DATA EXPLORATION is the data mining phase that follows the data preparation step. Data Exploration is the process by which businesses can interactively explore a significant amount of data, often presented to them via data visualization tools such as charts, diagrams and other dashboards, to gain a clearer, more comprehensible and comprehensive view of the data and identify potential correlations between them for analytical purposes.
DATA GOVERNANCE – A set of policies, processes, and standards to ensure the quality, security, confidentiality, and integrity of data in an organization.
DATA GOVERNANCE FRAMEWORK – An organizational and methodological structure that establishes policies, processes, and responsibilities for the effective management and control of data in an organization.
DATA INTEGRATION – The process of combining data from different sources to create a unified, consistent view of information, making analysis and decision-making easier.
DATALAKE is a data storage architecture that collects, stores, and manages large amounts of raw, semi-structured, and unstructured data from various sources within an organization. Unlike traditional data warehouses, a datalake allows data to be kept, for example, in its original form without a priori structuring, thus providing flexibility for exploration and analysis.
DATA LITERACY: The ability of individuals to understand, interpret, and effectively use data in their work or personal context, enabling them to make informed decisions and fully exploit the potential of available information.
DATA MARKING (Tag) refers to the act of attaching specific metadata to a data item or data set. Metadata is information that describes or characterizes the data, but is not part of its own content. Tags or tags are used to organize, categorize, search and track data more efficiently in different forms (labels, keywords, codes, etc.).
DATA MINING – The process of extracting meaningful patterns and relationships from large data sets, often used to uncover hidden patterns and trends.
DATA MIGRATION – The process of moving data from one system or platform to another, usually as part of a technology upgrade or infrastructure consolidation.
DATA MONETIZATION – The process of creating economic value from data by transforming it into marketable products, services, or information, which can generate new revenue or business opportunities for organizations.
DATA OWNERSHIP – Assigns responsibility and legal rights over data to a specific entity or person, defining who can access, edit, or share the data.
DATA PREPARATION is the first step in a Business Intelligence project. The Data Readiness module, which is the process of collecting, combining, structuring, and organizing information for analysis in data visualization and analysis applications. The objective of data preparation is to ensure the consistency and quality of data by transforming raw data into useful information for users who have to make decisions. By preparing data for analysis well in advance, organizations can maximize the value of their information and the relevance of policy decisions.
DATA PRIVACY – Protects personal and sensitive information from unauthorized access, disclosure, or misuse in accordance with data privacy laws and regulations.
DATA PRIVACY REGULATION – Government laws and regulations that govern the collection, storage, processing, and dissemination of personal data, such as the GDPR (General Data Protection Regulation) in Europe or the California Consumer Privacy Act (CCPA) in the United States.
DATA QUALITY – A measure of the accuracy, consistency, reliability, and relevance of data, often evaluated against specific criteria set by the organization.
DATA REPOSITORY, a “Data Repository” generally refers to a centralized and organized structure that stores and manages metadata and critical data information within an organization. It acts as a reference point for describing, cataloging and documenting different data sources, field definitions, data relationships, etc. is intended to improve data management, facilitate collaboration between teams, reduce redundancy and errors in data documentation, and foster a common understanding of information across the organization.
DATA SECURITY – A set of measures and practices designed to protect data from threats such as hacking, malware, unauthorized access, and information leakage.
DATA SCIENTIST: An expert in data analysis that uses advanced statistical, computational, and mathematical techniques to explore and interpret data, often to solve complex problems.
DATA SOURCE, a data source refers to any data source or provider that feeds information into an organization. These sources can include internal databases, enterprise systems, third-party applications, web services, external files, and more. Managing data sources is critical to ensuring data quality, security, compliance, and traceability throughout its lifecycle. A catalog of data sources allows the various sources to be documented and tracked, thereby facilitating their responsible and relevant use in the organization’s activities.
DATA SOVEREIGNTY, “data sovereignty” refers to the principle that an organization or a country must exercise full control over its data and protect it from unauthorized access, use or storage by third parties. This means that data belonging to an entity remains under its control and is not subject to foreign laws or regulations that could compromise the confidentiality or security of sensitive information. Data sovereignty has become crucial in an increasingly connected world, where the protection of personal, business, and government data is essential to preserve the trust, security, and autonomy of all involved.
DATA VISUALIZATION – Use of charts, tables, and other visual tools to represent data in ways that make trends and patterns easier to understand.
DATA WAREHOUSE: A centralized, integrated data warehouse that stores large amounts of data from different sources, making analysis and decision-making easier.
DOCUMENT, in the context of a database, a “document” is a fundamental unit of information that gathers and represents data in a structured form. It can be a single record, containing fields or attributes that define its specific characteristics. Documents are usually organized according to a data model, and stored in appropriate formats.
FININT, better known as Finance Intelligence or Financial Intelligence, is the collection of information about organizations’ financial affairs to understand their nature and volumes and predict their intentions. This method is generally used to detect money laundering, often as part of or as a result of other criminal activity.
GEOINT or Geospatial Intelligence refers to information derived from the analysis of images or data relating to a specific spatial location. This imagery was originally used to study and evaluate human activity and geography across the globe for military projects. Its use has been diversified into other uses such as academic research or business issues for private sector companies.
HETEROGENEOUS DATA, “heterogeneous data” refers to data from different sources or types, with varying formats, structures, and characteristics. This data can be collected from different computer systems, incompatible databases, various software, sensors, or from external sources such as files, documents, media, etc. Because of this diversity, heterogeneous data can be difficult to integrate and analyze in a homogeneous manner. Managing and analyzing heterogeneous data requires tools and techniques that are tailored to ensure interoperability, consistency, and optimal use of the data in a given context.
MACHINE LEARNING: A branch of artificial intelligence that enables computer systems to automatically learn and improve from data without being explicitly programmed mem.
METADATA: Structured information that describes the characteristics, attributes, and context of data, making it easy to find, organize, and understand.
MULTI-INT, a collection of intelligence techniques. The Argonos Data Operating System is a MULTI-INT system.
NEED TO KNOW is an information security principle that limits access to confidential data or information only to those who have a legitimate reason to access it in the course of their professional duties. This principle aims to restrict access to sensitive information only to authorized users, thereby reducing the risk of unauthorized disclosure, breach of confidentiality or data leakage. By putting in place appropriate access controls and enforcing the need to know, organizations can better protect their sensitive information and ensure that it is used responsibly and in accordance with applicable policies and regulations.
ONTOLOGY, in the context of a datalake, an ontology is an organized semantic representation of the concepts, relationships, and data schemas used to describe and structure the information stored in the datalake. It defines business-specific terms, classifications, and data linkages, providing a common and unified understanding of the datalake content. Ontology facilitates user search, navigation, and interpretation of data, improving discovery of relevant data and consistency of analysis. In addition, it plays a critical role in data governance by establishing rules and standards for the use, integrity, and quality of data in the datalake.
OSINT stands for Open Sources Intelligence. It is an intelligence technique used by internal security agencies to investigate military and civilian investigations as well as journalists in the context of investigations, for example. It exploits information such as the web, traditional search engines, social networks, blogs, forums, newspapers, magazines, etc. Any online publication made available and free to the general public. The OSINT system makes it possible to take advantage of the enormous flow of information available from open sources, to select the most relevant and thus process it in order to obtain the most accurate conclusions possible.
REGULATORY FRAMEWORK, this refers to the set of laws, rules, regulations and standards that govern the use, collection, processing, storage and protection of data, as well as the development of software, while respecting the rights and privacy of individuals. This framework aims to ensure a balance between technological innovation. It can evolve over time to adapt to technological advances and new state concerns.
SIGINT is signals intelligence (SIGINT) intended to collect information by intercepting signals, whether it be communications between persons (communication intelligence – the acronym for COMINT) or electronic signals not directly used in communication (electronic intelligence – abbreviated to ELINT). Because classified and sensitive information is typically encrypted, signals intelligence in turn involves the use of crypto-analysis to decrypt messages. Traffic analysis – the study of who reports who and how much – is also used to process information.
SOCMINT, or the Intelligence of Social Media, is a sub-branch of Open Source Intelligence (OSINT). It is a methodology that collects data available on publicly available social networks (e.g. public Facebook posts or LinkedIn posts) that are either private. This information can be in text, image, or video format. Private information – such as content shared with friends – cannot be accessed without the owner’s prior permission.
STRUCTURED DATA, “structured data” is data organized in a predefined and consistent format, with defined rules for how information is stored and interrelated. They are typically presented in tables, files, or relational databases, where each piece of information is stored in columns and rows, and each piece of information can be identified by a unique key. Structured data is easy to interpret and analyze, making it ideal for query, filtering, and aggregation operations, as well as for performing calculations and statistics.
The General Data Protection Regulation (GDPR) was adopted by the European Parliament in 2016 and became effective in 2018. It establishes a legal framework for the protection of personal data in Europe. Foreign organizations, acting as data controllers or processors and processing personal data from the European Union (EU), must also apply this law. Data controllers are responsible for ensuring compliance of their activities with the CNIL and must be able to demonstrate this if necessary.
TRACABILITY (Data Lineage) refers to the ability to systematically track and document the origin, path and transformation of data throughout its life cycle. This includes understanding how data is extracted, transformed and loaded (ETL), how it changes over time
UNSTRUCTURED DATA, “unstructured data” is information that does not follow a predefined format or rigid organization, making its storage and analysis more complex than structured data. This data is not organized into tables or diagrams, and may include free text, images, videos, audio files, e-mails, web pages, PDFs, social media pages, etc. Due to its unstructured nature, this data requires specific analysis techniques, such as natural language processing (NLP) or computer vision, to extract meaningful and useful information.