Glossary

Actors from the regalian world or those specialised in Defence and Security are brought to scan the digital environment in order to identify elements of risk potentially threatening the security of people, places and organisations.

To do this, many sensors, information sources and types of information that are more or less available, open and structured are used in order to carry out the most secure, reliable and detailed surveillance possible. There is a whole specific lexicon inherent to this sector with numerous acronyms that are often complex to understand for neophytes, which this glossary aims to decipher and simplify for your understanding.

We propose to review the following main acronyms by dissecting their meaning, knowing that they all use the same methodology based on various and varied sources of information and relying on the same process, i.e. the collection of data with a view to analysing and processing them to clarify a more or less strategic decision.  

GEOINT or Geospatial Intelligence refers to information derived from the analysis of images or data relating to a specific spatial location. This imagery was initially used to study and evaluate human activity and geography across the globe in the context of military projects. Its exploitation has been diversified to other uses such as academic research or commercial issues for private sector companies.

OSINT stands for Open Source Intelligence. It is an intelligence technique used by internal security agencies on both military and civilian investigation projects, but also by journalists in the context of investigations for example. It exploits information such as the web, traditional search engines, social networks, blogs, forums, newspapers, magazines, etc. Any online publication made available and free to the general public. The OSINT system makes it possible to take advantage of the enormous flow of information available in open sources, to select the most relevant and thus to process it in order to obtain the most accurate conclusions possible.

SOCMINT or Social Media Intelligence is a sub-branch of Open Source Intelligence (OSINT). It is a methodology that collects data available on social networks open to the public (e.g. public publications on Facebook or LinkedIn) or private. This can be information in text, image or video format. Private information – such as content shared with friends – cannot be viewed without the owner’s prior permission.

ADINT or Advertising Intelligence consists of tracking personal information about end customers or consumers in order to collect all the data related to their profiles, habits and purchasing behaviour. This methodology has an acquisition cost because it involves the purchase of databases. Once this customer information has been collected, analysed and processed, companies are able to carry out a fine segmentation of their audience in order to carry out targeted or even personalised marketing campaigns to win new customers or build loyalty.

SIGINT refers to signals intelligence (SIGINT), which aims to collect intelligence by intercepting signals, whether they are person-to-person communications (communications intelligence – acronym COMINT) or electronic signals not directly used in communication (electronic intelligence – abbreviated as ELINT). As classified and sensitive information is usually encrypted, signals intelligence in turn involves the use of cryptanalysis to decrypt messages. Traffic analysis – the study of who is signalling whom and how much – is also used to process information.

FININT, better known as Finance Intelligence, refers to the collection of information about the financial affairs of organisations in order to understand their nature and volume and to predict their intentions. This method is usually used to detect money laundering, often done as part of or as a result of other criminal activity.

MULTI-INT, brings together several intelligence techniques. The Argonos Data Operating System is a MULTI-INT system.

DATA PREPARATION is the first step in a Business Intelligence project. Data preparation is the process of collecting, combining, structuring and organising information so that it can be analysed in data visualisation and analysis applications. The objective of data preparation is to ensure data consistency and quality by transforming raw data into useful information for users who need to make decisions.  By preparing data for analysis well in advance, companies and organisations can maximise the value of their information and the relevance of strategic decisions.

DATA EXPLORATION refers to the data exploration phase that follows the data preparation stage. Data Exploration is the process by which businesses can interactively explore a large amount of data presented to them, often via data visualization tools such as graphs, charts and dashboards, to gain a clearer, more comprehensive view of the data and identify potential correlations between them for analytical purposes.

Data catalog is the organized, centralized repository that comprehensively lists the data sets available within an organization. The catalog provides a detailed description of the data, including metadata, content, sources, formats and access rights. Its aim is to facilitate the efficient discovery, access and use of data by internal and external users. By providing a global view of available data resources, the data catalog helps to improve decision-making, collaboration and data governance within the entity concerned.

Document, in the context of a database, a “document” is a fundamental unit of information that gathers and represents data in a structured form. It may be a single record, containing fields or attributes that define its specific characteristics. Documents are generally organized according to a data model, and stored in suitable formats.

Data source, a data source refers to any origin or supplier of data that feeds information into an organization. These sources can include internal databases, enterprise systems, third-party applications, web services, external files, and many others. Managing data sources is essential to ensure data quality, security, compliance and traceability throughout its lifecycle. A catalog of data sources enables the various sources to be documented and tracked, facilitating their responsible and relevant use within the framework of the organization’s activities.

Datalake refers to a data storage architecture that enables the collection, storage and management of large quantities of raw, semi-structured and unstructured data from a variety of sources within an organization. Unlike traditional data warehouses, a datalake enables data, in particular, to be preserved in its original form without being structured a priori, thus offering flexibility of exploration and analysis.

Data collection, in the context of data governance, refers to the systematic and organized process of gathering information from various sources, with the aim of creating usable data sets. This process involves clearly defining the data to be collected, identifying reliable and relevant sources, and putting in place mechanisms to ensure the integrity, quality and conformity of the data collected. Data governance plays a crucial role in data collection, by establishing policies and procedures to guarantee confidentiality, compliance with regulatory standards and responsible use of the information collected.

Ontology, in the context of a datalake, an ontology is an organized semantic representation of the concepts, relationships and data schemas used to describe and structure the information stored in the datalake. It defines business-specific terms, classifications and links between data, providing a common, unified understanding of the datalake’s content. The ontology makes it easier for users to search, navigate and interpret data, improving the discovery of relevant data and the consistency of analyses. It also plays an essential role in data governance, establishing rules and standards for the use, integrity and quality of data in the datalake.

Need to know, it’s an information security principle that limits access to confidential data or information to only those people who have a legitimate justification for accessing it as part of their professional duties. This principle aims to restrict access to sensitive information to authorized users only, thus reducing the risk of unauthorized disclosure, confidentiality breaches or data leaks. By implementing appropriate access controls and enforcing the need-to-know principle, organizations can better protect their sensitive information and ensure that it is used responsibly and in compliance with applicable rules and regulations.

ABAC, the Attribute-Based Access Control model, is an information access control model that uses attributes as the basis for access authorization decisions. Unlike the traditional RBAC (Role-Based Access Control) model, which focuses on user roles, ABAC takes into account attributes such as user identity, role, time, location and other relevant information. These attributes are evaluated by an access policy engine to determine whether a user is authorized to access a particular resource or action. ABAC offers finer granularity and greater flexibility to manage authorizations according to specific conditions, improving security and access management in complex IT environments.

Data sovereignty, “data sovereignty” refers to the principle that an organization or country must exercise complete control over its data and protect it from unauthorized access, use or storage by third parties. This implies that data belonging to an entity remains under its control, and is not subject to foreign laws or regulations that could compromise the confidentiality or security of sensitive information. Data sovereignty has become a crucial issue in an increasingly connected world, where the protection of personal, commercial and government data is essential to preserve the trust, security and autonomy of the players involved.

Cognitive services, “cognitive services” refer to technologies that use artificial intelligence. They represent a category of technologies that enable computer systems to process data more intelligently and simulate certain human capabilities. These services are designed to analyze, interpret and understand data in a more advanced way, by simulating cognitive abilities such as perception, natural language comprehension, speech recognition and so on.

NLP, “Natural Language Processing”. NLP is a branch of artificial intelligence that focuses on communication between computers and human beings through natural language, as used in speech and writing. This includes tasks such as machine translation, sentiment analysis, text generation, automatic speech recognition, parsing, question answering and many others.

Structured data, “structured data” is data organized in a predefined, consistent format, with defined rules for how the information is stored and linked together. They are usually presented in the form of tables, files or relational databases, where each item of information is stored in columns and rows, and each item can be identified by a unique key. Structured data is easy to interpret and analyze, making it ideal for query, filtering and aggregation operations, as well as for calculations and statistics.

Unstructured data, “unstructured data” is information that does not follow a predefined format or rigid organization, making it more complex to store and analyze than structured data. This data is not organized in tables or schemas, and can include free text, images, videos, audio files, e-mails, web pages, PDF documents, social networking pages and so on. Due to its unstructured nature, this data requires specific analysis techniques, such as natural language processing (NLP) or computer vision, to extract meaningful and useful information.

Heterogeneous data, “heterogeneous data” refers to data from different sources or of different natures, with different formats, structures and characteristics. This data may be collected from different computer systems, from non-compatible databases, from various software programs, from sensors, or from external sources such as files, documents, media, etc. Because of this diversity, heterogeneous data can be difficult to integrate and analyze seamlessly. Managing and analyzing heterogeneous data requires appropriate tools and techniques to ensure interoperability, consistency and optimal use of this data in a given context.

Data tagging refers to the act of attaching specific metadata to a datum or dataset. Metadata is information that describes or characterizes data, but is not part of its actual content. Tags are used to organize, categorize, search and track data more efficiently, in various forms (labels, keywords, codes, etc.).

Source code, refers to the structured set of guidelines and instructions written in a specific programming language by developers to describe in detail the processes and actions to be performed by a computer system. Source code plays a central role in the software development process, enabling developers to design, implement, test and maintain computer applications.

Regulatory framework: this refers to all the laws, rules, regulations and standards that govern the use, collection, processing, storage and protection of data, as well as the development of software, while respecting the rights and privacy of individuals. This framework aims to ensure a balance between technological innovation. It may evolve over time to adapt to technological advances and new state concerns.

GDPR, the General Data Protection Regulation (GDPR) was adopted by the European Parliament in 2016 and became effective in 2018. It establishes a legal framework for the protection of personal data in Europe. Foreign organizations, acting as data controllers or processors and processing personal data originating from the European Union (EU), must also apply this law. Data controllers are responsible for ensuring that their activities comply with the CNIL, and must be able to demonstrate this where necessary.

IA Act, the Artificial Intelligence Act is a European Union regulation which aims to establish a regulatory framework for the marketing of artificial intelligences, taking into account aspects of safety, health and fundamental rights. The regulation classifies artificial intelligence systems according to their level of risk, ranging from “minimal” to “unacceptable”. It prohibits certain uses that run counter to European values, such as “social credit systems” or mass video surveillance. High-risk” artificial intelligence systems must comply with the strictest regulatory regime in terms of transparency, risk management and data governance.

Data Repository, a “Data Repository” generally refers to a centralized, organized structure that stores and manages metadata and essential information about data within an organization. It acts as a reference point for describing, cataloguing and documenting different data sources, field definitions, data relationships etc. aims to improve data management, facilitate collaboration between teams, reduce redundancy and errors in data documentation, and foster a common understanding of information within the organization.

Auditability refers to the ability of a system, process or operation to be examined and verified in a reliable and traceable way. This usually involves setting up tracking and logging mechanisms to record all data-related interactions and activities. It is the creation and management of complete and transparent audit trails for all data-related interactions and activities, to ensure transparency, compliance, security and trust in the use and handling of data.

Data lineage refers to the ability to systematically track and document the origin, path and transformations of data throughout its entire life cycle. This includes understanding how data is extracted, transformed and loaded (ETL), how it evolves through the various processing stages, and how it is used in various applications and analyses. The main aim of traceability is to ensure the transparency, quality and integrity of data throughout its life cycle.