DATA GOUVERNANCE

Last update: January 17, 2025

1. DATA TREATMENT

The availability of huge amount of data, especially in the various company’s annual reports, gives us an unprecedented opportunity to apply current mathematical models. ICEBERG DATA LAB, has a unique expertise in this field. The use of syntactical analysis, machine learning or even deep learning algorithms allow us to extract information such as emissions, raw material purchase or company’s output. This data lake is feeding our engine to compute corporate’s impact.

This system allows us to significantly improve the efficiency of data collection and avoid analysts from spending hours looking for data in heavy reports. Even if this approach is already showing interesting results at Iceberg Data Lab, machine learning algorithms are not 100% accurate and they need to be trained. Therefore, the role of the analyst will not disappear but will be a part of an interaction where the system will propose available data, and the analysts will pick the right ones. By choosing, we will know which data are the most relevant and step by step being more and more accurate. Once this virtuous loop is settled, the analyst’s efficiency will improve more and more until we will reach the stage where the machine will propose a straightforward solution and the analyst will review and validate.

 

2. TRANSPARENCY

2.1. Sources of information

We will document our assessment on public sources of information, such as annual or environmental reports from corporates. We may have access to private information from the corporates, but the results of the calculation will always be public. Clients may have access, under specific terms, to more granular datapoints. In case of uncertainty on a calculation, the analysts are authorized to disclose, on an exceptional basis, the underlying assumptions of our calculation.

2.2. Customer support

Training

We have implemented an onboarding process, whereby the customer will receive a training (1h) from a dedicated analyst to explain ICEBERG DATA LAB methodology. A dedicated analyst will be the touchpoint of the client after reception of the first dataset and ensure, during 2 weeks after reception of the first dataset, that the client has no issues with the dataset.

After-Sales support

After this onboarding stage, a Relationship Manager will be the contact point of the client. Training documents and a FAQ will be made available to the client through the client’s platform.

Customer Group

On top of the bilateral discussion with their Relationship Manager, all clients under licence will be invited every 3 months to a conference call to inform them of updates in the coverage, metrics and sectoral methodology. 30 minutes will be dedicated to a Q&A to allow customers to raise specific questions or concerns about the approach or the dataset.

 

3. QUALITY

3.1. Quality

In the data calculation process, many sources of mistakes can occur (corrupted data published by the client, mistake during the data collection, wrong understanding of the analyst, methodological bias, etc.). We cannot guarantee a database free of mistakes, especially considering the target we aim at in terms of coverage (several thousand lines). However, what we can ensure and commit to deliver to our client is a process aligned with the best standards to ensure that we reduce the number of mistakes, eliminate the most consequential ones and be transparent about our quality control process and correction of mistakes.

Quality Control

No analyst will be authorized to validate an analysis or the automated calculation before its training period is over. He will be under the supervision of an experienced analyst until that point. Automated control will be programmed to detect errors based on gaps within the sector, within the universe, evolution from last year, evolution since last change, etc.
A systematic control is being performed by the Head of Analyst Team on the lines validated and published on the database at the end of each month. The results of this audit and remediation actions will be decided on by the Management Team of Iceberg Data Lab (CEO, CTO, Head of Research).

Auditability

Our contract will include an audit clause allowing our clients to audit us, which is the best guarantee of transparency we can offer. On top of that, the results of the weekly quality control performed on the production will be made available -on request- to our clients.

Treatment of errors

Two different issues should be distinguished:

  • The correction of mistakes will be made without delay (for instance the misallocation of a sector to a corporate). The analysis will then be versioned with an explicit mention that the previous version was corrupted
  • The correction of methodological biases will be aligned with the process related to methodological updates. No retroactive correction of the datapoints will be made

Audit trail

Traceability is instrumental in understanding mistakes. For that reason, and in accordance with the Gold Data Standards, all manual validations of the dataset will be recorded, along with the ID of the persons doing it. Moreover, the source of information will be recorded, and assumptions made in the course of an analysis will also be stored (on the production tool, not accessible to clients) in order to allow the establishment of an audit trail.

3.2. Methodological updates

Each of our sectors has a lead analyst in charge. Among the analyst’s responsibilities are, based on its expertise, identifying methodological biases. We systematically review the methodology and model for each high-stake sector every year, every 2 years for the medium-stake sectors and every 4 years for the low-stake sectors. This review will be presented for advice to Iceberg Data Lab Scientific Committee. This Committee, established to supervise the initial extension of our sectoral coverage, will continue advising on the needed methodological updates to stay abreast of scientific progress and development of standards. Iceberg Data Lab will participate in relevant committees to promote the development of Research and Standards related to Biodiversity and Socially Responsible Investing more broadly (PRI, F4T, FIR, etc.).

3.3. Versioning of the methodologies

The version used for each methodology will be numbered along the following taxonomy:
XX.YY.ZZZZ

  • Where XX is the version of the methodology, a change of version signals that the results are not comparable with the results obtained from the older version and that previous results cannot be produced anymore with the methodology.
  • YY signals a new feature in the methodology, like a coverage extension.
  • ZZ signals the correction of an error.

 

4. CUSTOMIZATION

Our system allows our clients a broad extent of customization to fit their specific needs. For instance, it is possible to custom the financial ratio to be calculated (sales or EV instead of capital employed), if it is useful for a stronger alignment with a client’s existing reporting standards. We allow a broad extent of customization of the calculation and allow it to be fed by a customer’s data. For instance, if a client has existing climate or pollution metrics or want to replace our financial data or production data by its own data, and so change the input of our calculation, this customization of the production process will be possible and subject to a specific budget and schedule to implement this custom approach.

 

5. CONNECTIVITY

Our platform and dataset are designed to easily connect with the IT systems of our clients and allow a safe and easy interface ability.

5.1. Nomenclature

We have chosen to design our data architecture based on a public nomenclature. This non-proprietary design will ease the matching between our sectoral assessment and other systems. We will collect and aggregate the ID of securities and corporates (ISIN, LEI) and match them with our unique ID which will be allocated to each assessed entity.

5.2.  SaaS & API

Our solution is SaaS, with a client platform which will allow clients, using an unique ID to connect to the platform, upload their portfolio and self-assess them, and download datapoints. Its functionality is still progressively being expanded. We also developed an API to feed directly the IT systems of our clients without connecting to the platform.

 

6. INTEGRITY & SECURITY

6.1. Conflict of Interest

Iceberg Data Lab has no advisory business with any corporates. We will not work with issuers, for instance, to support a Green Bond emission or to improve its ESG score. As a result, we will be free of any conflict of interest with issuers. If a corporation wants to know about its assessment, we will open access to its analysis, free of charge, and it will be authorized to post comments and suggest the changes he deems fit. All changes and comments will be reviewed by an analyst. If a corporation disagrees with its assessment, it will have the possibility to escalate to an ad-hoc committee constituted by the Head of Research, a member of our Scientific Committee and a representative of the Customer Group.
However, Iceberg Data Lab calculates the biodiversity footprint of institutional clients which are also its clients. A disclaimer mentions this conflict of interest whenever it arises. Every analysis performed on an institution which is a client of Iceberg Data Lab will have to be validated by the analyst in charge of the sector and the Head of Research and/or the CEO.

6.2 Data Location and protection

Iceberg Data Lab is a European Fintech, our teams are based in Paris, London and Frankfurt, and our servers are based in France (excepted encrypted backup which can be hosted on different countries). No data is stored outside of the EU. Besides, no incoming client portfolio will be stored on a server outside of the EU and, therefore, sensitive client data will stay under the umbrella of the EU rules regarding data protection. The company handles nor stores any personal information.

6.3. IT Security

In order to guarantee the confidentiality of data exchanges, all of our external and internal exchanges are encrypted by SSL.
On the application side, all users and server requests are authenticated via JSON Web Tokens (RFC 7519). All API requests between servers are also validated before they can successfully return data. This ensures that only authorized servers can communicate with each other as so, they will not respond to unknown requests.
On the infrastructure side all of our internal networks are compartmentalized at several levels by customer and by type of service. These networks are all protected by our firewalls that are also capable of detecting network intrusions.
All computers used by our teams are installed with Antivirus software and EDR (Endpoint Detection and response) software.
They can only have access to our different platforms and applications through VPNs. These VPNs only share access to applications depending on the level of access of each of our teammates.
We orchestrate all the security logs and events through our SIEM (Security Information and Event Management) platform.

6.4. SLA

We ensure our clients a 99% SLA. Within our infrastructure we have three types of monitoring:

Security

We have a SIEM and XDR (Extended Detection and Response) systems with agents installed on all of our nodes, vms, servers and firewalls, capable of threats and malware hunting, behavior analysis, exposing vulnerabilities and checking file integrity.

Health

We use Nagios with agents to monitor the health of our servers and to ensure that all needed services are up and working. Each service is contacted every 5 sec to give its status. If another answer than a successful response (200) is provided, we log the service as down and start counting the time of unavailability.

Performance

We also monitor the performance of our networks, disks and other parameters to detect any bottlenecks and dysfunctionalities.

6.5. Backups

We backup all our data, code and infrastructure configuration every day. And they are rotated and recorded for the last 30 days. We also keep monthly and yearly backups.
We use asymmetric file encryption for all our data backups based on a zero-trust backup architecture.
These backups are hosted on servers localized on different servers throughout the world (France, Singapore, Australia) and a set of backups are also cold stored.

6.6. Audits

Our internet exposure is audited on a yearly basis as well as our client applications at a White box assessment level where the tester has full knowledge of the system and network being tested, including the source code, network diagrams, and configuration files. This allows the tester to perform a more comprehensive assessment, as they can understand how the system works and how vulnerabilities could be exploited, execute penetration attempts or DOS attacks.

6.7. Data Recovery

All data uploaded on our platform are available for downloads on the platform. Upon request the data upload can be packed and sent back to the client before deletion. IDL has then a delay of 30 business days to deliver a zipped file with required data.
In case of termination of the company’s activity, zipped file with Client data will be provided to every Client under licence at the date of termination of activity.

6.8. Continuity of Services

Our infrastructure is based on the principle of IAC (Infrastructure as code). As such our infrastructure is created by code and scripts regrouped as projects that are then versioned in our own internal Git servers. These projects are maintained, updated and tested regularly as our infrastructure expands.
These Git servers are in turn backed up as any other servers of our infrastructure on different backup servers. We also have a secondary infrastructure or disaster recovery site with a minimal configuration composed of firewalls and git servers containing all that is needed for an eventual redeployment of our servers.

As the first step of our disaster recovery plan, In the case of any damage to the data center, cyberattack or any other issue, an analysis will be made to determine the cause and the extent of the damage will be measured, before any action can be taken.

In the case of the worst scenarios where the datacenter housing our infrastructure is lost and subsequently a complete reinstallation is needed;
The plan steps are as follows:

  • to restore services to our customers,
  • to allow the restoration of services to our analysts,
  • to restore the rest of our services.

By switching to our secondary site (that can be recreated as needed with our backups), we launch infrastructure CI/CD pipelines to deploy the servers necessary for the first step. Then a second group of scripts will repopulate the data as needed to these servers from our backups.
These steps are then executed again to restore steps two and three of our Disaster recovery plan.

 

7. CRITICAL SUPPLIERS

7.1. Dependency on critical IT suppliers and remediation actions

Supplier Description Remediation
OVH OVH is our hosting provider. Owning 30 data centers in 19 countries and hosting more than 300,000 servers. In case of a failed Data Center, we are able to redeploy all the infrastructure needed to carry on our clients operations in a different OVH Data Center.

 

OVH security policy has been shared with IDL. The document is confidential but its principles are summarized there: 

https://corporate.ovhcloud.com/en/trusted-cloud/security-certifications/