# Latest Developments

TreeKipedia is currently in the Data Collection and Cleaning phase. We have gathered over 12,000,000 occurrences of tree species data from global sources such as GBIF (Global Biodiversity Information Facility), iNaturalist, idigbio, and National History Museums. Our team is actively working on cleaning this data to ensure its accuracy by removing duplicates, correcting errors, and standardizing formats.

### **Key Activities in This Stage:**

**Data Cleaning:** Ensuring all collected data is accurate and consistent. This involves removing duplicate records, correcting errors, and standardizing data formats to create a reliable foundation for the TreeKipedia database.

**Column Selection**: We are in the process of selecting the most relevant columns from each dataset, focusing on providing essential information for various user groups, such as observers, planters, and data validators.

**Data Integration:** Preparing the cleaned and selected data for integration into the TreeKipedia platform. This step is crucial for creating a cohesive and functional database that supports all user activities.

### Completed Steps

#### Step 1: Data Acquisition and Ingestion

**Data Source Selection:** Prioritize suitable data sources like GBIF, Wikidata, and other relevant repositories.

**Data Extraction:** Develop web scraping techniques to extract the desired tree-related data from these sources.

<div><figure><img src="/files/FttwmVAByZ5t2iwMA1OP" alt=""><figcaption></figcaption></figure> <figure><img src="/files/hQHWcuOhHUbiS3zI37UG" alt=""><figcaption></figcaption></figure></div>

**Data Cleaning and Standardization:** Clean and standardize the extracted data to ensure consistency, accuracy, and compatibility.

**Data Ingestion:** Store the cleaned data in a suitable data storage system, such as GraphDB.

#### **Step 2: Ontology Development and Data Enrichment**

**Ontology Creation:** Develop a comprehensive ontology that defines the terms, concepts, and relationships relevant to tree data. This ontology provides a structured framework for organizing and understanding the information.

<div><figure><img src="/files/ex805EFDPlG5FeeCHslh" alt=""><figcaption></figcaption></figure> <figure><img src="/files/Pt5l5N3jXyAZwZmLTCc4" alt=""><figcaption></figcaption></figure> <figure><img src="/files/2Ct24oVUbQ92yDTBkIWw" alt=""><figcaption></figcaption></figure></div>

**Data Enrichment:** Use the ontology to annotate and enrich the ingested data with additional context and meaning. Adding taxonomic information, geographical coordinates, or other relevant details.

#### Step 3: Data Processing and Analysis

**Data Transformation:** Applying transformations to the data, such as data cleaning, normalization, and feature engineering using R-Studio.

**Data Analysis:** Planning to use statistical methods, machine learning algorithms, or other analytical techniques to extract insights and patterns from the data. This might involve tasks like species identification, habitat analysis, or trend analysis.

### Additional Supporting images

Database filtering process:

<figure><img src="/files/y3XS5nnZvtHbsXkyc0SN" alt=""><figcaption></figcaption></figure>

Graph Database Setup:

<div><figure><img src="/files/JwJwcUIQsYATmW9mGLfR" alt=""><figcaption></figcaption></figure> <figure><img src="/files/N0JHRzMkSJTzsCJ2nK2A" alt=""><figcaption></figcaption></figure></div>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.silvi.earth/treekipedia/latest-developments.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
