Latest Developments
Current Stage: 2024 Q3 - Data Collection and Cleaning
Last updated
Current Stage: 2024 Q3 - Data Collection and Cleaning
Last updated
TreeKipedia is currently in the Data Collection and Cleaning phase. We have gathered over 12,000,000 occurrences of tree species data from global sources such as GBIF (Global Biodiversity Information Facility), iNaturalist, idigbio, and National History Museums. Our team is actively working on cleaning this data to ensure its accuracy by removing duplicates, correcting errors, and standardizing formats.
Data Cleaning: Ensuring all collected data is accurate and consistent. This involves removing duplicate records, correcting errors, and standardizing data formats to create a reliable foundation for the TreeKipedia database.
Column Selection: We are in the process of selecting the most relevant columns from each dataset, focusing on providing essential information for various user groups, such as observers, planters, and data validators.
Data Integration: Preparing the cleaned and selected data for integration into the TreeKipedia platform. This step is crucial for creating a cohesive and functional database that supports all user activities.
Data Source Selection: Prioritize suitable data sources like GBIF, Wikidata, and other relevant repositories.
Data Extraction: Develop web scraping techniques to extract the desired tree-related data from these sources.
Data Cleaning and Standardization: Clean and standardize the extracted data to ensure consistency, accuracy, and compatibility.
Data Ingestion: Store the cleaned data in a suitable data storage system, such as GraphDB.
Ontology Creation: Develop a comprehensive ontology that defines the terms, concepts, and relationships relevant to tree data. This ontology provides a structured framework for organizing and understanding the information.
Data Enrichment: Use the ontology to annotate and enrich the ingested data with additional context and meaning. Adding taxonomic information, geographical coordinates, or other relevant details.
Data Transformation: Applying transformations to the data, such as data cleaning, normalization, and feature engineering using R-Studio.
Data Analysis: Planning to use statistical methods, machine learning algorithms, or other analytical techniques to extract insights and patterns from the data. This might involve tasks like species identification, habitat analysis, or trend analysis.
Database filtering process:
Graph Database Setup: