|
Arachne 1.0
Arachne - the perpetual stitcher of Wikidata entities.
|
Accumulates entity IDs into per-kind batches and organizes groups. More...
#include <include/arachne.hpp>

Private Member Functions | |
| void | select_group (std::string name) |
| Select an existing group or create it on demand. | |
| bool | enqueue (std::string_view id, corespace::entity_kind kind, bool interactive) const |
| Decide whether an entity should be enqueued for fetching. | |
| bool | touch_entity (const std::string &id_with_prefix) noexcept |
| Increment the touch counter for a single full ID (prefix REQUIRED). | |
| size_t | add_entity (const std::string &id_with_prefix, bool force=false, std::string name="") |
| Enqueue a full (prefixed) ID string and add it to a group. | |
Static Private Member Functions | |
| static bool | ask_update (std::string_view id, corespace::entity_kind kind, std::chrono::milliseconds age) |
| Placeholder for interactive staleness confirmation. | |
Private Attributes | |
| std::array< std::unordered_set< std::string >, batched_kind_count > | main_batches |
| std::array< std::unordered_set< std::string >, batched_kind_count > | extra_batches |
| std::unordered_map< std::string, std::unordered_set< std::string > > | groups |
| std::unordered_map< std::string, int > | candidates |
| const size_t | batch_threshold = 50 |
| Typical unauthenticated entity-per-request cap. | |
| const int | candidates_threshold = 50 |
| Intentional high bar for curiosity-driven candidates. | |
| std::string | current_group |
| std::chrono::milliseconds | staleness_threshold = 24h |
| corespace::interface | ui = corespace::interface::command_line |
| pheidippides | phe_client |
Public API | |
| bool | new_group (std::string name="") |
| Create or select a group and make it current. | |
| size_t | add_ids (std::span< const int > ids, corespace::entity_kind kind, std::string name="") |
| Enqueue numeric IDs with a given kind and add them to a group. | |
| int | touch_ids (std::span< const int > ids, corespace::entity_kind kind) |
| Batch variant of touch for numeric IDs. | |
| bool | flush (corespace::entity_kind kind=corespace::entity_kind::any) |
| Flush (send) up to batch_threshold entities of a specific kind. | |
| int | queue_size (corespace::entity_kind kind) const noexcept |
| Get the number of queued (pending) entities tracked in the main batch containers. | |
| static std::string | entity_root (const std::string &id) |
| Extract the lexeme root from a full ID string. | |
| static corespace::entity_kind | identify (const std::string &entity) noexcept |
| Determine the kind of a full ID string. | |
| static bool | parse_id (const std::string &entity, size_t &pos, int &id) |
| Parse a full ID string and extract the numeric portion. | |
| static std::string | normalize (int id, corespace::entity_kind kind) |
| Normalize a numeric ID with the given kind to a prefixed string. | |
Accumulates entity IDs into per-kind batches and organizes groups.
Invariants:
Definition at line 47 of file arachne.hpp.
|
private |
Enqueue a full (prefixed) ID string and add it to a group.
The ID must include its prefix (e.g., "Q123", "L77-F2"). Validation is performed via identify(). Invalid IDs cause an exception. For "L...-F..."/"L...-S...", the group receives the verbatim string while the batch queue stores the lexeme root ("L...") so fetches target the parent lexeme.
| id_with_prefix | Full ID with prefix. |
| force | If true, bypass freshness/existence checks and enqueue anyway. |
| name | Group name; empty targets the current/anonymous group (auto-created if needed). |
| std::invalid_argument | if the ID is invalid or has an unknown prefix. |
Definition at line 235 of file arachne.cpp.
References entity_root(), flush(), identify(), and select_group().

| size_t arachnespace::arachne::add_ids | ( | std::span< const int > | ids, |
| corespace::entity_kind | kind, | ||
| std::string | name = "" ) |
Enqueue numeric IDs with a given kind and add them to a group.
Numeric IDs are normalized by adding the kind prefix.
kind is form or sense, normalization maps to the lexeme prefix ("L<id>"); no warning is emitted yet (logging TODO).| ids | Span of numeric IDs. |
| kind | Entity kind (must NOT be any/unknown). |
| name | Group name; empty targets the current/anonymous group (auto-created if needed). |
| std::invalid_argument | if kind is any/unknown. |
Definition at line 42 of file arachne.cpp.
References corespace::any, select_group(), and corespace::unknown.

|
staticprivate |
Placeholder for interactive staleness confirmation.
The current implementation is non-interactive and always returns false. A future version is expected to prompt the user when cached data is stale and return the user's decision.
| id | Entity identifier under consideration. |
| kind | Detected kind of the entity. |
| age | Age of the cached entry. |
Definition at line 194 of file arachne.cpp.
|
private |
Decide whether an entity should be enqueued for fetching.
This placeholder implementation always returns true, effectively requesting a fetch for every entity. The expected behavior is to consult storage state (exist, last) and return true only when an update is required.
| id | Canonical identifier (e.g., "Q123" or "L7"). |
| kind | Entity kind (lexeme for forms/senses). |
Definition at line 201 of file arachne.cpp.
|
static |
Extract the lexeme root from a full ID string.
For IDs beginning with "L" followed by digits, returns "L<digits>". For other prefixes or malformed strings, returns an empty string.
| id | Identifier to inspect (e.g., "L7-F1"). |
Definition at line 74 of file arachne.cpp.
References corespace::any, corespace::form, identify(), corespace::sense, and corespace::unknown.
Referenced by add_entity(), and touch_entity().


| bool arachnespace::arachne::flush | ( | corespace::entity_kind | kind = corespace::entity_kind::any | ) |
Flush (send) up to batch_threshold entities of a specific kind.
For kind != any, attempts a single-batch flush for that kind (up to the threshold). For kind == any, a round-robin strategy over batchable kinds is used.
| kind | Entity kind selector or entity_kind::any. |
Definition at line 99 of file arachne.cpp.
Referenced by add_entity().

|
staticnoexcept |
Determine the kind of a full ID string.
Accepts prefixed IDs (e.g., "Q123", "L77-F2"). Returns unknown if the string is not a valid ID. The function does not throw.
| entity | Full ID with prefix. |
Definition at line 122 of file arachne.cpp.
References corespace::form, corespace::lexeme, arachnespace::prefixes, corespace::sense, and corespace::unknown.
Referenced by add_entity(), entity_root(), and touch_entity().

| bool arachnespace::arachne::new_group | ( | std::string | name = "" | ) |
Create or select a group and make it current.
If name is empty, creates a new anonymous group with a random name and makes it current. If name exists, it becomes current but is NOT cleared. If it doesn't exist, the group is created and then selected.
| name | Group name or empty for an anonymous group. |
Definition at line 31 of file arachne.cpp.
References current_group, and corespace::random_hex().
Referenced by select_group().


|
static |
Normalize a numeric ID with the given kind to a prefixed string.
Examples:
| id | Numeric identifier. |
| kind | Kind to prefix with (must not be any/unknown). |
| std::invalid_argument | if id is negative or kind is any/unknown. |
Definition at line 165 of file arachne.cpp.
References corespace::any, corespace::form, corespace::lexeme, and corespace::unknown.
|
static |
Parse a full ID string and extract the numeric portion.
| entity | Full ID (e.g., "Q123", "L7-F1", "L7-S2"). |
| pos | In/out index of the first digit within entity. On success the index is advanced past the number. |
| id | Out parameter for the parsed integer portion. |
Definition at line 149 of file arachne.cpp.
|
noexcept |
Get the number of queued (pending) entities tracked in the main batch containers.
| kind | Specific kind, or entity_kind::any to return the sum across all batchable kinds. |
Definition at line 107 of file arachne.cpp.
References corespace::any.
|
private |
Select an existing group or create it on demand.
An empty name selects/creates the anonymous group. A non-empty name is delegated to new_group, which creates the group if necessary.
| name | Group name to activate; empty targets the anonymous group. |
Definition at line 184 of file arachne.cpp.
References current_group, and new_group().
Referenced by add_entity(), and add_ids().


|
privatenoexcept |
Increment the touch counter for a single full ID (prefix REQUIRED).
If the entity is already queued or already has data, returns false (no increment). If the counter reaches candidates_threshold and the entity is not queued, it is moved into the queue. For "L…-F…"/"L…-S…", the exact ID is enqueued (no mapping).
| id_with_prefix | Full ID with prefix. |
Definition at line 224 of file arachne.cpp.
References entity_root(), and identify().

| int arachnespace::arachne::touch_ids | ( | std::span< const int > | ids, |
| corespace::entity_kind | kind ) |
Batch variant of touch for numeric IDs.
Each numeric ID is normalized using kind. If kind is form/sense, a warning is recorded and normalization yields "L<id>" (lexeme).
| ids | Span of numeric IDs. |
| kind | Normalization kind (must not be any/unknown). |
| std::invalid_argument | if kind is any/unknown. |
Definition at line 59 of file arachne.cpp.
References corespace::any, and corespace::unknown.
|
private |
Typical unauthenticated entity-per-request cap.
Definition at line 283 of file arachne.hpp.
|
private |
Definition at line 280 of file arachne.hpp.
|
private |
Intentional high bar for curiosity-driven candidates.
Definition at line 285 of file arachne.hpp.
|
private |
Definition at line 290 of file arachne.hpp.
Referenced by new_group(), and select_group().
|
private |
Definition at line 273 of file arachne.hpp.
|
private |
Definition at line 277 of file arachne.hpp.
|
private |
Definition at line 271 of file arachne.hpp.
|
private |
Definition at line 293 of file arachne.hpp.
|
private |
Definition at line 291 of file arachne.hpp.
|
private |
Definition at line 292 of file arachne.hpp.