Arachne 1.0
Arachne - the perpetual stitcher of Wikidata entities.
Loading...
Searching...
No Matches
arachnespace::pheidippides Class Reference

Batch courier for Wikidata/Commons: collects IDs, issues HTTP requests, and returns a merged JSON payload. More...

#include <include/pheidippides.hpp>

Collaboration diagram for arachnespace::pheidippides:

Public Member Functions

nlohmann::json fetch_json (const std::unordered_set< std::string > &batch, corespace::entity_kind kind=corespace::entity_kind::any)
 Fetch metadata for a set of entity IDs and return a merged JSON object.
nlohmann::json sparql (const corespace::sparql_request &request)
 Execute a SPARQL query according to the provided request.
nlohmann::json wdqs (std::string query)
 Convenience wrapper to run a raw SPARQL query string.
const corespace::network_metricsmetrics_info () const
 Access aggregated network metrics of the underlying client.
corespace::call_preview preview (const corespace::sparql_request &request) const
 Produce a call preview describing the HTTP request that would be made.

Static Public Member Functions

static std::string join_str (std::span< const std::string > ids, std::string_view separator="|")
 Join a span of strings with a separator (no encoding or validation).

Private Member Functions

corespace::call_preview build_call_preview (const corespace::sparql_request &request) const

Private Attributes

corespace::options opt {}
 Request shaping parameters (chunking, fields, base params).
corespace::http_client client {}
 Reused HTTP client (not thread-safe across threads).
corespace::wdqs_options wdqs_opt {}

Detailed Description

Batch courier for Wikidata/Commons: collects IDs, issues HTTP requests, and returns a merged JSON payload.

Responsibilities:

  • Pick the endpoint based on entity kind:
  • Build request parameters:
    • for E (EntitySchema): action=query, titles=EntitySchema:<id>, prop=<joined opt.prop>
    • for others: action=wbgetentities, ids=<id>|<id>..., props=<joined opt.props>
  • Split the input set into chunks up to batch_threshold.
  • Filter IDs by expected kind using arachne::identify(id).
  • Merge per-chunk JSON responses using merge_patch.

Thread-safety:

  • Not thread-safe; the instance owns a reusable http_client (single easy handle). Use one instance per calling thread.
Note
The implementation currently issues requests even when a chunk becomes empty after filtering (for example when kind == entity_kind::any). The server response for an empty identifier list is merged as-is.

Definition at line 59 of file pheidippides.hpp.

Member Function Documentation

◆ build_call_preview()

corespace::call_preview arachnespace::pheidippides::build_call_preview ( const corespace::sparql_request & request) const
private

Definition at line 143 of file pheidippides.cpp.

145 {
146 using namespace corespace;
147
148 call_preview preview;
149 const auto& profile = get_service_profile(service_kind::wdqs);
150 preview.url = profile.base_url;
151
152 const std::size_t threshold
153 = request.length_threshold == sparql_request::service_default
154 ? wdqs_opt.length_threshold
155 : request.length_threshold;
156
157 const auto method = choose_http_method(request, threshold);
158 preview.method = method;
159
160 preview.timeout_sec
161 = request.timeout_sec >= 0 ? request.timeout_sec : wdqs_opt.timeout_sec;
162
163 preview.accept = resolve_accept(request, profile, wdqs_opt.accept_override);
164
165 if (method == http_method::get) {
166 preview.query_params.emplace_back("query", request.query);
167 append_common_params(service_kind::wdqs, method, preview.query_params);
168 } else {
169 const auto [content_type, use_form_body]
170 = resolve_body_strategy(request);
171
172 preview.content_type = content_type;
173 preview.use_form_body = use_form_body;
174 if (preview.use_form_body) {
175 preview.form_params.emplace_back("query", request.query);
176 sort_parameters(preview.form_params);
177 } else {
178 preview.body = request.query;
179 }
180 append_common_params(service_kind::wdqs, method, preview.query_params);
181 }
182
183 return preview;
184}
corespace::call_preview preview(const corespace::sparql_request &request) const
Produce a call preview describing the HTTP request that would be made.
corespace::wdqs_options wdqs_opt
std::string resolve_accept(const sparql_request &request, const service_profile &profile, const std::string_view override_accept)
Resolves the Accept header value for a SPARQL request.
Definition utils.cpp:55
http_method choose_http_method(const sparql_request &request, const std::size_t threshold)
Chooses the appropriate HTTP method for a SPARQL request.
Definition utils.cpp:42
std::pair< std::string, bool > resolve_body_strategy(const sparql_request &request)
Determines the body content and strategy for a SPARQL request.
Definition utils.cpp:69
void sort_parameters(parameter_list &params)
Sorts the parameter list in-place by key.
Definition utils.cpp:96
const service_profile & get_service_profile(const service_kind kind)
Retrieve the service profile for a given service kind.
Definition utils.cpp:87
void append_common_params(const service_kind kind, const http_method method, parameter_list &params)
Appends common parameters required for a service and HTTP method.
Definition utils.cpp:105

References corespace::service_profile::base_url, corespace::call_preview::body, corespace::choose_http_method(), corespace::call_preview::content_type, corespace::get, corespace::get_service_profile(), corespace::wdqs_options::length_threshold, corespace::call_preview::method, corespace::sparql_request::query, corespace::call_preview::timeout_sec, corespace::sparql_request::timeout_sec, corespace::wdqs_options::timeout_sec, corespace::call_preview::url, corespace::call_preview::use_form_body, corespace::wdqs, and wdqs_opt.

Referenced by preview(), and sparql().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ fetch_json()

nlohmann::json arachnespace::pheidippides::fetch_json ( const std::unordered_set< std::string > & batch,
corespace::entity_kind kind = corespace::entity_kind::any )

Fetch metadata for a set of entity IDs and return a merged JSON object.

Behavior:

  • Empty batch results in an empty JSON object.
  • For kind == entity_kind::entity_schema, IDs are prefixed with EntitySchema: and fields come from opt.prop.
  • For other kinds, fields come from opt.props.
  • Only elements where arachne::identify(id) == kind are included in a request chunk; if the filter removes every element the request still executes with an empty identifier list and the response is merged.
  • Chunk responses are merged into a single object via merge_patch.

Errors:

  • Transport or HTTP errors are handled by the internal http_client retry policy; terminal failures throw std::runtime_error.
  • Invalid JSON payloads propagate nlohmann::json::parse_error from nlohmann::json::parse.
Parameters
batchSet of full IDs (e.g., "Q123", "L7-F1", "E42").
kindTarget entity kind (selects API, fields, and filtering).
Returns
Merged JSON object with fetched data.

Definition at line 29 of file pheidippides.cpp.

32 {
33 if (batch.empty()) {
34 return nlohmann::json::object();
35 }
36 std::string url
38 ? "https://www.wikidata.org/w/api.php"
39 : "https://commons.wikimedia.org/w/api.php");
40 std::string props
42 : join_str(opt.prop));
43
44 corespace::parameter_list base_params { opt.params };
46 base_params.emplace_back("action", "query");
47 } else {
48 base_params.emplace_back("action", "wbgetentities");
49 }
50
51 std::string prefix {};
53 prefix = "EntitySchema:";
54 }
55 nlohmann::json combined = nlohmann::json::object();
56 for (auto&& chunk : batch | std::views::chunk(opt.batch_threshold)) {
57 std::vector<std::string> chunk_vec;
58 for (const auto& id : chunk) {
59 if (arachne::identify(id) != kind) {
60 continue;
61 }
62 chunk_vec.emplace_back(prefix + id);
63 }
64 corespace::parameter_list params { base_params };
65 auto entities = join_str(chunk_vec);
66
68 params.emplace_back("titles", entities);
69 params.emplace_back("prop", props);
70 } else {
71 params.emplace_back("ids", entities);
72 params.emplace_back("props", props);
73 }
74 auto r = client.get(url, params);
75 auto data = nlohmann::json::parse(r.text, nullptr, true);
76 if (!data.is_object()) {
77 continue;
78 }
79 combined.merge_patch(data);
80 }
81 return combined;
82}
static corespace::entity_kind identify(const std::string &entity) noexcept
Determine the kind of a full ID string.
Definition arachne.cpp:122
corespace::http_client client
Reused HTTP client (not thread-safe across threads).
static std::string join_str(std::span< const std::string > ids, std::string_view separator="|")
Join a span of strings with a separator (no encoding or validation).
corespace::options opt
Request shaping parameters (chunking, fields, base params).
@ mediainfo
IDs prefixed with 'M'.
Definition utils.hpp:51
@ entity_schema
IDs prefixed with 'E'.
Definition utils.hpp:52
std::vector< parameter > parameter_list
Ordered list of query parameters appended to the URL.
Definition utils.hpp:63

References corespace::entity_schema, and corespace::mediainfo.

◆ join_str()

std::string arachnespace::pheidippides::join_str ( std::span< const std::string > ids,
std::string_view separator = "|" )
static

Join a span of strings with a separator (no encoding or validation).

Edge cases:

  • Empty input yields an empty string.
  • Separator defaults to "|" (useful for MediaWiki multi-ID parameters).
Parameters
idsInput strings to join.
separatorSeparator between elements (default: "|").
Returns
Concatenated string.

Definition at line 128 of file pheidippides.cpp.

130 {
131 if (ids.empty()) {
132 return {};
133 }
134 auto it = ids.begin();
135 std::string result = *it;
136 for (++it; it != ids.end(); ++it) {
137 result.append(separator);
138 result.append(*it);
139 }
140 return result;
141}

◆ metrics_info()

const corespace::network_metrics & arachnespace::pheidippides::metrics_info ( ) const
nodiscard

Access aggregated network metrics of the underlying client.

Returns
Const reference to metrics snapshot.

Definition at line 119 of file pheidippides.cpp.

119 {
120 return client.metrics_info();
121}

◆ preview()

corespace::call_preview arachnespace::pheidippides::preview ( const corespace::sparql_request & request) const
nodiscard

Produce a call preview describing the HTTP request that would be made.

The returned call_preview contains all information necessary to perform the request without actually executing it: resolved URL, HTTP method, query/form parameters, content type/body, Accept header, timeout, and whether the body should be sent as form data.

Parameters
requestSPARQL request used to compute the preview.
Returns
A filled corespace::call_preview describing the planned call.

Definition at line 124 of file pheidippides.cpp.

124 {
125 return build_call_preview(request);
126}
corespace::call_preview build_call_preview(const corespace::sparql_request &request) const

References build_call_preview().

Here is the call graph for this function:

◆ sparql()

nlohmann::json arachnespace::pheidippides::sparql ( const corespace::sparql_request & request)

Execute a SPARQL query according to the provided request.

Builds the HTTP call preview from request, issues the HTTP call via the internal http_client and parses the returned payload as JSON.

Errors:

  • Transport or HTTP failures propagate from the internal http_client (may throw std::runtime_error on terminal failure).
  • Malformed JSON in the response propagates nlohmann::json::parse_error.
Parameters
requestStructured SPARQL request (query text, method hint, accept/content-type overrides, timeout, etc.).
Returns
Parsed JSON object containing the service response.

Definition at line 84 of file pheidippides.cpp.

84 {
85 const auto
86 [method, url, query_params, form_params, body, content_type, accept,
87 timeout_sec, use_form_body]
88 = build_call_preview(request);
89 if (method == corespace::http_method::get) {
90 return nlohmann::json::parse(
91 client.get(url, query_params, accept, timeout_sec).text, nullptr,
92 true
93 );
94 }
95 if (use_form_body) {
96 return nlohmann::json::parse(
97 client
98 .post_form(url, form_params, query_params, accept, timeout_sec)
99 .text,
100 nullptr, true
101 );
102 }
103 return nlohmann::json::parse(
104 client
105 .post_raw(
106 url, body, content_type, query_params, accept, timeout_sec
107 )
108 .text,
109 nullptr, true
110 );
111}

References build_call_preview(), and corespace::get.

Here is the call graph for this function:

◆ wdqs()

nlohmann::json arachnespace::pheidippides::wdqs ( std::string query)

Convenience wrapper to run a raw SPARQL query string.

Constructs a default sparql_request with the provided query and forwards to sparql().

Parameters
querySPARQL query string to execute.
Returns
Parsed JSON object containing the service response.

Definition at line 113 of file pheidippides.cpp.

113 {
114 corespace::sparql_request request;
115 request.query = std::move(query);
116 return sparql(request);
117}
nlohmann::json sparql(const corespace::sparql_request &request)
Execute a SPARQL query according to the provided request.

References corespace::sparql_request::query.

Member Data Documentation

◆ client

corespace::http_client arachnespace::pheidippides::client {}
private

Reused HTTP client (not thread-safe across threads).

Definition at line 161 of file pheidippides.hpp.

161{};

◆ opt

corespace::options arachnespace::pheidippides::opt {}
private

Request shaping parameters (chunking, fields, base params).

Definition at line 159 of file pheidippides.hpp.

159{};

◆ wdqs_opt

corespace::wdqs_options arachnespace::pheidippides::wdqs_opt {}
private

Definition at line 162 of file pheidippides.hpp.

162{};

Referenced by build_call_preview().


The documentation for this class was generated from the following files: