Lightweight asset retrieval (findRaw)
Read-only API that returns MediaHub asset data as plain PHP arrays for high-volume galleries, JSON endpoints, sitemaps, and exports without loading full Page objects.
MediaHub assets are stored as individual ProcessWire pages. Each one carries its own template, fields, image data, label assignments, collection memberships, and crop relationships. When you iterate over a MediaHub field with a standard foreach loop, ProcessWire instantiates a full Page object for every asset in the list, a Pageimage for each image file, and loads all the linked pages those objects reference: labels, collections, crops, master images. At 50 assets this adds roughly 100 database queries. At 200 it is over 400. At 1,000 it is over 2,000 queries, and the memory cost can push a request past a shared host's PHP memory limit.
findRaw(), findRawStream(), and the ->raw() shortcut bypass all of that. They return asset metadata as plain PHP arrays using a small fixed set of SQL queries — typically 8 to 10 — regardless of how many assets you are reading. No Page objects. No Pageimage objects. No hook registration overhead. O(1) queries instead of O(n).
The API is read-only. Use it wherever you need URLs, dimensions, alt text, labels, crop data, or other metadata at scale. Keep the standard object API for anything that needs on-the-fly image resizing, lazy crop generation, or writes.
When to use it
Use findRaw when you are rendering a gallery or list and only need URLs, dimensions, title, alt, labels, crops, and similar metadata. It is the right choice for JSON API responses, sitemap generation, CSV exports, or any job where loading hundreds or thousands of asset pages would be too slow or memory-heavy.
When not to use it
Keep using the standard object API when you need:
$asset->image()->size(400, 300)or any chainedPageimagemethod for on-the-fly resizing$asset->ensureCropImage('square')for on-demand crop generation- Methods on the asset page class such as
usageCount() - Saving or modifying asset pages
findRaw returns associative arrays, not objects. The two APIs complement each other: use findRaw on hot read paths and fall back to object iteration where you need image manipulation or writes.
Why not use ProcessWire's built-in $pages->findRaw()?
ProcessWire ships its own $pages->findRaw(), which returns raw database column values directly. It is a great fit for general-purpose page queries and custom field schemas. MediaHub's findRaw is purpose-built for the asset domain — it knows about the extra tables, relationships, and assembly steps that MediaHub assets require:
- No assembled URLs. ProcessWire's version returns raw filenames from the database (
my-photo.jpg). To get a usable URL you have to reconstruct it manually from the page ID, files path, and filename. MediaHub'sfindRawreturns ready-to-use relative and absolute URLs. - No image dimensions. Width and height live in a separate variations table. ProcessWire's version does not join that table. MediaHub's version does.
- No crop data. Crops are stored in MediaHub's own tables and require several joins to resolve correctly, including the field-level and library-level resolution chain. ProcessWire's version has no knowledge of any of this.
- No per-reference metadata. Per-field description overrides and display tags are stored in a junction table keyed to the page, field, and asset. ProcessWire's version cannot surface these without manual queries.
- No multi-language COALESCE. ProcessWire's version returns the raw database column — typically the default language only. MediaHub's version generates
COALESCE(data{langId}, data)for every language-aware column so the active language is returned with a graceful fallback. - No labels or collection names. These are page references stored in separate tables. ProcessWire's version returns raw page IDs. MediaHub's version resolves them to name strings in the same query batch.
In short: for MediaHub assets, $pages->findRaw() would leave you with raw database fragments you then have to assemble yourself — at which point you are reimplementing most of what MediaHub::findRaw() already does. Use the right tool for the domain.
Quick patterns
1. Render an image gallery
The ->raw() shortcut works directly on any MediaHub field value. Output order matches the editor's sort order.
// Standard loop — fine for small sets, heavy at scale
foreach ($page->gallery as $asset) {
$img = $asset->image();
echo "<img src='{$img->url}' alt='{$asset->altText()}'>";
}
// findRaw — same output shape, bounded cost at scale
foreach ($page->gallery->raw() as $asset) {
if (!$asset['isImage']) continue;
echo "<img src='{$asset['url']}' width='{$asset['width']}' height='{$asset['height']}' alt='{$asset['alt']}'>";
}
2. Build a JSON API endpoint
Because findRaw returns plain arrays, json_encode handles the result directly with no extra serialisation step.
header('Content-Type: application/json');
$pageId = (int) wire('input')->get('id');
$sourcePage = wire('pages')->get($pageId);
if (!$sourcePage->id) {
http_response_code(404);
exit(json_encode(['error' => 'Page not found']));
}
$assets = $sourcePage->gallery->raw([
'fields' => ['focusPoint', 'labels', 'collections'],
]);
echo json_encode($assets);
3. Generate a sitemap of every published asset
findRawStream() is a PHP generator — it yields one row at a time so peak memory stays flat even at 100,000 assets.
$mediaHub = $modules->get('MediaHub');
echo "<?xml version='1.0' encoding='UTF-8'?>\n";
echo "<urlset xmlns='http://www.sitemaps.org/schemas/0.9'>\n";
$selector = 'template=pkd-mediahub-asset, status<2048, sort=id';
foreach ($mediaHub->findRawStream($selector, ['fields' => ['httpUrl', 'modified']]) as $asset) {
echo " <url>\n";
echo " <loc>" . htmlspecialchars($asset['httpUrl']) . "</loc>\n";
echo " <lastmod>{$asset['modified']}</lastmod>\n";
echo " </url>\n";
}
echo "</urlset>\n";
4. Render a per-collection thumbnail grid
Pass a collection page directly as the source. The API resolves the asset list automatically. Use a page ID in production code — it survives collection renames; a path does not.
$mediaHub = $modules->get('MediaHub');
// By path (readable, breaks if renamed):
$collection = wire('pages')->get('/photos/landscapes/');
// By ID (survives renames and re-ordering — use this in production):
$collection = wire('pages')->get(1234);
$assets = $mediaHub->findRaw($collection, [
'crops' => ['square'],
'limit' => 24,
]);
foreach ($assets as $asset) {
$thumbUrl = $asset['crops']['square']['url'] ?? $asset['url'];
echo "<a href='{$asset['url']}'>";
echo " <img src='{$thumbUrl}' alt='{$asset['alt']}'>";
echo "</a>";
}
5. Export every asset to CSV (keyset pagination)
afterId pagination avoids SQL OFFSET — cost stays constant-time per chunk regardless of dataset size, which matters for large exports on shared hosting.
$mediaHub = $modules->get('MediaHub');
$handle = fopen('/tmp/mediahub-export.csv', 'w');
fputcsv($handle, ['ID', 'Title', 'URL', 'Labels', 'Collections']);
$lastId = 0;
$chunkSize = 500;
do {
$chunk = $mediaHub->findRaw('template=pkd-mediahub-asset, status<2048', [
'afterId' => $lastId,
'limit' => $chunkSize,
'fields' => ['labels', 'collections'],
]);
foreach ($chunk as $asset) {
fputcsv($handle, [
$asset['id'],
$asset['title'],
$asset['url'],
implode('|', $asset['labels']),
implode('|', $asset['collections']),
]);
$lastId = $asset['id'];
}
} while (count($chunk) === $chunkSize);
fclose($handle);
Methods
MediaHubPageArray::raw(array $options = []): array
Shortcut on any MediaHub field value. Delegates to findRaw() with the field as the source, so per-reference metadata overrides work automatically. The owning page ID is captured so cropResolve => 'field' works without extra arguments.
$rows = $page->gallery->raw([
'fields' => ['focusPoint', 'labels'],
'crops' => ['square', '16_9'],
'cropResolve' => 'field',
]);
MediaHub::findRaw($source, array $options = []): array
Returns all matching rows in one array. Source may be any of:
- A MediaHub field value (
$page->gallery) - A ProcessWire selector string (
'template=pkd-mediahub-asset, …') - An array of asset page IDs
- A collection page object
$mediaHub = $modules->get('MediaHub');
$rows = $mediaHub->findRaw('template=pkd-mediahub-asset, status<2048', [
'limit' => 24,
'fields' => ['labels', 'collections'],
]);
MediaHub::findRawStream($source, array $options = []): Generator
Same arguments and row shape as findRaw(), but yields one row at a time so peak memory stays flat. Internally processes IDs in batches of chunkSize (default 100) and releases each batch before loading the next. Use for sitemaps, migrations, or exports where the result set could grow unbounded.
foreach ($mediaHub->findRawStream($selector, ['fields' => ['httpUrl', 'modified']]) as $asset) {
// RAM stays roughly constant whether you iterate 100 or 100,000 rows
}
Options
| Option | Default | What it does |
|---|---|---|
fields | [] | Opt-in extra keys to include — see table below |
crops | false | true for all crops, or an array of preset keys e.g. ['square', '16_9'] |
cropResolve | 'library' | 'library' returns only library-level crops. 'field' returns a field-level crop when one exists for the current page and field, falling back to the library crop. See Crops API. |
cropCategory | — | Filter crops by their preset group, e.g. 'Social' |
language | current | Force a language for title, alt, about, description |
start | 0 | Skip the first N resolved IDs |
limit | — | Return at most N rows |
afterId | 0 | Keyset pagination: only rows with id > afterId. Prefer this over start for large exports |
chunkSize | 100 | Streaming batch size for findRawStream() only |
Opt-in fields
Core fields are always returned. Everything else is opt-in via the fields array — this keeps the default fast path as lean as possible.
fields value | Type | Notes |
|---|---|---|
'about' | string | Per-asset description text |
'httpUrl' | string | Full URL with scheme and host. Needed for sitemaps and open-graph tags |
'path' | string | Server-side filesystem path |
'focusPoint' | array | ['x' => 50, 'y' => 30] percent coordinates. Images only — omitted for files |
'labels' | array | Library label names (strings). Library-organisation use — rarely needed on the front end |
'collections' | array | Collection names this asset belongs to |
'collectionPaths' | array | Full collection paths e.g. '/photos/landscapes/' |
'displayTags' | array | Per-reference display tags set by editors in the field drawer (e.g. hero, mobile-header) |
'created' | string | ISO 8601 timestamp |
'modified' | string | ISO 8601 timestamp |
'favourite' | bool | Site-wide favourite flag set in the MediaHub admin |
'description' | string | Per-reference alt-text override; falls back to the global alt when not set |
Labels vs display tags. labels are the library-organisation labels set on the asset inside MediaHub admin ("Events", "Headshots", etc.) — for the librarian, not the visitor. displayTags are the per-field-per-page tags an editor sets in the drawer when placing an asset ("hero", "mobile-header") — these are the ones you output on the front end. Same word in conversation; two separate concepts in code.
Return shape
Every row includes these core keys:
[
'id' => 1234,
'type' => 'asset', // 'asset' or 'crop'
'isImage' => true, // false for PDFs, docs, audio, video
'name' => 'paris-tower', // ProcessWire page name
'title' => 'Eiffel Tower at dusk',
'filename' => 'paris-tower.jpg',
'ext' => 'jpg',
'mime' => 'image/jpeg',
'url' => '/site/assets/files/1234/paris-tower.jpg',
'filesize' => 184320,
'width' => 1920, // null for non-image assets
'height' => 1280, // null for non-image assets
'alt' => 'The Eiffel Tower lit golden at sunset',
]
For non-image assets, isImage is false, width and height are null (not 0 — the semantic is "doesn't apply"), and focusPoint and crops are omitted even if requested.
When crops is enabled, each preset appears under $row['crops']['square'] with its own url, dimensions, and metadata. Missing crops are omitted — findRaw never auto-generates crops.
$thumbUrl = $asset['crops']['square']['url'] ?? $asset['url'];
Multi-language
findRaw respects ProcessWire's multi-language fields automatically. For each language-aware column it generates COALESCE(data{langId}, data) so empty translations fall back to the default language without any extra work from you.
// Use the current user's language (default)
$assets = $page->gallery->raw();
// Force a specific language by name
$assets = $page->gallery->raw(['language' => 'de']);
// Or pass a Language object
$assets = $page->gallery->raw(['language' => wire('languages')->get('de')]);
Detection is runtime and per-table. Installs that mix typed (FieldtypeTextLanguage) and untyped fields work without configuration.
Crops and findRaw
findRaw reads crops that already exist — it never generates them. If a crop doesn't exist on disk, it is simply omitted from the response. Use one of these complementary paths to ensure crops exist before you read them:
| Method | When | Effect |
|---|---|---|
| Upload Automation in module config | At upload | Pre-generates configured presets so they exist when findRaw runs. Recommended for high-volume sites. |
autoGeneratePresetsForAsset hook | At upload | Override the preset list per asset or per field. See Hooks below. |
ensureCropImage('square') | At read | Lazy generation in templates that are not on the hot path |
resolvedCropImage() | At read | Full resolution chain: field crop → library crop → auto-generate |
The recommended workflow for high-volume galleries: define the crop presets you use on the front end, tick them in Upload Automation, and findRaw reads them in a single batched query — no render-time resizing, no per-asset cost.
Pagination patterns
Known size: start + limit
$page1 = $page->gallery->raw(['limit' => 20]);
$page2 = $page->gallery->raw(['start' => 20, 'limit' => 20]);
start is implemented in PHP after the ID list is resolved, not as SQL OFFSET, so it stays cheap even as the offset grows.
Large exports: afterId (keyset pagination)
afterId avoids SQL OFFSET entirely. Cost stays constant-time per chunk regardless of how deep into the dataset you are.
$mediaHub = $modules->get('MediaHub');
$lastId = 0;
do {
$chunk = $mediaHub->findRaw($selector, ['afterId' => $lastId, 'limit' => 500]);
foreach ($chunk as $row) {
// process row…
$lastId = $row['id'];
}
} while (count($chunk) === 500);
Hooks
All core methods are ProcessWire-hookable so you can extend behaviour from your own site code without modifying MediaHub.
MediaHub::findRaw — caching example
// site/ready.php
$wire->addHookBefore('MediaHub::findRaw', function(HookEvent $event) {
$cacheKey = 'mh_raw_' . md5(serialize($event->arguments));
$cached = wire('cache')->get($cacheKey);
if ($cached !== null) {
$event->replace = true;
$event->return = $cached;
}
});
$wire->addHookAfter('MediaHub::findRaw', function(HookEvent $event) {
$cacheKey = 'mh_raw_' . md5(serialize($event->arguments));
if (!empty($event->return)) {
wire('cache')->save($cacheKey, $event->return, 300); // 5 minutes
}
});
MediaHub::buildRawItem — add computed fields
$wire->addHookAfter('MediaHub::buildRawItem', function(HookEvent $event) {
$row = $event->return;
$created = strtotime($row['created'] ?? '');
$row['ageInDays'] = $created ? (int) floor((time() - $created) / 86400) : null;
$event->return = $row;
});
MediaHub::autoGeneratePresetsForAsset — per-field crop rules
Override which presets are auto-generated at upload time on a per-field, per-template, or per-role basis — without changing the global module config.
$wire->addHookAfter('MediaHub::autoGeneratePresetsForAsset', function(HookEvent $event) {
$field = $event->arguments(1);
$presets = $event->return;
if ($field && $field->name === 'hero_gallery') {
$presets[] = 'mobile_500';
$presets[] = 'tablet_800';
}
$event->return = array_unique($presets);
});
Performance
Benchmarks run on MAMP (MySQL, PHP 8.3) with image-only fixtures. Each result is the median of 3 runs. Approaches:
- A. Standard object iteration —
foreach ($page->gallery as $asset) - B.
findRaw()— core fields only - C.
findRaw()— core fields + all crops - D.
findRaw()— core fields + four opt-in extras - E.
findRawStream()— generator iteration
50 assets
| Approach | Peak memory | Wall time | Queries |
|---|---|---|---|
| A. Object iteration | 0.40 MB | 54.9 ms | 112 |
| B. findRaw, core | 0.26 MB | 11.5 ms | 8 |
| C. findRaw + crops | 0.26 MB | 13.2 ms | 9 |
| D. findRaw + opt-in fields | 0.29 MB | 11.5 ms | 10 |
| E. findRawStream | 0.26 MB | 10.5 ms | 8 |
4.8× faster, 14× fewer queries.
200 assets
| Approach | Peak memory | Wall time | Queries |
|---|---|---|---|
| A. Object iteration | 1.51 MB | 212.4 ms | 412 |
| B. findRaw, core | 1.01 MB | 36.3 ms | 8 |
| C. findRaw + crops | 1.02 MB | 53.4 ms | 9 |
| D. findRaw + opt-in fields | 1.14 MB | 35.6 ms | 10 |
| E. findRawStream | 0.76 MB | 34.6 ms | 10 |
5.9× faster, 51× fewer queries, 2× less memory with streaming.
1,000 assets
| Approach | Peak memory | Wall time | Queries |
|---|---|---|---|
| A. Object iteration | 7.28 MB | 1,144.8 ms | 2,012 |
| B. findRaw, core | 4.88 MB | 172.0 ms | 8 |
| C. findRaw + crops | 4.88 MB | 185.1 ms | 9 |
| D. findRaw + opt-in fields | 5.49 MB | 169.4 ms | 10 |
| E. findRawStream | 2.72 MB | 174.7 ms | 26 |
6.7× faster, 251× fewer queries, 2.7× less memory with streaming.
What the numbers show
The headline is query reduction, not memory. Object iteration fires roughly 2 queries per asset (one to load the page, one per image field for variation metadata) — that's O(n). findRaw() fires a fixed 8–10 queries regardless of result size — O(1). On shared hosting where each query carries connection, parse, and network overhead beyond pure SQL execution time, going from 2,012 to 8 queries is the dominant saving. That's what produces the 5–7× wall-time speedup.
Memory savings are modest for the non-streaming path (1.2–1.5×) because ProcessWire's lazy page loading is more efficient than it might seem. The real memory tool is findRawStream(), which holds constant RAM regardless of result size. Use it for sitemaps, exports, and any operation where the result set could grow unbounded.
Crops and opt-in fields are effectively free at scale: adding crops => true adds exactly one query; adding four opt-in fields adds two or three. Both are batched and bounded.
Migration guide
Most templates need a one-line change: the variable on the inner loop becomes an array key lookup.
Before
foreach ($page->gallery as $asset) {
$img = $asset->image();
if (!$img) continue;
echo "<img src='{$img->url}' width='{$img->width}' height='{$img->height}' alt='{$asset->altText()}'>";
}
After
foreach ($page->gallery->raw() as $asset) {
if (!$asset['isImage']) continue;
echo "<img src='{$asset['url']}' width='{$asset['width']}' height='{$asset['height']}' alt='{$asset['alt']}'>";
}
Field name translation
| Object API | findRaw array key |
|---|---|
$asset->title | $asset['title'] |
$asset->altText() | $asset['alt'] |
$asset->image()->url | $asset['url'] |
$asset->image()->width | $asset['width'] |
$asset->image()->height | $asset['height'] |
$asset->image()->filesize | $asset['filesize'] |
$asset->mime | $asset['mime'] |
$asset->about | $asset['about'] (requires fields: ['about']) |
$asset->focusPoint() | $asset['focusPoint'] as ['x' => …, 'y' => …] (requires fields: ['focusPoint']) |
| Label names | $asset['labels'] (requires fields: ['labels']) |
$page->gallery->description($asset) | $asset['description'] (requires fields: ['description']) |
$page->gallery->getDisplayTags($asset) | $asset['displayTags'] (requires fields: ['displayTags']) |
$asset->cropImage('square')->url | $asset['crops']['square']['url'] (with crops => true) |
$asset->resolvedCropImage('square', $page->id, $field->name) | $asset['crops']['square']['url'] with cropResolve => 'field' |
What does not translate
These have no findRaw equivalent. Keep the object API for these cases:
$image->size($w, $h)— generates a sized variant.findRawreturns existing URLs only.$asset->ensureCropImage(…)— lazy crop generation at render time.$asset->usageCount(),$asset->usageDescription()— analytics features.- Anything that writes:
$asset->save(),$asset->set(…), etc.
FAQ
Why is findRaw not the default? Templates that call ->size() on images still need the object API. Defaulting to arrays would break those without warning. The two APIs are designed to complement each other.
Can I call findRaw from a hook? Yes — it has no prerequisites beyond the MediaHub module being loaded.
Does findRaw respect access control? It applies the same status filtering as normal reads, excluding trashed and unpublished pages. It does not run per-user permission checks per row. Add a MediaHub::findRaw after-hook if you need per-user gating.
What happens with an empty source? You get back an empty array. No warnings, no exceptions.
Multi-language sites? Title, alt, about, and description use the active language with automatic fallback to the default when a translation is empty. Pass language in options to force a specific language.
Are crops in the response in any particular order? Crops within a row are keyed by preset key ($asset['crops']['square']), so iteration order matches preset definition order — not creation time.
What if a crop preset I requested doesn't exist on an asset? It is simply omitted from $asset['crops']. Always use ?? $asset['url'] as a fallback.
See also
- Template API — object-based field access
- Querying Assets — selector-based discovery
- Crops API — crop resolution and hooks
- Per-Reference Metadata — description overrides and display tags surfaced in
findRaw - Focus Point — focus data stored and used in crop generation
Last updated