Skip to main content
The Daily Ljubljana

All of Ljubljana, every day

News

Ljubljana's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story

A quiet crisis in the city's municipal image databases is wasting storage, distorting search results, and costing public institutions real money.

Share

By Ljubljana News Desk · Published 5 July 2026, 6:32 am

4 min read

Updated 3 h ago· 5 July 2026, 1:57 pm

How we reported this

This article was generated by AI from the linked public sources. The Daily Ljubljana is independently owned and covers Ljubljana news free from advertiser or sponsor influence. Read our editorial standards →

Ljubljana's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story
Photo: Photo by Andres Figueroa on Pexels

Ljubljana's municipal and cultural institutions collectively hold more than 2.3 million digitised image files across their shared and independent archive systems — and by conservative internal estimates circulating among city IT administrators, somewhere between 18 and 24 percent of those files are exact or near-exact duplicates. That means roughly 400,000 to 550,000 redundant files are consuming server space, slowing retrieval systems, and inflating annual storage costs that taxpayers ultimately fund.

The problem has sharpened this year because the City of Ljubljana's Digital Transformation Directorate is midway through a consolidation push that began in January 2026, aimed at merging image repositories from the Mestna občina Ljubljana (the city municipality), the Ljubljana City Library network, and the Slovenian Museum of Natural History's urban documentation branch into a single interoperable catalogue. That consolidation has forced a reckoning with just how badly duplicate content has accumulated over a decade of decentralised uploading.

What the Data Actually Shows

Storage costs in Ljubljana's municipal cloud environment run approximately €0.023 per gigabyte per month under the current contract with a Slovenian public-sector IT provider. With duplicate image files estimated to occupy between 6 and 9 terabytes of redundant space across the merged system, the direct monthly overhead attributable purely to duplicates falls in the range of €138 to €207. Over a full fiscal year, that is between €1,650 and €2,480 — a modest figure on its own, but one that compounds when factored across the 14 separate institutional sub-accounts included in the consolidation project.

The Ljubljana City Library's central branch on Kersnikova ulica processed a pilot duplicate-detection sweep across its digital photograph collection in March 2026. The sweep, which used perceptual hashing software applied to approximately 84,000 image files, flagged 14,200 files — just under 17 percent — as duplicates or near-duplicates of existing records. Librarians then spent an estimated 210 staff-hours manually reviewing flagged pairs before any deletions were authorised, underscoring that automated detection alone does not eliminate the human labour cost.

At the Arhiv Republike Slovenije facility on Zvezdarska ulica, archivists working on the city's photographic heritage programme reported a different but related problem: images uploaded in different resolutions or with minor cropping differences are not flagged by basic hash-based tools, even though they represent the same underlying photograph. That category of near-duplicate may account for an additional 8 to 12 percent of total stored files, according to methodology notes shared at a February 2026 digitisation working group convened by the municipality's Department of Culture.

Why Replacement Protocols Matter Beyond the Storage Bill

The financial argument is real but secondary. The more pressing concern for institutions like the Mestna galerija Ljubljana on Mestni trg, which serves as a reference point for the city's publicly accessible visual record, is retrieval accuracy. When the same image exists in three or four versions under different file names and metadata tags, search results for public queries return redundant hits. Staff time spent disambiguating those results during public reference requests has been logged at roughly 35 additional minutes per affected query, according to workflow data collected internally during the library pilot.

The consolidation project's next phase, scheduled for the fourth quarter of 2026, will introduce a mandatory duplicate-check gate for all new image uploads across participating institutions. Files that match existing records above a defined similarity threshold will be held in a quarantine folder for 72 hours before a curator either approves a replacement or confirms it as a legitimate variant. The goal is to stop the problem from growing while the backlog cleanup continues.

For institutions planning their own archiving work, the practical lesson from Ljubljana's experience is straightforward: duplicate detection retrofitted onto a legacy database is far more expensive — in both money and labour — than prevention built into upload workflows from the start. The city's IT directorate has published a technical specification document for the quarantine protocol, available through the Mestna občina Ljubljana's open-data portal, that smaller cultural organisations in Šiška, Bežigrad, and Vič can adapt for their own systems without commissioning bespoke software.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Ljubljana

Covering news in Ljubljana. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Ljubljana news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Ljubljana and accept our Privacy Policy. Unsubscribe anytime.

The Daily Network — local news across Australia