Skip to main content
The Daily Ljubljana

All of Ljubljana, every day

News

Ljubljana's Digital Archives Are Drowning in Duplicate Images — The Numbers Tell a Stark Story

A growing backlog of repeated photographs is clogging the city's public records systems, and new municipal data reveals just how large the problem has become.

Share

By Ljubljana News Desk · Published 5 July 2026, 6:02 am

4 min read

Updated 3 h ago· 5 July 2026, 1:47 pm

How we reported this

This article was generated by AI from the linked public sources. The Daily Ljubljana is independently owned and covers Ljubljana news free from advertiser or sponsor influence. Read our editorial standards →

Ljubljana's Digital Archives Are Drowning in Duplicate Images — The Numbers Tell a Stark Story
Photo: Photo by Andres Figueroa on Pexels

Ljubljana's municipal digital archive holds more than 340,000 image files across four active databases — and an internal audit completed in May 2026 found that roughly one in five of those files is an exact or near-exact duplicate. The finding, circulated within the Mestna občina Ljubljana's IT directorate, has prompted an emergency procurement process for automated deduplication software, with a budget ceiling of €48,000 set for the contract.

The timing matters. The city is midway through digitising physical planning records from the Bežigrad and Šiška urban districts — a project running under the 2024–2027 Digital Ljubljana strategic framework — and administrators say feeding duplicated source material into the new system will compound errors that are already expensive to correct. Every redundant file that enters the workflow requires staff time to verify, tag and either archive or delete, and the audit estimated that duplicates were costing the city's records team approximately 210 working hours per month in manual review.

Where the Backlog Is Piling Up

The worst-affected system is the city's urban planning image repository, maintained by the Urbanistični inštitut Republike Slovenije on Trnovski pristan 2. That database, which holds aerial photographs, construction permit scans and architectural drawings dating back to 1998, had a duplication rate of 23.7 percent as of the May audit. The second-most affected system belongs to the Ljubljana City Museum on Gosposka ulica, whose digitised photographic collection — approximately 61,000 items — showed a duplication rate of 18.4 percent, partly because a 2021 bulk upload from the Jakopič Gallery archive was never deduplicated before ingestion.

Smaller in scale but still significant: the Mestna knjižnica Ljubljana, with branches from Krakovo to Šiška, runs a shared digital asset system for event photography and promotional materials. That system logged 4,200 duplicate files in the first quarter of 2026 alone, the highest quarterly figure since the library network consolidated its digital operations in 2019. Librarians at the Pionirska — Center za mladinsko književnost in knjižničarstvo on Einspielerjeva ulica have been tasked with manually reviewing flagged files each Thursday afternoon, a process that staff say has absorbed time previously spent on cataloguing new acquisitions.

What Deduplication Actually Costs — and What It Saves

The economics of the problem are not abstract. Server storage at the city's primary data centre, housed within the Tehnološki park Ljubljana facility in the Brdo business zone, costs the municipality approximately €0.034 per gigabyte per month under its current infrastructure contract. Duplicate image files — many of them high-resolution TIFFs ranging from 80 to 400 megabytes each — are consuming an estimated 12.6 terabytes of redundant storage, translating to a monthly storage cost of around €429 for data the city effectively holds twice or more.

That figure is modest on its own. The larger cost is workflow degradation. The May audit calculated that if duplication rates continue at their current pace through the end of 2026, the total volume of redundant files across all four systems will exceed 80,000 items — a threshold the report described as making manual remediation operationally impractical without additional staffing.

The proposed automated solution, currently out to tender through the portal javnih naročil, would use perceptual hashing to identify near-duplicate images — catching files that are technically distinct but visually identical, such as the same photograph saved at different resolutions or with minor colour corrections applied. Similar systems have been deployed by the municipal archives in Vienna and Prague over the past three years.

The procurement deadline is 28 July 2026. Whichever vendor wins the contract will have until 31 October to complete an initial deduplication pass across all four databases, with a maintenance and monitoring agreement expected to run through 2028. City administrators say they plan to publish aggregate results — total files removed, storage recovered, staff hours freed — on the Ljubljana open data portal at podatki.ljubljana.si once the first phase is complete.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Ljubljana

Covering news in Ljubljana. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Ljubljana news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Ljubljana and accept our Privacy Policy. Unsubscribe anytime.

The Daily Network — local news across Australia