Sony Pixel Power calrec Sony

Pre- Vs. Post-Claification For Records Metadata by Juerg Meier

18/08/2016

I recently watched a demo of the IBM Watson natural language processing (NLP) tool that showed how it was used by police for criminal investigations at a Records and Information Management conference. I was struck by the fact that this fascinating tool may have led some Records and Information Managers in the audience to think that this was the new, disruptive way to manage records.

CAN YOU RUN A RECORDS AND INFORMATION MANAGEMENT PROGRAM ON POST-CLASSIFICATION AND MACHINE LEARNING? IBM Watson can be looked at as a post-classification tool, representing a whole group of NLP tools often marketed under the Big Data umbrella. Post-classification tools use algorithms to produce classification for an information asset after the asset is created or stored. In fact, every major supplier and the open source community offers similar tools. This can even seem overwhelming for those in the eDiscovery space.

The demo was really impressive. It was almost as if Watson went through the test documents by magic- thereby identifying people, organizations, places, phone numbers and frequently used terms. A click on one of the filtered terms lists its occurrences across various documents. The tool also has the ability to produce graphics that connect interrelated terms visually. This is, of course, quite a departure from the rather minimalistic user interfaces typically found in electronic records management systems (ERMS).

What is attractive about potentially using such an analytics tool for Records and Information Management is its ability to reduce the complexity of information management processes, in particular, capturing metadata. It could:

save us from lengthy discussions with information asset creators to get them to provide the right metadata and how and in what format its delivery should take place

eliminate the need to design metadata structures for document types, simplifying forms management and the optical character recognition process (OCR) during scanning

allow users to search for criteria originally not available for lookup

provide the ability to search for important emails in the email archive or unstructured documents.

No doubt, adopting such a tool has the potential to appear to be a silver bullet for Records and Information Managers.

However, two things should be kept in mind: with a tool like Watson, you (a) buy into algorithms and (b) require these algorithms to build the structure that has not been created and provided upfront, i.e. during an on-boarding analysis processes. Or to summarize the two issues: what sort of structure is actually used or produced by such algorithms?

NLP and text mining algorithms (with impressive names like Latent Semantic Indexing or Na ve Bayes) work on a mathematical-statistical basis, building trees based on the frequency and proximity of terms in a given document or set of documents. Typically, they don't work with pre-existing ontologies, and they don't really understand what they recognize. The algorithms are able to identify amounts in documents, but what sort of amounts are they? This is left to human review to sort out, but at least these tools can provide us hints.

PRE-CLASSIFICATION IS STILL THE WAY TO GO FOR RECORDS AND INFORMATION MANAGEMENT The classification structure for Watson is determined by its algorithms, whose outcomes are somehow unpredictable. Tools like Watson are widespread in the forensics and e-discovery communities, as these domains are often interested in unknown bits of information in datasets. But for Records and Information Management usage, this approach is simply not good enough. Precision and recall seem too arbitrary that they could truly satisfy the findability requirement stipulated in ARMA's General Record Keeping Principles, for example.

For simplicity, I am defining a record as something that it is created by a business process. Consequently, as much as business processes are defined and executed in a controlled environment, properties found in records and their metadata are also well known, and their possible values are actually restricted. Here are some of those properties:

client ID

transaction number

transaction type

privacy level

It is fairly intuitive what the metadata terms above mean. The size of their value sets (i.e. possible values) may differ dramatically, from a handful (privacy classification) to perhaps billions (transaction numbers) but both represent restricted value sets and not free, unstructured text. There is semantic behind these properties. Additionally, client IDs and transaction numbers may overlap in terms of format and value, creating serious problems for post-classification algorithms.

Pre-classification, or classifying information before creating or storing it, is clearly still the way to go for Records and Information Management professionals. Declaring the semantics of each required metadata property that describes a record and promotes finding it, and having the business process owner and the producing application to supply the correct values, remains an indispensable activity even in times of machine learning .

That said, I would not encourage you to dismiss of NLP or text mining. These tools can be very helpful if you need to classify, say, millions of office documents on shared drives or Sharepoints. But make sure it is you, the information professional, who can supply the terms, taxonomies, ontologies and document types of the knowledge domain to the analytics tool so the classification does not become arbitrary. It's all about creating well-understood, intended structure.
LINK: http://blogs.ironmountain.com/2016/service-lines/information-managemen...
See more stories from ironmountain

Most recent headlines

01/04/2025

Inside USHER's Star-Studded Black Box' Event in London

USHER's London takeover is in full swing. After kicking off his sold-out run of shows at the O2 Arena to rave reviews, the R&B icon joined forces with Spoti...

01/04/2025

ST Engineering iDirect Launches New Excelerator Partner Program

Innovative program empowers partners with growth, efficiency and collaboration Herndon, Va., April 1, 2025 ST Engineering iDirect, a global leader in satelli...

01/04/2025

L3Harris Sets Date for First Quarter 2025 Earnings Release

MELBOURNE, Fla., April 1, 2025 - L3Harris Technologies (NYSE: LHX) will release its first quarter 2025 financial results before the market opens on Thursday, Ap...

01/04/2025

Calrec Craft Interview: Aston Fearon, Sound Supervisor

Calrec Craft Interview: Aston Fearon, Sound Supervisor In this craft interview, Aston Fearon speaks to us about how his career in sound started, projects he'...

01/04/2025

Telestream Adds Support for intoPIX's JPEG XS Technology in PRISM

MONT-SAINT-GUIBERT, Belgium Telestream has integrated intoPIX's JPEG XS technology into Telestream's PRISM waveform monitors, which Telestream says will...

01/04/2025

Avid Accelerates Move to Cloud Production With AWS Agreement

BURLINGTON, Mass. Avid has signed a strategic collaboration agreement with Amazon Web Services (AWS), to deliver a cloud-based production framework that helps f...

01/04/2025

UFL Goes Global with DAZN Deal

LONDON and NEW YORK The United Football League (UFL) has signed a new global partnership with sports broadcaster DAZN to broadcast every game of the UFL's 2...

01/04/2025

Netflix Outlines Groundbreaking Effort to Streamline Global Production

In a groundbreaking bid to streamline and democratize the production process, Netflix has laid out how it is developing a new Media Production Suite, that t...

01/04/2025

Comcast Business Closes Nitel Acquisition

PHILADELPHIA Comcast Business has announced that it has completed its acquisition of Nitel, a U.S. managed services provider headquartered in Chicago, from inte...

01/04/2025

Research Veterans Launch New Analytics and Insights Platform Tenetic

NEW YORK A team of research industry veterans, led by Tod Johnson have launched a new consumer insights and analytics platform, Tenetic, that offers both local ...

01/04/2025

V-Nova Joins Access Advance HEVC Patent Pool - Reinforcin...

V-Nova, a leading provider of compression solutions, today announced its inaugural participation in a patent pool, joining the Access Advance HEVC Patent Pool. ...

01/04/2025

Cinnafilm To Unveil Tachyon LIVE at the 2025 NAB Show - R...

Cinnafilm, a global leader in video optimization solutions, today announced that it will launch Tachyon LIVE, its groundbreaking live IP standards and format co...

01/04/2025

HighField AI Partners with Vizrt to Augment Vizrt-based B...

HighField AI, an advanced AI-powered solution designed to automate repetitive tasks within the media production workflow, today announced that it will demonstra...

01/04/2025

Nimbra Edge enables simplified and smarter cloud media tr...

Globecast has expanded its use of Net Insight's Nimbra technology by deploying Nimbra Edge, significantly streamlining its media transport operations. This ...

01/04/2025

Broadpeak Launches Edgepeak

EdgePeak enables software architects and developers to design and build their own content delivery network (CDN) while reducing streaming costs, fighting video...

01/04/2025

Cinnafilm and NVIDIA Revolutionize AI-Powered Video Upcon...

Cinnafilm to preview the innovation at the 2025 NAB Show Cinnafilm, a global leader in video optimization, has collaborated with NVIDIA to unveil a groundbreak...

01/04/2025

Synamedia to showcase innovations that transform video se...

Leading video software provider Synamedia, will showcase its innovation-driven approach to solving the biggest challenges facing customers today and in the futu...

01/04/2025

AJA Debuts IP and 12G-SDI Innovations Ahead of NAB 2025

AJA Debuts IP and 12G-SDI Innovations Ahead of NAB 2025 Brie Clayton April 1, 2025 0 Comments New tools optimize media and entertainment and proAV wo...

01/04/2025

Bit Part Introduces bitbox mini, the Smallest and Lightest Solution for Ultra-Long Distance, Remote Camera Control

Bit Part Introduces bitbox mini, the Smallest and Lightest Solution for Ultra-Lo...

01/04/2025

IABM Unveils Bold Transformation at NAB Show, Prioritizing Member Value

IABM Unveils Bold Transformation at NAB Show, Prioritizing Member Value Brie Clayton April 1, 2025 0 Comments IABM is delivering a strategic transform...

01/04/2025

OOONA Introduces Multilingual QC Tool for Subtitling Workflows

OOONA Introduces Multilingual QC Tool for Subtitling Workflows Brie Clayton April 1, 2025 0 Comments See OOONA on booth W4209 at the NAB Show, Las Veg...

01/04/2025

Netflix Media Production Suite democratises access' to innovation

Adopting open standards, the solution aims to provide workflow standardisation, allowing for automation and other innovations across a diverse range of markets ...

01/04/2025

Best of Show Awards at 2025 NAB Show submissions close tomorrow

Submissions will be accepted up until 23:59 PST on 2nd April By Jenny Priestley Published: March 24, 2025 Updated: April 1, 2025 Submissions will be acc...

01/04/2025

TVBEurope April 2025 issue now available

The AI issue takes a look at how AI is reshaping broadcasting, including areas such as sports commentary and archiving and storage, plus we discover how Norways...

01/04/2025

John Wastcoat new senior director of partner development at TVU Networks

Joining the company with more than two decades of experience forging and scaling alliances in the industry, Wastcoats role will support TVUs strategic developme...

01/04/2025

Meet Rich Welsh, SMPTE's New President

At the beginning of the year, Rich Welsh, senior vice president with Deluxe, was appointed the new president of Society of Motion Picture and Television Enginee...

01/04/2025

Charter's Spectrum Adds AMC+ to Spectrum TV Select

STAMFORD, Conn. and NEW YORK Charter's Spectrum pay TV operations are continuing its previously announced strategy of adding more streaming services to its ...

01/04/2025

Sinclair and ONE Media Technologies Announce NAB Show Plans

HUNT VALLEY, Md. Sinclair, Inc. and its subsidiary, ONE Media Technologies, have announced that members of their leadership team will be participating in multip...

01/04/2025

Bus Stop Films' first feature Boss Cat to begin production in June

01 04 2025 - Media release Bus Stop Films' first feature Boss Cat to begin production in June Boss Cat cast (L-R): Olivia Hargroder, Penny Downie and Juli...

01/04/2025

PremiumBeat - Flexible, Unlimited Music For Creators

PremiumBeat - Flexible, Unlimited Music For Creators Brie Clayton March 31, 2025 0 Comments Back in November of 2024, PremiumBeat made a bold move tha...

01/04/2025

MLB 2025: TNT Sports Chooses Remote Production for MLB Tuesday,' Upgrades Control Rooms at Techwood

MLB 2025: TNT Sports Chooses Remote Production for MLB Tuesday,' Upgrades C...

01/04/2025

SVG All-Stars: Francisco Contreras, Executive Director, Field Operations, FOX Sports

SVG All-Stars: Francisco Contreras, Executive Director, Field Operations, FOX Sp...

01/04/2025

MILTON drones get a boost with Rohde & Schwarz SIGINT integration

MILTON drones get a boost with Rohde & Schwarz SIGINT integration Rohde & Schwarz and MILTON have partnered to integrate advanced signals intelligence technol...

01/04/2025

Rohde & Schwarz presents comprehensive R&S ELEKTRA portfolio for reproducible, standard-compliant EMC measurements

Rohde & Schwarz presents comprehensive R&S ELEKTRA portfolio for reproducible, s...

01/04/2025

NAB 2025: FOR-A America Introduces Fully Software-Based Switcher

Create Complex Compositions with Unlimited Layers with FOR-A MixBoard Powered by ClassX...

01/04/2025

Nara Deep Dive

Article courtesy of Digital Production Germany Read the article Digital Production Germany magazine editor, Bela Beier, recently talked to Nara's Steve Br...

01/04/2025

Light Iron integrates Nara

Article courtesy of Digital Media World Read the article Light Iron uses Nara to handle file navigation, content streaming and information sharing workflow ef...

01/04/2025

The making of Here AI, VFX and colour

Article courtesy of British Cinematographer Read the article DoP Don Burgess, VFX supervisor Kevin Baillie and colourist Maxine Gervais pulled their talents t...

01/04/2025

Krzysztof Polesiski chooses DiGiCo to power his immersive mixes

Polesi ski made a name for himself early in his career. Renowned for his attention to detail and ability to mix his creative and technical skills, Polesi ski st...

01/04/2025

2025-04-01

visionOS 2.4 is available today, bringing the first set of powerful Apple Intelligence features that help users communicate, write, and express themselves on Ap...

01/04/2025

DSTA and Thales Announce AI-Driven Co-Lab to Strengthen Singapore's Defence Systems

Facebook Twitter LinkedIn Defence Science and Technology Agency (DSTA) and...

31/03/2025

Ready, set, Party Time!' SBS News empowers young voters with a new political podcast

Ready, set, Party Time!' SBS News empowers young voters with a new politica...

31/03/2025

Hitachi Announces Executive Changes (31.1.2024)

31 January, 2024 Company News Tokyo, January 31, 2024 - Hitachi, Ltd. (TSE:6501) today announced the following executive changes to improve corporate value....

31/03/2025

L3Harris Completes Sale of Commercial Aviation Solutions Business to TJC for $800 Million

MELBOURNE, Fla., March 31, 2025 - L3Harris Technologies (NYSE: LHX) has complete...

31/03/2025

L3Harris, Dutch Ministry of Defence Sign FOXTROT Long-Term Agreement

Vice Admiral Jan Willem Hartman, commander of the Dutch Materiel and IT Command, and Chris Aebli, President, Tactical Communications, L3Harris Technologies, sig...

31/03/2025

L3Harris UK to Optimize New Australian Hunter-class Frigates for Anti-Submarine Warfare

The L3Harris team visited HMS GLASGOW, the first T26 Global Combat Ship, current...

31/03/2025

Sinclair's KOMO, KUNS to Broadcast 33 Seattle Storm Games

SEATTLE As the WNBA prepares to kick off the 2025 season, the Seattle Storm WNBA team has announced a multi-year deal with Sinclair's KOMO and KUNS station...

31/03/2025

Digital Nirvana and Avid Partner to Enhance Media Product...

Digital Nirvana, a provider of leading-edge AI-powered media solutions, today announced a global Alliance Partnership with Avid to bring advanced AI metadata c...

31/03/2025

BeckTV Welcomes Kate Gazdic as Senior Procurement Special...

BeckTV, a premier systems integrator for the broadcast media industry, today announced that Kate Gazdic has joined the company as a senior procurement specialis...

31/03/2025

MainConcept Unveils Efficiency Gains for its HEVC Encoder...

MainConcept, a leading provider of video and audio codecs, has announced a series of key codec advancements that enable customers to realize significant time an...