Sony Pixel Power calrec Sony

Pre- Vs. Post-Claification For Records Metadata by Juerg Meier

18/08/2016

I recently watched a demo of the IBM Watson natural language processing (NLP) tool that showed how it was used by police for criminal investigations at a Records and Information Management conference. I was struck by the fact that this fascinating tool may have led some Records and Information Managers in the audience to think that this was the new, disruptive way to manage records.

CAN YOU RUN A RECORDS AND INFORMATION MANAGEMENT PROGRAM ON POST-CLASSIFICATION AND MACHINE LEARNING? IBM Watson can be looked at as a post-classification tool, representing a whole group of NLP tools often marketed under the Big Data umbrella. Post-classification tools use algorithms to produce classification for an information asset after the asset is created or stored. In fact, every major supplier and the open source community offers similar tools. This can even seem overwhelming for those in the eDiscovery space.

The demo was really impressive. It was almost as if Watson went through the test documents by magic- thereby identifying people, organizations, places, phone numbers and frequently used terms. A click on one of the filtered terms lists its occurrences across various documents. The tool also has the ability to produce graphics that connect interrelated terms visually. This is, of course, quite a departure from the rather minimalistic user interfaces typically found in electronic records management systems (ERMS).

What is attractive about potentially using such an analytics tool for Records and Information Management is its ability to reduce the complexity of information management processes, in particular, capturing metadata. It could:

save us from lengthy discussions with information asset creators to get them to provide the right metadata and how and in what format its delivery should take place

eliminate the need to design metadata structures for document types, simplifying forms management and the optical character recognition process (OCR) during scanning

allow users to search for criteria originally not available for lookup

provide the ability to search for important emails in the email archive or unstructured documents.

No doubt, adopting such a tool has the potential to appear to be a silver bullet for Records and Information Managers.

However, two things should be kept in mind: with a tool like Watson, you (a) buy into algorithms and (b) require these algorithms to build the structure that has not been created and provided upfront, i.e. during an on-boarding analysis processes. Or to summarize the two issues: what sort of structure is actually used or produced by such algorithms?

NLP and text mining algorithms (with impressive names like Latent Semantic Indexing or Na ve Bayes) work on a mathematical-statistical basis, building trees based on the frequency and proximity of terms in a given document or set of documents. Typically, they don't work with pre-existing ontologies, and they don't really understand what they recognize. The algorithms are able to identify amounts in documents, but what sort of amounts are they? This is left to human review to sort out, but at least these tools can provide us hints.

PRE-CLASSIFICATION IS STILL THE WAY TO GO FOR RECORDS AND INFORMATION MANAGEMENT The classification structure for Watson is determined by its algorithms, whose outcomes are somehow unpredictable. Tools like Watson are widespread in the forensics and e-discovery communities, as these domains are often interested in unknown bits of information in datasets. But for Records and Information Management usage, this approach is simply not good enough. Precision and recall seem too arbitrary that they could truly satisfy the findability requirement stipulated in ARMA's General Record Keeping Principles, for example.

For simplicity, I am defining a record as something that it is created by a business process. Consequently, as much as business processes are defined and executed in a controlled environment, properties found in records and their metadata are also well known, and their possible values are actually restricted. Here are some of those properties:

client ID

transaction number

transaction type

privacy level

It is fairly intuitive what the metadata terms above mean. The size of their value sets (i.e. possible values) may differ dramatically, from a handful (privacy classification) to perhaps billions (transaction numbers) but both represent restricted value sets and not free, unstructured text. There is semantic behind these properties. Additionally, client IDs and transaction numbers may overlap in terms of format and value, creating serious problems for post-classification algorithms.

Pre-classification, or classifying information before creating or storing it, is clearly still the way to go for Records and Information Management professionals. Declaring the semantics of each required metadata property that describes a record and promotes finding it, and having the business process owner and the producing application to supply the correct values, remains an indispensable activity even in times of machine learning .

That said, I would not encourage you to dismiss of NLP or text mining. These tools can be very helpful if you need to classify, say, millions of office documents on shared drives or Sharepoints. But make sure it is you, the information professional, who can supply the terms, taxonomies, ontologies and document types of the knowledge domain to the analytics tool so the classification does not become arbitrary. It's all about creating well-understood, intended structure.
LINK: http://blogs.ironmountain.com/2016/service-lines/information-managemen...
See more stories from ironmountain

Most recent headlines

21/02/2025

Study: Saturday Night Live' Boosts Peacock Subscriber Revenue

A new study suggests that the 50th season of Saturday Night Live is proving to be a major financial success for NBCUniversal's Peacock, generating an esti...

21/02/2025

Calrec expands its flexible production model at NAB 2025 with True Control 2.0, ImPulseV and Argo M

Calrec expands its flexible production model at NAB 2025 with True Control 2.0, ...

21/02/2025

Rossetto Supermarket Ad Campaign Shot on Blackmagic PYXIS 6K

Rossetto Supermarket Ad Campaign Shot on Blackmagic PYXIS 6K Brie Clayton February 20, 2025 0 Comments Campaign brings dreamlike visuals to life throu...

21/02/2025

Corbel 3D and Pixel Light Effects' Leap to Simple, Fast File Transfer

Corbel 3D and Pixel Light Effects' Leap to Simple, Fast File Transfer Brie Clayton February 20, 2025 0 Comments The Setting Corbel 3D, located in ...

21/02/2025

Alum TK Johnson Backs Bartees Strange for a Tiny Desk Instant Classic

Alum TK Johnson Backs Bartees Strange for a Tiny Desk Instant Classic The drummer has been playing with Bartees since 2022, and also appeared on Jimmy Kimmel ...

20/02/2025

FC Barcelona Star Jules Kounde Reveals His Favorite Rap Anthems on the Latest Edition of Bara Matchday

Ever since Spotify and FC Barcelona forged a first-of-its kind partnership three...

20/02/2025

Spotify Opens Up Support for ElevenLabs Audiobook Content

As Spotify continues to roll out audiobooks to new listeners worldwide, we're committed to providing authors with tools that help them reach those listeners...

20/02/2025

Report: A Quarter of Broadcasters are Using AI

The amount of broadcasters worldwide using artificial intelligence (AI) has more than doubled over the last 12 months, with 25% now employing the technology....

20/02/2025

NESN Launches National FAST Channel

BOSTON New England sports fans nationwide will now have a new FAST channel bringing them news, insight and live programming covering its major league and colleg...

20/02/2025

Chyron Opens Designer of the Year Competition

MELVILLE, N.Y. Chyron today launched its 2025 Designer of the Year Competition and said the winner will be announced live at the 2025 NAB Show, April 5-9, at th...

20/02/2025

NFL Game Pass Sees Significant International Growth on DAZN

LONDON Sports entertainment platform DAZN said the second season of NFL Game Pass on DAZN saw significant growth, with paid subscribers growing by 23% year-over...

20/02/2025

Meet the VP international business development

Stuart Barnes, Yospaces VP international business development, explains the importance of getting stuck in when part of the media tech industry By Matthew Corr...

20/02/2025

C2HR: Broadcast engineering among hottest jobs in content development

Broadcast techs saw a 25% boost in their pay in 2024, according to annual survey By Tom Butts Published: February 20, 2025 Broadcast techs saw a 25% boost...

20/02/2025

NAB Show To Feature New Business Of Entertainment Track

WASHINGTON The 2025 NAB Show, April 5-9, at the Las Vegas Convention Center will mark the debut of the Business of Entertainment track developed with The Ankler...

20/02/2025

Local News Veteran Adrienne Roark Joins Tegna as Chief Content Officer

TYSONS, Va. Tegna Inc. has announced that news veteran Adrienne Roark has been named chief content officer reporting to CEO Mike Steib, effective March 31....

20/02/2025

Roku to Become the Streaming Hub of Bassmaster Tournaments

BIRMINGHAM, Ala. Roku has further expanded its sports content with a new media rights deal with Bassmaster that will make Roku the streaming hub for Bassmaster ...

20/02/2025

NABs Curtis LeGeyt Calls for Modernization of Broadcast Ownership Rules

WASHINGTON National Association of Broadcasters (NAB) President and CEO Curtis LeGeyt opened The Media Institute's 2025 Communications Forum series with a s...

20/02/2025

From Storage Struggles to Streamlined Storytelling How Th...

The Belonging Co., a Nashville-based church renowned for its dynamic worship experiences and multimedia-rich conferences, tapped DigitalGlue's creative.spac...

20/02/2025

Mediahuis Radio Chooses DHD Audio Mixers for New Studios...

Mediahuis Radio continues its expansion with the completion of a new production facility in Amsterdam. DHD RX2 and DX2 audio mixers connected to XD3 Cores form ...

20/02/2025

Calrec expands its flexible production model at NAB 2025

At NAB 2025, Calrec is introducing a suite of new interconnected products and updates aiming to help broadcasters meet a variety of challenges. With increased c...

20/02/2025

Keepit achieves exceptional growth in a tough 2024 market

Keepit, the world's only independent vendor of cloud backup and recovery solutions designed to protect SaaS data, today announced a remarkable year of growt...

20/02/2025

Hitomi Broadcast Showcases Enhanced UHD Capabilities at N...

Expanded SMPTE ST 2110 support strengthens lip-sync and latency measurement solutions Hitomi Broadcast, the market leader in audio/video alignment and latency ...

20/02/2025

LiveU Transforms Tabcorp Sky Racing Coverage through Rem...

LiveU, a leader in live IP-video and remote production solutions, today announced the successful implementation of its comprehensive IP remote production soluti...

20/02/2025

MainConcept and Veset Collaborate to Enhance Live TV with...

MainConcept, a leading provider of video and audio codecs, has announced a partnership with cloud playout solutions provider, Veset, to integrate its JPEG XS SD...

20/02/2025

OOONA Launches OnStage - A Free Archive for Media Localiz...

OOONA, a leading provider of professional management and production tools for the media localization industry, announces the launch of On Stage, a free-to-acces...

20/02/2025

Feisty Feminist Murder Mystery He Had It Coming Announced

18 02 2025 - Media release Feisty Feminist Murder Mystery He Had It Coming Announced Stars of He Had It Coming, Lydia West, Natasha Liu Bordizzo and Liv Hewso...

20/02/2025

Screen Australia and Stan Announce New Comedy-Horror Series Gnomes

18 02 2025 - Media release Screen Australia and Stan Announce New Comedy-Horror Series Gnomes Gnomes writers Tegan Higginbotham and Paul Verhoeven, and creato...

20/02/2025

Do I Know You From Somewhere? Shot with URSA Mini Pro 12K

Do I Know You From Somewhere? Shot with URSA Mini Pro 12K Brie Clayton February 19, 2025 0 Comments Canadian indie film uses Blackmagic camera to crea...

20/02/2025

Berklee Ensemble for Musicians with Disabilities Is Stronger for Our Differences

Berklee Ensemble for Musicians with Disabilities Is Stronger for Our Differences Associate Professor Adrian Anantawan, who founded Berklee's Music Inclusi...

20/02/2025

Ross Video to Showcase Live Production Innovations at 2025 NAB Show

Ottawa, Canada - February 20, 2025 - Ross Video, your most trusted partner in live video production solutions, is excited to announce our participation at the 2...

20/02/2025

Shure Establishes Wireless Microphone Spectrum Alliance

Shure Establishes Wireless Microphone Spectrum Alliance By SVG Staff Thursday, February 20, 2025 - 7:30 am Print This Story | Subscribe Story Highlights...

20/02/2025

Behind the Broadcast Booth Episode 1: Curtis Symonds of HBCUGO Shares His Journey To Create the Network

Behind the Broadcast Booth Episode 1: Curtis Symonds of HBCUGO Shares His Journe...

20/02/2025

European Luge Championships: How Technology Helped to Tell the Story of the Twists and Turns

Twists and turns: How technology helped to tell the story of the European Luge C...

20/02/2025

SVG New Sponsor Spotlight: AudioShake's Suzanne Kirkland Shares How AI Is Making Sound Smarter

SVG New Sponsor Spotlight: AudioShake's Suzanne Kirkland Shares How AI Is Ma...

20/02/2025

World Athletics Ultimate Championship: Behind the Broadcast Scenes of the Inaugural Event in Hungary

Planning process: Behind the broadcast scenes of the inaugural World Athletics U...

20/02/2025

NHL 4 Nations Face-Off Championship Game Has All the Makings of a Game 7 for ESPN

NHL 4 Nations Face-Off Championship Game Has All the Makings of a Game 7 for ESP...

20/02/2025

Rohde & Schwarz successfully validates ML-enhanced channel-state information feedback with Qualcomm for 5G-Advanced

Rohde & Schwarz successfully validates ML-enhanced channel-state information fee...

20/02/2025

Riedel Introduces Breakthrough Communication Capabilities for Louis Vuitton 37th America's Cup Barcelona

Wuppertal February 20, 2025 Riedel Introduces Breakthrough Communication Capab...

20/02/2025

Meet the Singles From the Upcoming Season of 'Love is Blind: Sweden'

Back to All News Meet the Singles From the Upcoming Season of Love is Blind: Sweden Entertainment 20 February 2025 GlobalDenmarkNorwaySweden Link copied to...

20/02/2025

Jisoo and Seo In-guk Star in Boyfriend on Demand' (WT), A New Rom-Com Series Set in Virtual Reality

Back to All News Jisoo and Seo In-guk Star in Boyfriend on Demand' (WT), A...

20/02/2025

Ted Sarandos Keynote: Mexico's Presidency Press Conference

Back to All News Ted Sarandos Keynote: Mexico's Presidency Press Conference President Claudia Sheinbaum & Ted Sarandos Ted Sarandos co-CEO Business 20...

20/02/2025

Comscore Expands YouTube CTV Viewership Measurement to International Markets

Comscore Expands YouTube CTV Viewership Measurement to International MarketsCross-Platform Content Consumption Metrics Now Available in Canada, France, Spain, a...

20/02/2025

It's a Sign: AI Platform for Teaching American Sign Language Aims to Bridge Communication Gaps

American Sign Language is the third most prevalent language in the United States...

20/02/2025

Thales, Milrem Robotics, and EM&E Group sign a MoU for strategic cooperation in the United Arab Emirates

Facebook Twitter LinkedIn February 18, 2025 - Abu Dhabi, UAE: Milrem Robot...

20/02/2025

The RT Choice Music Prize Classic Irish Album 2025 is

Celebrating 20 Years of the RT Choice Music Prize RT Choice Music Prize In association with IMRO and IRMA Classic Irish Album And the winning album is W...

20/02/2025

Calling All Creators: GeForce RTX 5070 Ti GPU Accelerates Generative AI and Content Creation Workflows in Video Editing, 3D and More

The NVIDIA GeForce RTX 5070 Ti graphics cards - built on the NVIDIA Blackwell ar...

20/02/2025

Into the Omniverse: How OpenUSD and Synthetic Data Are Shaping the Future for Humanoid Robots

Editor's note: This post is part of Into the Omniverse, a series focused on ...

20/02/2025

Step Into the World of Avowed' on GeForce NOW

Wield magic and steel as GeForce NOW's fifth-anniversary celebration summons Obsidian Entertainment's highly anticipated Avowed to the cloud. This firs...

19/02/2025

Give Your Playlist Covers a Tyler, The Creator Touch With Our Exclusive Stickers

For more than a decade, Tyler, The Creator has blazed his own trail, exploring unique aesthetics across music, fashion, and art. Now he's bringing his signa...

19/02/2025

10 Billion Streams and Counting: Spotify Singles Celebrates Its Biggest Hits

Spotify Singles, our longest-running original recorded music franchise, has officially surpassed 10 billion collective streams worldwide. That's a whole lot...