
I recently watched a demo of the IBM Watson natural language processing (NLP) tool that showed how it was used by police for criminal investigations at a Records and Information Management conference. I was struck by the fact that this fascinating tool may have led some Records and Information Managers in the audience to think that this was the new, disruptive way to manage records.
CAN YOU RUN A RECORDS AND INFORMATION MANAGEMENT PROGRAM ON POST-CLASSIFICATION AND MACHINE LEARNING? IBM Watson can be looked at as a post-classification tool, representing a whole group of NLP tools often marketed under the Big Data umbrella. Post-classification tools use algorithms to produce classification for an information asset after the asset is created or stored. In fact, every major supplier and the open source community offers similar tools. This can even seem overwhelming for those in the eDiscovery space.
The demo was really impressive. It was almost as if Watson went through the test documents by magic- thereby identifying people, organizations, places, phone numbers and frequently used terms. A click on one of the filtered terms lists its occurrences across various documents. The tool also has the ability to produce graphics that connect interrelated terms visually. This is, of course, quite a departure from the rather minimalistic user interfaces typically found in electronic records management systems (ERMS).
What is attractive about potentially using such an analytics tool for Records and Information Management is its ability to reduce the complexity of information management processes, in particular, capturing metadata. It could:
save us from lengthy discussions with information asset creators to get them to provide the right metadata and how and in what format its delivery should take place
eliminate the need to design metadata structures for document types, simplifying forms management and the optical character recognition process (OCR) during scanning
allow users to search for criteria originally not available for lookup
provide the ability to search for important emails in the email archive or unstructured documents.
No doubt, adopting such a tool has the potential to appear to be a silver bullet for Records and Information Managers.
However, two things should be kept in mind: with a tool like Watson, you (a) buy into algorithms and (b) require these algorithms to build the structure that has not been created and provided upfront, i.e. during an on-boarding analysis processes. Or to summarize the two issues: what sort of structure is actually used or produced by such algorithms?
NLP and text mining algorithms (with impressive names like Latent Semantic Indexing or Na ve Bayes) work on a mathematical-statistical basis, building trees based on the frequency and proximity of terms in a given document or set of documents. Typically, they don't work with pre-existing ontologies, and they don't really understand what they recognize. The algorithms are able to identify amounts in documents, but what sort of amounts are they? This is left to human review to sort out, but at least these tools can provide us hints.
PRE-CLASSIFICATION IS STILL THE WAY TO GO FOR RECORDS AND INFORMATION MANAGEMENT The classification structure for Watson is determined by its algorithms, whose outcomes are somehow unpredictable. Tools like Watson are widespread in the forensics and e-discovery communities, as these domains are often interested in unknown bits of information in datasets. But for Records and Information Management usage, this approach is simply not good enough. Precision and recall seem too arbitrary that they could truly satisfy the findability requirement stipulated in ARMA's General Record Keeping Principles, for example.
For simplicity, I am defining a record as something that it is created by a business process. Consequently, as much as business processes are defined and executed in a controlled environment, properties found in records and their metadata are also well known, and their possible values are actually restricted. Here are some of those properties:
client ID
transaction number
transaction type
privacy level
It is fairly intuitive what the metadata terms above mean. The size of their value sets (i.e. possible values) may differ dramatically, from a handful (privacy classification) to perhaps billions (transaction numbers) but both represent restricted value sets and not free, unstructured text. There is semantic behind these properties. Additionally, client IDs and transaction numbers may overlap in terms of format and value, creating serious problems for post-classification algorithms.
Pre-classification, or classifying information before creating or storing it, is clearly still the way to go for Records and Information Management professionals. Declaring the semantics of each required metadata property that describes a record and promotes finding it, and having the business process owner and the producing application to supply the correct values, remains an indispensable activity even in times of machine learning .
That said, I would not encourage you to dismiss of NLP or text mining. These tools can be very helpful if you need to classify, say, millions of office documents on shared drives or Sharepoints. But make sure it is you, the information professional, who can supply the terms, taxonomies, ontologies and document types of the knowledge domain to the analytics tool so the classification does not become arbitrary. It's all about creating well-understood, intended structure.
Most recent headlines
21/02/2025
A new study suggests that the 50th season of Saturday Night Live is proving to be a major financial success for NBCUniversal's Peacock, generating an esti...
21/02/2025
Calrec expands its flexible production model at NAB 2025 with True Control 2.0, ...
21/02/2025
Rossetto Supermarket Ad Campaign Shot on Blackmagic PYXIS 6K
Brie Clayton February 20, 2025
0 Comments
Campaign brings dreamlike visuals to life throu...
21/02/2025
Corbel 3D and Pixel Light Effects' Leap to Simple, Fast File Transfer
Brie Clayton February 20, 2025
0 Comments
The Setting Corbel 3D, located in ...
21/02/2025
Alum TK Johnson Backs Bartees Strange for a Tiny Desk Instant Classic The drummer has been playing with Bartees since 2022, and also appeared on Jimmy Kimmel ...
20/02/2025
Ever since Spotify and FC Barcelona forged a first-of-its kind partnership three...
20/02/2025
As Spotify continues to roll out audiobooks to new listeners worldwide, we're committed to providing authors with tools that help them reach those listeners...
20/02/2025
The amount of broadcasters worldwide using artificial intelligence (AI) has more than doubled over the last 12 months, with 25% now employing the technology....
20/02/2025
BOSTON New England sports fans nationwide will now have a new FAST channel bringing them news, insight and live programming covering its major league and colleg...
20/02/2025
MELVILLE, N.Y. Chyron today launched its 2025 Designer of the Year Competition and said the winner will be announced live at the 2025 NAB Show, April 5-9, at th...
20/02/2025
LONDON Sports entertainment platform DAZN said the second season of NFL Game Pass on DAZN saw significant growth, with paid subscribers growing by 23% year-over...
20/02/2025
Stuart Barnes, Yospaces VP international business development, explains the importance of getting stuck in when part of the media tech industry
By Matthew Corr...
20/02/2025
Broadcast techs saw a 25% boost in their pay in 2024, according to annual survey
By Tom Butts
Published: February 20, 2025
Broadcast techs saw a 25% boost...
20/02/2025
WASHINGTON The 2025 NAB Show, April 5-9, at the Las Vegas Convention Center will mark the debut of the Business of Entertainment track developed with The Ankler...
20/02/2025
TYSONS, Va. Tegna Inc. has announced that news veteran Adrienne Roark has been named chief content officer reporting to CEO Mike Steib, effective March 31....
20/02/2025
BIRMINGHAM, Ala. Roku has further expanded its sports content with a new media rights deal with Bassmaster that will make Roku the streaming hub for Bassmaster ...
20/02/2025
WASHINGTON National Association of Broadcasters (NAB) President and CEO Curtis LeGeyt opened The Media Institute's 2025 Communications Forum series with a s...
20/02/2025
The Belonging Co., a Nashville-based church renowned for its dynamic worship experiences and multimedia-rich conferences, tapped DigitalGlue's creative.spac...
20/02/2025
Mediahuis Radio continues its expansion with the completion of a new production facility in Amsterdam. DHD RX2 and DX2 audio mixers connected to XD3 Cores form ...
20/02/2025
At NAB 2025, Calrec is introducing a suite of new interconnected products and updates aiming to help broadcasters meet a variety of challenges. With increased c...
20/02/2025
Keepit, the world's only independent vendor of cloud backup and recovery solutions designed to protect SaaS data, today announced a remarkable year of growt...
20/02/2025
Expanded SMPTE ST 2110 support strengthens lip-sync and latency measurement solutions
Hitomi Broadcast, the market leader in audio/video alignment and latency ...
20/02/2025
LiveU, a leader in live IP-video and remote production solutions, today announced the successful implementation of its comprehensive IP remote production soluti...
20/02/2025
MainConcept, a leading provider of video and audio codecs, has announced a partnership with cloud playout solutions provider, Veset, to integrate its JPEG XS SD...
20/02/2025
OOONA, a leading provider of professional management and production tools for the media localization industry, announces the launch of On Stage, a free-to-acces...
20/02/2025
18 02 2025 - Media release Feisty Feminist Murder Mystery He Had It Coming Announced
Stars of He Had It Coming, Lydia West, Natasha Liu Bordizzo and Liv Hewso...
20/02/2025
18 02 2025 - Media release Screen Australia and Stan Announce New Comedy-Horror Series Gnomes
Gnomes writers Tegan Higginbotham and Paul Verhoeven, and creato...
20/02/2025
Do I Know You From Somewhere? Shot with URSA Mini Pro 12K
Brie Clayton February 19, 2025
0 Comments
Canadian indie film uses Blackmagic camera to crea...
20/02/2025
Berklee Ensemble for Musicians with Disabilities Is Stronger for Our Differences Associate Professor Adrian Anantawan, who founded Berklee's Music Inclusi...
20/02/2025
Ottawa, Canada - February 20, 2025 - Ross Video, your most trusted partner in live video production solutions, is excited to announce our participation at the 2...
20/02/2025
Shure Establishes Wireless Microphone Spectrum Alliance By SVG Staff
Thursday, February 20, 2025 - 7:30 am
Print This Story | Subscribe
Story Highlights...
20/02/2025
Behind the Broadcast Booth Episode 1: Curtis Symonds of HBCUGO Shares His Journe...
20/02/2025
Twists and turns: How technology helped to tell the story of the European Luge C...
20/02/2025
SVG New Sponsor Spotlight: AudioShake's Suzanne Kirkland Shares How AI Is Ma...
20/02/2025
Planning process: Behind the broadcast scenes of the inaugural World Athletics U...
20/02/2025
NHL 4 Nations Face-Off Championship Game Has All the Makings of a Game 7 for ESP...
20/02/2025
Rohde & Schwarz successfully validates ML-enhanced channel-state information fee...
20/02/2025
Wuppertal February 20, 2025
Riedel Introduces Breakthrough Communication Capab...
20/02/2025
Back to All News
Meet the Singles From the Upcoming Season of Love is Blind: Sweden
Entertainment
20 February 2025
GlobalDenmarkNorwaySweden
Link copied to...
20/02/2025
Back to All News
Jisoo and Seo In-guk Star in Boyfriend on Demand' (WT), A...
20/02/2025
Back to All News
Ted Sarandos Keynote: Mexico's Presidency Press Conference
President Claudia Sheinbaum & Ted Sarandos
Ted Sarandos
co-CEO
Business
20...
20/02/2025
Comscore Expands YouTube CTV Viewership Measurement to International MarketsCross-Platform Content Consumption Metrics Now Available in Canada, France, Spain, a...
20/02/2025
American Sign Language is the third most prevalent language in the United States...
20/02/2025
Facebook
Twitter
LinkedIn
February 18, 2025 - Abu Dhabi, UAE: Milrem Robot...
20/02/2025
Celebrating 20 Years of the RT Choice Music Prize
RT Choice Music Prize
In association with IMRO and IRMA
Classic Irish Album
And the winning album is
W...
20/02/2025
The NVIDIA GeForce RTX 5070 Ti graphics cards - built on the NVIDIA Blackwell ar...
20/02/2025
Editor's note: This post is part of Into the Omniverse, a series focused on ...
20/02/2025
Wield magic and steel as GeForce NOW's fifth-anniversary celebration summons Obsidian Entertainment's highly anticipated Avowed to the cloud.
This firs...
19/02/2025
For more than a decade, Tyler, The Creator has blazed his own trail, exploring unique aesthetics across music, fashion, and art. Now he's bringing his signa...
19/02/2025
Spotify Singles, our longest-running original recorded music franchise, has officially surpassed 10 billion collective streams worldwide. That's a whole lot...