In an era where artificial intelligence (AI) profoundly reshapes the ways knowledge is produced and organized, the library sector faces an unprecedented transformation. Various AI tools are penetrating every aspect of metadata creation at an unprecedented speed; however, there is a general lack of a unified, systematic framework in the industry to evaluate, deploy, and manage the complex patterns of human-machine collaboration. This theoretical gap has led to explorations in practice that are often fragmented and lack strategic guidance. To address this challenge, this article proposes a novel conceptual model called the "Library Metadata Creation Scale" (LMCS), aimed at providing a clear analytical coordinate for this transformation.
I. Proposal of the LMCS Framework#
The rise of generative artificial intelligence has brought a paradigm-level impact to the field of library metadata. It heralds unprecedented possibilities for efficiency improvement and service innovation while simultaneously triggering profound concerns in the industry regarding metadata quality, academic integrity, and professional role positioning. In practice, discussions surrounding AI tools often quickly fall into a polarized dilemma: either adhering to tradition and rejecting any AI intervention in a "fully manual" mode, or embracing technology and pursuing extreme efficiency in a "fully automated" vision. This binary opposition not only fails to address real issues but also exacerbates practitioners' anxiety and decision-making confusion.
The "Library Metadata Creation Scale" (LMCS) was conceived and proposed against this backdrop. Its core purpose is to transcend the simplistic dichotomy of "allow or prohibit" and provide the library sector with a more nuanced and operationally viable structured framework. This framework aims to offer a common language for managers, catalogers, and technology developers to clearly define and communicate the boundaries, patterns, and responsibilities of human-machine collaboration in different scenarios. Its theoretical construction is primarily based on the following considerations:
- Responding to the "binary dilemma" in practice, advocating for refined governance: The birth of LMCS is, first and foremost, a direct response to the simplistic binary discourse prevalent in current industry discussions. It recognizes that viewing AI as a single, homogeneous concept is erroneous. Instead, the application of AI in metadata creation is a continuous spectrum. LMCS attempts to deconstruct this spectrum into five clear, manageable levels, evolving from an initial "traffic light" model (such as "prohibit AI," "partially allow," "fully allow") into a more instructive hierarchical system. This allows libraries to make differentiated strategic choices based on resource types, importance, and processing goals, rather than a one-size-fits-all approach.
- Drawing on historical experiences of technology integration to provide a forward-looking path: Throughout history, every disruptive technology—from the photocopier and computer to the internet—was initially viewed by libraries as a threat to traditional skills and workflows. However, these technologies ultimately found paths to integrate with professional practice and became indispensable infrastructure. LMCS draws on this historical perspective, suggesting that rather than passively resisting or hastily accepting, it is better to proactively design a gradual, clearly defined integration path. It provides a predictable developmental ladder for AI tools, evolving from auxiliary "consultants" (Level 2) to deep "partners" (Levels 4-5).
- Reconciling the inherent theoretical tensions within the industry to achieve theoretical coherence: The LMCS framework is deeply rooted in the century-long theoretical debates that have persisted in the field of library cataloging. It seeks to systematically reconcile two core value pursuits: on one hand, the "normative ideal" represented by Charles Ammi Cutter, which strives for perfection in individual records (corresponding to Levels 1-2 of LMCS); on the other hand, the realism principle that emphasizes "usability first" in response to the vast amounts of information, such as the "More Product, Less Process" (MPLP) philosophy in the archival community (corresponding to Levels 3-5 of LMCS). LMCS does not aim to judge which is superior but acknowledges the coexisting value of these two theories in different contexts and provides the possibility for their coexistence within a unified strategic framework.
In summary, LMCS aims to serve as a strategic tool that integrates diagnostic, planning, and communication functions. It not only provides guidance for current practice but, more importantly, seeks to reshape discussions about AI from a defensive discourse centered on "threat" and "replacement" to a constructive dialogue focused on "collaboration," "enhancement," and "professional evolution."
The scale categorizes human-machine collaboration modes in metadata creation from complete reliance on human intelligence to fully autonomous machine operation into five progressive levels.
Level (Level) | Name (Name) | Core Description (Core Description) | Key Requirements & Librarian's Responsibility (Key Requirements & Librarian's Responsibility) | Typical Application Scenarios (Typical Application Scenarios) |
---|---|---|---|---|
1 | Original Cataloging (Original Cataloging) | Metadata records are entirely manually created by catalogers, without using any AI generation tools. Catalogers rely on traditional tools and standards such as RDA, MARC21, LCSH. | Catalogers bear full responsibility for the accuracy, completeness, and compliance of every field in the record. This is the benchmark of traditional cataloging work. | - Original cataloging for unique collections (e.g., manuscripts, archives, theses). - Creating high-standard "gold" records for national bibliographies or authoritative agencies. - Training new catalogers to master the basic rules and thinking of cataloging. |
2 | AI-Assisted Suggestion (AI-Assisted Suggestion) | AI serves as a consulting tool, providing suggestions or options for specific fields but does not directly generate complete records. | Catalogers are responsible for critically evaluating all AI suggestions, making final choices, and manually completing records. AI is a tool for auxiliary thinking, and catalogers remain the sole creators of the records. | - AI recommends subject terms (LCSH/FAST) or classification numbers (DDC/LCC) based on titles, abstracts, or full texts. - AI extracts possible keywords or entities (names, places) from the text. - AI suggests applicable MARC field tags. |
3 | AI-Assisted Enhancement & Cleanup (AI-Assisted Enhancement & Cleanup) | AI enhances, corrects, or formats an existing, incomplete, or low-quality record (e.g., vendor records, abridged records). | Catalogers provide the initial record and must review all modifications made by AI, ensuring accuracy, that core semantics are unchanged, and compliance with local policies. The role of catalogers is as "editors" and "proofreaders." | - Automatically correcting punctuation and subfield codes in MARC records. - Automatically normalizing names of persons and corporate bodies based on authoritative documents (e.g., VIAF). - Automatically expanding abbreviations or translating abstracts into another language. - Enriching records, such as automatically adding content notes (field 505) based on content. |
4 | Machine-Generated Record, Human Review (Machine-Generated Record, Human Review) | AI automatically generates a complete, reviewable metadata record based on the resource itself (e.g., scanned text, PDF files, audio, and video). | The core responsibility of catalogers shifts from "creation" to "review and validation." They must carefully check the AI-generated preliminary record, correct errors, fill in omissions, and ultimately approve it. This is the primary mode of human-machine collaboration. | - Rapid cataloging of large batches of e-books or journal articles, with AI automatically extracting authors, titles, ISBNs, abstracts, etc. - Automatically generating descriptive metadata for digitized image collections (e.g., identifying image content, extracting EXIF data). - Converting unstructured bibliographic information (e.g., reference lists) into structured MARC records. |
5 | Fully Automated Metadata Generation (Fully Automated Metadata Generation) | AI autonomously completes the processes of metadata creation, validation, and ingestion, triggering human intervention only in cases of unprocessable exceptions or low confidence. | The role of catalogers shifts to "system managers" and "quality monitors." They are responsible for configuring AI rules, monitoring overall system performance, conducting regular sampling audits of record quality, and addressing issues reported by AI. | - Real-time processing of large-scale publisher data streams or open-access repositories, automatically generating metadata and loading it into discovery systems. - Automatically creating metadata records for submissions in institutional repositories (e.g., preprints). - Automatically tagging and classifying user-generated content (e.g., photos, videos). |
II. Discussion#
The value of LMCS extends far beyond its practicality as an operational guide; it serves as a theoretical prism that refracts and attempts to reconcile the long-standing fundamental tensions within the library profession, thereby deriving a logically rigorous path for professional redefinition.
The five levels of LMCS are not merely a technical ladder but a systematic encoding and response to the core theoretical debates in the history of library cataloging. The essence of this debate has always revolved around the tension between the "normative ideal" and "efficiency reality."
-
The inheritance and limitation of the "normative ideal": Levels 1-2 of LMCS are a direct reflection of Charles Ammi Cutter's "Bibliographic Objectives" principle in contemporary times. It strives to create a perfect record for each resource, emphasizing the core role of human intelligence in semantic understanding, knowledge connections, and authority control. This "craftsmanship" is the cornerstone of library professionalism, ensuring the deep revelation of core collections and high-value knowledge assets. However, the LMCS framework also recognizes that applying this ideal to all resources is neither realistic nor necessary in the age of information explosion. By limiting this model to specific scenarios (e.g., rare books, manuscripts), it preserves its value and avoids the collapse of the system caused by infinite generalization.
-
The integration and elevation of the "efficiency reality": Levels 3-5 of LMCS absorb and develop the archival community's "More Product, Less Process" (MPLP) realism philosophy. MPLP acknowledges that for vast backlogs of collections, "good enough" metadata far outweighs the absence of metadata. LMCS elevates this principle from a makeshift response to backlogs to a proactive, graded strategic choice. It is no longer the opposite of "perfection" but constitutes a complementary strategy that serves different information discovery needs.
More importantly, LMCS signifies a fundamental theoretical shift: from "bibliographic control" to "bibliographic governance." Traditional "bibliographic control" emphasizes a centralized, institution-led authoritative production and gatekeeping of individual records. Under the LMCS framework, the role of libraries shifts to that of "governors" of a metadata ecosystem. "Governance" means that libraries are no longer the sole producers of all metadata but coordinators of diverse production entities, including people, machines, vendors, and even user-generated content. Its core task shifts from "creation" to designing and supervising a credible, quality-controllable, human-machine collaborative metadata production system. This represents a higher-dimensional control, a systematic governance based on rules, strategies, and quality audits.
Based on the above theoretical analysis, LMCS outlines a clear practical path for the professional evolution of librarians, which is essentially a profound transfer of "professional jurisdiction" and may give rise to changes in organizational forms and service paradigms.
- The transfer of professional jurisdiction and skill reconstruction: The core jurisdiction of traditional catalogers lies in the exquisite interpretation and manual application of cataloging rules. In the higher-level models of LMCS, machines take on much of the work of rule application, and the new core jurisdiction of librarians lies in the "design, validation, and ethical supervision" of automated processes. The focus of work shifts from "craftsmen on the production line" to "architects of knowledge systems." This evolution requires a systematic reconstruction of the skill stack:
- At Levels 1-2, value is reflected in deep content knowledge of cataloging, subject headings, classification systems, etc. (Content Knowledge).
- At Levels 3-4, value is reflected in process knowledge such as data evaluation, pattern recognition, and human-machine interaction efficiency (Process Knowledge).
- At Level 5, value is reflected in metacognitive knowledge such as systems thinking, data analysis, strategic planning, and ethical decision-making (Metacognitive Knowledge).
- The inevitable transformation of organizational structure: The transfer of professional jurisdiction will inevitably impact traditional departmental structures based on homogeneous tasks. Libraries that fully adopt LMCS will see their technical services departments evolve from a single "cataloging department" into a functionally differentiated "metadata strategy center." This center may include:
- Special Collections and Original Cataloging Group (focusing on Levels 1-2): Composed of senior experts responsible for handling unique, complex, high-value collections, inheriting core professional skills.
- Bulk Processing and Data Enhancement Group (focusing on Levels 3-4): The main force of human-machine collaboration, responsible for processing large-scale digital and physical resources, emphasizing the balance between efficiency and quality.
- Metadata Systems and Strategy Group (focusing on Level 5): Responsible for formulating overall metadata policies, evaluating and configuring AI tools, monitoring the quality and ethical compliance of automated processes, serving as the "brain" of the entire system.
- The expansion of the "Metadata as a Service" (MaaS) concept: The transformation of organizational structure enables the metadata department to shift from an internal production unit to a "service provider" for internal and external users. With the support of AI capabilities, the connotation of "Metadata as a Service" can be greatly expanded. For example, it can provide "on-demand metadata generation" services for researchers at the institution, quickly processing their research datasets; or utilize AI for large-scale metadata analysis to support decision-making for subject services; or even offer metadata cleaning and enhancement consulting services to small cultural institutions lacking technical capabilities, thereby expanding the social value of libraries.
This evolution signifies an organizational transformation in the technical services department, shifting from a "production line" model based on task homogeneity to a "portfolio management" model based on LMCS levels and resource types. Different teams will focus on different LMCS levels, forming a complementary professional ecosystem composed of "special collections cataloging experts" (Levels 1-2), "data enhancement and quality control teams" (Levels 3-4), and "metadata strategy and systems analysts" (Level 5).
III. Critical Examination#
As a theoretical model, the elegant simplicity of LMCS also conceals risks that warrant caution. A critical examination reveals four core challenges it may face in practice.
- The illusion of "linear progress": Viewing the five levels as an evolutionary ladder from "backward" to "advanced" is a dangerous form of technological determinism. We must emphasize that LMCS is a "diagnostic toolbox" applicable to different contexts, rather than an "evolutionary goal" that must be achieved. For a medieval manuscript, Level 1 will always be a more "advanced" and appropriate choice than Level 5. The value of work should not be defined by the degree of automation; otherwise, it will lead to a devaluation of professional judgment and "craftsmanship," eroding the core values of libraries.
- The ethical crisis of the "algorithmic black box": High levels of automation heavily rely on AI models, which may have systemic biases in their training data (e.g., linguistic, cultural, regional biases). When the role of librarians shifts from "creators" to "reviewers," will their ability to identify and correct these deeply embedded, more covert epistemological biases within algorithms diminish? This is not only a technical issue but also an ethical crisis concerning knowledge equity and epistemic justice, directly challenging libraries' social commitment as neutral and inclusive guardians of knowledge.
- The risk of "hollowing out" professional skills: If the new generation of librarians works long-term in a Level 3-4 environment without systematic training in Levels 1-2, they may "know what" but not "know why," failing to grasp the underlying logic and complex rules that support the entire professional edifice. When AI makes mistakes, they may be unable to make fundamental corrections. Over time, this could lead to intergenerational loss of professional skills, ultimately causing us to lose knowledge dominance and professional authority in collaboration with machines, descending from "architects" to "repair workers."
- Exacerbating the new "digital divide": High-quality AI cataloging tools and services, whether commercially procured or self-developed, require substantial financial and technical investments. This is likely to create a new divide within the library sector. Well-funded university libraries can easily achieve efficient automation at Levels 4-5, while cash-strapped public libraries or local institutions may still remain at Levels 1-2. This "differentiation in metadata productivity" will directly lead to significant disparities in the level of information resource revelation, ultimately evolving into a gap in service quality and user access rights, contrary to the fundamental mission of libraries to promote information equity.
Conclusion#
The "Library Metadata Creation Scale" (LMCS) provides us with a powerful tool for examining and navigating metadata practices in the AI era. But its more significant meaning lies in its compulsion to confront the core contradictions of the industry and rethink the professional value of librarians.
The future path does not lie in making a binary choice between "fully manual" and "fully automated." The real challenge is whether librarians can transcend being mere rule executors and become critical designers and ethical guardians of human-machine collaborative systems. This means that we must embrace the efficiency brought by automation while defending thoughtful human judgment, maintaining fairness in knowledge representation, and ensuring that professional wisdom is inherited and elevated in the new technological ecosystem. Only in this way can we truly harness technology in the intelligent era, rather than be defined by it.