World Library  
Flag as Inappropriate
Email this Article

Marc-8

Article Id: WHEBN0024870820
Reproduction Date:

Title: Marc-8  
Author: World Heritage Encyclopedia
Language: English
Subject: Character encodings, HP roman8, MacArabic encoding, KEIS, JEF codepage
Collection: Character Sets
Publisher: World Heritage Encyclopedia
Publication
Date:
 

Marc-8

The MARC-8 charset is a MARC standard used in MARC-21 library records.[1] The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form, and they are frequently used in library computer systems. The encoding now known as MARC-8 was introduced in 1968 with the beginning of the use of the MARC format. Over the years it has grown to include code points for a large repertoire of characters including Latin, Cyrillic, Arabic, Hebrew, and Greek scripts and over 15,000 characters used in writing Chinese, Japanese and Korean. If a character is not representable in MARC-8 of a MARC-21 record, then UTF-8 must be used instead. UTF-8 has support for many more characters than MARC-8. MARC-8 is rarely used outside of library records.

Contents

  • Technical Details 1
    • Code structure 1.1
    • Custom set extension 1.2
  • References 2
  • External links 3

Technical Details

MARC-8 uses a variant of the ISO-2022 encoding. It uses escape characters to represent characters beyond the 7-bit ASCII range of characters.

It generally uses the same logical BiDi ordering as Unicode.

The combining characters and base characters are in a different order than used in Unicode. The following are some examples. The combining characters are not always stored in reverse order as Unicode normalization. The MARC-21 standard describes the MARC-8 Unicode conversion issues in more detail.

Displayed Character Unicode NFD MARC-8
á a  ́   ́ a
a   ̣   ̂   ̂   ̣ a

Code structure

The ISO/IEC 2022 coding specifies a two-layer mapping between character codes and displayed characters. In MARC-8, character codes from the 7-bit ASCII graphic range (0x20–0x7F) are referred to as "G0" codes, while codes from the "high ASCII" range (0xA0–0xFF) are referred to as the "G1" codes. Graphic character sets are designated and invoked by means of a multiple byte escape sequence consisting of the escape character, an Intermediate character sequence, and a Final character in the form ESC I F.

The following table shows the intermediate byte after the ESC byte (hexadecimal 1B), and the corresponding ASCII characters.

Intermediate Bytes[2]
G0 set G1 set
SBCS MBCS SBCS MBCS
Normal ISO-2022 28 ( 24 $ 29 ) 24 29 $)
Alternate ISO-2022 (additional 63+16 sets) 2C , 24 2C $, 2D - 24 2D $-

The following table shows the final bytes in hexadecimal and the corresponding ASCII characters after the intermediate bytes.

Final Bytes[3]
Bytes Characters Name Type Comment
31 1 Chinese, Japanese, Korean (EACC) MBCS
32 2 Basic Hebrew SBCS
33 3 Basic Arabic SBCS
34 4 Extended Arabic SBCS
42 B Basic Latin (ASCII) SBCS
21 45 !E Extended Latin (ANSEL) SBCS The 21(hex) technically is a second byte of the Intermediate segment of this escape sequence.
4E N Basic Cyrillic SBCS
51 Q Extended Cyrillic SBCS
53 S Basic Greek SBCS

The EACC is the only multibyte encoding of MARC-8, it encodes each CJK character in three ASCII bytes.

For example, to encode the U+4EBA CJK character (人) you will need the following bytes

 \x1B\x24\x31\x21\x30\x64

The \x1B\x24\x31 switches to EACC/CJK, and the \x21\x30\x34 corresponds to the U+4EBA.

Custom set extension

In addition to the ISO-2022 character sets, the following custom sets are available too. The byte designation follows the escape byte (hexadecimal 1B). There is no intermediate byte.

Final Bytes[4]
Bytes Characters Name Type Comment
62 b Subscript set SBCS
67 g Greek Symbol set SBCS The alpha, beta, gamma characters normally do not round trip map to Unicode.
70 p Superscript set SBCS
73 s Basic Latin (ASCII) SBCS

References

  1. ^ http://www.loc.gov/marc/specifications/speccharintro.html
  2. ^ http://www.loc.gov/marc/specifications/speccharmarc8.html
  3. ^ http://www.loc.gov/marc/specifications/speccharmarc8.html
  4. ^ http://www.loc.gov/marc/specifications/speccharmarc8.html

External links

  • MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media - The official MARC-8 standard as maintained by the US Library of Congress
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
 



Copyright © World Library Foundation. All rights reserved. eBooks from World eBook Library are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.