Jump to content

Module:Citation/CS1/Identifiers: Difference between revisions

sync from sandbox;
m (1 revision imported)
(sync from sandbox;)
Line 250: Line 250:
--[[--------------------------< N O R M A L I Z E _ L C C N >--------------------------------------------------
--[[--------------------------< N O R M A L I Z E _ L C C N >--------------------------------------------------


LCCN normalization (http://www.loc.gov/marc/lccn-namespace.html#normalization)
LCCN normalization (https://www.loc.gov/marc/lccn-namespace.html#normalization)
1. Remove all blanks.
1. Remove all blanks.
2. If there is a forward slash (/) in the string, remove it, and remove all characters to the right of the forward slash.
2. If there is a forward slash (/) in the string, remove it, and remove all characters to the right of the forward slash.
Line 287: Line 287:
--[[--------------------------< A R X I V >--------------------------------------------------------------------
--[[--------------------------< A R X I V >--------------------------------------------------------------------


See: http://arxiv.org/help/arxiv_identifier
See: https://arxiv.org/help/arxiv_identifier


format and error check arXiv identifier.  There are three valid forms of the identifier:
format and error check arXiv identifier.  There are three valid forms of the identifier:
Line 381: Line 381:
Validates (sort of) and formats a bibcode ID.
Validates (sort of) and formats a bibcode ID.


Format for bibcodes is specified here: http://adsabs.harvard.edu/abs_doc/help_pages/data.html#bibcodes
Format for bibcodes is specified here: https://adsabs.harvard.edu/abs_doc/help_pages/data.html#bibcodes


But, this: 2015arXiv151206696F is apparently valid so apparently, the only things that really matter are length, 19 characters
But, this: 2015arXiv151206696F is apparently valid so apparently, the only things that really matter are length, 19 characters
Line 527: Line 527:
and terminal punctuation may not be technically correct but it appears, that in practice these characters are rarely
and terminal punctuation may not be technically correct but it appears, that in practice these characters are rarely
if ever used in DOI names.
if ever used in DOI names.
https://www.doi.org/doi_handbook/2_Numbering.html -- 2.2 Syntax of a DOI name
https://www.doi.org/doi_handbook/2_Numbering.html#2.2.2 -- 2.2.2 DOI prefix


]]
]]
Line 573: Line 576:
'^[^1-9]%d%d%d$', -- 4 digits without subcode (0xxx); accepts: 1000–9999
'^[^1-9]%d%d%d$', -- 4 digits without subcode (0xxx); accepts: 1000–9999
'^%d%d%d%d%d%d+', -- 6 or more digits
'^%d%d%d%d%d%d+', -- 6 or more digits
'^%d%d?%d?$', -- less than 4 digits without subcode (with subcode is legitimate)
'^%d%d?%d?$', -- less than 4 digits without subcode (3 digits with subcode is legitimate)
'^%d%d?%.[%d%.]+', -- 1 or 2 digits with subcode
'^5555$', -- test registrant will never resolve
'^5555$', -- test registrant will never resolve
'[^%d%.]', -- any character that isn't a digit or a dot
'[^%d%.]', -- any character that isn't a digit or a dot
Line 621: Line 625:
if ever used in HDLs.
if ever used in HDLs.


Query string parameters are named here: http://www.handle.net/proxy_servlet.html.  query strings are not displayed
Query string parameters are named here: https://www.handle.net/proxy_servlet.html.  query strings are not displayed
but since '?' is an allowed character in an HDL, '?' followed by one of the query parameters is the only way we
but since '?' is an allowed character in an HDL, '?' followed by one of the query parameters is the only way we
have to detect the query string so that it isn't URL-encoded with the rest of the identifier.
have to detect the query string so that it isn't URL-encoded with the rest of the identifier.
Line 631: Line 635:
local access = options.access;
local access = options.access;
local handler = options.handler;
local handler = options.handler;
local query_params = { -- list of known query parameters from http://www.handle.net/proxy_servlet.html
local query_params = { -- list of known query parameters from https://www.handle.net/proxy_servlet.html
'noredirect',
'noredirect',
'ignore_aliases',
'ignore_aliases',
Line 800: Line 804:


Determines whether an ISMN string is valid.  Similar to ISBN-13, ISMN is 13 digits beginning 979-0-... and uses the
Determines whether an ISMN string is valid.  Similar to ISBN-13, ISMN is 13 digits beginning 979-0-... and uses the
same check digit calculations.  See http://www.ismn-international.org/download/Web_ISMN_Users_Manual_2008-6.pdf
same check digit calculations.  See https://www.ismn-international.org/download/Web_ISMN_Users_Manual_2008-6.pdf
section 2, pages 9–12.
section 2, pages 9–12.


Line 849: Line 853:
like this:
like this:


|issn=0819 4327 gives: [http://www.worldcat.org/issn/0819 4327 0819 4327] -- can't have spaces in an external link
|issn=0819 4327 gives: [https://www.worldcat.org/issn/0819 4327 0819 4327] -- can't have spaces in an external link
This code now prevents that by inserting a hyphen at the ISSN midpoint.  It also validates the ISSN for length
This code now prevents that by inserting a hyphen at the ISSN midpoint.  It also validates the ISSN for length
Line 953: Line 957:
Format LCCN link and do simple error checking.  LCCN is a character string 8-12 characters long. The length of
Format LCCN link and do simple error checking.  LCCN is a character string 8-12 characters long. The length of
the LCCN dictates the character type of the first 1-3 characters; the rightmost eight are always digits.
the LCCN dictates the character type of the first 1-3 characters; the rightmost eight are always digits.
http://info-uri.info/registry/OAIHandler?verb=GetRecord&metadataPrefix=reg&identifier=info:lccn/
https://oclc-research.github.io/infoURI-Frozen/info-uri.info/info:lccn/reg.html


length = 8 then all digits
length = 8 then all digits
Anonymous user