[R] Parsing XML?
Spencer Graves
@pencer@gr@ve@ @end|ng |rom e||ect|vede|en@e@org
Wed Jul 27 22:50:55 CEST 2022
Hello, All:
What would you suggest I do to parse the following XML file into a
list that I can understand:
XMLfile <-
"https://chroniclingamerica.loc.gov/data/bib/worldcat_titles/bulk5/ndnp_Alabama_all-yrs_e_0001_0050.xml"
This is the first of 6666 XML files containing "U.S. Newspaper
Directory" maintained by the US Library of Congress discussed in the
thread below. I've tried various things using the XML and xml2.
XMLdata <- xml2::read_xml(XMLfile)
str(XMLdata)
XMLdat <- XML::xmlParse(XMLdata)
str(XMLdat)
XMLtxt <- xml2::xml_text(XMLdata)
nchar(XMLtxt)
#[1] 29415
Someplace there's a schema for this. I don't know if it's embedded
in this XML file or in a separate file. If it's in a separate file, how
could I describe it to my contacts with the Library of Congress so they
would understand what I needed and could help me get it.
Thanks,
Spencer Graves
p.s. All 29415 characters in XMLtext appear in the thread below.
-------- Forwarded Message --------
Subject: [Newspapers and Current Periodicals] How can I get counts of
the numbers of newspapers by year in the US, and preferably also
elsewhere? A search of "U.S. Newspaper Directory,
Date: Wed, 27 Jul 2022 14:59:03 +0000
From: Kerry Huller <serials using ask.loc.gov>
To: Spencer Graves <spencer.graves using effectivedefense.org>
CC: twes using loc.gov
--# Type your reply above this line #--
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jul 27 2022, 10:59am via System
Hello Spencer,
So, when I view the xml, I'm actually looking at it in XML editor
software, so I can view the tags and it's structured neatly. I've copied
and pasted the text from the beginning of the file and the first
newspaper title below from my XML editor:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type='text/xsl'
href='/webservices/catalog/xsl/searchRetrieveResponse.xsl'?>
<searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/"
xmlns:oclcterms="http://purl.org/oclc/terms/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:diag="http://www.loc.gov/zing/srw/diagnostic/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<version>1.1</version>
<numberOfRecords>2250</numberOfRecords>
<records>
<record>
<recordSchema>info:srw/schema/1/marcxml</recordSchema>
<recordPacking>xml</recordPacking>
<recordData>
<record xmlns="http://www.loc.gov/MARC21/slim">
<leader>00000nas a22000007i 4500</leader>
<controlfield tag="001">1030438981</controlfield>
<controlfield tag="008">180404c20159999aluwr n 0 a0eng
</controlfield>
<datafield ind1=" " ind2=" " tag="010">
<subfield code="a"> 2018200464</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="040">
<subfield code="a">DLC</subfield>
<subfield code="e">rda</subfield>
<subfield code="c">DLC</subfield>
<subfield code="b">eng</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="012">
<subfield code="m">1</subfield>
</datafield>
<datafield ind1="0" ind2=" " tag="022">
<subfield code="a">2577-5316</subfield>
<subfield code="2">1</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="032">
<subfield code="a">021110</subfield>
<subfield code="b">USPS</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="037">
<subfield code="b">711 Alabama Avenue, Selma, AL 36701</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="042">
<subfield code="a">nsdp</subfield>
<subfield code="a">pcc</subfield>
</datafield>
<datafield ind1="1" ind2="0" tag="050">
<subfield code="a">ISSN RECORD</subfield>
</datafield>
<datafield ind1="1" ind2="0" tag="082">
<subfield code="a">071</subfield>
<subfield code="2">15</subfield>
</datafield>
<datafield ind1=" " ind2="0" tag="222">
<subfield code="a">Selma sun</subfield>
</datafield>
<datafield ind1="0" ind2="0" tag="245">
<subfield code="a">Selma sun.</subfield>
</datafield>
<datafield ind1=" " ind2="1" tag="264">
<subfield code="a">Selma, AL :</subfield>
<subfield code="b">North Shore Press, LLC</subfield>
<subfield code="c">2016-</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="310">
<subfield code="a">Weekly</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="336">
<subfield code="a">text</subfield>
<subfield code="b">txt</subfield>
<subfield code="2">rdacontent</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="337">
<subfield code="a">unmediated</subfield>
<subfield code="b">n</subfield>
<subfield code="2">rdamedia</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="338">
<subfield code="a">volume</subfield>
<subfield code="b">nc</subfield>
<subfield code="2">rdacarrier</subfield>
</datafield>
<datafield ind1="1" ind2=" " tag="362">
<subfield code="a">Began in 2015.</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="588">
<subfield code="a">Description based on: Volume 2, Issue 40
(October 5, 2017) (surrogate); title from caption.</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="588">
<subfield code="a">Latest issue consulted: Volume 2, Issue 40
(October 5, 2017).</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="752">
<subfield code="a">United States</subfield>
<subfield code="b">Alabama</subfield>
<subfield code="c">Dallas</subfield>
<subfield code="d">Selma.</subfield>
</datafield>
</record>
</recordData>
</record>
When I view the records in the XML editor, these 2 lines below do begin
each of the records for each individual title, but of course this is
including the xml tags:
<recordSchema>info:srw/schema/1/marcxml</recordSchema>
<recordPacking>xml</recordPacking>
Hopefully this helps you decide where to break or parse each record.
On another note, I just noticed as well that at the top of this first
file it lists the total number of records for the Alabama grouping -
2250. This also appeared to be the case for the Alaska records when I
took a look at the first one for that state. I imagine that should be
consistent throughout each "grouping" of records.
Let me know if you have follow-up questions!
Best wishes,
Kerry Huller
Newspaper & Current Periodical Reading Room
Serial & Government Publications Division
Library of Congress
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jul 27 2022, 10:21am via Email
Hi, Kerry:
Thanks. I understand the chunking in files of at most 50. I've read
the first file "ndnp_Alabama_all-yrs_e_0001_0050.xml" into a string of
29415 characters, copied below. Might you have any suggestions on the
next step in parsing this? Staring at it now, it looks splitting on
"info:srw/schema/1/marcxmlxml" might convert the 29415 characters into
shorter chunks, each of which could then be parsed further.
This is not as bad as reading ancient Egyptian heiroglyphics without
the Rosetta Stone, but I wondered if you might have something that could
make this work easier and more reliable? I guess I could compare with
what I already read as JSON ;-)
Thanks,
Spencer Graves
"1.12250info:srw/schema/1/marcxmlxml00000nas a22000007i
45001030438981180404c20159999aluwr n 0 a0eng
2018200464DLCrdaDLCeng12577-53161021110USPS711 Alabama Avenue, Selma, AL
36701nsdppccISSN RECORD07115Selma sunSelma sun.Selma, AL :North Shore
Press,
LLC2016-WeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierBegan in
2015.Description based on: Volume 2, Issue 40 (October 5, 2017)
(surrogate); title from caption.Latest issue consulted: Volume 2, Issue
40 (October 5, 2017).United
StatesAlabamaDallasSelma.info:srw/schema/1/marcxmlxml00000cas a22000007a
4500502150053100127c20109999aluwr n 0 a0eng
2010200019DLCengDLCDLCOCLCQ112153-18111750USPSB & C Publishing, LLC,
3514 Martin St. S. Ste 104, Cropwell, AL 35054pccnsdpISSN RECORDSt.
Clair County news (Cropwell, Ala.)St. Clair County news(Cropwell,
Ala.)St. Clair County news.Cropwell, AL :B & C Pub.WeeklyBegan in
2010.Description based on: Nov. 4, 2010 (surrogate); title from
caption.info:srw/schema/1/marcxmlxml00000cas a22000007a
4500426491872090720c20099999alumr n 0 a0eng
2009203372DLCengDLCOCLCQ12150-346X2150-346X1AU using 000044489617NZ116076352Devon
Applewhite/Applewhite Publishing Co., 1910 Honeysuckle Rd., #N183,
Dothan, AL 36305mscnsdpISSN RECORD30514Triangle tribune(Dothan,
Ala.)Triangle tribune.Dothan, AL :Applewhite Pub. CoMonthlyBegan with
vol. 1, issue 1 (May 2009).\"Connecting the Tri-State African -American
Community.\"Description based on: Vol. 1, issue 1 (May 2009); title from
masthead.Applewhite, Devon.United StatesAlabama.United
StatesGeorgia.United StatesFlorida.info:srw/schema/1/marcxmlxml00000cas
a22000007a 4500289017315081219c20089999aluwr n | a0eng c
2008213218NSDengNSDOCLCQDLCOCLCQ111945-93191945-93191005270USPSSpringhill Publications,
LLC, P.O. Box 186, Greenville, AL 36037nsdppccISSN RECORD07014Greenville
standardThe Greenville standard.Greenville, AL :Springhill
PublicationsWeeklytexttxtrdacontentunmediatednrdamediaBegan with vol. 1,
issue 1 (Sept. 3, 2008)Description based on surrogate of: Vol. 1, no. 15
(Dec. 18, 2008); title from masthead (publisher's Web site, viewed Dec.
19, 2008).Latest issue consulted: Vol. 1, no. 99 (July 27, 2011)
(surrogate).info:srw/schema/1/marcxmlxml00000cas a22000007a
4500123539969070426c20079999aluwr ne 0 a0eng c
2007212138NSDengNSDNSDOCLCQ101936-95571936-95571The Western Tribune,
1530 Third Ave. N., Bessemer, AL 35020mscnsdpISSN RECORDWestern tribune
(Bessemer, Ala.)The Western tribune(Bessemer, Ala.)The Western
tribune.Bessemer, Ala. :D-Med, Inc.v.WeeklyBegan in 2007.Description
based on: May 23, 2007 (surrogate); title from
caption.AU using 000041575341info:srw/schema/1/marcxmlxml00000cas a22000007a
4500226300653080425c20079999aluwr ne | a0eng
2008212112NSDengNSDNSDOCLCQ11942-20751942-20751nsdppccISSN RECORDThe
corridor messengerThe corridor messenger.Carbon Hill, AL :Corridor
Messenger, Inc.WeeklyBegan with vol. 1, issue (10.03.2007).Description
based on: 1st issue.United StatesAlabamaWalkerCarbon
Hill.http://www.corridormessenger.cominfo:srw/schema/1/marcxmlxml00000cas a22000007a
450077560432070109c20069999aluwr ne 0 a0eng c
2007213400NSDengNSDOCLCQAUBRNOCLCOOCLCFa01935-37901935-37901AU using 000041190283The
Auburn Villager, P.O. Box 1633, Auburn, AL 36831-1633pccnsdpISSN
RECORDThe Auburn villagerThe Auburn villager.Auburn, AL :Auburn
Villagerv.WeeklyBegan in 2006.Description based on: Vol. 1, no. 4 (July
20, 2006) (surrogate); title from caption.Auburn (Ala.)Newspapers.Lee
County (Ala.)Newspapers.AlabamaAuburn.fast(OCoLC)fst01209634AlabamaLee
County.fast(OCoLC)fst01211930Newspapers.fast(OCoLC)fst01423814United
StatesAlabamaLeeAuburn.info:srw/schema/1/marcxmlxml00000cas a2200000Ii
4500872286785m o d s cr mn|---a||||140311c20069999alucr n o b
s0 a0eng cABCengrdaABCABCOCLCFLD59.13University of Alabama at
Birmingham.The eReporter.[Birmingham, Alabama] :The University of
Alabama at Birmingham,[2006]-[Birmingham, Alabama] :Offices of Public
Relations & Marketing and Information Technology1 online resource2
issues weeklytexttxtrdacontentcomputercrdamediaonline
resourcecrrdacarrierSeptember 19, 2006-\"The eReporter is an official
communication of The University of Alabama at Birmingham, companion to
the UAB Reporter and recommended alternative to mass e-mails.\"Issues
for <March 11, 2014- published and distributed via e-mail subscription
on Tuesdays and Fridays.Description based on: September 19, 2006; title
from title screen (viewed March 12, 2014).University of Alabama at
BirminghamPeriodicals.Periodicals.fast(OCoLC)fst01411641University of
Alabama at Birmingham.fast(OCoLC)fst00645114University of Alabama at
Birmingham.Office of Public Relations and Marketing.University of
Alabama at Birmingham.Information Technology.2006-2012, companion
to:University of Alabama at Birmingham.UAB
reporter.(OCoLC)32435748Archived
issueshttp://hatteras.dpo.uab.edu/cgi-bin/ereporter.cgiinfo:srw/schema/1/marcxmlxml00000cas
a22000007a 4500166387050070829c20059999aluwr ne | a0eng c
2007215501NSDengNSDOCLCQ11939-68991939-68991The Wilkie Clark Memorial
Foundation, P.O. Box 514, Roanoke, AL 36274$30.00nsdpmscISSN
RECORD305.89614People's voice (Roanoke, Ala.)The people's voice(Roanoke,
Ala.)The people's voice.Roanoke, AL :Wilkie Clark Memorial
Foundationv.WeeklyBegan with vol. 1, no. 1 in 2005.Description based on:
Vol. 2, no. 20 (Apr. 20, 2007); title from caption.Wilkie Clark Memorial
Foundation.United
StatesAlabamaRandolphRoanoke.AU using 000042141390info:srw/schema/1/marcxmlxml00000nas
a22000007i 45001124677787191021c20uu9999aluwr ne | a0eng
2019202521DLCengrdaDLC12689-3258122730USPSNorth Jackson Press, 42950 Hwy
72, Suite 406, Stevenson, AL 35772nsdppccISSN RECORD071.323North Jackson
pressNorth Jackson press.Stevenson, AL :Caney Creek Publications
LLCWeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierDescription
based on surrogate of: Volume 1, number 36 (October 11, 2019); title
from masthead.Latest issue consulted: Volume 1, number 36 (October 11,
2019) (Surrogate).United
StatesAlabamaJacksonStevensoninfo:srw/schema/1/marcxmlxml00000cas
a2200000 a 4500226315099080428d19981998aluwr ne | 0eng c
2008233691GUAengGUAOCLCQOCLCFOCLCO39911644pccn-us-gaThe Dekalb
news.Birmingham, Ala. :Community newspaper holdings Inc.v.WeeklyBegan
with 1st year, no. 1 (Apr. 1, 1998); ceased with 1st year, no. 31 (Oct.
28, 1998).Final issue consulted.Description based on first issue; title
from caption.Decatur (Ga.)Newspapers.DeKalb County
(Ga.)Newspapers.Newspapers.fast(OCoLC)fst01423814GeorgiaDecatur.fast(OCoLC)fst01226234GeorgiaDeKalb
County.fast(OCoLC)fst01215288United
StatesGeorgiaDeKalbDecatur.Decatur-DeKalb news/era(DLC)sn
89053661(OCoLC)19946163info:srw/schema/1/marcxmlxml00000cas a2200000 i
450050263311m o d cr cn|||||||||020730c19979999alu x neo
0 a0eng c
2015238492AMHengrdapnAMHOCLCQOCLCFOCLCOIULOCLHTMOCLCQCOODLC66460694810970435082687-93791AU using 000050711528OCLCS45109pccnsdpn-us---AP2.B5707023Birmingham
weekly (Online)Birmingham weekly(Online)Birmingham weekly.Birmingham, AL
:Birmingham Weekly1 online resourceIrregular,Feb. 16-28,
2012-Weekly,Sept. 4-11, 1997-Feb. 9-16,
2012texttxtrdacontentcomputercrdamediaonline resourcecrrdacarrierBegan
with vol. 1, issue 1 (Sept. 4-11, 1997).\"City news, views &
entertainment\"--Cover.Numbering dropped in Mar. 2012.Also issued in
print.Description based on: Publication information from ProQuest; title
from web page (viewed June 18, 2015).Latest issue consulted: Aug. 15-20,
2012.Birmingham (Ala.)Newspapers.Internet resources.Electronic
journals.AlabamaBirmingham.fast(OCoLC)fst01204958Newspapers.fast(OCoLC)fst01423814United
StatesAlabamaBirmingham.Print version:Birmingham
Weekly(OCoLC)39271050http://apw.softlineweb.com/http://WC2VB5MT8E.search.serialssolutions.com/?sid=sersol&SS_jc=JC_000051895&title=Birmingham+Weeklyinfo:srw/schema/1/marcxmlxml00000cas
a22000007a 450031471314941116d19941995aluwr ne 0 a0eng csn
94003083
NSDengNSDANEOCLCQOCLCFOCLCOOCLCQ11079-65411079-65411nsdppccn-us-akSoutheast
shopperSoutheast shopper.Juneau, Alaska :Kemper
Communications,1994-volumesWeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierVol.
1, no. 1 (Nov. 16, 1994)-Ceased in Feb. 1995.Juneau
(Alaska)Newspapers.AlaskaJuneau.fast(OCoLC)fst01213587Newspapers.fast(OCoLC)fst01423814United
StatesAlaskaJuneau.AU using 000011356572info:srw/schema/1/marcxmlxml00000cas
a22000008a 450027910515930413c19949999alumr n 0 a0eng dsn
93002581 NSDengNSDOCLCQ11069-06621Birmingham Tribune, 216 Ave. T. Pratt
City, Birmingham, AL 35214nsdpBirmingham tribuneBirmingham
tribune.Birmingham, Ala. :Kervin
Fondren9501volumesMonthlytexttxtrdacontentunmediatednrdamediavolumencrdacarrierPREPUB:
publication expected Jan.
1995AU using 000025863987info:srw/schema/1/marcxmlxml00000cas a22000007a
450026199931920716d19922013alumr ne 0 a0eng csn 92003357
NSDengNSDOCLOCLCQDLC011064-01341064-01341Black & White, POB 13215,
Birmingham, AL 35202-3215nsdppccBlack & white (Birmingham, Ala.)Black &
white(Birmingham, Ala.)Black & white.Black and whiteBirmingham, Ala.
:Black & White, Inc.v.Biweekly,Oct. 2, 1997-Monthly,May 1, 1992-Sept.
1997Began in May 1992; ceased with Jan. 10, 2013.\"Birmingham's New City
paper.\"Description based on: June 1992.Latest issue consulted: No. 67
(Oct. 16, 1997) (surrogate).info:srw/schema/1/marcxmlxml00000cas
a2200000 a 450032145723950314d19901999alumr ne 0 a0eng csn
95068755
MGNengMGNNSDCLUOCLCQOCLCFOCLCOOCLCA971211082-34841082-34841AU using 000011579542nsdppccn-us-alF335.J5S68The
Southern shofarThe Southern shofar.Birmingham, AL :L. Brook,-[1999]v.
:ill. ;35 cm.MonthlyBegan in 1990.-v. 9, issue 9 (Aug./Sept. 1999).\"The
monthly newspaper of Alabama's Jewish community.\"Some issues also
available on the Internet via the World Wide Web.Description based on:
Vol. 3, issue 11 (Oct. 1993).Jewish newspapersAlabama.Jewish
newspapers.fast(OCoLC)fst00982872Alabama.fast(OCoLC)fst01204694United
StatesAlabamaJeffersonBirmingham.Deep South Jewish voice(DLC)sn
99018499(OCoLC)42431704CLUhttp://bibpurl.oclc.org/web/719http://www.bham.net/shofar/info:srw/schema/1/marcxmlxml00000cas
a22000007a 450021265141900326c19909999aluwr ne 0 a0eng csn
90099004 AARengAARCPNNSDOCLCQ11050-08981050-08981005022USPSE.O.N., Inc.,
Main St., Eclectic, AL 36024pccnsdpISSN RECORDThe Eclectic observerThe
Eclectic observer.Eclectic, Ala. :E.O.N., Inc.,1990-v.WeeklyVol. 1, no.
1 (Feb. 22, 1990)-Published by: Price Publications, Inc., <2006->Latest
issue consulted: Vol. 17, no. 1 (Jan. 5, 2006).United
StatesAlabamaElmoreEclectic.AU using 000040212446info:srw/schema/1/marcxmlxml00000cas
a22000007a 450021214781900314c19909999aluir ne 0 a0eng csn
90002457 AAAengAAANSDOCLCQ111050-20841050-20841931180USPSClanton
Newspapers, 1109 Seventh St., N., PO Box 1379, Clanton, AL
35045nsdppccn-us-alThe Clanton advertiserThe Clanton
advertiser.AdvertiserClanton, Ala. :Clanton Newspapersv. :ill. ;58
cm.Three no. a week,<May 13, 1992->Semiweekly,<Apr. 4, 1990->Began in
Jan. 1990.Description based on: Vol. 19, no. 27 (Wed., Apr. 4,
1990).Latest issue consulted: Vol. 22, no. 58 (May 13, 1992).United
StatesAlabamaChiltonClanton.Independent advertiser (Clanton,
Ala.)(OCoLC)21214732AU using 000025908452info:srw/schema/1/marcxmlxml00000cas
a2200000 a 450021214814900314c19909999aluwr ne 0 a0eng dsn
90099009 AAAengAAACPNNSDOCLCQ11056-32881056-32881505740USPSThe Blount
Countian, 3rd St. at Washington Ave., PO Box 310, Oneonta, AL
35121mscnsdpn-us-alThe Blount countianThe Blount countian.Oneonta, Ala.
:Southern Democrat, Inc.,1990-v. :ill.WeeklyVol. 1, no. 1 (Jan. 3,
1990)-Editor: Molly Howard Ryan, 1990-Latest issue consulted: Vol. 1,
no. 36 (Sept. 5, 1990).Ryan, Molly Howard.United
StatesAlabamaBlountOneonta.Southern Democrat(DLC)sn
85044741(OCoLC)12038577AU using 000025884049info:srw/schema/1/marcxmlxml00000cas
a22000007a 450022413044900920c19909999aluwr ne 0 a0eng dsn
90099011
AARengAARCPNNSDNSTOCLCQ92081707011191053-91231053-91231314240USPSmscnsdpThe
Clay times-journalThe Clay times-journal.Lineville, Ala. :C.L.
Proctor,1990-v.WeeklyVol. 1, no. 1 (Sept. 6, 1990)-United
StatesAlabamaClayLineville.Ashland progress(DLC)sn 85044701Lineville
tribune(DLC)sn 85044702AUinfo:srw/schema/1/marcxmlxml00000cas a22000007a
450021265218900326c19909999aluwr ne 0 0eng dsn 90099005
AARengAARCPNOCLCQmscTrussville news-journal.Trussville, Ala. :Mike
Mitchell,1990-v.BimonthlyVol. 1, no. 1 (Feb. 20, 1990)-United
StatesAlabamaJeffersonTrussville.info:srw/schema/1/marcxmlxml00000cas
a22000007a 450022301035900831c19909999aluwr ne 0 0eng dsn
90099010 AARengAARCPNOCLCQmscWeaver tribune.Oxford, Ala. :Cheaha
Pub.,1990-v.WeeklyVol. 1, no. 1 (July 19, 1990)-United
StatesAlabamaCalhounWeaver.United
StatesAlabamaCalhounOxford.info:srw/schema/1/marcxmlxml00000cas
a22000007a 450015155895870205c19879999aludr ne 0 a0eng csn
87050045
AAAengAAACPNNSDDLCCPNNSDDLCCPNDLCOCLDLCOCLCQOCLCFOCLCQ19261126829944596670892-44570892-44571AU using 000020456714360980USPSThe
Advertiser, P.O. Box 1000, Montgomery, AL
36192pccnsdpn-us-alNewspaperMontgomery advertiser (Montgomery, Ala. :
1987)The Montgomery advertiser(1987)The Montgomery advertiser.Montgomery
advertiser & the Alabama journalSunday Montgomery advertiserMontgomery,
Ala. :Advertiser Co.,1987-volumes
:illustrationsDailytexttxtrdacontentunmediatednrdamediavolumencrdacarrier160th
year, no. 1 (Jan. 2, 1987)-On Saturdays, Sundays and holidays a combined
edition is published with the Alabama journal, and called: Montgomery
advertiser and the Alabama journal, Jan. 3, 1987, and: Alabama journal
and Montgomery advertiser, Jan. 4, 1987-Feb. 25, 1990.Issues for Sunday
called: Sunday Montgomery advertiser, Mar. 4, 1990-Issues for Saturday,
Sunday and holidays have their own numbering, Jan. 3, 1987-Feb. 25,
1990.Montgomery
(Ala.)Newspapers.AlabamaMontgomery.fast(OCoLC)fst01202689Newspapers.fast(OCoLC)fst01423814United
StatesAlabamaMontgomeryMontgomery.Advertiser (Montgomery,
Ala.)0745-3221(DLC)sn 82008412(OCoLC)9049482Alabama journal (Montgomery,
Ala. : 1940)0745-323X(DLC)sn
87062018(OCoLC)2666111info:srw/schema/1/marcxmlxml00000cas a2200000 a
450016942287871105c19879999aludn ne 0 a0eng dsn 88050149
AAAengAAACPNNSDOCLCQy1044-00701044-0070746--32780746-32781565580USPSTroy
Publications, Inc., 113 North Market St., Troy, AL 36081mscnsdpMessenger
(Troy, Ala.)The Messenger(Troy, Ala.)The Messenger.Troy, Ala. :Troy
Pub.,1987-v.Daily (Sunday, Tuesday, Thursday and Friday)Vol. 121, no.
166 (July 1, 1987)-Sunday, Apr. 2, 1989 misprinted as v. 113.Latest
issue consulted: Vol. 113 [sic 123], no. 96 (Sunday, Apr. 2,
1989).United StatesAlabamaPikeTroy.Troy messenger0746-3278(DLC)sn
83009935(OCoLC)9921908info:srw/schema/1/marcxmlxml00000cas a22000007a
450017799786880415c19879999aluir ne 0 a0eng dsn 88050086
AARengAARCPNNSDOCLCQ1p1044-03801044-03800745-75961441520USPSThe
Prattville Progress, 152 W. 3rd St., Prattville, AL
36067mscnsdpPrattville progress (Prattville, Ala. : 1987)The Prattville
progress(Prattville, Ala.)The Prattville progress.Prattville, Ala.
:James C. Seymour,1987-v.Three times a weekVol. 102, no. 8 (Jan. 20,
1987)-Latest issue consulted: Vol. 105, no. 153 (Wednesday, Dec. 26,
1990).United StatesAlabamaAutaugaPrattville.Progress (Prattville,
Ala.)0745-7596(DLC)sn
83007623(OCoLC)9428489info:srw/schema/1/marcxmlxml00000cas a22000007a
450015344667870319c19869999aluwr ne 0 a0eng dsn 87000284
NSDengNSDCPNOCLCQy0893-07670893-07671431800USPSPickens County Herald,
P.O. Drawer E, Carrollton, AL 35447nsdpPickens County heraldPickens
County herald.Pickens County herald and west AlabamianCarrollton, Ala.
:Pickens Newspapers, Inc.,1986-WeeklyVol. 138, no. 40 (Oct. 2,
1986)-United StatesAlabamaPickensCarrollton.Pickens County herald and
west Alabamian0746-0473(DLC)sn
83008141AU using 000040635809info:srw/schema/1/marcxmlxml00000cas a22000007a
450018917586881217c19869999aluwr ne 0 0eng dsn 88050225
CPNengCPNOCLCQmscThe Oxford sun/times.Oxford, Ala.
:[s.n.],1986-v.WeeklyVol. 1, no. 1 (Jan. 16, 1986)-Editor: Andy
Goggans.Numbering is irregular.United StatesAlabamaCalhounOxford.Oxford
sun (Oxford, Ala.)(DLC)sn
85045023AU using 000025803813info:srw/schema/1/marcxmlxml00000cas a22000007a
450013991168860731c19869999aluwr ne 0 0eng dsn 86050322
CPNengCPNOCLCQmscIndependent (Brewton, Ala.)The Independent.Brewton,
Ala. :Jim Thornton,1986-v. :ill. ;58 cm.WeeklyVol. 1, no. 1 (June 19,
1986)-United
StatesAlabamaEscambiaBrewton.info:srw/schema/1/marcxmlxml00000cas
a22000007a 450018957493881231c19859999aluwr ne 0 0eng dsn
88050247 CPNengCPNOCLCQmscPiedmont journal-independent (Piedmont,
Ala.)The Piedmont journal-independent.Journal independentPiedmont, Ala.
:Lane Weatherbee,1985-v.WeeklyVol. 4, no. 52 (Dec. 24, 1985)-Sometimes
published as: Journal independent.United
StatesAlabamaCalhounPiedmont.Journal-independent(DLC)sn
85045014info:srw/schema/1/marcxmlxml00000cas a22000007a
450012715821851024d19841985aluwr ne 0 a0eng dsn 85045014
CPNengCPNNSDCPNOCLCQmscThe Journal-independent.Piedmont, Ala.
:Journal-Independent, Inc.,1984-1985.volumes :illustrations ;58
cmWeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierVol. 3,
no. 27 (July 3, 1984)- v. 4, no. 51 (Dec. 18, 1985).Carries the same
vol. numbering as the Piedmont journal-independent.United
StatesAlabamaCalhounPiedmont.Piedmont
journal-independent0890-6017(DLC)sn 85045013Piedmont journal-independent
(Piedmont, Ala.)(DLC)sn 88050247info:srw/schema/1/marcxmlxml00000cas
a22000007a 450012691448851018c19839999aludr ne 0 0eng dsn
85045007 CPNengCPNOCLCQmscTimesDaily.Times dailyFlorence, Ala. :T.S.P.
Newspapers, Inc.,1983-volumes :illustrations ;58
cmDailytexttxtrdacontentunmediatednrdamediavolumencrdacarrierVol. 114,
no. 226 (Aug. 14, 1983)-United StatesAlabamaLauderdaleFlorence.Florence
times + tri-cities daily(DLC)sn
85044995info:srw/schema/1/marcxmlxml00000cas a22000007a
45009428489830420d19831987aluir ne 0 a0eng dsn 83007623
NSDengNSDCPNNSDNSTOCLCQ89090d0745-75960745-75961The Progress, 152 W. 3rd
St., Prattville, AL 36067nsdpmscProgress (Prattville, Ala.)The
Progress(Prattville, Ala.)The Progress.Prattville, Ala. :The Prattville
Progress,1983-1987.volumes :illustrations ;58 cmThree times a
weektexttxtrdacontentunmediatednrdamediavolumencrdacarrierVol. 98, no.
32 (Mar. 17, 1983)-v. 102, no. 7 (Jan. 17, 1987).United
StatesAlabamaAutaugaPrattville.Prattville progress(DLC)sn
85044740Prattville progress (Prattville, Ala.)1044-0380(DLC)sn
88050086(OCoLC)12254317AAPinfo:srw/schema/1/marcxmlxml00000cas a2200000
a 45009867255830831c19839999aludr ne 0 a0eng dsn 84008052
AAAengAAANSDOCLOCLCQX0743-15110743-15111617760USPST.S.P. Newspapers,
Inc., 219 W. Tennessee St., Florence, AL 35630nsdpTimesDaily (Shoals
edition)TimesDaily(Shoals ed.)TimesDaily.Times dailyShoals ed.Florence,
Ala. :T.S.P. Newspapersvolumes
:illustrationsDailytexttxtrdacontentunmediatednrdamediavolumencrdacarrierBegan
with: Vol. 114, no. 226 (Aug. 14,
1983).\"Florence/Sheffield/Tuscumbia/Muscle Shoals.\"Shoals ed. and
Regional ed. combined on Sundays.Description based on: Vol. 114, no. 346
(Monday, Dec. 12, 1983).United
StatesAlabamaLauderdaleFlorence.TimesDaily (Regional
edition)0743-152XTimes Tri-cities dailyUnknownDec. 12,
1983info:srw/schema/1/marcxmlxml00000cas a22000007a
450010536023840319c19839999aludr ne 0 a0eng dsn 84008051
NSDengNSDOCLCQ1x0743-152X0743-152X1617760USPST.S.P. Newspapers, Inc.,
219 W. Tennessee St., Florence, AL 35630nsdpTimesDaily (Regional
edition)TimesDaily(Regional ed.)TimesDaily.Times dailyRegional
ed.Florence, Ala. :T.S.P.
NewspapersDailytexttxtrdacontentunmediatednrdamediaBegan with: Vol. 114,
no. 226 (Aug. 14, 1983).Shoals ed. and Regional ed. combined on
Sundays.Description based on: Vol. 114, no. 346 (Monday, Dec. 12,
1983).United StatesAlabamaLauderdaleFlorence.TimesDaily (Shoals
edition)0743-1511Times Tri-cities dailyDec. 12,
1983AU using 000025818125info:srw/schema/1/marcxmlxml00000cas a22000007a
45009049482821213d19821987aludn ne 0 a0eng csn 82008412
AAAengAAANSDNPWCPNDLCCPNNSDDLCNSDDLCCPNNVFDLCOCLCQCRLOCLCFOCLCQ1d0745-32210745-32211nsdppccn-us-alNewspaperAdvertiser
(Montgomery, Ala.)The Advertiser(Montgomery, Ala.)The advertiser.Alabama
journal and advertiserMontgomery, Ala. :Advertiser Co.,1982-1987.volumes
:illustrationsDailytexttxtrdacontentunmediatednrdamediavolumencrdacarrier155th
year, no. 232 (Nov. 22, 1982)- ; -v. 14-3, Jan. 1, 1987.On Saturdays,
Sundays and holidays published as: The Alabama journal and advertiser,
Nov. 27, 1982-Jan. 1, 1987.Saturday, Sunday and holiday issues have
their own numbering.Montgomery
(Ala.)Newspapers.AlabamaMontgomery.fast(OCoLC)fst01202689Newspapers.fast(OCoLC)fst01423814United
StatesAlabamaMontgomeryMontgomery.Montgomery advertiser (Montgomery,
Ala. : Daily)(DLC)sn 84020645(OCoLC)2685433Montgomery advertiser
(Montgomery, Ala. : 1987)0892-4457(DLC)sn
87050045(OCoLC)15155895AU using 000020281746info:srw/schema/1/marcxmlxml00000cas
a2200000 a 45009237931830218c19829999aluwr ne 0 0eng dsn
86050139 AAAengAAACPNOCLOCLCQmscThe Randolph leader.Roanoke, Ala. :David
S. Stevenson,1982-volumes :illustrations ;58
cmWeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierVol. 91,
no. 1 (Oct. 6, 1982)-United StatesAlabamaRandolphRoanoke.Roanoke
leader(DLC)sn 86050137Randolph press(DLC)sn
86050138info:srw/schema/1/marcxmlxml00000cas a22000007a
450012715815851024d19821984aluwr ne 0 a0eng dsn 85045013
CPNengCPNNSDCPNOCLCQ110890-60170890-60171432080USPSThe Piedmont
Journal-Independent, 115 N. Center Ave., Piedmont, AL 36272mscnsdpThe
Piedmont journal-independentThe Piedmont journal-independent.Piedmont,
Ala. :Piedmont Journal-Independent, Inc.,1982-1984.volumes
:illustrations ;58
cmWeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierVol. 1,
no. 1 (Mar. 31, 1982)-v. 3, no. 26 (June 27, 1984).Latest issue
consulted: Vol. 5, no. 31 (August 20, 1986).United
StatesAlabamaCalhounPiedmont.Piedmont journal(DLC)sn
85045012Journal-independent(DLC)sn
85045014(OCoLC)12715821AU using 000045312916info:srw/schema/1/marcxmlxml00000cas
a22000007a 45009183905830202c19829999aluwr n 0 a0eng dsn
85044580 AAAengAAACPNNSDOCLOCLCQ11098-58671098-58671016409USPSNo. 4,
Rucker Plaza, Enterprise, AL 36331P.O. Box 1536, Enterprise, AL
36331mscnsdpSoutheast sun (Enterprise, Ala.)The southeast
sun(Enterprise, Ala.)The Southeast sun.Enterprise, Ala. :QST
Publicationsvolumes :illustrations ;58
cmWeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierBegan in
1982.Description based on: Vol. 1, no. 25 (Oct. 21, 1982).Latest issue
consulted: Vol. 16, no. 43 (Mar. 4, 1998).United
StatesAlabamaCoffeeEnterprise.AU using 000025827687info:srw/schema/1/marcxmlxml00000cas
a22000007a 450010487314840305c19819999aluwr ne 0 a0eng dsn
85044906
AAAengAAACPNNSDNSTCPNOCLOCLCQOCLCFOCLCOOCLCAOCLCQ900410885-16620885-16621749310USPSThe
New Times, 1618 1/2 St. Stephens Rd., Mobile, AL 36603mscnsdpn-us-alNew
times (Mobile, Ala.)The New times(Mobile, Ala.)The new times.Mobile,
Ala. :New Times Groupvolumes
:illustrationsWeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierBegan
in 1981.Vol. 3, no. 49 (Dec. 15-21, 1983) and vol. 3, no. 50 (Dec.
22-28, 1983) are both called vol. 3, no. 49 (Dec. 15-21,
1983).Description based on: Vol. 2, no. 3 (Jan. 28-Feb. 3, 1982).African
AmericansAlabamaNewspapers.African
Americans.fast(OCoLC)fst00799558Alabama.fast(OCoLC)fst01204694Newspapers.fast(OCoLC)fst01423814United
StatesAlabamaMobileMobile.AAPUnknownAug. 15,
1985AU using 000024686659info:srw/schema/1/marcxmlxml00000cas a22000007a
450018922463881219d19811983alucr ne 0 0eng dsn 88050233
AARengAARCPNNSDOCLCQmscThe Sylacauga daily advance.Advance/Sylacauga
dailySylacauga advanceSunday advanceAdvanceSylacauga, Ala. :Mrs. W.A.
Moody,1981-1893.v.Semiweekly,<Nov. 24, 1982-Feb. 13, 1983>Daily (except
Mon., Tues. & Sat.),<May 26, 1982-Nov. 21, 1982>Daily (except Sat. &
Mon.),<Jan. 1, 1981-May 23, 1982>74th Year, no. 123 (Jan. 1, 1981)-76th
year, no. 83 (Feb. 13, 1983).Days of publication vary.Published as: The
Advance/Sylacauga daily, <Aug. 28, 1981-May 23, 1982>.Published as:
Sylacauga advance, <Nov. 24, 1982-Feb. 13, 1983>.On Sunday, published
as: Sunday advance.United StatesAlabamaTalladegaSylacauga.Childersburg
star(DLC)sn 88050232Coosa press(DLC)sn 86050293Daily
home1059-6461(DLC)sn 88050234info:srw/schema/1/marcxmlxml00000cas
a22000007a 450021026715cr un|||||||||900209c19809999aluwr ne 0
0eng dsn 90099002
AARengAARCPNCUSOCLOCLCQTJCOCLCQOCLCFOCLCOOCLCA926143844AU using 000020585756mscn-us-alSpeakin'
out news.Speaking out newsDecatur, Ala. :Minority Network,
Inc.v.WeeklyBegan in 1980.Published in Huntsville, Ala., <1987>-Also
issued by subscription via the World Wide Web.Description based on: Vol.
7, no. 8 (Jan. 7-13, 1987).African AmericansAlabamaNewspapers.African
American
newspapersAlabama.AlabamaNewspapers.Newspapers.fast(OCoLC)fst01423814African
American newspapers.fast(OCoLC)fst00799278African
Americans.fast(OCoLC)fst00799558Alabama.fast(OCoLC)fst01204694United
StatesAlabamaMorganDecatur.United
StatesAlabamaMadisonHuntsville.Speakin' out weekly news(DLC)sn
88050097http://www.softlineweb.com/softlineweb/ethnic.htminfo:srw/schema/1/marcxmlxml00000cas
a22000007a 450014996511861219c19809999aluwr ne 0 a0eng csn
86050472
AARengAARCPNNSDOCLCQ11080-15021080-15021328110USPSnsdppccWest-Alabama
gazetteWest-Alabama gazette.GazetteMillport, Ala. :Millport Pub.
Co.,1980-volumesWeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrier4th
year, no. 32 (Jan. 3, 1980)-United StatesAlabamaLamarMillport.Gazette
(Millport, Ala.)(DLC)sn 86050471info:srw/schema/1/marcxmlxml00000cas
a2200000 a 450011828156850320c19809999aluwr ne 0 0eng dsn
86050314 AAAengAAACPNOCLOCLCQmscThe Hartford news-herald.Hartford, Ala.
:Geneva Publications,1980-volumes :illustrations ;57-59
cmWeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierVol. 80,
no. 20 (Feb. 14, 1980)-United StatesAlabamaGenevaHartford.News-herald
(Hartford, Ala.)(DLC)sn 86050313info:srw/schema/1/marcxmlxml00000cas
a22000007a 450017857788880427d198u198ualusr ne 0 0eng dsn
88050097 AARengAARCPNOCLOCLCQOCLCFOCLCOOCLCAmscn-us-alSpeakin' out
weekly news.Decatur, Ala. :Smothers PublicationsPublished every first
and third Wed. of each monthDescription based on: Vol. 3, no. 13 (May
4-17, 1983).African
AmericansAlabamaNewspapers.Newspapers.fast(OCoLC)fst01423814African
Americans.fast(OCoLC)fst00799558Alabama.fast(OCoLC)fst01204694United
StatesAlabamaMorganDecatur.Weekly news (Huntsville, Ala.)(DLC)sn
87050012Speakin' out news(DLC)sn
90099002info:srw/schema/1/marcxmlxml00000cas a2200000 a
450017807936880418c198u9999aluwr ne 0 a0eng dsn 90099001
AAAengAAACPNOCLOCLCQThe Daleville Sun-Courier, 310 Daleville Ave.,
Daleville, AL 36322mscn-us-alDaleville sun-courier.Daleville, Ala. :QST
Publicationsv. :ill. ;58 cm.WeeklyDescription based on: Vol. 2, no. 28
(Wed., Feb. 17, 1988).United
StatesAlabamaDaleDaleville.AU using 000020585749info:srw/schema/1/marcxmlxml00000cas
a22000007a 450015580838870423c198u9999aluwr ne 0 0eng dsn
87050128 AARengAARCPNOCLCQmscGreene County independent.Eutaw, Ala.
:Greene County Independent, Inc.v.WeeklyDescription based on: Vol. 2,
no. 10 (Mar. 12, 1987).United
StatesAlabamaGreeneEutaw.info:srw/schema/1/marcxmlxml00000cas a22000007a
450010125135831114d198u198ualucr ne 0 a0eng dsn 83003221
NSDengNSDOCLCQ0d0746-55210746-55211Auburn Bulletin & Lee County Eagle,
PO Box 2111, Auburn, Ala. 36830nsdpThe Auburn bulletin & the Lee County
eagleThe Auburn bulletin & the Lee County eagle.Lee County eagleAuburn
bulletin and the Lee County eagleAuburn, Ala. :[publisher not
identified]Semiweekly,<Sept. 5,
1984->WeeklytexttxtrdacontentunmediatednrdamediaDescription based on:
Oct. 19, 1983.United StatesAlabamaLeeAuburn.Auburn bulletin(DLC)sn
89050006Eagle (Auburn, Ala.)(OCoLC)18435663Sept. 5,
1984info:srw/schema/1/marcxmlxml00000cas a22000007a
450018370324880818c198u9999aluwr ne 0 0eng dsn 88050147
CPNengCPNOCLCQmscTri-city times (Geraldine, Ala.)The Tri-City
times.Geraldine, Ala. :Wanda Nelsonv.WeeklyDescription based on: Vol. 2,
no. 24 (Jan. 6, 1982).United
StatesAlabamaDeKalbGeraldine.info:srw/schema/1/marcxmlxml00000cas
a22000007a 450010199338831208c198u9999aluwr ne 0 a0eng dsn
83005367 NSDengNSDCPNOCLCQ10746-62770746-62771707590USPSSpringville Pub.
Co., 539 Main St., Springville, AL 35146nsdpThe St. Clair clarionThe St.
Clair clarion.Saint Clair clarionSpringville, AL :Gary L.
ShultsWeeklytexttxtrdacontentunmediatednrdamediaDescription based on:
Vol. 2, no. 1 (Jan. 5, 1982).United StatesAlabamaSt.
ClairSpringville.AU using 000025783743info:srw/schema/1/marcxmlxml00000cas
a22000007a 450013787251860627c198u9999aluwr ne 0 a0eng dsn
86001923 NSDengNSDCPNOCLCQ10889-00800889-00801The Westerner Star, P.O.
Box 2060, Bessemer, AL 35021nsdpWestern star (Bessemer, Ala.)The Western
star(Bessemer, Ala.)The western star.Bessemer, Ala. :Hal
HodgensWeeklytexttxtrdacontentunmediatednrdamediaDescription based on:
Vol. 3, no. 15 (Wednesday, June 11, 1986).United
StatesAlabamaJeffersonBessemer.Bessemer advertiser(DLC)sn
87050117AU using 000025805174511.1srw.pc any \"y\" and srw.mt any
\"newspaper\" and srw.cp exact
\"Alabama\"50info:srw/schema/1/marcxmlxml1Date,,0mq1lME887FoIbjulKUV6bx9ImwWQNCv9GqZzGS92IKS31lEbcpRJBNHgcE1l29tFaHP9CHe0Yexk1uWQofffull"
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jul 27 2022, 09:22am via System
Hello Spencer,
Thank you for reaching out about the bulk xml files for the US Newspaper
Directory.
We don't have documentation specific to these bulk xml files, but upon
further inspection I can say that each of those files don't necessarily
contain info for 50 newspaper titles. The structure of the titles for
California and New York for instance are different from say, Alabama.
If you look at California for example, the file naming structure
indicates the year the title started, and then the number of titles
included in that xml file. So for instance, the files below include info
for newspapers that started in 2000, 2001, and 2002 respectively. And
there is info for 30 titles in the xml file from 2000, and 14 in the
file for 2001, and so on.
* ndnp_California_2000_e_0001_0030.xml
* ndnp_California_2001_e_0001_0014.xml
* ndnp_California_2002_e_0001_0012.xml
If there's more than 50 titles for a given year, say for California
starting in 1880, then the next 50 titles will roll into the next xml
file, and so on. And the last xml file for that year may not include 50
titles.
Many of the states seem to group all the years together, so each xml
file contains 50 titles, until possibly the last one for a given state,
which may contain less.
I hope this information helps explain the total number of records and
structure a bit better. Let me know if you have any further questions.
Best wishes,
Kerry Huller
Newspaper & Current Periodical Reading Room
Serial & Government Publications Division
Library of Congress
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jul 25 2022, 02:22pm via Email
Hi, Kerry:
Might there be documentation on the XML files you mentioned?
I've successfully read
'https://chroniclingamerica.loc.gov/data/bib/worldcat_titles/bulk5/',
extracted the names of 6666 XML files, and read the first one,
"ndnp_Alabama_all-yrs_e_0001_0050.xml". It contains 29415 characters,
beginning, "1.12250info:srw/schema/1/marcxmlxml00000nas a22000007i
45001030438981180404c20159999aluwr n 0 a0eng ". With a bit
more effort, I will likely be able to parse all 6666 of these. The
names suggest that each contains information on 50 newspapers, totaling
333,300. The main page
"https://chroniclingamerica.loc.gov/search/titles/" says there are only
157,521 "Titles currently listed". This suggests that these XML files
include place holders for a little more than double the number of
entries currently in "https://chroniclingamerica.loc.gov/search/titles/".
Thanks for this.
Progress.
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jul 07 2022, 08:55am via System
Hi Spencer,
I thought of one more option after I emailed you yesterday that I wanted
to make you aware of.
I had explained the other day how we pull the records from OCLC into our
U.S. Newspaper Directory. You can also access all of the raw MARC
records found in the directory in xml format from here if you choose:
https://chroniclingamerica.loc.gov/data/bib/worldcat_titles/bulk5/
<https://chroniclingamerica.loc.gov/data/bib/worldcat_titles/bulk5/> These will
provide you all of the data from the record fields in MARC format, so
you'd get all the data you see here for example:
https://chroniclingamerica.loc.gov/lccn/sn98059792/marc/
<https://chroniclingamerica.loc.gov/lccn/sn98059792/marc/> but in xml. I
don't know if this might be more data and info than you want to work
with, but wanted to make sure you were aware of this option as well.
Best wishes,
Kerry Huller
Newspaper & Current Periodical Reading Room
Serial & Government Publications Division
Library of Congress
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jul 06 2022, 10:55am via System
Hi Spencer,
Thanks for reaching out again. I have been looking at the json view a
bit closer this morning and your example of "9999."
After talking with a colleague this morning and looking at various
examples, I see there is some variation in how the titles with either an
unknown starting/ending date or currently published titles are being
handled - depending on the view.
As an example, I completed a search in the directory for Alaska and the
city of Anchorage. There are 80 results, and on the first page of
results you'll see # 4. Fort Richardson news, which was published from
1952-19??. The csv view of this state/city search result will show the
ending date of 19??. But if I append &format=json to this search result,
this specific title will show an ending date of 1999. After talking with
a colleague this morning, I discovered an integer had to be used in
these cases where dates were "?" so that the search based on year range
would work. Similarly, if you look at # 12 Alaska digest, which was
published 1994-current, the "current" becomes "9999" in the json view.
So, the records you are seeing with "9999" would most likely be titles
with an ending date of "current."
However, there is an issue with the unknown dates, like "1999" being
used for "19??" in the example above. The "9" does not get inserted in
place of "?" when you are looking at the title/LCCN view of a specific
newspaper. So for instance, if you view the #4 title: Fort Richardson
news at this url: https://chroniclingamerica.loc.gov/lccn/sn98059792/
<https://chroniclingamerica.loc.gov/lccn/sn98059792/> but append .json
to the end of the url, after the LCCN, like this:
https://chroniclingamerica.loc.gov/lccn/sn98059792.json
<https://chroniclingamerica.loc.gov/lccn/sn98059792.json> you'll see
that the end_year is "19??." Viewing the title/LCCN json view for titles
that are currently published will also show the end_year as "current."
The Alaska digest example from above can be viewed here:
https://chroniclingamerica.loc.gov/lccn/sn97060056.json
<https://chroniclingamerica.loc.gov/lccn/sn97060056.json>
I wasn't aware of the difference between the directory search json view
and the title/LCCN view. But I think it would be possible to grab
the data from the title/LCCN json url through an additional script
potentially. The json url is included in the view under the "url" field.
Of course, there are unknowns with publishing dates, but better to know
where the question marks are, and what titles are considered to be current.
I hope this clarifies the data a bit more - let me know if any of it
needs more clarification though. And let me know if you have follow-up
questions.
Thank you,
Kerry Huller
Newspaper & Current Periodical Reading Room
Serial & Government Publications Division
Library of Congress
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jul 05 2022, 04:42pm via Email
Hi, Kerry:
What would you suggest I do to get a count of the numbers of
newspapers and publishers operating by year from, say, 1790 to 2021?
I just determined that 20630 (13 percent) of the 157520 records in
the US Newspaper database I downloaded a week ago have end_year = 9999.
I don't think it's feasible to assume that all or even most of those
are still publishing.
Might there be some other database that might have this kind of
information?
I ask, because Robert McChesney (2004) The Problem of the Media
(Monthly Review Pr., esp. pp. 34-35) suggests that in the first half of
the nineteenth century, the US had more newspapers and newspaper
publishers per capita than any other place or time. He suggests that
that diversity of newspapers helped encourage literacy and limit
political corruption, both of which helped propel the young US to its
current dominance of the international political economy. I'm hoping to
get some data to evaluate this claim. Sadly, it looks like there is too
much missing and questionable data in this dataset for me to use this
without a fairly substantive data cleaning effort.
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jul 05 2022, 09:05am via System
Hello Spencer,
Thank you for reaching out about your additional questions.
I was looking at the records you mention above, and yes, you are correct
- those 9 records with the date inconsistencies and the one record for
the The New Mexican mining news
<https://chroniclingamerica.loc.gov/lccn/sn93061507/> containing "Santa
Fe.\" have typos in them. Thanks for spotting these - it may be possible
to have the cataloger in our division correct those typos. I will look
into this further.
The U.S. Newspaper Directory doesn't have a connection with Wikimedia or
Wikipedia. The Library of Congress periodically pulls the records for
the Directory from OCLC Worldcat
<https://www.oclc.org/en/worldcat.html>. And those newspaper records in
OCLC Worldcat have been created by catalogers at various institutions
around the U.S. over the span of several years. So, occasionally, you
will find a typo in the records. Corrections can be made by OCLC and
library staff at the various institutions. Every time we complete a new
pull on the OCLC records, any corrected records will then populate our
Directory.
Regarding your question on the New-York weekly journal - yes, that is
also correct that it has two records. There is actually a record for
each format of the newspaper, so this record is for the microfilm format
<https://chroniclingamerica.loc.gov/lccn/2009252748/> and this one is
for the original print format
<https://chroniclingamerica.loc.gov/lccn/sn83030211/>. You can see in
the heading for the microfilm record where it says [microfilm reel] and
the print version shows [volume]. You are likely to see this for other
titles as well because each format has been cataloged with its own LCCN.
You are also likely to see additional records with [online resource]
identified as the format as more and more titles are available as
ePrints or online.
I hope this helps answer your additional questions a bit more. Please
reach out if you have any other questions.
Thank you,
Kerry Huller
Newspaper & Current Periodical Reading Room
Serial & Government Publications Division
Library of Congress
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jul 04 2022, 01:47pm via Email
Hi, Kelly:
At the risk of bombing your inbox with more emails than you want,
what is your relationship with Wikipedia and other Wikimedia Foundation
projects like Wikidata?
I ask, because I've logged over 20,000 edits in Wikimedia Foundation
projects since 2010, and I would happily try to answer questions about
Wikidata and other Wikimedia Foundation projects. I have NOT organized
an edit-a-thon, but I've made presentations at conferences with people
who have, and I would happily try to help organize such if you could
find a group of people who want to work to improve this US Newspaper
database. I think it would be good to establish links between this US
Newspaper database and Wikidata, with appropriate procedures so changes
to one could be evaluated for acceptance into the other.
FYI, John Peter Zenger's famous "New-York weekly journal" (1733-1751)
appears TWICE in your database with lccn = 2009252748 and sn83030211 and
ONCE in Wikidata WITHOUT an lccn, even though many other Wikidata items
have an lccn. See:
https://www.wikidata.org/wiki/Q23091960
There's a "WikiProject Newspapers" on Wikipedia and a companion
"WikiProject Periodicals" on Wikidata:
https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Newspapers/Wikidata
https://www.wikidata.org/wiki/Wikidata:WikiProject_Periodicals
I've tried to connect with others on those projects, so far with only
limited success. However, you may know that almost anyone can change
almost anything on Wikipedia and other Wikimedia Foundation projects.
What stays tends to be written from a neutral point of view citing
credible sources. They have problems with vandals, but the problems are
usually easily controlled. This makes Wikipedia and Wikidata very
useful platforms for cleaning up databases like your US Newspaper dataset.
Spencer Graves
##########
Hello, Kelly:
In addition to the invalid JSON, discussed below [NOTE: The "below"
contains a slight addition to the report of the I sent last Friday.], I
found 9 (NINE!) cases where start_year was AFTER end_year. These have
lccn = "sn86071531" "sn95069213" "sn90059096" "sn86058451" "sn90060926"
"sn99065409" "sn89065002" "sn98069857" "sn91059179"
See:
https://chroniclingamerica.loc.gov/lccn/sn86071531/
https://chroniclingamerica.loc.gov/lccn/sn95069213/
https://chroniclingamerica.loc.gov/lccn/sn90059096/
https://chroniclingamerica.loc.gov/lccn/sn86058451/
https://chroniclingamerica.loc.gov/lccn/sn90060926/
https://chroniclingamerica.loc.gov/lccn/sn99065409/
https://chroniclingamerica.loc.gov/lccn/sn89065002/
https://chroniclingamerica.loc.gov/lccn/sn98069857/
https://chroniclingamerica.loc.gov/lccn/sn91059179/
These all have obvious coding errors that can be easily fixed. The
data may not be completely accurate after the fix, but at least they are
not obviously wrong ;-)
##################
I got invalid JSON from:
https://chroniclingamerica.loc.gov/search/titles/results/?rows=500&page=103&format=json
After some experimentation, I was able to replicate the problem with
a request for rows=10:
https://chroniclingamerica.loc.gov/search/titles/results/?rows=10&page=5117&format=json
Duncan Temple Lang <dtemplelang using ucdavis.edu>, Professor of Statistics
and Associate Dean for Graduate Programs at the University of California
- Davis, confirmed that it was a JSON error using:
https://codebeautify.org/jsonvalidator
He is part of the core team developing the R free, open-source
programming language. He said, that starting at offsets 161070 and
161502 in the character string you get from [the R code RCurl::getURL()]
we have:
Santa Fe.\"
and these are in an entry such as
"city": ["Santa Fe.\"]
So the final " is escaped and therefore there is no closing " for the
string. The parser continues to consume characters looking for the end
of that string.
If one "repairs" the text from getURL() with
ftxt= gsub('Santa Fe.\\\\"', 'Santa Fe."', txt)
then the rest of my code worked fine.
You may wish to do something to implement other checks for valid JSON
and repair this problem. I've scanned all the 157520 records that were
in that database a couple of days ago, and this is the only JSON error
identified by the code I used.
NOTE: I was NOT able to replicate this error when downloading records
one at a time. That suggests a problem NOT in the database itself but
in the download algorithm. ???
Thank you for your help. I will almost certainly have other
questions ;-)
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jul 03 2022, 10:39pm via Email
Hello, Kelly:
In addition to the invalid JSON, discussed below [NOTE: The "below"
contains a slight addition to the report of the I sent last Friday.], I
found 9 (NINE!) cases where start_year was AFTER end_year. These have
lccn = "sn86071531" "sn95069213" "sn90059096" "sn86058451" "sn90060926"
"sn99065409" "sn89065002" "sn98069857" "sn91059179"
See:
https://chroniclingamerica.loc.gov/lccn/sn86071531/
https://chroniclingamerica.loc.gov/lccn/sn95069213/
https://chroniclingamerica.loc.gov/lccn/sn90059096/
https://chroniclingamerica.loc.gov/lccn/sn86058451/
https://chroniclingamerica.loc.gov/lccn/sn90060926/
https://chroniclingamerica.loc.gov/lccn/sn99065409/
https://chroniclingamerica.loc.gov/lccn/sn89065002/
https://chroniclingamerica.loc.gov/lccn/sn98069857/
https://chroniclingamerica.loc.gov/lccn/sn91059179/
These all have obvious coding errors that can be easily fixed. The
data may not be completely accurate after the fix, but at least they are
not obviously wrong ;-)
##################
I got invalid JSON from:
https://chroniclingamerica.loc.gov/search/titles/results/?rows=500&page=103&format=json
After some experimentation, I was able to replicate the problem with
a request for rows=10:
https://chroniclingamerica.loc.gov/search/titles/results/?rows=10&page=5117&format=json
Duncan Temple Lang <dtemplelang using ucdavis.edu>, Professor of Statistics
and Associate Dean for Graduate Programs at the University of California
- Davis, confirmed that it was a JSON error using:
https://codebeautify.org/jsonvalidator
He is part of the core team developing the R free, open-source
programming language. He said, that starting at offsets 161070 and
161502 in the character string you get from [the R code RCurl::getURL()]
we have:
Santa Fe.\"
and these are in an entry such as
"city": ["Santa Fe.\"]
So the final " is escaped and therefore there is no closing " for the
string. The parser continues to consume characters looking for the end
of that string.
If one "repairs" the text from getURL() with
ftxt= gsub('Santa Fe.\\\\"', 'Santa Fe."', txt)
then the rest of my code worked fine.
You may wish to do something to implement other checks for valid JSON
and repair this problem. I've scanned all the 157520 records that were
in that database a couple of days ago, and this is the only JSON error
identified by the code I used.
NOTE: I was NOT able to replicate this error when downloading records
one at a time. That suggests a problem NOT in the database itself but
in the download algorithm. ???
Thank you for your help. I will almost certainly have other
questions ;-)
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jul 01 2022, 11:46am via Email
Hello, Kelly:
I got invalid JSON from:
https://chroniclingamerica.loc.gov/search/titles/results/?rows=500&page=103&format=json
After some experimentation, I was able to replicate the problem with
a request for rows=10:
https://chroniclingamerica.loc.gov/search/titles/results/?rows=10&page=5117&format=json
Duncan Temple Lang <dtemplelang using ucdavis.edu>, Professor of Statistics
and Associate Dean for Graduate Programs at the University of California
- Davis, confirmed that it was a JSON error using:
https://codebeautify.org/jsonvalidator
He is part of the core team developing the R free, open-source
programming language. He said, that starting at offsets 161070 and
161502 in the character string you get from [the R code RCurl::getURL()]
we have:
Santa Fe.\"
and these are in an entry such as
"city": ["Santa Fe.\"]
So the final " is escaped and therefore there is no closing " for the
string. The parser continues to consume characters looking for the end
of that string.
If one "repairs" the text from getURL() with
ftxt= gsub('Santa Fe.\\\\"', 'Santa Fe."', txt)
then the rest of my code worked fine.
You may wish to do something to implement other checks for valid JSON
and repair this problem. I've scanned all the 157520 records that were
in that database a couple of days ago, and this is the only JSON error
identified by the code I used.
Thank you for your help. I will almost certainly have other
questions ;-)
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jun 28 2022, 02:20pm via System
Hello Spencer,
Thank you for sending along your follow-up questions.
I'm glad to hear the json view will work for you. It was recommended to
me that you limit your requests to 500 rows at a time. And a developer
here at LC suggests the following regarding rate limiting:
“To avoid being blocked by the server, the current rate-limiting rules
restrict un-cached requests to URLs starting with
https://chroniclingamerica.loc.gov/search/
<https://chroniclingamerica.loc.gov/search/> to 120 requests every 10
minutes from a single IP address.”
So, I think if you limited each of your requests to 500 rows at a time
with the proper pauses, then you should be able to access what you need.
As for the csv view, I checked on this as well, and was informed that
the csv view was not implemented for all url formats. The csv view was
only implemented for this view:
https://chroniclingamerica.loc.gov/newspapers/
<https://chroniclingamerica.loc.gov/newspapers/>and urls resulting from
US Directory search results - for e.g. if you wanted to narrow down your
search results by state, city, date range, etc. found at this link:
https://chroniclingamerica.loc.gov/search/titles/
<https://chroniclingamerica.loc.gov/search/titles/>. So, if you wanted a
csv and limited your search by state ( for example:
https://chroniclingamerica.loc.gov/search/titles/results/?state=Alaska&county=&city=&year1=1690&year2=2022&terms=&frequency=&language=ðnicity=&labor=&material_type=&lccn=&rows=20&format=csv
<https://chroniclingamerica.loc.gov/search/titles/results/?state=Alaska&county=&city=&year1=1690&year2=2022&terms=&frequency=&language=ðnicity=&labor=&material_type=&lccn=&rows=20&format=csv>
), you could append &format=csv to the search result url and get the csv
to automatically download. But, if your search results ended up being
over a couple thousand titles, then the system would probably time out.
I hope this info helps! Let me know if you have any other questions.
Best wishes,
Kerry Huller
Newspaper & Current Periodical Reading Room
Serial & Government Publications Division
Library of Congress
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jun 27 2022, 04:15pm via Email
Hello, Kerry:
Thanks for the reply. Can you please give me some further guidance
on two thing "so that the system is not overwhelmed"?
1. The max size in a small batch?
2. Any limit on the number of small batches in a second or minute?
I've found that I can download small batches under program control
using "RCurl::getURL" in R (programming language) using, e.g.;
https://chroniclingamerica.loc.gov/search/titles/results/?rows=20&page=2&format=json
With this, I can control the batch size with "row=20" vs. "row=50"
vs., e.g., "row=1000". A naive search says there are 157520 "results".
With "row=1000", this would require 158 calls. With "row=20", it
would require 7876 calls. Before I start, I need to decide which fields
I want; I don't need them all.
Thanks,
Spencer Graves
p.s. I tried appending "&format=csv" and got "Error 504 Ray ID:
7220896da85e86e7 • 2022-06-27 19:19:53 UTC Gateway time-out". I used:
https://chroniclingamerica.loc.gov/search/titles/results/?state=&county=&city=&year1=1690&year2=2022&terms=&frequency=&language=ðnicity=&labor=&material_type=&lccn=&rows=20&format=csv
I can get what I want using json so do not need csv. However, I
thought you might want to know that I was unable to get csv to work.
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jun 27 2022, 10:54am via System
Hello Spencer,
Thank you for contacting the Library of Congress about searching the US
Newspaper Directory. I wanted to follow up with you regarding your
request to output the data in a machine readable format.
It looks like you were provided the link to the API documentation for
the website: About the Site and API
<https://chroniclingamerica.loc.gov/about/api/>. Scroll down to the
section with the heading, Searching the directory and newspaper pages
using OpenSearch. This section describes the search functionality and
structure for the US Newspaper Directory in more detail. It is possible
to return your directory searches in json format by appending
&format=json to the end of the url. It is also possible to return search
results in csv format by appending &format=csv to the end of the url,
but I would strongly suggest that you do this in small batches by
putting limits on your search so that the system is not overwhelmed.
So, from the search page for the US Newspaper Directory
<https://chroniclingamerica.loc.gov/search/titles/> you could
potentially limit your search based on state and city, or date range,
and/or even frequency. Then once you've completed the search, you can
add &format=csv to the end of the url to automatically download a csv of
those records. The resulting csv will contain several fields/headers:
lccn, title, place of publication, start year, end year, publisher,
edition, frequency, subject, state, city, country, language, oclc
number, and holding type. I think these fields include the information
you were looking for. But, again, I would like to stress that you put
limits on your search before creating the csv so as not overwhelm the
system.
Please let me know if you have any other additional questions.
Best wishes,
Kerry Huller
Newspaper & Current Periodical Reading Room
Serial & Government Publications Division
Library of Congress
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jun 23 2022, 01:55pm via System
Mr. Graves,
I'm going to transfer you request to a member of our digital collections
team who may be of more assistance to you than me.
Mike
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jun 23 2022, 01:51pm via Email
Dear Mr. Queen:
Thanks for the reply. I'm still confused. I downloaded and
installed Docker Desktop and "docker-compose.yml" and ran their "Getting
Started" Tutorial, but I don't see what to do next.
I repeat: I'd like to analyze "U.S. Newspaper Directory,
1690-Present" (https://chroniclingamerica.loc.gov/search/titles/), which
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jun 22 2022, 07:15pm via System
Mr. Graves,
Programmatic access to the data forChronicling America
<https://chroniclingamerica.loc.gov/>and possibly the U.S. Newspaper
Directory <https://chroniclingamerica.loc.gov/search/titles/>can be
found on theAbout the Site and API
<https://chroniclingamerica.loc.gov/about/api/>page in various formats.
Also, please note that Chronicling Americacontains newspapers published
from 1777-1963, but does not include everyU.S. newspaper published in
that time period.
Please let me know if I can be of further assistance.
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jun 22 2022, 06:14pm via Email
Dear Mr. Queen:
Can we simplify this to just giving me the data behind "U.S.
Newspaper Directory, 1690-Present"
(https://chroniclingamerica.loc.gov/search/titles/) in a machine
readable format, e.g., csv or xlsx or a MySQL database?
As I mentioned in my original email, a naive search of that without
restrictions returned 157520 titles in 7876 pages with up to 20 titles
per page giving date ranges in at least some cases. I could probably
write software to scrape those 7876 pages from your web site and combine
them into a data file.
I have a PhD in statistics, I have been using the R programming
language and similar software for decades. This includes publishing
tutorials on how to analyze data like this on Wikiversity.[1] I'd like
to do something similar with this. I could help make your data more
useful to others and discuss with you how we might prioritize
improvements like accessing the other sources you mentioned.
Thanks very much for your reply.
Sincerely,
Spencer Graves, PhD
Founder, EffectiveDefense.org
4550 Warwick Blvd 508
Kansas City, MO 64111
m: 408-655-4567
[1] e.g.:
https://en.wikiversity.org/wiki/US_Gross_Domestic_Product_(GDP)_per_capita
------------------------------------------------------------------------
Newspapers and Current Periodicals Reference Librarian
Jun 22 2022, 05:27pm via System
Mr. Graves
Your request is a little more complex than it first appears and requires
extensive research. A variety of resources should be consulted to
determine the circulation statistics of newspapers published prior to
1851. You will need to check newspaper union lists and newspaper
histories. Union listspresent lists of newspapers in geographic
arrangement according to place of publication, and specify which
libraries or other institutions hold collections of those newspapers and
the dates of their holdings. These can also be useful for tracking title
changes throughout a newspaper's history. Newspaper
historieslikeAmerican Journalism: A History: 1690-1960
<https://lccn.loc.gov/62007157>(Mott),The Penny Press
<https://lccn.loc.gov/2004043078>(Thompson), andThe Press and America
<https://lccn.loc.gov/99044295>(Emery et al.) may not include
circulation statistics, but they do document the diversity and progress
of newspaper publishing, including notable newspapers of the era.
Newspaper histories also cover the history of the printers and printing
of newspapers in a state, county, or region more generally, and provide
more condensed histories of the editors, journalists, and evolution of
the newspapers in a specific area. Newspaper histories and union lists
should be available at most large public or university libraries. More
information about union lists, newspaper histories, and researching
newspapers in general can be found in theU.S. Newspaper Collections at
the Library of Congress
<https://guides.loc.gov/united-states-newspapers/introduction>research
guide (see Reference Sources).
Please let me know if I can be of further assistance.
------------------------------------------------------------------------
Original Question
Jun 20 2022, 02:34pm via System
How can I get counts of the numbers of newspapers by year in the US, and
preferably also elsewhere? A search of "U.S. Newspaper Directory,
How can I get counts of the numbers of newspapers by year in the US, and
preferably also elsewhere?
A search of "U.S. Newspaper Directory, 1690-Present"
(https://chroniclingamerica.loc.gov/search/titles/) returned 157520
titles in 7876 pages with up to 20 titles per page giving date ranges to
the extent that it's known. If I can get a data file (e.g., csv or xls),
I can summarize. I could also use data on circulation and frequency and
especially parent company for multiple newspapers published by the same
company, to the extant that such is available.
I'm interested in this, because McChesney quoted Tocqueville in
suggesting that the US had more newspapers per person (or per million
population) prior to 1851 than at any other time or place in history.
I'd like to evaluate that claim with data to the extent that I can. See
"https://en.wikiversity.org/wiki/Social_construction_of_crime_and_what_we_can_do_about_it#Newspapers_1790_-_present".
Thanks, Spencer Graves, PhD
m: 408-655-4567
------------------------------------------------------------------------
Thank you for using Newspapers & Current Periodicals Ask a Librarian
Service!
This email is sent from Ask a Librarian in relationship to ticket #9625195.
Read our privacy policy. <https://springshare.com/privacy.html>
More information about the R-help
mailing list