[R] dealing with a messy dataset
jean-philippe
jeanphilippe.fontaine at gssi.infn.it
Thu Oct 5 18:46:18 CEST 2017
dear Jim,
Yes I fixed the problem. Thanks again all of you for your contribution!
This worked :
start <- c(1, 20, 35, 41, 44, 48, 53, 59, 64, 70, 76, 78, 83, 88,
+ 93, 114, 122, 127)
data1<-read_fwf("lvg_table2.txt",skip=70, fwf_widths(diff(start)))
Well now I know how to deal with fixed-width files :)
Cheers
Jean-Philippe
On 05/10/2017 18:42, jim holtman wrote:
> You should be able to use that header information to create the
> correct parameters to the read_fwf function to read in the data.
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> On Thu, Oct 5, 2017 at 11:02 AM, jean-philippe
> <jeanphilippe.fontaine at gssi.infn.it> wrote:
>> dear Jim,
>>
>> Thanks for your reply and your proposition.
>>
>> I forgot to provide the header of the dataframe, here it is:
>> ================================================================================
>> Byte-by-byte Description of file: lvg_table2.dat
>> --------------------------------------------------------------------------------
>> Bytes Format Units Label Explanations
>> --------------------------------------------------------------------------------
>> 1- 18 A18 --- Name Galaxy name in well-known catalogs
>> 20- 21 I2 h RAh Hour of Right Ascension (J2000)
>> 22- 23 I2 min RAm Minute of Right Ascension (J2000)
>> 24- 27 F4.1 s RAs Second of Right Ascension (J2000)
>> 28 A1 --- DE- Sign of the Declination (J2000)
>> 29- 30 I2 deg DEd Degree of Declination (J2000)
>> 31- 32 I2 arcmin DEm Arcminute of Declination (J2000)
>> 33- 34 I2 arcsec DEs Arcsecond of Declination (J2000)
>> 36- 40 F5.2 kpc a26 ? Major linear diameter (1)
>> 42- 43 I2 deg inc ? Inclination
>> 45- 47 I3 km/s Vm ? Amplitude of rotational velocity (2)
>> 49- 52 F4.2 mag AB ? Internal B band extinction (3)
>> 54- 58 F5.1 mag BMag ? Absolute B band magnitude (4)
>> 60- 63 F4.1 mag/arcsec2 SBB ? Average B band surface brightness (5)
>> 65- 69 F5.2 [solLum] logKLum ? Log K_S_ band luminosity (6)
>> 71- 75 F5.2 [solMass] logM26 ? Log mass within Holmberg radius (7)
>> 77 A1 --- l_logMHI Limit flag on logMHI
>> 78- 82 F5.2 [solMass] logMHI ? Log hydrogen mass (8)
>> 84- 87 I4 km/s VLG ? Radial velocity (9)
>> 89- 92 F4.1 --- Theta1 ? Tidal index (10)
>> 94-116 A23 --- MD Main disturber name (11)
>> 118-121 F4.1 --- Theta5 ? Another tidal index (12)
>> 123-127 F5.2 [-] Thetaj ? Log K band luminosity density (13)
>> --------------------------------------------------------------------------------
>>
>> The idea for me is to select only the galaxy name and the logMHI values for
>> these galaxies, so quite a simple job when the dataset is tidy enough. I was
>> thinking as usual to use select from dplyr.
>> That is why I was just asking how to read this kind of files which, for me
>> so far, are uncommon.
>>
>> Doing what you propose, it formats most of the columns correctly except few
>> ones, I will see how I can change some width to get it correctly:
>>
>> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
>> X11 X12 X13 X14 X15 X16 X17
>> (chr) (chr) (dbl) (int) (dbl) (dbl) (chr) (dbl) (chr)
>> (chr) (int) (chr) (chr) (chr) (chr) (dbl) (chr)
>> 1 UGC12894 000022.5+392944 2.78 33 21 0 -13.3 25.2 7.5 8 8.1
>> 7 7.9 2 61 9 -1. 3 NGC7640 -1 0 0.12
>> 2 WLM 000158.1-152740 3.25 90 22 0 -14.1 24.8 7.7 0 8.2
>> 7 7.8 4 -1 6 0. 0 MESSIER031 0 2 1.75
>> 3 And XVIII 000214.5+450520 0.69 17 9 0 -8.7 26.8 6.4 4 6.7
>> 8 < 6.6 5 -4 4 0. 5 MESSIER031 0 6 1.54
>> 4 PAndAS-03 000356.4+405319 0.10 17 NA 0 -3.6 27.8 4.3 8
>> NA NA NA 2. 8 MESSIER031 2 8 1.75
>> 5 PAndAS-04 000442.9+472142 0.05 22 NA 0 -6.6 23.1 5.5 9
>> NA NA -10 8 2. 5 MESSIER031 2 5 1.75
>> 6 PAndAS-05 000524.1+435535 0.06 31 NA 0 -4.5 25.6 4.7 5
>> NA NA 10 3 2. 8 MESSIER031 2 8 1.75
>> 7 ESO409-015 000531.8-280553 3.00 78 23 0 -14.6 24.1 8.1 0 8.2
>> 5 8.1 0 76 9 -2. 0 NGC0024 -1 5 -2.05
>> 8 AGC748778 000634.4+153039 0.61 70 3 0 -10.4 24.9 6.3 9 5.7
>> 0 6.6 4 48 6 -1. 9 NGC0253 -1 5 -2.72
>> 9 And XX 000730.7+350756 0.20 33 5 0 -5.8 27.1 5.2 6 5.7
>> 0 NA -18 2 2. 4 MESSIER031 2 4 1.75
>>
>>
>> Cheers, thanks again
>>
>>
>> Jean-Philippe
>> On 05/10/2017 16:49, jim holtman wrote:
>>> start <- c(1, 20, 35, 41, 44, 48, 53, 59, 64, 69, 75, 77, 82, 87,
>>> + 92, 114, 121, 127)
>>> > read_fwf(input, fwf_widths(diff(start)))
>>
>> --
>> Jean-Philippe Fontaine
>> PhD Student in Astroparticle Physics,
>> Gran Sasso Science Institute (GSSI),
>> Viale Francesco Crispi 7,
>> 67100 L'Aquila, Italy
>> Mobile: +393487128593, +33615653774
>>
--
Jean-Philippe Fontaine
PhD Student in Astroparticle Physics,
Gran Sasso Science Institute (GSSI),
Viale Francesco Crispi 7,
67100 L'Aquila, Italy
Mobile: +393487128593, +33615653774
More information about the R-help
mailing list