-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathHowtodoit.txt
More file actions
450 lines (329 loc) · 13.1 KB
/
Howtodoit.txt
File metadata and controls
450 lines (329 loc) · 13.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
*) PREREQUISITES:
python2.6
python modules:
lxml
mx.DateTime or egenix_mx_base
rdflib 3
pysolr
simplejson
(Use pip or easy_install)
*) BASIC INSTALL:
mkdir ads-stuff
git clone git@github.com:rahuldave/semflow.git
rsync -avz ~/Dropbox/AstroExplorer .
If Installing AppSem:
Install node.js 0.4.8
install modules using npm (which therefore also needs to be installed):
connect connect-redis hiredis mime mustache redis qs
git clone git@github.com:rahuldave/appsem.git
cd appsem
git submodule init
git submodule update
In appsem run node server.js and point at
http://localhost:3000/semantic2/alpha/explorer/publications/
or open
appsem/static/ajax-solr/publications.html
in your browser (this will only work if you have the webapp up and
running on port 8983, as shown below).
*) VENDOR INSTALLS:
mkdir ads-stuff/vendor
cd ads-stuff/vendor
(a) Install Solr
tar zxvf ../AstroExplorer/vendor/apache-solr-3.2.0.tgz
cp -a apache-solr-3.2.0/example container
(b) Install Sesame
tar zxvf ../AstroExplorer/vendor/openrdf-sesame-2.4.0-sdk.tar.gz
cp openrdf-sesame-2.4.0/war/openrdf-* container/webapps/
(c) Install Redis
tar zxvf ../AstroExplorer/vendor/redis-2.2.10.tar.gz
cd redis-2.2.10
make; sudo make install
/usr/local/bin/redis-server (if installed)
redis-cli can be used to check it
(d) Start Sesame and Solr on port 8983:
pushd container/solr/conf
mv solrconfig.xml solrconfig.xml.bak
mv schema.xml schema.xml.bak
# Link to configuration files in semflow repository
ln -s ../../../../semflow/solr/schema.xml .
ln -s ../../../../semflow/solr/solrconfig.xml .
popd
cd container
#Now back in container directory
java -jar start.jar
#With more memory for heap
-Xmx4g
Solr admin is at:
http://localhost:8983/solr/admin/
Seasme Workbench is at:
http://localhost:8983/openrdf-workbench/
*) CREATION OR CLEANING OF THE DATABASES:
We have three data stores that either need creating or cleaning:
(a) on disk storage
This is in the chandra-rdf/ and mast-rdf/ directories created in the
SET UP FOR DATA INSTALL step below.
(b) Sesame database
This can be created (or deleted and then created) using either the
Sesame workbench interface at
http://localhost:8983/openrdf-workbench/
or using the console.sh script at
vendor/openrdf-sesame-2.4.0/bin/console.sh
The repository should have
Type = Native Java Store
Id = testads8
Title = testads8
and accept the default value for the triple indexes (which will be
"spoc,posc".
Here's a run-through using the command-line interface, which is
described at
http://www.openrdf.org/doc/sesame2/users/ch07.html
% vendor/openrdf-sesame-2.4.0/bin/console.sh
03:11:25.843 [main] DEBUG info.aduna.platform.PlatformFactory - os.name = mac os x
03:11:25.848 [main] DEBUG info.aduna.platform.PlatformFactory - Detected Mac OS X platform
Connected to default data directory
Commands end with '.' at the end of a line
Type 'help.' for help
> connect http://localhost:8983/openrdf-sesame.
Disconnecting from default data directory
Connected to http://localhost:8983/openrdf-sesame
> show repositories.
+----------
|SYSTEM ("System configuration repository")
+----------
> create native.
Please specify values for the following variables:
Repository ID [native]: testads8
Repository title [Native store]: testads8
Triple indexes [spoc,posc]:
Repository created
> show repositories.
+----------
|SYSTEM ("System configuration repository")
|testads8 ("testads8")
+----------
> quit.
Disconnecting from http://localhost:8983/openrdf-sesame
Bye
The database should be deleted and re-created if re-running the
scripts. This can be done from the web page or from within the console
by saying
connect http://localhost:8983/openrdf-sesame.
drop testads8.
create native.
...
You may want to try physically deleting the files from disk after the
drop but before re-creating: on OS-X this worked for me:
rm -rf ~/Library/Application\ Support/Aduna/OpenRDF\ Sesame/repositories/testads8/
(c) Solr database
It looks like there's no need to create a database for Solr. If
re-running the scripts then the existing data needs to be removed
which is achieved with the solrclear.py script:
cd semflow
python solrclear.py
There is also
python solropt.py
which optimizes the Solr database; this is mainly useful to reduce the number of open
files (so allowing more data to be added) but it may improve search times somewhat.
*) SETUP FOR DATA INSTALL:
Depending on if u r on a mac or nor , the directory will be Chandra or CHANDRA. Make sure one is
symbolically linked to the other. as below on linux
cd ads-stuff
tar zxvf AstroExplorer/Missions/Chandra/chandra/chda.tgz
pushd AstroExplorer/Missions/
ln -s Chandra CHANDRA
popd
mkdir chandra-rdf
mkdir mast-rdf
cd semflow
export PYTHONPATH=`pwd`
POSSIBLE THINGIES NEEDING DOING
(a) java heap size
(b) ulimits on ubuntu
*) CHANDRA INSTALL:
To do everything (there are also scripts for the other missions but
these are not included in this document); the scripts log information
about each stage to <mission>.log (to make it easier to see how long
stages are taking and to see if any fail).
% ./scripts/doitchandra.sh
Manual steps are:
adsrdf:
python adsclassic2rdf.py ../chandra-rdf ../AstroExplorer/Missions/Chandra/chandra/sherry.p.a.biblist.txt
python adsclassic2rdf.py ../chandra-rdf ../AstroExplorer/Missions/Chandra/chandra/hutoverlap.biblist.txt
simbadrdf:
python simbad2rdf.py ../AstroExplorer/Missions/Chandra/chandra/sherry.p.a.simbad.dict ../chandra-rdf
python simbad2rdf.py ../AstroExplorer/Missions/Chandra/chandra/hutoverlap.simbad.dict ../chandra-rdf
pubrdf:
python chandra/genrdf.py pub ../AstroExplorer/Missions/Chandra/chandra/sherry.p.a.linkedpubs.txt ../chandra-rdf/
python chandra/genrdf.py pub ../AstroExplorer/Missions/Chandra/chandra/hutoverlap.linkedpubs.txt ../chandra-rdf/
obsvrdf:
python chandra/genrdf.py obsv ../AstroExplorer/Missions/Chandra/chandra/global.obsids.txt ../chandra-rdf/
proprdf:
python chandra/genrdf.py prop ../AstroExplorer/Missions/Chandra/chandra/global.proposals.txt ../chandra-rdf/
adsload:
python loadfiles.py ../AstroExplorer/Missions/Chandra/chandra/sherry.p.a.biblist.txt
python loadfiles.py ../AstroExplorer/Missions/Chandra/chandra/hutoverlap.biblist.txt
simbadload:
python loadfiles-simbad.py ../AstroExplorer/Missions/Chandra/chandra/sherry.p.a.biblist.txt
python loadfiles-simbad.py ../AstroExplorer/Missions/Chandra/chandra/hutoverlap.biblist.txt
pubload:
python chandra/loadfiles.py ../AstroExplorer/Missions/Chandra/chandra/sherry.p.a.linkedpubs.txt pub
python chandra/loadfiles.py ../AstroExplorer/Missions/Chandra/chandra/hutoverlap.linkedpubs.txt pub
obsvload:
python chandra/loadfiles.py ../AstroExplorer/Missions/Chandra/chandra/global.obsids.txt obsv
propload:
python chandra/loadfiles.py ../AstroExplorer/Missions/Chandra/chandra/global.proposals.txt prop
#We had to produce a cut file below due to some linkage probs in Chandra
pubsolr:
#python solrclear.py #is doing this first
python rdf2solr5.py CHANDRA chandra ../AstroExplorer/Missions/Chandra/chandra/sherry.p.a.biblist.txt.cut
python rdf2solr5.py CHANDRA chandra ../AstroExplorer/Missions/Chandra/chandra/hutoverlap.biblist.txt
MAST:
For each mission there needs to be a file
mast/ingest_<mission>.py
newmast/mast_proprdf_<mission>.py
(only if there are proposals for the mission)
In most cases this can be simple, such as mast/ingest_wuppe.py, but it
can get complex (e.g. hut).
In order to match bibcodes, datasets and possibly proposals, you need
to understand how the obsid value in the map.<mission>.txt file
relates to the obsid values from the obscore.<mission>.psv file. In
most cases the map file just gives the prefix of the obscore value
(hopefully a unique prefix) but there are cases where more
manipulation is needed (e.g. case conversion), which should be done by
the getObsidForPubMap routine. This routine takes in the obsid value which we
use to create the URI for the object, so essentially the obscore
value, and converts it to a form that can be compared to the map
version via
if getObsidForPubMap(...).startswith(obsid-from-map-file):
start processing
Note that Doug has messed around with this flow slightly and now made
the comparison case insensitive, so you should not need to do case
conversion in the ingest_<mission>.py file, but this change may be
backed out.
Ok, doing HUT
--------------
1992BAAS...24.1285L
1992AAS...18110208L
BAAS error, need to use the ADS bibcode synonym file
No output, need to fix first.
Multiple fixes were needed. We corrected at file level by new alts.py in scripts folder.
also see errors.txt in hut where we had to exscise one non-existent bibcode
adsrdf:
python adsclassic2rdf.py ../mast-rdf ../AstroExplorer/Missions/MAST/hut/hut.biblist.txt
simbadrdf:
python simbad2rdf.py ../AstroExplorer/Missions/MAST/hut/hut.simbad.dict ../mast-rdf
# Order is important here
obsvrdf:
python newmast/mast_obsvrdf.py hut ../AstroExplorer/Missions/MAST/hut/obscore.hut.psv
pubrdf:
python newmast/mast_pubrdf.py hut ../AstroExplorer/Missions/MAST/hut/map.hut.txt
proprdf:
echo None
adsload:
python loadfiles.py ../AstroExplorer/Missions/MAST/hut/hut.biblist.txt default2.conf
simbadload:
python loadfiles-simbad.py ../AstroExplorer/Missions/MAST/hut/hut.biblist.txt default2.conf
obsvload:
python newmast/mast_obsvload.py hut
pubload:
python newmast/mast_pubload.py hut
propload:
echo None
pubsolr:
python rdf2solr5.py MAST hut ../AstroExplorer/Missions/MAST/hut/hut.biblist.txt
WUPPE
adsrdf:
python adsclassic2rdf.py ../mast-rdf ../AstroExplorer/Missions/MAST/wuppe/wuppe.biblist.txt
simbadrdf:
python simbad2rdf.py ../AstroExplorer/Missions/MAST/wuppe/wuppe.simbad.dict ../mast-rdf
# Order is important here
obsvrdf:
python newmast/mast_obsvrdf.py wuppe ../AstroExplorer/Missions/MAST/wuppe/obscore.wuppe.psv
pubrdf:
python newmast/mast_pubrdf.py wuppe ../AstroExplorer/Missions/MAST/wuppe/map.wuppe.txt
proprdf:
echo None
adsload:
python loadfiles.py ../AstroExplorer/Missions/MAST/wuppe/wuppe.biblist.txt default2.conf
simbadload:
python loadfiles-simbad.py ../AstroExplorer/Missions/MAST/wuppe/wuppe.biblist.txt default2.conf
obsvload:
python newmast/mast_obsvload.py wuppe
pubload:
python newmast/mast_pubload.py wuppe
propload:
echo None
pubsolr:
python rdf2solr5.py MAST wuppe ../AstroExplorer/Missions/MAST/wuppe/wuppe.biblist.txt
HPOL
adsrdf:
python adsclassic2rdf.py ../mast-rdf ../AstroExplorer/Missions/MAST/hpol/hpol.biblist.txt
simbadrdf:
python simbad2rdf.py ../AstroExplorer/Missions/MAST/hpol/hpol.simbad.dict ../mast-rdf
# Order is important here
obsvrdf:
python newmast/mast_obsvrdf.py hpol ../AstroExplorer/Missions/MAST/hpol/obscore.hpol.psv
pubrdf:
python newmast/mast_pubrdf.py hpol ../AstroExplorer/Missions/MAST/hpol/map.hpol.txt
proprdf:
echo None
adsload:
python loadfiles.py ../AstroExplorer/Missions/MAST/hpol/hpol.biblist.txt default2.conf
simbadload:
python loadfiles-simbad.py ../AstroExplorer/Missions/MAST/hpol/hpol.biblist.txt default2.conf
obsvload:
python newmast/mast_obsvload.py hpol
pubload:
python newmast/mast_pubload.py hpol
propload:
echo None
pubsolr:
python rdf2solr5.py MAST hpol ../AstroExplorer/Missions/MAST/hpol/hpol.biblist.txt
EUVE
adsrdf:
python adsclassic2rdf.py ../mast-rdf ../AstroExplorer/Missions/MAST/euve/euve.biblist.txt
simbadrdf:
python simbad2rdf.py ../AstroExplorer/Missions/MAST/euve/euve.simbad.dict ../mast-rdf
# Order is important here
obsvrdf:
python newmast/mast_obsvrdf.py euve ../AstroExplorer/Missions/MAST/euve/obscore.euve.psv
pubrdf:
python newmast/mast_pubrdf.py euve ../AstroExplorer/Missions/MAST/euve/map.euve.txt
proprdf:
python newmast/mast_proprdf.py euve ../AstroExplorer/Missions/MAST/euve/euve_program.list
adsload:
python loadfiles.py ../AstroExplorer/Missions/MAST/euve/euve.biblist.txt default2.conf
simbadload:
python loadfiles-simbad.py ../AstroExplorer/Missions/MAST/euve/euve.biblist.txt default2.conf
obsvload:
python newmast/mast_obsvload.py euve
pubload:
python newmast/mast_pubload.py euve
propload:
python newmast/mast_propload.py euve
pubsolr:
python rdf2solr5.py MAST euve ../AstroExplorer/Missions/MAST/euve/euve.biblist.txt
FUSE
adsrdf:
python adsclassic2rdf.py ../mast-rdf ../AstroExplorer/Missions/MAST/fuse/fuse.biblist.txt
simbadrdf:
python simbad2rdf.py ../AstroExplorer/Missions/MAST/fuse/fuse.simbad.dict ../mast-rdf
# Order is important here
obsvrdf:
python newmast/mast_obsvrdf.py fuse ../AstroExplorer/Missions/MAST/fuse/obscore.fuse.psv
pubrdf:
python newmast/mast_pubrdf.py fuse ../AstroExplorer/Missions/MAST/fuse/map.fuse.txt
proprdf:
python newmast/mast_proprdf.py fuse ../AstroExplorer/Missions/MAST/fuse/fuse_program.list
adsload:
python loadfiles.py ../AstroExplorer/Missions/MAST/fuse/fuse.biblist.txt default2.conf
simbadload:
python loadfiles-simbad.py ../AstroExplorer/Missions/MAST/fuse/fuse.biblist.txt default2.conf
obsvload:
python newmast/mast_obsvload.py fuse
pubload:
python newmast/mast_pubload.py fuse
propload:
python newmast/mast_propload.py fuse
pubsolr:
python rdf2solr5.py MAST fuse ../AstroExplorer/Missions/MAST/fuse/fuse.biblist.txt