Discussion:
NetCDF for parallel usage
Samrat Rao
2014-10-01 05:52:18 UTC
Permalink
Hi,

I plan to use netCDF-4 format for the output of my CFD code written in
Fortran 90. The code is written for parallel computation.

I know how to read and write netCDF files from a single processor.

I would like to output the data from my CFD code into a single netCDF file
as opposed to outputing data from each processor individually. I know that
many GCMs use this method.

I am also aware (not used) parallel-netCDF, but am not inclined in doing so
unless there is no other option.

Can someone tell be the basic steps for outputing multi-processor data onto
a single netCDF file? Or maybe point me to some links so that i can adopt
the program for my use?

Thanks,
Samrat.
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
Rob Latham
2014-10-01 15:24:03 UTC
Permalink
Post by Samrat Rao
Hi,
I plan to use netCDF-4 format for the output of my CFD code written in
Fortran 90. The code is written for parallel computation.
I know how to read and write netCDF files from a single processor.
I would like to output the data from my CFD code into a single netCDF
file as opposed to outputing data from each processor individually. I
know that many GCMs use this method.
I am also aware (not used) parallel-netCDF, but am not inclined in doing
so unless there is no other option.
Can someone tell be the basic steps for outputing multi-processor data
onto a single netCDF file? Or maybe point me to some links so that i can
adopt the program for my use?
Well, even though you are down on parallel-netCDF, you have not hurt my
feelings. I would suggest you take a look at pnetcdf's quick tutorial
-- not for the API calls, but for the discussion of data decomposition.

http://trac.mcs.anl.gov/projects/parallel-netcdf/wiki/QuickTutorial

in particular the "real parallel I/O on shared files".

==rob
Post by Samrat Rao
Thanks,
Samrat.
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
_______________________________________________
netcdfgroup mailing list
For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
Rob Latham
2014-10-02 14:22:14 UTC
Permalink
Thanks for your replies.
I estimate that i will be requiring approx 4000 processors and a total
grid resolution of 2.5 billion for my F90 code. So i need to
think/understand which is better - parallel netCDF or the 'normal' one.
There are a few specific nifty features in pnetcdf that can let you get
really good performance, but 'normal' netCDF is a fine choice, too.
Right now I do not know how to use parallel-netCDF.
It's almost as simple as replacing every 'nf' call with 'nfmpi' but you
will be just fine if you stick with UCAR netCDF-4
Secondly, i hope that the netCDF-4 files created by either parallel
netCDF or the 'normal' one are mutually compatible. For analysis I will
be extracting data using the usual netCDF library, so in case i use
parallel-netCDF then there should be no inter-compatibility issues.
For truly large variables, parallel-netcdf introduced, with some
consultation from the UCAR folks, a 'CDF-5' file format. You have to
request it explicitly, and then in that one case you would have a
pnetcdf file that netcdf tools would not understand.

In all other cases, we work hard to keep pnetcdf and "classic" netcdf
compatible. UCAR NetCDF has the option for an HDF5-based backend -- and
in fact it's not an option if you want parallel I/O with NetCDF-4 -- is
not compatible with parallel-netcdf. By now, your analysis tools surely
are updated to understand the new HDF5-based backend?

I suppose it's possible you've got some 6 year old analysis tool that
does not understand NetCDF-4's HDF5-based file format. Parallel-netcdf
would allow you to simulate with parallel i/o and produce a classic
netCDF file. But I would be shocked and a little bit angry if that was
actually a good reason to use parallel-netcdf in 2014.


==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
Samrat Rao
2014-10-17 12:33:04 UTC
Permalink
Hi,

I'm sorry for the late reply.

I have no classic/netcdf-3 datasets --- datasets are to be generated. All
my codes are also new.

Initially i tried with pnetcdf, wrote a few variables, but found that the
format was CDF-5 which 'normal' netcdf would not read.

I also need to read some bits of netcdf data in Matlab, so i thought of
sticking to the usual netcdf-4 compiled for parallel io. It is also likely
that i will have to share my workload with others in my group and/or leave
the code for future people to work on.

Does matlab read cdf-5 files?

So i preferred the usual netcdf. Rob, i hope you are not annoyed.

But most of the above is for another day. Currently i am stuck elsewhere.

With a less no of processors, 216, the single netcdf file gets created (i
create i single netcdf file for each variable), but for anything above that
i get these errors:
NetCDF: Bad chunk sizes.
Not sure where these errors come from.

Then i shifted to dumping outputs from each processor in simple binary ---
this works till about 1500 processors. Above this number the code gets
stuck and eventually aborts.

This issue is not new. My colleague too had problems with running his code
on 1500+ procs.

Today i came to know that opening a large number of files (each proc writes
1 file) can overwhelm the system --- solving this requires more than
rudimentary techniques of writing --- or understanding the system's
inherent parameters/bottlenecks.

So netcdf is probably out of bounds for now --- will try again if the
simple binary write from each processor gets sorted out.

Does anyone have any suggestion?

Thanks,
Samrat.
Post by Rob Latham
Thanks for your replies.
I estimate that i will be requiring approx 4000 processors and a total
grid resolution of 2.5 billion for my F90 code. So i need to
think/understand which is better - parallel netCDF or the 'normal' one.
There are a few specific nifty features in pnetcdf that can let you get
really good performance, but 'normal' netCDF is a fine choice, too.
Right now I do not know how to use parallel-netCDF.
It's almost as simple as replacing every 'nf' call with 'nfmpi' but you
will be just fine if you stick with UCAR netCDF-4
Secondly, i hope that the netCDF-4 files created by either parallel
netCDF or the 'normal' one are mutually compatible. For analysis I will
be extracting data using the usual netCDF library, so in case i use
parallel-netCDF then there should be no inter-compatibility issues.
For truly large variables, parallel-netcdf introduced, with some
consultation from the UCAR folks, a 'CDF-5' file format. You have to
request it explicitly, and then in that one case you would have a pnetcdf
file that netcdf tools would not understand.
In all other cases, we work hard to keep pnetcdf and "classic" netcdf
compatible. UCAR NetCDF has the option for an HDF5-based backend -- and in
fact it's not an option if you want parallel I/O with NetCDF-4 -- is not
compatible with parallel-netcdf. By now, your analysis tools surely are
updated to understand the new HDF5-based backend?
I suppose it's possible you've got some 6 year old analysis tool that does
not understand NetCDF-4's HDF5-based file format. Parallel-netcdf would
allow you to simulate with parallel i/o and produce a classic netCDF file.
But I would be shocked and a little bit angry if that was actually a good
reason to use parallel-netcdf in 2014.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
Rob Latham
2014-10-17 14:23:33 UTC
Permalink
Post by Samrat Rao
Initially i tried with pnetcdf, wrote a few variables, but found that
the format was CDF-5 which 'normal' netcdf would not read.
Yes, sorry about that. We're working on improving CDF-5 support, but
it's not quite there yet.
Post by Samrat Rao
I also need to read some bits of netcdf data in Matlab, so i thought of
sticking to the usual netcdf-4 compiled for parallel io. It is also
likely that i will have to share my workload with others in my group
and/or leave the code for future people to work on.
Does matlab read cdf-5 files?
So i preferred the usual netcdf. Rob, i hope you are not annoyed.
sure, no problem. you need to tackle the data decomposition problem
first anyway.
Post by Samrat Rao
But most of the above is for another day. Currently i am stuck elsewhere.
With a less no of processors, 216, the single netcdf file gets created
(i create i single netcdf file for each variable), but for anything
NetCDF: Bad chunk sizes.
Not sure where these errors come from.
that's an unusual error. If you simplify this a bit and have 216
processors writing a single variable, will you still get this error?
Post by Samrat Rao
Then i shifted to dumping outputs from each processor in simple binary
--- this works till about 1500 processors. Above this number the code
gets stuck and eventually aborts.
gets stuck how?
Post by Samrat Rao
This issue is not new. My colleague too had problems with running his
code on 1500+ procs.
Today i came to know that opening a large number of files (each proc
writes 1 file) can overwhelm the system --- solving this requires more
than rudimentary techniques of writing --- or understanding the system's
inherent parameters/bottlenecks.
well, it's not quite that simple. some poorly designed file systems
require one file per process. GPFS, for a counter-example, will spend
quite a long time on file creation if you try to create one file per
process.
Post by Samrat Rao
So netcdf is probably out of bounds for now --- will try again if the
simple binary write from each processor gets sorted out.
Does anyone have any suggestion?
yes, you need to re-think your data decomposition. Instead of one file
per process, partition your array. you can do this face-wise,
pencil-wise, subcube-wise... really depends on your science demands

(for example, a volume renderer will not do well with face-wise
decomposition: it requires a 3d region for the math to work out)

Also, ensure you are using MPI-IO, and collective MPI-IO at that.

==rob
Post by Samrat Rao
In
Thanks,
Samrat.
Thanks for your replies.
I estimate that i will be requiring approx 4000 processors and a total
grid resolution of 2.5 billion for my F90 code. So i need to
think/understand which is better - parallel netCDF or the 'normal' one.
There are a few specific nifty features in pnetcdf that can let you
get really good performance, but 'normal' netCDF is a fine choice, too.
Right now I do not know how to use parallel-netCDF.
It's almost as simple as replacing every 'nf' call with 'nfmpi' but
you will be just fine if you stick with UCAR netCDF-4
Secondly, i hope that the netCDF-4 files created by either parallel
netCDF or the 'normal' one are mutually compatible. For analysis I will
be extracting data using the usual netCDF library, so in case i use
parallel-netCDF then there should be no inter-compatibility issues.
For truly large variables, parallel-netcdf introduced, with some
consultation from the UCAR folks, a 'CDF-5' file format. You have
to request it explicitly, and then in that one case you would have a
pnetcdf file that netcdf tools would not understand.
In all other cases, we work hard to keep pnetcdf and "classic"
netcdf compatible. UCAR NetCDF has the option for an HDF5-based
backend -- and in fact it's not an option if you want parallel I/O
with NetCDF-4 -- is not compatible with parallel-netcdf. By now,
your analysis tools surely are updated to understand the new
HDF5-based backend?
I suppose it's possible you've got some 6 year old analysis tool
that does not understand NetCDF-4's HDF5-based file format.
Parallel-netcdf would allow you to simulate with parallel i/o and
produce a classic netCDF file. But I would be shocked and a little
bit angry if that was actually a good reason to use parallel-netcdf
in 2014.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
Ed Hartnett
2014-10-17 16:25:07 UTC
Permalink
Unless things have changed since my day, it is possible to read pnetcdf
files with the netCDF library. It must be built with --enable-pnetcdf and
with-pnetcdf=/some/location, IIRC.
Post by Samrat Rao
Hi,
I'm sorry for the late reply.
I have no classic/netcdf-3 datasets --- datasets are to be generated. All
my codes are also new.
Initially i tried with pnetcdf, wrote a few variables, but found that the
format was CDF-5 which 'normal' netcdf would not read.
I also need to read some bits of netcdf data in Matlab, so i thought of
sticking to the usual netcdf-4 compiled for parallel io. It is also likely
that i will have to share my workload with others in my group and/or leave
the code for future people to work on.
Does matlab read cdf-5 files?
So i preferred the usual netcdf. Rob, i hope you are not annoyed.
But most of the above is for another day. Currently i am stuck elsewhere.
With a less no of processors, 216, the single netcdf file gets created (i
create i single netcdf file for each variable), but for anything above that
NetCDF: Bad chunk sizes.
Not sure where these errors come from.
Then i shifted to dumping outputs from each processor in simple binary ---
this works till about 1500 processors. Above this number the code gets
stuck and eventually aborts.
This issue is not new. My colleague too had problems with running his code
on 1500+ procs.
Today i came to know that opening a large number of files (each proc
writes 1 file) can overwhelm the system --- solving this requires more than
rudimentary techniques of writing --- or understanding the system's
inherent parameters/bottlenecks.
So netcdf is probably out of bounds for now --- will try again if the
simple binary write from each processor gets sorted out.
Does anyone have any suggestion?
Thanks,
Samrat.
Post by Rob Latham
Thanks for your replies.
I estimate that i will be requiring approx 4000 processors and a total
grid resolution of 2.5 billion for my F90 code. So i need to
think/understand which is better - parallel netCDF or the 'normal' one.
There are a few specific nifty features in pnetcdf that can let you get
really good performance, but 'normal' netCDF is a fine choice, too.
Right now I do not know how to use parallel-netCDF.
It's almost as simple as replacing every 'nf' call with 'nfmpi' but you
will be just fine if you stick with UCAR netCDF-4
Secondly, i hope that the netCDF-4 files created by either parallel
netCDF or the 'normal' one are mutually compatible. For analysis I will
be extracting data using the usual netCDF library, so in case i use
parallel-netCDF then there should be no inter-compatibility issues.
For truly large variables, parallel-netcdf introduced, with some
consultation from the UCAR folks, a 'CDF-5' file format. You have to
request it explicitly, and then in that one case you would have a pnetcdf
file that netcdf tools would not understand.
In all other cases, we work hard to keep pnetcdf and "classic" netcdf
compatible. UCAR NetCDF has the option for an HDF5-based backend -- and in
fact it's not an option if you want parallel I/O with NetCDF-4 -- is not
compatible with parallel-netcdf. By now, your analysis tools surely are
updated to understand the new HDF5-based backend?
I suppose it's possible you've got some 6 year old analysis tool that
does not understand NetCDF-4's HDF5-based file format. Parallel-netcdf
would allow you to simulate with parallel i/o and produce a classic netCDF
file. But I would be shocked and a little bit angry if that was actually a
good reason to use parallel-netcdf in 2014.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
_______________________________________________
netcdfgroup mailing list
http://www.unidata.ucar.edu/mailing_lists/
Rob Latham
2014-10-17 16:30:03 UTC
Permalink
Post by Ed Hartnett
Unless things have changed since my day, it is possible to read pnetcdf
files with the netCDF library. It must be built with --enable-pnetcdf
and with-pnetcdf=/some/location, IIRC.
Ed!

In this case, Samrat Rao was using pnetcdf to create CDF-5 (giant
variable) formatted files. To refresh your memory, Argonne and
Northwestern developed this file format with UCARS's signoff, with the
understanding that we (ANL and NWU) would never expect UCAR to add
support for it unless we did the work. I took a stab at it a few years
back and Wei-keng is taking a second crack at it right now.

the classic file formats CDF-1 and CDF-2 are fully inter-operable
between pnetcdf and netcdf.
==rob
Post by Ed Hartnett
Hi,
I'm sorry for the late reply.
I have no classic/netcdf-3 datasets --- datasets are to be
generated. All my codes are also new.
Initially i tried with pnetcdf, wrote a few variables, but found
that the format was CDF-5 which 'normal' netcdf would not read.
I also need to read some bits of netcdf data in Matlab, so i thought
of sticking to the usual netcdf-4 compiled for parallel io. It is
also likely that i will have to share my workload with others in my
group and/or leave the code for future people to work on.
Does matlab read cdf-5 files?
So i preferred the usual netcdf. Rob, i hope you are not annoyed.
But most of the above is for another day. Currently i am stuck elsewhere.
With a less no of processors, 216, the single netcdf file gets
created (i create i single netcdf file for each variable), but for
NetCDF: Bad chunk sizes.
Not sure where these errors come from.
Then i shifted to dumping outputs from each processor in simple
binary --- this works till about 1500 processors. Above this number
the code gets stuck and eventually aborts.
This issue is not new. My colleague too had problems with running
his code on 1500+ procs.
Today i came to know that opening a large number of files (each proc
writes 1 file) can overwhelm the system --- solving this requires
more than rudimentary techniques of writing --- or understanding the
system's inherent parameters/bottlenecks.
So netcdf is probably out of bounds for now --- will try again if
the simple binary write from each processor gets sorted out.
Does anyone have any suggestion?
Thanks,
Samrat.
Thanks for your replies.
I estimate that i will be requiring approx 4000 processors
and a total
grid resolution of 2.5 billion for my F90 code. So i need to
think/understand which is better - parallel netCDF or the
'normal' one.
There are a few specific nifty features in pnetcdf that can let
you get really good performance, but 'normal' netCDF is a fine
choice, too.
Right now I do not know how to use parallel-netCDF.
It's almost as simple as replacing every 'nf' call with 'nfmpi'
but you will be just fine if you stick with UCAR netCDF-4
Secondly, i hope that the netCDF-4 files created by either parallel
netCDF or the 'normal' one are mutually compatible. For
analysis I will
be extracting data using the usual netCDF library, so in case i use
parallel-netCDF then there should be no inter-compatibility issues.
For truly large variables, parallel-netcdf introduced, with some
consultation from the UCAR folks, a 'CDF-5' file format. You
have to request it explicitly, and then in that one case you
would have a pnetcdf file that netcdf tools would not understand.
In all other cases, we work hard to keep pnetcdf and "classic"
netcdf compatible. UCAR NetCDF has the option for an HDF5-based
backend -- and in fact it's not an option if you want parallel
I/O with NetCDF-4 -- is not compatible with parallel-netcdf. By
now, your analysis tools surely are updated to understand the
new HDF5-based backend?
I suppose it's possible you've got some 6 year old analysis tool
that does not understand NetCDF-4's HDF5-based file format.
Parallel-netcdf would allow you to simulate with parallel i/o
and produce a classic netCDF file. But I would be shocked and a
little bit angry if that was actually a good reason to use
parallel-netcdf in 2014.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
_______________________________________________
netcdfgroup mailing list
http://www.unidata.ucar.edu/mailing_lists/
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
Samrat Rao
2014-10-18 09:39:05 UTC
Permalink
Hi Rob & Ed,

I think that the machine i am using is not that bad. It was commissioned in
'12. Some basic info:

Performance
360 TFLOPS Peak & 304 TFLOPS sustained on LINPACK
Hardware
HP blade system C7000 with BL460c Gen8 blades
1088 nodes with 300 GB disk/node (319 TB)
2,176 Intel Xeon E5 2670 processors@ 2.6 GHz
17,408 processor cores, 68 TB main memory
FDR Infiniband based fully non-blocking fat-tree topology
2 PB high performance storage with lustre parallel file system

----

Using netCDF configured for parallel applications, i did manage to write
data on a single netCDF file using 512 procs --- but this was when i
reduced the grid nodes per proc to about 20-30. When i made the grid nodes
to about 100 i got this error too:

NetCDF: HDF error

----

There is another issue i need to share --- while compiling netCDF4 for
parallel usage, i had encountered errors during 'make check' in these
files: run_par_test.sh, run_f77_par_test.sh and run_f90_par_test.sh

These were related to mpiexec commands --- mpd.hosts issue. These errors
did not occur when i compiled netcdf for parallel on my desktop.

----

Dumping outputs from each processor gave me these errors --- it is not
that all such errors appear together - they are a bit random.

proxy:0:***@cn0083] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:73): one of the processes terminated
badly; aborting
[proxy:0:***@cn0083] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
completion
[proxy:0:***@cn0083] HYD_pmci_wait_for_childs_completion
(./pm/pmiserv/pmip_utils.c:1476): bootstrap server returned error waiting
for completion
[proxy:0:***@cn0083] main (./pm/pmiserv/pmip.c:392): error waiting for event
children completion

[***@cn0002] control_cb (./pm/pmiserv/pmiserv_cb.c:674): assert
(!closed) failed
[***@cn0002] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[***@cn0002] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:388): error waiting for event
[***@cn0002] main (./ui/mpich/mpiexec.c:718): process manager error
waiting for completion

cn0137:b279:beba2700: 132021042 us(132021042 us!!!): ACCEPT_RTU: rcv ERR,
rcnt=0 op=1 <- 10.1.1.136
cn1068:48c5:4b280700: 132013538 us(132013538 us!!!): ACCEPT_RTU: rcv ERR,
rcnt=-1 op=1 <- 10.1.5.47
cn1075:dba3:f8d7700: 132099675 us(132099675 us!!!): CONN_REQUEST: SOCKOPT
ERR Connection refused -> 10.1.1.51 16193 - RETRYING... 5
cn1075:dba3:f8d7700: 132099826 us(151 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...4
cn1075:dba3:f8d7700: 132099942 us(116 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...3
cn1075:dba3:f8d7700: 132100049 us(107 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...2
cn1075:dba3:f8d7700: 132100155 us(106 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...1
cn1075:dba3:f8d7700: 132100172 us(17 us): dapl_evd_conn_cb() unknown event
0x0

----

Rob, I guess i will need to look into the io methods you listed.

Thanks for your time,
Samrat.
Post by Rob Latham
Post by Ed Hartnett
Unless things have changed since my day, it is possible to read pnetcdf
files with the netCDF library. It must be built with --enable-pnetcdf
and with-pnetcdf=/some/location, IIRC.
Ed!
In this case, Samrat Rao was using pnetcdf to create CDF-5 (giant
variable) formatted files. To refresh your memory, Argonne and
Northwestern developed this file format with UCARS's signoff, with the
understanding that we (ANL and NWU) would never expect UCAR to add support
for it unless we did the work. I took a stab at it a few years back and
Wei-keng is taking a second crack at it right now.
the classic file formats CDF-1 and CDF-2 are fully inter-operable between
pnetcdf and netcdf.
==rob
Post by Ed Hartnett
Hi,
I'm sorry for the late reply.
I have no classic/netcdf-3 datasets --- datasets are to be
generated. All my codes are also new.
Initially i tried with pnetcdf, wrote a few variables, but found
that the format was CDF-5 which 'normal' netcdf would not read.
I also need to read some bits of netcdf data in Matlab, so i thought
of sticking to the usual netcdf-4 compiled for parallel io. It is
also likely that i will have to share my workload with others in my
group and/or leave the code for future people to work on.
Does matlab read cdf-5 files?
So i preferred the usual netcdf. Rob, i hope you are not annoyed.
But most of the above is for another day. Currently i am stuck elsewhere.
With a less no of processors, 216, the single netcdf file gets
created (i create i single netcdf file for each variable), but for
NetCDF: Bad chunk sizes.
Not sure where these errors come from.
Then i shifted to dumping outputs from each processor in simple
binary --- this works till about 1500 processors. Above this number
the code gets stuck and eventually aborts.
This issue is not new. My colleague too had problems with running
his code on 1500+ procs.
Today i came to know that opening a large number of files (each proc
writes 1 file) can overwhelm the system --- solving this requires
more than rudimentary techniques of writing --- or understanding the
system's inherent parameters/bottlenecks.
So netcdf is probably out of bounds for now --- will try again if
the simple binary write from each processor gets sorted out.
Does anyone have any suggestion?
Thanks,
Samrat.
Thanks for your replies.
I estimate that i will be requiring approx 4000 processors
and a total
grid resolution of 2.5 billion for my F90 code. So i need to
think/understand which is better - parallel netCDF or the
'normal' one.
There are a few specific nifty features in pnetcdf that can let
you get really good performance, but 'normal' netCDF is a fine
choice, too.
Right now I do not know how to use parallel-netCDF.
It's almost as simple as replacing every 'nf' call with 'nfmpi'
but you will be just fine if you stick with UCAR netCDF-4
Secondly, i hope that the netCDF-4 files created by either
parallel
netCDF or the 'normal' one are mutually compatible. For
analysis I will
be extracting data using the usual netCDF library, so in
case i use
parallel-netCDF then there should be no inter-compatibility
issues.
For truly large variables, parallel-netcdf introduced, with some
consultation from the UCAR folks, a 'CDF-5' file format. You
have to request it explicitly, and then in that one case you
would have a pnetcdf file that netcdf tools would not understand.
In all other cases, we work hard to keep pnetcdf and "classic"
netcdf compatible. UCAR NetCDF has the option for an HDF5-based
backend -- and in fact it's not an option if you want parallel
I/O with NetCDF-4 -- is not compatible with parallel-netcdf. By
now, your analysis tools surely are updated to understand the
new HDF5-based backend?
I suppose it's possible you've got some 6 year old analysis tool
that does not understand NetCDF-4's HDF5-based file format.
Parallel-netcdf would allow you to simulate with parallel i/o
and produce a classic netCDF file. But I would be shocked and a
little bit angry if that was actually a good reason to use
parallel-netcdf in 2014.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
_______________________________________________
netcdfgroup mailing list
http://www.unidata.ucar.edu/mailing_lists/
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
Samrat Rao
2014-10-18 09:41:08 UTC
Permalink
Ed,

Even if netcdf-4 can access cdf-5 files, it will not help me unless matlab
can read it --- the other option is to read cdf-5 files using serial netcdf
and then converting it to a form that matlab can read.

In one of your replies here:
http://netcdf-group.1586084.n2.nabble.com/NetCDF-HDF-error-and-now-what-td6921602.html
you had said that beyond about 100 processors parallel io does not save
significant time when compared to sequential ie one master proc io. Is this
the case even today?

Am i better off trying to see if netCDF with sequential io makes life
simpler for me?
Post by Samrat Rao
Hi Rob & Ed,
I think that the machine i am using is not that bad. It was commissioned
Performance
360 TFLOPS Peak & 304 TFLOPS sustained on LINPACK
Hardware
HP blade system C7000 with BL460c Gen8 blades
1088 nodes with 300 GB disk/node (319 TB)
17,408 processor cores, 68 TB main memory
FDR Infiniband based fully non-blocking fat-tree topology
2 PB high performance storage with lustre parallel file system
----
Using netCDF configured for parallel applications, i did manage to write
data on a single netCDF file using 512 procs --- but this was when i
reduced the grid nodes per proc to about 20-30. When i made the grid nodes
NetCDF: HDF error
----
There is another issue i need to share --- while compiling netCDF4 for
parallel usage, i had encountered errors during 'make check' in these
files: run_par_test.sh, run_f77_par_test.sh and run_f90_par_test.sh
These were related to mpiexec commands --- mpd.hosts issue. These errors
did not occur when i compiled netcdf for parallel on my desktop.
----
Dumping outputs from each processor gave me these errors --- it is not
that all such errors appear together - they are a bit random.
(./tools/bootstrap/utils/bscu_wait.c:73): one of the processes terminated
badly; aborting
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
completion
(./pm/pmiserv/pmip_utils.c:1476): bootstrap server returned error waiting
for completion
event children completion
(!closed) failed
(./tools/demux/demux_poll.c:77): callback returned error status
(./pm/pmiserv/pmiserv_pmci.c:388): error waiting for event
waiting for completion
cn0137:b279:beba2700: 132021042 us(132021042 us!!!): ACCEPT_RTU: rcv ERR,
rcnt=0 op=1 <- 10.1.1.136
cn1068:48c5:4b280700: 132013538 us(132013538 us!!!): ACCEPT_RTU: rcv ERR,
rcnt=-1 op=1 <- 10.1.5.47
cn1075:dba3:f8d7700: 132099675 us(132099675 us!!!): CONN_REQUEST: SOCKOPT
ERR Connection refused -> 10.1.1.51 16193 - RETRYING... 5
cn1075:dba3:f8d7700: 132099826 us(151 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...4
cn1075:dba3:f8d7700: 132099942 us(116 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...3
cn1075:dba3:f8d7700: 132100049 us(107 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...2
cn1075:dba3:f8d7700: 132100155 us(106 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...1
cn1075:dba3:f8d7700: 132100172 us(17 us): dapl_evd_conn_cb() unknown event
0x0
----
Rob, I guess i will need to look into the io methods you listed.
Thanks for your time,
Samrat.
Post by Rob Latham
Post by Ed Hartnett
Unless things have changed since my day, it is possible to read pnetcdf
files with the netCDF library. It must be built with --enable-pnetcdf
and with-pnetcdf=/some/location, IIRC.
Ed!
In this case, Samrat Rao was using pnetcdf to create CDF-5 (giant
variable) formatted files. To refresh your memory, Argonne and
Northwestern developed this file format with UCARS's signoff, with the
understanding that we (ANL and NWU) would never expect UCAR to add support
for it unless we did the work. I took a stab at it a few years back and
Wei-keng is taking a second crack at it right now.
the classic file formats CDF-1 and CDF-2 are fully inter-operable between
pnetcdf and netcdf.
==rob
Post by Ed Hartnett
Hi,
I'm sorry for the late reply.
I have no classic/netcdf-3 datasets --- datasets are to be
generated. All my codes are also new.
Initially i tried with pnetcdf, wrote a few variables, but found
that the format was CDF-5 which 'normal' netcdf would not read.
I also need to read some bits of netcdf data in Matlab, so i thought
of sticking to the usual netcdf-4 compiled for parallel io. It is
also likely that i will have to share my workload with others in my
group and/or leave the code for future people to work on.
Does matlab read cdf-5 files?
So i preferred the usual netcdf. Rob, i hope you are not annoyed.
But most of the above is for another day. Currently i am stuck elsewhere.
With a less no of processors, 216, the single netcdf file gets
created (i create i single netcdf file for each variable), but for
NetCDF: Bad chunk sizes.
Not sure where these errors come from.
Then i shifted to dumping outputs from each processor in simple
binary --- this works till about 1500 processors. Above this number
the code gets stuck and eventually aborts.
This issue is not new. My colleague too had problems with running
his code on 1500+ procs.
Today i came to know that opening a large number of files (each proc
writes 1 file) can overwhelm the system --- solving this requires
more than rudimentary techniques of writing --- or understanding the
system's inherent parameters/bottlenecks.
So netcdf is probably out of bounds for now --- will try again if
the simple binary write from each processor gets sorted out.
Does anyone have any suggestion?
Thanks,
Samrat.
Thanks for your replies.
I estimate that i will be requiring approx 4000 processors
and a total
grid resolution of 2.5 billion for my F90 code. So i need to
think/understand which is better - parallel netCDF or the
'normal' one.
There are a few specific nifty features in pnetcdf that can let
you get really good performance, but 'normal' netCDF is a fine
choice, too.
Right now I do not know how to use parallel-netCDF.
It's almost as simple as replacing every 'nf' call with 'nfmpi'
but you will be just fine if you stick with UCAR netCDF-4
Secondly, i hope that the netCDF-4 files created by either
parallel
netCDF or the 'normal' one are mutually compatible. For
analysis I will
be extracting data using the usual netCDF library, so in
case i use
parallel-netCDF then there should be no inter-compatibility
issues.
For truly large variables, parallel-netcdf introduced, with some
consultation from the UCAR folks, a 'CDF-5' file format. You
have to request it explicitly, and then in that one case you
would have a pnetcdf file that netcdf tools would not understand.
In all other cases, we work hard to keep pnetcdf and "classic"
netcdf compatible. UCAR NetCDF has the option for an HDF5-based
backend -- and in fact it's not an option if you want parallel
I/O with NetCDF-4 -- is not compatible with parallel-netcdf. By
now, your analysis tools surely are updated to understand the
new HDF5-based backend?
I suppose it's possible you've got some 6 year old analysis tool
that does not understand NetCDF-4's HDF5-based file format.
Parallel-netcdf would allow you to simulate with parallel i/o
and produce a classic netCDF file. But I would be shocked and a
little bit angry if that was actually a good reason to use
parallel-netcdf in 2014.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
_______________________________________________
netcdfgroup mailing list
http://www.unidata.ucar.edu/mailing_lists/
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
Rob Latham
2014-10-18 18:05:20 UTC
Permalink
Post by Samrat Rao
Hi Rob & Ed,
I think that the machine i am using is not that bad. It was commissioned
Performance
360 TFLOPS Peak & 304 TFLOPS sustained on LINPACK
Hardware
HP blade system C7000 with BL460c Gen8 blades
1088 nodes with 300 GB disk/node (319 TB)
17,408 processor cores, 68 TB main memory
FDR Infiniband based fully non-blocking fat-tree topology
2 PB high performance storage with lustre parallel file system
OK,then let's work up the software stack.

You've got a lustre file system, so you're going to need a halfway
decent MPI-IO implementation. good news: OpenMPI, MPICH, and MVAPICH all
have good lustre drivers. Please ensure you are running something
close to the latest version. (sometimes we find users -- somehow --
running ten year old MPICH code)

You need a recent-ish HDF5 library to make full use of the MPI-IO library.

You need the very latest netcdf library for assorted bug fixes (and
compatibility with the latest HDF5 library)

Debugging this stack over the mailing list is a bit of a challenge.

==rob
Post by Samrat Rao
----
Using netCDF configured for parallel applications, i did manage to write
data on a single netCDF file using 512 procs --- but this was when i
reduced the grid nodes per proc to about 20-30. When i made the grid
NetCDF: HDF error
----
There is another issue i need to share --- while compiling netCDF4 for
parallel usage, i had encountered errors during 'make check' in these
files: run_par_test.sh, run_f77_par_test.sh and run_f90_par_test.sh
These were related to mpiexec commands --- mpd.hosts issue. These errors
did not occur when i compiled netcdf for parallel on my desktop.
----
Dumping outputs from each processor gave me these errors --- it is not
that all such errors appear together - they are a bit random.
(./tools/bootstrap/utils/bscu_wait.c:73): one of the processes
terminated badly; aborting
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
for completion
(./pm/pmiserv/pmip_utils.c:1476): bootstrap server returned error
waiting for completion
event children completion
(!closed) failed
(./tools/demux/demux_poll.c:77): callback returned error status
(./pm/pmiserv/pmiserv_pmci.c:388): error waiting for event
waiting for completion
cn0137:b279:beba2700: 132021042 us(132021042 us!!!): ACCEPT_RTU: rcv
ERR, rcnt=0 op=1 <- 10.1.1.136
cn1068:48c5:4b280700: 132013538 us(132013538 us!!!): ACCEPT_RTU: rcv
ERR, rcnt=-1 op=1 <- 10.1.5.47
SOCKOPT ERR Connection refused -> 10.1.1.51 16193 - RETRYING... 5
cn1075:dba3:f8d7700: 132099826 us(151 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...4
cn1075:dba3:f8d7700: 132099942 us(116 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...3
cn1075:dba3:f8d7700: 132100049 us(107 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...2
cn1075:dba3:f8d7700: 132100155 us(106 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...1
cn1075:dba3:f8d7700: 132100172 us(17 us): dapl_evd_conn_cb() unknown
event 0x0
----
Rob, I guess i will need to look into the io methods you listed.
Thanks for your time,
Samrat.
Unless things have changed since my day, it is possible to read pnetcdf
files with the netCDF library. It must be built with
--enable-pnetcdf
and with-pnetcdf=/some/location, IIRC.
Ed!
In this case, Samrat Rao was using pnetcdf to create CDF-5 (giant
variable) formatted files. To refresh your memory, Argonne and
Northwestern developed this file format with UCARS's signoff, with
the understanding that we (ANL and NWU) would never expect UCAR to
add support for it unless we did the work. I took a stab at it a
few years back and Wei-keng is taking a second crack at it right now.
the classic file formats CDF-1 and CDF-2 are fully inter-operable
between pnetcdf and netcdf.
==rob
On Fri, Oct 17, 2014 at 6:33 AM, Samrat Rao
Hi,
I'm sorry for the late reply.
I have no classic/netcdf-3 datasets --- datasets are to be
generated. All my codes are also new.
Initially i tried with pnetcdf, wrote a few variables, but found
that the format was CDF-5 which 'normal' netcdf would not read.
I also need to read some bits of netcdf data in Matlab, so i thought
of sticking to the usual netcdf-4 compiled for parallel io. It is
also likely that i will have to share my workload with others in my
group and/or leave the code for future people to work on.
Does matlab read cdf-5 files?
So i preferred the usual netcdf. Rob, i hope you are not annoyed.
But most of the above is for another day. Currently i am stuck
elsewhere.
With a less no of processors, 216, the single netcdf file gets
created (i create i single netcdf file for each variable), but for
NetCDF: Bad chunk sizes.
Not sure where these errors come from.
Then i shifted to dumping outputs from each processor in simple
binary --- this works till about 1500 processors. Above this number
the code gets stuck and eventually aborts.
This issue is not new. My colleague too had problems with running
his code on 1500+ procs.
Today i came to know that opening a large number of files (each proc
writes 1 file) can overwhelm the system --- solving this requires
more than rudimentary techniques of writing --- or understanding the
system's inherent parameters/bottlenecks.
So netcdf is probably out of bounds for now --- will try again if
the simple binary write from each processor gets sorted out.
Does anyone have any suggestion?
Thanks,
Samrat.
On Thu, Oct 2, 2014 at 7:52 PM, Rob Latham
Thanks for your replies.
I estimate that i will be requiring approx 4000 processors
and a total
grid resolution of 2.5 billion for my F90 code. So i need to
think/understand which is better - parallel netCDF or the
'normal' one.
There are a few specific nifty features in pnetcdf that can let
you get really good performance, but 'normal' netCDF is a fine
choice, too.
Right now I do not know how to use parallel-netCDF.
It's almost as simple as replacing every 'nf' call with 'nfmpi'
but you will be just fine if you stick with UCAR netCDF-4
Secondly, i hope that the netCDF-4 files created by either
parallel
netCDF or the 'normal' one are mutually compatible. For
analysis I will
be extracting data using the usual netCDF library, so in
case i use
parallel-netCDF then there should be no
inter-compatibility
issues.
For truly large variables, parallel-netcdf introduced, with some
consultation from the UCAR folks, a 'CDF-5' file format. You
have to request it explicitly, and then in that one case you
would have a pnetcdf file that netcdf tools would not understand.
In all other cases, we work hard to keep pnetcdf and "classic"
netcdf compatible. UCAR NetCDF has the option for an HDF5-based
backend -- and in fact it's not an option if you want parallel
I/O with NetCDF-4 -- is not compatible with
parallel-netcdf. By
now, your analysis tools surely are updated to understand the
new HDF5-based backend?
I suppose it's possible you've got some 6 year old analysis tool
that does not understand NetCDF-4's HDF5-based file format.
Parallel-netcdf would allow you to simulate with parallel i/o
and produce a classic netCDF file. But I would be shocked and a
little bit angry if that was actually a good reason to use
parallel-netcdf in 2014.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
_________________________________________________
netcdfgroup mailing list
http://www.unidata.ucar.edu/__mailing_lists/
<http://www.unidata.ucar.edu/mailing_lists/>
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
Ed Hartnett
2014-10-19 02:14:21 UTC
Permalink
WRT performance, there is a point at which you max out your file system.
You might have 10K processors, but you don't have 10K disk drives. So once
you reach the limits of your disk array and internal bandwidth, more
parallelization will not help your overall I/O performance.

Although there are limits to what may be achieved, it is still worth
achieving them, as this may provide an order of magnitude or more overall
performance improvement. But once you saturate your disk array, you will
not see further performance improvements when you add more processors.

Another reason to use the parallel I/O interfaces (either netCDF-4 with
parallel, or pnetcdf) is simplification of code. It is a lot easier to
write the parallel code using netCDF-4 or pnetcdf than to write the code
which collects data from all the processes and writes it to a file in a
serial way. By using the parallel interfaces, you get very simple, natural
code, where each process directly writes data without passing it to another
process.

With the parallel interfaces, you have to pay attention to collective vs.
independent in order to get good performance. See the docs for more.

Good luck!
Ed
Post by Rob Latham
Post by Samrat Rao
Hi Rob & Ed,
I think that the machine i am using is not that bad. It was commissioned
Performance
360 TFLOPS Peak & 304 TFLOPS sustained on LINPACK
Hardware
HP blade system C7000 with BL460c Gen8 blades
1088 nodes with 300 GB disk/node (319 TB)
17,408 processor cores, 68 TB main memory
FDR Infiniband based fully non-blocking fat-tree topology
2 PB high performance storage with lustre parallel file system
OK,then let's work up the software stack.
You've got a lustre file system, so you're going to need a halfway decent
MPI-IO implementation. good news: OpenMPI, MPICH, and MVAPICH all have
good lustre drivers. Please ensure you are running something close to the
latest version. (sometimes we find users -- somehow -- running ten year
old MPICH code)
You need a recent-ish HDF5 library to make full use of the MPI-IO library.
You need the very latest netcdf library for assorted bug fixes (and
compatibility with the latest HDF5 library)
Debugging this stack over the mailing list is a bit of a challenge.
==rob
Post by Samrat Rao
----
Using netCDF configured for parallel applications, i did manage to write
data on a single netCDF file using 512 procs --- but this was when i
reduced the grid nodes per proc to about 20-30. When i made the grid
NetCDF: HDF error
----
There is another issue i need to share --- while compiling netCDF4 for
parallel usage, i had encountered errors during 'make check' in these
files: run_par_test.sh, run_f77_par_test.sh and run_f90_par_test.sh
These were related to mpiexec commands --- mpd.hosts issue. These errors
did not occur when i compiled netcdf for parallel on my desktop.
----
Dumping outputs from each processor gave me these errors --- it is not
that all such errors appear together - they are a bit random.
(./tools/bootstrap/utils/bscu_wait.c:73): one of the processes
terminated badly; aborting
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
for completion
(./pm/pmiserv/pmip_utils.c:1476): bootstrap server returned error
waiting for completion
event children completion
(!closed) failed
(./tools/demux/demux_poll.c:77): callback returned error status
(./pm/pmiserv/pmiserv_pmci.c:388): error waiting for event
waiting for completion
cn0137:b279:beba2700: 132021042 us(132021042 us!!!): ACCEPT_RTU: rcv
ERR, rcnt=0 op=1 <- 10.1.1.136
cn1068:48c5:4b280700: 132013538 us(132013538 us!!!): ACCEPT_RTU: rcv
ERR, rcnt=-1 op=1 <- 10.1.5.47
SOCKOPT ERR Connection refused -> 10.1.1.51 16193 - RETRYING... 5
cn1075:dba3:f8d7700: 132099826 us(151 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...4
cn1075:dba3:f8d7700: 132099942 us(116 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...3
cn1075:dba3:f8d7700: 132100049 us(107 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...2
cn1075:dba3:f8d7700: 132100155 us(106 us): CONN_REQUEST: SOCKOPT ERR
Connection refused -> 10.1.1.51 16193 - RETRYING...1
cn1075:dba3:f8d7700: 132100172 us(17 us): dapl_evd_conn_cb() unknown
event 0x0
----
Rob, I guess i will need to look into the io methods you listed.
Thanks for your time,
Samrat.
Unless things have changed since my day, it is possible to read pnetcdf
files with the netCDF library. It must be built with
--enable-pnetcdf
and with-pnetcdf=/some/location, IIRC.
Ed!
In this case, Samrat Rao was using pnetcdf to create CDF-5 (giant
variable) formatted files. To refresh your memory, Argonne and
Northwestern developed this file format with UCARS's signoff, with
the understanding that we (ANL and NWU) would never expect UCAR to
add support for it unless we did the work. I took a stab at it a
few years back and Wei-keng is taking a second crack at it right now.
the classic file formats CDF-1 and CDF-2 are fully inter-operable
between pnetcdf and netcdf.
==rob
On Fri, Oct 17, 2014 at 6:33 AM, Samrat Rao
Hi,
I'm sorry for the late reply.
I have no classic/netcdf-3 datasets --- datasets are to be
generated. All my codes are also new.
Initially i tried with pnetcdf, wrote a few variables, but found
that the format was CDF-5 which 'normal' netcdf would not read.
I also need to read some bits of netcdf data in Matlab, so i thought
of sticking to the usual netcdf-4 compiled for parallel io. It is
also likely that i will have to share my workload with others in my
group and/or leave the code for future people to work on.
Does matlab read cdf-5 files?
So i preferred the usual netcdf. Rob, i hope you are not annoyed.
But most of the above is for another day. Currently i am stuck
elsewhere.
With a less no of processors, 216, the single netcdf file gets
created (i create i single netcdf file for each variable), but for
NetCDF: Bad chunk sizes.
Not sure where these errors come from.
Then i shifted to dumping outputs from each processor in simple
binary --- this works till about 1500 processors. Above this number
the code gets stuck and eventually aborts.
This issue is not new. My colleague too had problems with running
his code on 1500+ procs.
Today i came to know that opening a large number of files (each proc
writes 1 file) can overwhelm the system --- solving this requires
more than rudimentary techniques of writing --- or understanding the
system's inherent parameters/bottlenecks.
So netcdf is probably out of bounds for now --- will try again if
the simple binary write from each processor gets sorted out.
Does anyone have any suggestion?
Thanks,
Samrat.
On Thu, Oct 2, 2014 at 7:52 PM, Rob Latham
Thanks for your replies.
I estimate that i will be requiring approx 4000 processors
and a total
grid resolution of 2.5 billion for my F90 code. So i need to
think/understand which is better - parallel netCDF or the
'normal' one.
There are a few specific nifty features in pnetcdf that can let
you get really good performance, but 'normal' netCDF is a fine
choice, too.
Right now I do not know how to use parallel-netCDF.
It's almost as simple as replacing every 'nf' call with 'nfmpi'
but you will be just fine if you stick with UCAR netCDF-4
Secondly, i hope that the netCDF-4 files created by either
parallel
netCDF or the 'normal' one are mutually compatible. For
analysis I will
be extracting data using the usual netCDF library, so in
case i use
parallel-netCDF then there should be no
inter-compatibility
issues.
For truly large variables, parallel-netcdf introduced, with some
consultation from the UCAR folks, a 'CDF-5' file format. You
have to request it explicitly, and then in that one case you
would have a pnetcdf file that netcdf tools would not
understand.
In all other cases, we work hard to keep pnetcdf and "classic"
netcdf compatible. UCAR NetCDF has the option for an HDF5-based
backend -- and in fact it's not an option if you want parallel
I/O with NetCDF-4 -- is not compatible with
parallel-netcdf. By
now, your analysis tools surely are updated to understand the
new HDF5-based backend?
I suppose it's possible you've got some 6 year old analysis tool
that does not understand NetCDF-4's HDF5-based file format.
Parallel-netcdf would allow you to simulate with parallel i/o
and produce a classic netCDF file. But I would be shocked and a
little bit angry if that was actually a good reason to use
parallel-netcdf in 2014.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
_________________________________________________
netcdfgroup mailing list
http://www.unidata.ucar.edu/__mailing_lists/
<http://www.unidata.ucar.edu/mailing_lists/>
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
--
Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
Loading...