Discussion:
Missing data in netcdf4 using pnetcdf
Ted Mansell
2014-10-20 22:21:07 UTC
Permalink
Howdy,

The recent discussion has prompted me to see if anybody has any idea what could be going on with my own attempt to use pnetcdf from within netcdf4. Since I already had parallel IO working (via HDF5), I thought I would try it with pnetcdf. It seemed that the main change needed was to change the mode to use nf90_PNETCDF instead of NF90_NETCDF4, and everything else should just work. (I also turned off the chunking calls, which probably would have thrown an error). I am also specifying NF90_MPIIO and NF90_COLLECTIVE.

Well, the resulting file has inconsistent missing data. The 3d data are tiled into two domains along the y axis (similar results for tiling along x). The data that do get written appear to be correct, so it seems that the writes are simply incomplete. The left panels are using netcdf4/hdf5 parallel for comparison. I modified nc4_pnc_put_vara.f for F90 calls and a 3d array, but can't get that to fail, so I'm not ready to blame the file system (plain OS X 10.9). The missing data sometimes have a pattern, but one time level out of 5 only had missing data at the highest z level.


x-y cross-section (x=left-right)

Left: netcdf4/HDF5, Right: netcdf4/pnetcdf


y-z cross-section (y=left-right)

Left: netcdf4/HDF5, Right: netcdf4/pnetcdf


These are the versions of things I'm using:

netcdf-c-4.3.1.1
hdf5 1.8.9
netcdf-fortran-4.4.0
parallel-netcdf-1.4.1

Any ideas? I suppose there must be something in my (somewhat elaborate) code that doesn't cause issues for HDF5 but pnetcdf doesn't tolerate? Probably something "obvious"!

Best regards,

-- Ted

__________________________________________________________
| Edward Mansell <***@noaa.gov>
| National Severe Storms Laboratory
| 120 David L. Boren Blvd.
| Room 4354
| Norman, OK 73072
| Phone: (405) 325-6177
| FAX: (405) 325-2316
|
|----------------------------------------------------------------------------
|
| "The contents of this message are mine personally and
| do not reflect any position of the U.S. Government or NOAA."
|
|----------------------------------------------------------------------------
Wei-keng Liao
2014-10-21 02:15:57 UTC
Permalink
Hi, Ted,

Both netCDF and PnetCDF are not small libraries, so it can be hard
to tell where might cause the problem. One thing you can do is to check all
returned errors against NC_NOERR. If you have already done so, I wonder
if you are OK for me to test your program. Since you tested it on a Mac,
I assume it is fairly small.

The test file nc4_pnc_put_vara.f, is it the one from PnetCDF web page?

Wei-keng
Post by Ted Mansell
Howdy,
The recent discussion has prompted me to see if anybody has any idea what could be going on with my own attempt to use pnetcdf from within netcdf4. Since I already had parallel IO working (via HDF5), I thought I would try it with pnetcdf. It seemed that the main change needed was to change the mode to use nf90_PNETCDF instead of NF90_NETCDF4, and everything else should just work. (I also turned off the chunking calls, which probably would have thrown an error). I am also specifying NF90_MPIIO and NF90_COLLECTIVE.
Well, the resulting file has inconsistent missing data. The 3d data are tiled into two domains along the y axis (similar results for tiling along x). The data that do get written appear to be correct, so it seems that the writes are simply incomplete. The left panels are using netcdf4/hdf5 parallel for comparison. I modified nc4_pnc_put_vara.f for F90 calls and a 3d array, but can't get that to fail, so I'm not ready to blame the file system (plain OS X 10.9). The missing data sometimes have a pattern, but one time level out of 5 only had missing data at the highest z level.
x-y cross-section (x=left-right)
<PastedGraphic-3.tiff>
Left: netcdf4/HDF5, Right: netcdf4/pnetcdf
y-z cross-section (y=left-right)
<PastedGraphic-2.tiff>
Left: netcdf4/HDF5, Right: netcdf4/pnetcdf
netcdf-c-4.3.1.1
hdf5 1.8.9
netcdf-fortran-4.4.0
parallel-netcdf-1.4.1
Any ideas? I suppose there must be something in my (somewhat elaborate) code that doesn't cause issues for HDF5 but pnetcdf doesn't tolerate? Probably something "obvious"!
Best regards,
-- Ted
__________________________________________________________
| National Severe Storms Laboratory
| 120 David L. Boren Blvd.
| Room 4354
| Norman, OK 73072
| Phone: (405) 325-6177
| FAX: (405) 325-2316
|
|----------------------------------------------------------------------------
|
| "The contents of this message are mine personally and
| do not reflect any position of the U.S. Government or NOAA."
|
|----------------------------------------------------------------------------
_______________________________________________
netcdfgroup mailing list
For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/
Ted Mansell
2014-10-23 15:19:20 UTC
Permalink
Many thanks to Wei-keng Liao for finding the cause of the missing data. There is a bug in dfile.c where netcdf does not recognize NC_PNETCDF in the cmode when opening a file for writing. (In 4.3.1.1, it is at line 1675 in dfile.c) The test code with netcdf-fortran works because it writes data after the enddef, but before closing the file.

- Ted
I think I found the cause !
One question first. Does your code first creates a new nc file, defines
dimensions, variables, put attributes, close the file without calling
any put_var? And then reopen the file and call many get_var and put_var?
Yes, it does.
If this is the case, then the cause is due to a bug in netCDF I reported
to netCDF team recently. See https://bugtracking.unidata.ucar.edu/browse/NCF-319
A simple fix is to add the 3 lines (1712-1715 below) after line 1711 of file libdispatch/dfile.c.
1710 else if(cmode & NC_NETCDF4) model |= NC_DISPATCH_NC4;
1711 }
1712 else if (model == NC_DISPATCH_NC3) {
1713 /* if file is a CDF-1 or CDF-2, and PnetCDF method is selected */
1714 if (cmode & NC_PNETCDF) model = NC_DISPATCH_NC5;
1715 }
Without this fix, netCDF will fail to call PnetCDF APIs even if NF90_PNETCDF
open mode is used. It will use POSIX read/write instead. Please re-build your
netCDF and let me know if it works.
Wei-keng
It works!!!

Loading...