A 2nd test at trying to push the USB Host MSC has produced an interesting result -
A new version of ps also supplies a deeper view of the tasks and how much the stack has been used.
The end result was a series of hung processes - to be poetic a zombie undead process!
My take is that somewhere deep in the system (far deeper than I can go) occasionally the process isn't exiting correctly.
The good news for me is that the USB Host MSC has not failed in a bunch of system failures, zombie processes, Hard Faults, and that is what I need from the system.
Trying to be a good netizen (is that a nutter!) I've raised an issue against it in case anybody else is playing in this area.bitbucket.org/nuttx/nuttx/issues/11/hung-process
'ps' when called uses this as per this update.
https://bitbucket.org/nuttx/apps/commits/dd7c4c94935889fee77d79a188b62f11aa3fb0c6
To enable the stack (and heap) to be monitored with a watermark (the functionality changed about this time - but the Stack Coloration needs to be set now for its usage in ps, whenever its done)
From make menuconfigs
-->Build Setup-->Debug Options
[*] Stack coloration
[*] Heap coloration
Specifically Clicking on "Stack coloration" and then pressing 'h'
" Enable stack coloration to initialize the stack memory to the value
of STACK_COLOR and enable the stack checking APIs that can be used
to monitor the level of stack usage."
So pullin the latest changes, making a new build
$mknx.sh
$stwr
nsh> mount -t vfat /dev/sda /usb0
nsh>
Taking the one test script of the previous blog and using vi (cheat sheet for vi) I did the following
shuntils----
date >> /usb0/t/l_free
date >> /usb0/t/l_date
hello >> /usb0/t/l_free
free >> /usb0/t/l_free
ps >> /usb0/t/l_free
------
shspawn-----
set FSTOP /usb0/t/fstop
until [ -f $FSTOP ]
do
sh /usb0/t/shuntils &
sleep 1
done
--------
nsh> sh /usb0/t/shspawn &
# However after about 15 minutes it had failures. (see later) and seems to have left some zombie processes.
#The pid in this code is a signed integer, and wraps at 16xxxx
nsh>
#however also failed on
nsh> umount usb0
nsh: umount: umount failed: 16 "Device or resource busy"
It turns out this "Device or resource busy" also means files open on /usb0 so I'm guessing is likely to be derived from the hung processes - and while an indicator not the root cause.
It turns out this "Device or resource busy" also means files open on /usb0 so I'm guessing is likely to be derived from the hung processes - and while an indicator not the root cause.
nsh> uname -a
NuttX 17.6 nxd47wk_wkd47tst1_20160713_1444 arm stm32f4discovery
# off its starts testing with some manual 'ps'.
sh [121:100] #new spawn with PID
sh [124:100]
sh [127:100]
ps
PID PRI POLICY TYPE
NPX STATE EVENT SIGMASK
STACK USED FILLED COMMAND
0 0 FIFO
Kthread --- Ready
00000000 0 0
0.0% Idle Task
1 224 FIFO Kthread --- Waiting Signal
00000000 2028 292
14.3% hpwork
2 100 FIFO Task
--- Waiting Semaphore
00000000 1004 508
50.5% usbhost
3 100 FIFO Task
--- Running
00000000 4076 852
20.9% init
4 100 RR pthread --- Waiting Signal
00000000 2044 788
38.5% <pthread> 0x10002f80
nsh> sh [135:100]
sh [138:100]
sh [141:100]
sh [149:100]
sh [152:100]
ps
PID PRI POLICY TYPE
NPX STATE EVENT SIGMASK
STACK USED FILLED COMMAND
0 0 FIFO
Kthread --- Ready
00000000 0 0
0.0% Idle Task
1 224 FIFO Kthread --- Waiting Signal
00000000 2028 292
14.3% hpwork
2 100 FIFO Task
--- Waiting Semaphore
00000000 1004 508
50.5% usbhost
3 100 FIFO Task
--- Running
00000000 4076 852
20.9% init
4 100 RR pthread --- Waiting Signal
00000000 2044 788
38.5% <pthread> 0x10002f80
nsh> sh [155:100]
sh [158:100]
sh [166:100]
ps sh [169:100]
PID PRI POLICY TYPE
NPX STATE EVENT SIGMASK
STACK USED FILLED COMMAND
0 0 FIFO
Kthread --- Ready
00000000 0 0
0.0% Idle Task
1 224 FIFO Kthread --- Waiting Signal
00000000 2028 292
14.3% hpwork
2 100 FIFO Task
--- Waiting Semaphore
00000000 1004 508
50.5% usbhost
3 100 FIFO Task
--- Running 00000000 4076
852 20.9% init
4 100 RR pthread --- Waiting Signal
00000000 2044 788
38.5% <pthread> 0x10002f80
169 100 RR pthread --- Ready 00000000 2044
812 39.7% <pthread> 0x10004420
#NOTE: A spawned file snapshoted while running
nsh> sh [172:100]
sh [175:100]
sh [183:100]
sh [186:100]
# Hey Hey off for lunch while the system works
#start is- Jul 13 14:54:49 2016
#sadly on returning find that it failed again (with more of a log) 20minutes later at Jul 13 15:14:33 2016
# and the log shows ps: open failed: 24 "Too
many open files" from nuttx/include/errno.h:
Jul 13 15:14:32 2016
Hello, World!!
total used free largest
Mem: 188096 31392 156704 120368
PID PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 FIFO Kthread --- Ready 00000000 0 0 0.0% Idle Task
1 224 FIFO Kthread --- Waiting Signal 00000000 2028 292 14.3% hpwork
2 100 FIFO Task --- Waiting Semaphore 00000000 1004 508 50.5% usbhost
3 100 FIFO Task --- Waiting Semaphore 00000000 4076 852 20.9% init
4 100 RR pthread --- Waiting Signal 00000000 2044 788 38.5% <pthread> 0x10002f80
5065 100 RR pthread --- Running 00000000 2044 1060 51.8% <pthread> 0x10005310
859 100 RR pthread --- Waiting Semaphore 00000000 2044 900 44.0% <pthread> 0x10004420
Jul 13 15:14:33 2016
Hello, World!!
<<<<missing ps>>>>>
Jul 13 15:14:34 2016
Hello, World!!
<<<<<strange ps>>>>
total used free largest
Mem: 188096 36336 151760 120368
PID PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
nsh: ps: open failed: 24 "Too many open files"
0 nsh: ps: open failed: 24
0 0 0.0% nsh: ps: open failed: 24
Jul 13 15:14:35 2016
Hello, World!!
<<<<<strange ps>>>>
total used free largest
Mem: 188096 36336 151760 120368
PID PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
nsh: ps: open failed: 24
0 nsh: ps: open failed: 24
0 0 0.0% nsh: ps: open failed: 24
# so now stop the script from running by
nsh> echo stop > /usb0/t/fstop #pwd is usb/tst1
#Trying 'ps' there are a few strang processes still present, that don't look like they died.
nsh> ps
PID PRI POLICY TYPE
NPX STATE EVENT SIGMASK
STACK USED FILLED COMMAND
0 0 FIFO
Kthread --- Ready
00000000 0 0
0.0% Idle Task
1 224 FIFO Kthread --- Waiting Signal
00000000 2028 292
14.3% hpwork
2 100 FIFO Task
--- Waiting Semaphore
00000000 1004 508
50.5% usbhost
3 100 FIFO Task
--- Running
00000000 4076 972
23.8% init
859 100 RR pthread --- Waiting Semaphore 00000000 2044
900 44.0% <pthread> 0x10004420
5069 100 RR pthread --- Waiting Semaphore 00000000 2044
812 39.7% <pthread> 0x10005310
7614 100 RR pthread --- Waiting Semaphore 00000000 2044
812 39.7% <pthread> 0x10005f80
So the ps that failed while the script was running is that it has exceeded the CONFIG_NFILE_DESCRIPTORS allocation of files at that point in time which is 8.
Of course the release investigation is why it isn't cleaning up, which is probably related to not dispatching the process correctly - leaving the zombie process.
include/errno.h: 24 -> EMFILE
fs/vfs/fs_open.c:
191 /* Associate the inode with a file structure */
192
193 fd = files_allocate(inode, oflags, 0, 0);
194 if (fd < 0)
195 {
196 ret = EMFILE;
197 goto errout_with_inode;
198 }
The suggestion from Greg Nuttx was "you need to set CONFIG_NFILE_DESCRIPTORS larger."
However trying builds for 10, 14, 32 resulted in it running for a few seconds then a "Hard Fault Exception" - won't go there. This seems like another Nuttx brittleness.
The good news is that the file system is good and removable for inspection on Win10.
nsh> ls -ls usb0/tst1
/usb0/tst1:
-rw-rw-rw- 172 shuntil
-rw-rw-rw- 1123855
l_free
-rw-rw-rw- 136 shuntils
-rw-rw-rw- 79 shspawn
-rw-rw-rw- 32823 l_date
-rw-rw-rw- 6 fstop
So I'm learning that having two destination files l_date & l_free seems to cause a total increment in pid of 3. Possibly one for the new process "shuntils" and one for each file - but that's a guess.
Rerunning this test with an extra date to verify time taken
/usb0/t/shuntils----
date >> /usb0/t/l_date
hello >> /usb0/t/l_free
date >> /usb0/t/l_free
free >> /usb0/t/l_free
ps >> /usb0/t/l_free
date >> /usb0/t/l_free
------
/usb0/t/shspawn-----
set FSTOP /usb0/tst1/fstop
until [ -f $FSTOP ]
do
sh /usb0/t//shuntils
&
sleep 1
done
--------
nsh> mount -t vfat /dev/sda /usb0
nsh> sh /usb0/t/shspawn
&
For a 2sec interval - this is a lot slower - I ran it overnight and the pid wrapped at least twice
sh [32764:100]
sh [32767:100]
sh [7:100]
sh [10:100]
For some reason three was one zombie process pid=283, and it happened in the first pid 0-32767
sh [28060:100]
sh [28063:100]
ps
sh [28071:100]
PID PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 FIFO Kthread --- Ready 00000000 0 0 0.0% Idle Task
1 224 FIFO Kthread --- Waiting Signal 00000000 2028 292 14.3% hpwork
2 100 FIFO Task --- Waiting Semaphore 00000000 1004 508 50.5% usbhost
3 100 FIFO Task --- Running 00000000 4076 868 21.2% init
5 100 RR pthread --- Waiting Signal 00000000 2044 788 38.5% <pthread> 0x10002f80
28071 100 RR pthread --- Ready 00000000 2044 1100 53.8% <pthread> 0x10005330
283 100 RR pthread --- Waiting Semaphore 00000000 2044 812 39.7% <pthread> 0x10004420
nsh> sh [28074:100]
sh [28078:100]
sh [28086:100]
sh [28063:100]
ps
sh [28071:100]
PID PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 FIFO Kthread --- Ready 00000000 0 0 0.0% Idle Task
1 224 FIFO Kthread --- Waiting Signal 00000000 2028 292 14.3% hpwork
2 100 FIFO Task --- Waiting Semaphore 00000000 1004 508 50.5% usbhost
3 100 FIFO Task --- Running 00000000 4076 868 21.2% init
5 100 RR pthread --- Waiting Signal 00000000 2044 788 38.5% <pthread> 0x10002f80
28071 100 RR pthread --- Ready 00000000 2044 1100 53.8% <pthread> 0x10005330
283 100 RR pthread --- Waiting Semaphore 00000000 2044 812 39.7% <pthread> 0x10004420
nsh> sh [28074:100]
sh [28078:100]
sh [28086:100]
So stopping
echo stop >/usb0/t/fstop
nsh> cd
nsh> umount usb0
nsh: umount: umount failed: 16
# Failed again - probably due to pid=283 - but no way of listing it.
# Checking ps, the stack useage is very pleasing, and the stack sizes could be reduced if needed - but better to have them to large at this stage.
nsh> ps
PID PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 FIFO Kthread --- Ready 00000000 0 0 0.0% Idle Task
1 224 FIFO Kthread --- Waiting Signal 00000000 2028 292 14.3% hpwork
2 100 FIFO Task --- Waiting Semaphore 00000000 1004 508 50.5% usbhost
3 100 FIFO Task --- Running 00000000 4076 972 23.8% init
283 100 RR pthread --- Waiting Semaphore 00000000 2044 812 39.7% <pthread> 0x10004420