test161 Character Timeout

Hi,

After having used test161 a few times (rather than manually booting the kernel etc.), I attempted to run all ASST1 tests with test161 run asst1 earlier today. I was surprised to see cvt4 fail, as we had already gotten our solution to pass that test. However, after digging through the console output, I found that it had attempted to run cvt44 instead (note the extra 4).

There is a message from test161 mentioning the timeout before the failure. I have not been able to reproduce this problem so far. I have the relevant cvt4 output below, but I can provide the full console output if desired.

[synch/cvt4.t    ]      0.000000        0 serial
[synch/cvt4.t    ]      0.000000        1       emufs
[synch/cvt4.t    ]      0.000000        28      random seed=16338
[synch/cvt4.t    ]      0.000000        29      timer
[synch/cvt4.t    ]      0.000000        30      trace
[synch/cvt4.t    ]      0.000000        31      mainboard ramsize=1M cpus=1
[synch/cvt4.t    ]      0.000000        sys161: System/161 release 2.0.8, compiled Jan  9 2017 17:17:19
[synch/cvt4.t    ]      0.006289
[synch/cvt4.t    ]      0.006289        OS/161 base system version 2.0.2
[synch/cvt4.t    ]      0.006289        Copyright (c) 2000, 2001-2005, 2008-2011, 2013, 2014
[synch/cvt4.t    ]      0.006289           President and Fellows of Harvard College.  All rights reserved.
[synch/cvt4.t    ]      0.226752
[synch/cvt4.t    ]      0.226752        Put-your-group-name-here's system version 0 (ASST1 #9)
[synch/cvt4.t    ]      0.256814
[synch/cvt4.t    ]      0.256814        740k physical memory available
[synch/cvt4.t    ]      0.256814        Device probe...
[synch/cvt4.t    ]      0.256814        lamebus0 (system main bus)
[synch/cvt4.t    ]      0.256814        emu0 at lamebus0
[synch/cvt4.t    ]      0.256814        ltrace0 at lamebus0
[synch/cvt4.t    ]      0.256814        ltimer0 at lamebus0
[synch/cvt4.t    ]      0.256814        beep0 at ltimer0
[synch/cvt4.t    ]      0.256814        rtclock0 at ltimer0
[synch/cvt4.t    ]      0.256814        lrandom0 at lamebus0
[synch/cvt4.t    ]      0.256814        random0 at lrandom0
[synch/cvt4.t    ]      0.256814        lser0 at lamebus0
[synch/cvt4.t    ]      0.256814        con0 at lser0
[synch/cvt4.t    ]      0.256814
[synch/cvt4.t    ]      0.256814        cpu0: MIPS/161 (System/161 2.x) features 0x0
[synch/cvt4.t    ]      0.407492        1 CPU online
[synch/cvt4.t    ]      0.407492        OS/161 kernel [? for menu]:
test161: 2017/02/08 20:03:54 run.go:588: Test ID: 8fe7d3a3-6619-414a-8ebd-f3632415bb13  Character timeout in command line 'cvt4'
[synch/cvt4.t    ]      0.407492        cvt44
[synch/cvt4.t    ]      4.106879        cvt44: Command not found
[synch/cvt4.t    ]      4.167161        Menu command failed: Invalid argument
[synch/cvt4.t    ]      4.187262        OS/161 kernel [? for menu]:
[synch/cvt4.t    ]      4.197264        shutdown: panic expected
synch/cvt4.t                     incorrect                      ---          0/1

@shaseley: I guess this is still happening. It may be worth adding some logic there to make sure that test161 actually runs the command that it tried to.

Quick background: there is a small, known issue with feeding input commands to sys161 where it occasionally drop characters. For example, if you send cvt4, you might end up with ct4. Because of this, when test161 issues commands it does so one character at a time, waiting to see the character it sent in the output stream before sending the next character. There is a limit to how long it will wait, however, which is the character timeout that you experienced. If it does not see the character after 1 sec. of simulation time, it sends it again.

To your specific case: what happened is that it took more than 1 second of simulation time to register the 4 in cvt4. Assuming the character was dropped, test161 sent another 4, hence cvt44. Digging further, notice that it took about 4 seconds of simulation time to issue the cvt4 command (that’s a lot!). This leads me to believe it’s taking a lot of time to either register input or display the input as output on your system, which is not normal behavior. My suspicion is that you might be spinning somewhere, maybe in your lock implementation; I know the console uses locks for synchronization.

A few questions and notes:

  1. When this runs successfully, how long does it take to issue the command? This should be on the order of milliseconds, not seconds.
  2. Is your console laggy when you run the tests manually?
  3. You may want to stop by offices hours tomorrow and see if we can help you debug. I may have someone take a look at your lock implementation.

@gwa thoughts?

Agreed. This does seem suspicious. The character timeouts are quite long, precisely to work around this issue. I’d suspect that @shaseley is right: something’s fishy with your locks or with another primitive.

Thanks for the explanation.

In my latest test (extracted from running the full ASST1 test suite again):

...
[synch/cvt4.t    ]	0.404894	cvt4
[synch/cvt4.t    ]	1.208677	Starting cvt4...
...

In general, things are (fairly?) fast, though there are one or two places where it might stop for brief periods of time (probably not the same places, and probably expected(?)).

Having tried a few other times (without re-compiling), I still haven’t been able to reproduce it. Could the additional time potentially be a consequence of other tests being run concurrently (since, again it was doing the entire ASST 1 test suite)?

Here’s a close-up view of all the output within that region, if it helps.

[synch/cvt4.t    ]      0.407492        1 CPU online
[synch/cvt3.t    ]      0.376944        1 CPU online
[synchprobs/sp1.t]      0.266726        ltimer0 at lamebus0
[synch/cvt4.t    ]      0.407492        OS/161 kernel [? for menu]:
[synch/cvt3.t    ]      0.376944        OS/161 kernel [? for menu]:
[synchprobs/sp1.t]      0.266726        beep0 at ltimer0
[synchprobs/sp1.t]      0.326995        rtclock0 at ltimer0
[synchprobs/sp1.t]      0.326995        lrandom0 at lamebus0
[synchprobs/sp1.t]      0.326995        random0 at lrandom0
[synchprobs/sp1.t]      0.326995        lser0 at lamebus0
[synchprobs/sp1.t]      0.326995        con0 at lser0
[synchprobs/sp1.t]      0.326995
[synchprobs/sp1.t]      0.326995        cpu0: MIPS/161 (System/161 2.x) features 0x0
[synchprobs/sp1.t]      0.326995        8 CPUs online
[synchprobs/sp1.t]      0.326995        OS/161 kernel [? for menu]:
[synch/cvt2.t    ]      0.006289           President and Fellows of Harvard College.  All rights reserved.
[synch/cvt2.t    ]      0.006289
[synch/cvt2.t    ]      0.006289        Put-your-group-name-here's system version 0 (ASST1 #9)
[synch/cvt2.t    ]      0.006289
[synch/cvt2.t    ]      0.006289        740k physical memory available
[synch/cvt2.t    ]      0.006289        Device probe...
[synch/cvt1.t    ]      0.006289           President and Fellows of Harvard College.  All rights reserved.
[synch/cvt1.t    ]      0.006289
[synch/cvt1.t    ]      0.006289        Put-your-group-name-here's system version 0 (ASST1 #9)
[synch/cvt5.t    ]      0.006289           President and Fellows of Harvard College.  All rights reserved.
[synch/cvt5.t    ]      0.006289
[synch/cvt5.t    ]      0.006289        Put-your-group-name-here's system version 0 (ASST1 #9)
[synch/cvt5.t    ]      0.006289
[synch/cvt5.t    ]      0.006289        740k physical memory available
[synch/cvt5.t    ]      0.006289        Device probe...
[synch/cvt1.t    ]      0.006289
[synch/cvt1.t    ]      0.006289        740k physical memory available
test161: 2017/02/08 20:03:54 run.go:588: Test ID: 8fe7d3a3-6619-414a-8ebd-f3632415bb13  Character timeout in command line 'cvt4'
[synch/cvt3.t    ]      0.376944        cvt3
[synch/cvt4.t    ]      0.407492        cvt44
[synchprobs/sp1.t]      1.834524        khu
[synchprobs/sp1.t]      2.226219
[synch/cvt1.t    ]      0.006289        Device probe...
[synch/cvt3.t    ]      3.190941        Starting cvt3...
[synch/cvt4.t    ]      4.106879        cvt44: Command not found
[synchprobs/sp1.t]      2.226219        khu: 6624
[synch/cvt3.t    ]      3.201026        (This test panics on success!)
[synch/cvt3.t    ]      3.211085
[synch/cvt3.t    ]      3.211085        cvt3: Should panic...
[synch/cvt4.t    ]      4.167161        Menu command failed: Invalid argument
[synch/cvt5.t    ]      0.006289        lamebus0 (system main bus)
[synchprobs/sp1.t]      4.196090        Operation took 0.007942200 seconds
[synch/cvt2.t    ]      0.277077        lamebus0 (system main bus)
[synch/cvt4.t    ]      4.187262        OS/161 kernel [? for menu]:
[synch/cvt4.t    ]      4.197264        shutdown: panic expected

No: the character timeout is in simulation time, not wall-clock time, and the simulations are isolated – each test spawns a different instance of sys161. Using simulation time avoids issues where your machine may be overloaded.

You can also run cvt4 by itself to isolate its output: test161 run -no-dependencies synch/cvt4.t. (see test161 help for more command line options).

Okay, thanks for the clarification.

From the output of test161 run -no-dependencies synch/cvt4.t, I just got the following snippet (and it ran pretty much instantaneously):

[synch/cvt4.t]  0.517512        sys161: Elapsed real time: 0.123429 seconds (168                                                                                                                                  .069 mhz)
[synch/cvt4.t]  0.517512        sys161: Elapsed virtual time: 0.522035003 second                                                                                                                                  s (25 mhz)

The suspicious part in your first part was the huge gap in time here.

By the looks of it, there doesn’t appear to be a large gap in this run that you posted a part of. And, yes, this test should just pretty much run and exit (the test panics immediately). For reference, I just ran this test on our asst1 solution 1000 times. The median difference between the line cvt4 (sending command) and Starting cvt4... (start executing) is 0.04 seconds (sim time). There are some cases where it takes longer, up to 0.6 seconds. I’d suggest running this several times and seeing if you notice any gaps.

If my calculations are correct, I got a median of 0.05 seconds, with an average of 0.10 seconds, for 1000 runs. The maximum time I got for that interval (sending the command to starting execution) was 0.70 seconds.

Perhaps that just happened to be an edge case?

Hmm, that’s in line with what I got. Since you can’t reproduce it, it’s very possible that it was an anomaly. Feel free to stop by the lab and we can poke around at a few things if you have time tomorrow or Friday.

Will do, thanks again!