Cách giải quyết lỗi khi training model

deep-learning

#1

Trong quá trình training mạng này, theo hướng dẫn dưới đây: https://github.com/pierluigiferrari/ssd_keras/blob/master/ssd300_training.ipynb

Em train trên GPU: thì xuất hiện lỗi như thế này: Mọi người hướng dẫn cho em cách nhìn vào đâu để search lỗi với, em có search key word “val-loss” thì không tìm được:

Em copy toàn bộ đoạn lỗi, nếu có hơi dài thì mọi người thông cảm giúp em ạ.

Using TensorFlow backend.
Loading labels: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 11540/11540 [00:05<00:00, 1951.90it/s]
Loading image IDs: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 11540/11540 [00:01<00:00, 7824.63it/s]
Loading evaluation-neutrality annotations: 100%|███████████████████████████████████████████████████████████████████████████| 11540/11540 [00:02<00:00, 5742.43it/s]
Loading labels: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:02<00:00, 2023.48it/s]
Loading image IDs: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:00<00:00, 7987.53it/s]
Loading evaluation-neutrality annotations: 100%|█████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:00<00:00, 5703.35it/s]
Processing image set 'trainval.txt': 100%|██████████████████████████████████████████████████████████████████████████████████| 11540/11540 [01:04<00:00, 178.13it/s]
Processing image set 'test.txt': 100%|████████████████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:13<00:00, 372.27it/s]
Loading labels: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 11540/11540 [00:03<00:00, 3452.84it/s]
Loading image IDs: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 11540/11540 [00:01<00:00, 8095.24it/s]
Loading evaluation-neutrality annotations: 100%|███████████████████████████████████████████████████████████████████████████| 11540/11540 [00:02<00:00, 5746.94it/s]
Loading labels: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:01<00:00, 3453.67it/s]
Loading image IDs: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:00<00:00, 8171.77it/s]
Loading evaluation-neutrality annotations: 100%|█████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:00<00:00, 5714.35it/s]
Number of images in the training dataset:	 11540
Number of images in the validation dataset:	  4952
Epoch 1/2
2019-04-05 17:33:44.982750: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-05 17:33:46.671690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2019-04-05 17:33:46.923581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:02:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2019-04-05 17:33:46.925992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1
2019-04-05 17:33:48.413691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-05 17:33:48.413921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 1 
2019-04-05 17:33:48.414019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N Y 
2019-04-05 17:33:48.414125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1:   Y N 
2019-04-05 17:33:48.416846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10407 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-04-05 17:33:48.599154: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10405 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)

Epoch 00001: LearningRateScheduler setting learning rate to 0.001.
2019-04-05 17:33:52.899066: W tensorflow/core/grappler/optimizers/arithmetic_optimizer.cc:1441] Failed to build SimpleGraphView.
2019-04-05 17:33:52.964235: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:586] Iteration = 0, topological sort failed with message: Non-existent input ^ConstantFoldingCtrl/loss/predictions_loss/cond/zeros/Less/Switch_1 for node ConstantFolding/loss/predictions_loss/cond/zeros/packed_const_axis
2019-04-05 17:33:53.017447: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:586] Iteration = 1, topological sort failed with message: Non-existent input ^ConstantFoldingCtrl/loss/predictions_loss/cond/zeros/Less/Switch_1 for node ConstantFolding/loss/predictions_loss/cond/zeros/packed_const_axis
 4/10 [===========>..................] - ETA: 31s - loss: nan           Batch 3: Invalid loss, terminating training
Traceback (most recent call last):
  File "ssd300_training.py", line 408, in <module>
    initial_epoch=initial_epoch)
  File "/home/manhdo/.conda/envs/SSD_Run/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/manhdo/.conda/envs/SSD_Run/lib/python3.6/site-packages/keras/engine/training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "/home/manhdo/.conda/envs/SSD_Run/lib/python3.6/site-packages/keras/engine/training_generator.py", line 251, in fit_generator
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/manhdo/.conda/envs/SSD_Run/lib/python3.6/site-packages/keras/callbacks.py", line 79, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/manhdo/.conda/envs/SSD_Run/lib/python3.6/site-packages/keras/callbacks.py", line 429, in on_epoch_end
    filepath = self.filepath.format(epoch=epoch + 1, **logs)
KeyError: 'val_loss'