• 教程 >
  • 使用 Ray Tune 进行超参数调优
快捷方式

使用 Ray Tune 进行超参数调优

超参数调优可以决定模型的性能是平庸还是高度精确。通常,一些简单的事情,例如选择不同的学习率或更改网络层大小,可能会对模型性能产生重大影响。

幸运的是,有一些工具可以帮助找到最佳参数组合。 Ray Tune 是一个用于分布式超参数调优的行业标准工具。Ray Tune 包含最新的超参数搜索算法,与 TensorBoard 和其他分析库集成,并通过 Ray 的分布式机器学习引擎 本地支持分布式训练。

在本教程中,我们将向您展示如何将 Ray Tune 集成到您的 PyTorch 训练工作流程中。我们将扩展 PyTorch 文档中的本教程 以训练 CIFAR10 图像分类器。

正如您将看到,我们只需要进行一些细微的修改。特别是,我们需要

  1. 将数据加载和训练封装在函数中,

  2. 使某些网络参数可配置,

  3. 添加检查点(可选),

  4. 并定义模型调优的搜索空间


要运行本教程,请确保安装了以下软件包

  • ray[tune]: 分布式超参数调优库

  • torchvision: 用于数据转换器

设置/导入

让我们从导入开始

from functools import partial
import os
import tempfile
from pathlib import Path
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import random_split
import torchvision
import torchvision.transforms as transforms
from ray import tune
from ray import train
from ray.train import Checkpoint, get_checkpoint
from ray.tune.schedulers import ASHAScheduler
import ray.cloudpickle as pickle

大多数导入是构建 PyTorch 模型所必需的。只有最后几个导入是用于 Ray Tune。

数据加载器

我们将数据加载器包装在它们自己的函数中,并传递一个全局数据目录。这样,我们就可以在不同的试验之间共享数据目录。

def load_data(data_dir="./data"):
    transform = transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
    )

    trainset = torchvision.datasets.CIFAR10(
        root=data_dir, train=True, download=True, transform=transform
    )

    testset = torchvision.datasets.CIFAR10(
        root=data_dir, train=False, download=True, transform=transform
    )

    return trainset, testset

可配置的神经网络

我们只能调整可配置的参数。在这个例子中,我们可以指定全连接层的层大小

class Net(nn.Module):
    def __init__(self, l1=120, l2=84):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, l1)
        self.fc2 = nn.Linear(l1, l2)
        self.fc3 = nn.Linear(l2, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)  # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

训练函数

现在变得有趣了,因为我们对来自 PyTorch 文档的示例进行了一些更改

我们将训练脚本包装在一个名为 train_cifar(config, data_dir=None) 的函数中。 config 参数将接收我们想要训练的超参数。 data_dir 指定了我们加载和存储数据的目录,以便多个运行可以共享相同的数据源。我们还会在运行开始时加载模型和优化器状态,如果提供了检查点。在本教程的后面部分,您将找到有关如何保存检查点及其用途的信息。

net = Net(config["l1"], config["l2"])

checkpoint = get_checkpoint()
if checkpoint:
    with checkpoint.as_directory() as checkpoint_dir:
        data_path = Path(checkpoint_dir) / "data.pkl"
        with open(data_path, "rb") as fp:
            checkpoint_state = pickle.load(fp)
        start_epoch = checkpoint_state["epoch"]
        net.load_state_dict(checkpoint_state["net_state_dict"])
        optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])
else:
    start_epoch = 0

优化器的学习率也变得可配置

optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

我们还将训练数据分成训练集和验证集。因此,我们在 80% 的数据上进行训练,并在剩余的 20% 上计算验证损失。我们迭代训练集和测试集的批次大小也是可配置的。

使用 DataParallel 添加 (多) GPU 支持

图像分类在很大程度上受益于 GPU。幸运的是,我们可以在 Ray Tune 中继续使用 PyTorch 的抽象。因此,我们可以将我们的模型包装在 nn.DataParallel 中以支持在多个 GPU 上进行数据并行训练

device = "cpu"
if torch.cuda.is_available():
    device = "cuda:0"
    if torch.cuda.device_count() > 1:
        net = nn.DataParallel(net)
net.to(device)

通过使用 device 变量,我们确保即使没有可用的 GPU,训练也能正常工作。PyTorch 要求我们显式地将数据发送到 GPU 内存,如下所示

for i, data in enumerate(trainloader, 0):
    inputs, labels = data
    inputs, labels = inputs.to(device), labels.to(device)

该代码现在支持在 CPU、单个 GPU 和多个 GPU 上进行训练。值得注意的是,Ray 还支持分数 GPU,因此我们可以将 GPU 分配给不同的试验,只要模型仍然适合 GPU 内存。我们稍后会回到这个问题。

与 Ray Tune 通信

最有趣的部分是与 Ray Tune 的通信

checkpoint_data = {
    "epoch": epoch,
    "net_state_dict": net.state_dict(),
    "optimizer_state_dict": optimizer.state_dict(),
}
with tempfile.TemporaryDirectory() as checkpoint_dir:
    data_path = Path(checkpoint_dir) / "data.pkl"
    with open(data_path, "wb") as fp:
        pickle.dump(checkpoint_data, fp)

    checkpoint = Checkpoint.from_directory(checkpoint_dir)
    train.report(
        {"loss": val_loss / val_steps, "accuracy": correct / total},
        checkpoint=checkpoint,
    )

在这里,我们首先保存一个检查点,然后将一些指标报告回 Ray Tune。具体来说,我们发送验证损失和准确率回 Ray Tune。然后,Ray Tune 可以使用这些指标来决定哪些超参数配置可以产生最佳结果。这些指标也可以用来尽早停止性能不佳的试验,以避免浪费这些试验的资源。

保存检查点是可选的,但如果我们想使用像基于种群的训练这样的高级调度器,则它是必需的。此外,通过保存检查点,我们以后可以加载训练好的模型并在测试集上对其进行验证。最后,保存检查点对于容错很有用,它允许我们中断训练并在以后继续训练。

完整训练函数

完整的代码示例如下所示

def train_cifar(config, data_dir=None):
    net = Net(config["l1"], config["l2"])

    device = "cpu"
    if torch.cuda.is_available():
        device = "cuda:0"
        if torch.cuda.device_count() > 1:
            net = nn.DataParallel(net)
    net.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

    checkpoint = get_checkpoint()
    if checkpoint:
        with checkpoint.as_directory() as checkpoint_dir:
            data_path = Path(checkpoint_dir) / "data.pkl"
            with open(data_path, "rb") as fp:
                checkpoint_state = pickle.load(fp)
            start_epoch = checkpoint_state["epoch"]
            net.load_state_dict(checkpoint_state["net_state_dict"])
            optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])
    else:
        start_epoch = 0

    trainset, testset = load_data(data_dir)

    test_abs = int(len(trainset) * 0.8)
    train_subset, val_subset = random_split(
        trainset, [test_abs, len(trainset) - test_abs]
    )

    trainloader = torch.utils.data.DataLoader(
        train_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8
    )
    valloader = torch.utils.data.DataLoader(
        val_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8
    )

    for epoch in range(start_epoch, 10):  # loop over the dataset multiple times
        running_loss = 0.0
        epoch_steps = 0
        for i, data in enumerate(trainloader, 0):
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()
            epoch_steps += 1
            if i % 2000 == 1999:  # print every 2000 mini-batches
                print(
                    "[%d, %5d] loss: %.3f"
                    % (epoch + 1, i + 1, running_loss / epoch_steps)
                )
                running_loss = 0.0

        # Validation loss
        val_loss = 0.0
        val_steps = 0
        total = 0
        correct = 0
        for i, data in enumerate(valloader, 0):
            with torch.no_grad():
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)

                outputs = net(inputs)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

                loss = criterion(outputs, labels)
                val_loss += loss.cpu().numpy()
                val_steps += 1

        checkpoint_data = {
            "epoch": epoch,
            "net_state_dict": net.state_dict(),
            "optimizer_state_dict": optimizer.state_dict(),
        }
        with tempfile.TemporaryDirectory() as checkpoint_dir:
            data_path = Path(checkpoint_dir) / "data.pkl"
            with open(data_path, "wb") as fp:
                pickle.dump(checkpoint_data, fp)

            checkpoint = Checkpoint.from_directory(checkpoint_dir)
            train.report(
                {"loss": val_loss / val_steps, "accuracy": correct / total},
                checkpoint=checkpoint,
            )

    print("Finished Training")

如您所见,大部分代码都直接改编自原始示例。

测试集准确率

通常,机器学习模型的性能是在一个保持不变的测试集上进行测试的,该测试集包含未用于训练模型的数据。我们也将其包装在一个函数中

def test_accuracy(net, device="cpu"):
    trainset, testset = load_data()

    testloader = torch.utils.data.DataLoader(
        testset, batch_size=4, shuffle=False, num_workers=2
    )

    correct = 0
    total = 0
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    return correct / total

该函数还期望一个 device 参数,因此我们可以在 GPU 上进行测试集验证。

配置搜索空间

最后,我们需要定义 Ray Tune 的搜索空间。以下是一个示例

config = {
    "l1": tune.choice([2 ** i for i in range(9)]),
    "l2": tune.choice([2 ** i for i in range(9)]),
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([2, 4, 8, 16])
}

tune.choice() 接受一个值列表,这些值将从列表中均匀采样。在这个示例中, l1l2 参数应该是 4 到 256 之间的 2 的幂,因此为 4、8、16、32、64、128 或 256。 lr(学习率)应该在 0.0001 到 0.1 之间均匀采样。最后,批次大小是在 2、4、8 和 16 中进行选择。

在每次试验中,Ray Tune 现在将从这些搜索空间中随机采样参数组合。然后,它将并行训练多个模型,并在这些模型中找到性能最佳的模型。我们还使用 ASHAScheduler,它将尽早终止性能不佳的试验。

我们将 train_cifar 函数包装在 functools.partial 中以设置常量 data_dir 参数。我们还可以告诉 Ray Tune 每个试验应该使用哪些资源

gpus_per_trial = 2
# ...
result = tune.run(
    partial(train_cifar, data_dir=data_dir),
    resources_per_trial={"cpu": 8, "gpu": gpus_per_trial},
    config=config,
    num_samples=num_samples,
    scheduler=scheduler,
    checkpoint_at_end=True)

您可以指定 CPU 的数量,然后就可以使用这些 CPU 例如来增加 PyTorch DataLoader 实例的 num_workers。选定的 GPU 数量将在每个试验中对 PyTorch 可见。试验无法访问未为其请求的 GPU——因此您无需担心两个试验使用同一组资源。

这里我们也可以指定分数 GPU,因此类似 gpus_per_trial=0.5 是完全有效的。然后,试验将共享彼此之间的 GPU。您只需确保模型仍然适合 GPU 内存即可。

训练完模型后,我们将找到性能最佳的模型,并从检查点文件中加载训练过的网络。然后,我们将获得测试集准确率,并通过打印报告所有内容。

完整的 main 函数如下所示

def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2):
    data_dir = os.path.abspath("./data")
    load_data(data_dir)
    config = {
        "l1": tune.choice([2**i for i in range(9)]),
        "l2": tune.choice([2**i for i in range(9)]),
        "lr": tune.loguniform(1e-4, 1e-1),
        "batch_size": tune.choice([2, 4, 8, 16]),
    }
    scheduler = ASHAScheduler(
        metric="loss",
        mode="min",
        max_t=max_num_epochs,
        grace_period=1,
        reduction_factor=2,
    )
    result = tune.run(
        partial(train_cifar, data_dir=data_dir),
        resources_per_trial={"cpu": 2, "gpu": gpus_per_trial},
        config=config,
        num_samples=num_samples,
        scheduler=scheduler,
    )

    best_trial = result.get_best_trial("loss", "min", "last")
    print(f"Best trial config: {best_trial.config}")
    print(f"Best trial final validation loss: {best_trial.last_result['loss']}")
    print(f"Best trial final validation accuracy: {best_trial.last_result['accuracy']}")

    best_trained_model = Net(best_trial.config["l1"], best_trial.config["l2"])
    device = "cpu"
    if torch.cuda.is_available():
        device = "cuda:0"
        if gpus_per_trial > 1:
            best_trained_model = nn.DataParallel(best_trained_model)
    best_trained_model.to(device)

    best_checkpoint = result.get_best_checkpoint(trial=best_trial, metric="accuracy", mode="max")
    with best_checkpoint.as_directory() as checkpoint_dir:
        data_path = Path(checkpoint_dir) / "data.pkl"
        with open(data_path, "rb") as fp:
            best_checkpoint_data = pickle.load(fp)

        best_trained_model.load_state_dict(best_checkpoint_data["net_state_dict"])
        test_acc = test_accuracy(best_trained_model, device)
        print("Best trial test set accuracy: {}".format(test_acc))


if __name__ == "__main__":
    # You can change the number of GPUs per trial here:
    main(num_samples=10, max_num_epochs=10, gpus_per_trial=0)
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to /var/lib/workspace/beginner_source/data/cifar-10-python.tar.gz

  0% 0.00/170M [00:00<?, ?B/s]
  0% 459k/170M [00:00<00:37, 4.53MB/s]
  4% 7.47M/170M [00:00<00:03, 42.9MB/s]
 10% 17.9M/170M [00:00<00:02, 70.6MB/s]
 16% 27.9M/170M [00:00<00:01, 82.2MB/s]
 22% 38.2M/170M [00:00<00:01, 89.7MB/s]
 28% 48.4M/170M [00:00<00:01, 93.7MB/s]
 34% 58.6M/170M [00:00<00:01, 96.6MB/s]
 40% 68.9M/170M [00:00<00:01, 98.4MB/s]
 46% 79.1M/170M [00:00<00:00, 99.5MB/s]
 52% 89.3M/170M [00:01<00:00, 100MB/s]
 58% 99.5M/170M [00:01<00:00, 101MB/s]
 64% 110M/170M [00:01<00:00, 101MB/s]
 70% 120M/170M [00:01<00:00, 102MB/s]
 76% 130M/170M [00:01<00:00, 102MB/s]
 82% 140M/170M [00:01<00:00, 102MB/s]
 88% 151M/170M [00:01<00:00, 102MB/s]
 94% 161M/170M [00:01<00:00, 102MB/s]
100% 170M/170M [00:01<00:00, 91.2MB/s]
Extracting /var/lib/workspace/beginner_source/data/cifar-10-python.tar.gz to /var/lib/workspace/beginner_source/data
Files already downloaded and verified
2024-10-17 21:58:28,302 WARNING services.py:1889 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 2147479552 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2024-10-17 21:58:28,552 INFO worker.py:1642 -- Started a local Ray instance.
2024-10-17 21:58:29,750 INFO tune.py:228 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `tune.run(...)`.
2024-10-17 21:58:29,752 INFO tune.py:654 -- [output] This will use the new output engine with verbosity 2. To disable the new output and use the legacy output engine, set the environment variable RAY_AIR_NEW_OUTPUT=0. For more information, please see https://github.com/ray-project/ray/issues/36949
+--------------------------------------------------------------------+
| Configuration for experiment     train_cifar_2024-10-17_21-58-29   |
+--------------------------------------------------------------------+
| Search algorithm                 BasicVariantGenerator             |
| Scheduler                        AsyncHyperBandScheduler           |
| Number of trials                 10                                |
+--------------------------------------------------------------------+

View detailed results here: /var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29
To visualize your results with TensorBoard, run: `tensorboard --logdir /var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29`

Trial status: 10 PENDING
Current time: 2024-10-17 21:58:30. Total running time: 0s
Logical resource usage: 0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+-------------------------------------------------------------------------------+
| Trial name                status       l1     l2            lr     batch_size |
+-------------------------------------------------------------------------------+
| train_cifar_f1b1b_00000   PENDING      16      1   0.00213327               2 |
| train_cifar_f1b1b_00001   PENDING       1      2   0.013416                 4 |
| train_cifar_f1b1b_00002   PENDING     256     64   0.0113784                2 |
| train_cifar_f1b1b_00003   PENDING      64    256   0.0274071                8 |
| train_cifar_f1b1b_00004   PENDING      16      2   0.056666                 4 |
| train_cifar_f1b1b_00005   PENDING       8     64   0.000353097              4 |
| train_cifar_f1b1b_00006   PENDING      16      4   0.000147684              8 |
| train_cifar_f1b1b_00007   PENDING     256    256   0.00477469               8 |
| train_cifar_f1b1b_00008   PENDING     128    256   0.0306227                8 |
| train_cifar_f1b1b_00009   PENDING       2     16   0.0286986                2 |
+-------------------------------------------------------------------------------+

Trial train_cifar_f1b1b_00002 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_f1b1b_00002 config             |
+--------------------------------------------------+
| batch_size                                     2 |
| l1                                           256 |
| l2                                            64 |
| lr                                       0.01138 |
+--------------------------------------------------+

Trial train_cifar_f1b1b_00006 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_f1b1b_00006 config             |
+--------------------------------------------------+
| batch_size                                     8 |
| l1                                            16 |
| l2                                             4 |
| lr                                       0.00015 |
+--------------------------------------------------+

Trial train_cifar_f1b1b_00003 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_f1b1b_00003 config             |
+--------------------------------------------------+
| batch_size                                     8 |
| l1                                            64 |
| l2                                           256 |
| lr                                       0.02741 |
+--------------------------------------------------+

Trial train_cifar_f1b1b_00004 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_f1b1b_00004 config             |
+--------------------------------------------------+
| batch_size                                     4 |
| l1                                            16 |
| l2                                             2 |
| lr                                       0.05667 |
+--------------------------------------------------+

Trial train_cifar_f1b1b_00000 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_f1b1b_00000 config             |
+--------------------------------------------------+
| batch_size                                     2 |
| l1                                            16 |
| l2                                             1 |
| lr                                       0.00213 |
+--------------------------------------------------+

Trial train_cifar_f1b1b_00001 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_f1b1b_00001 config             |
+--------------------------------------------------+
| batch_size                                     4 |
| l1                                             1 |
| l2                                             2 |
| lr                                       0.01342 |
+--------------------------------------------------+
(func pid=5895) Files already downloaded and verified

Trial train_cifar_f1b1b_00005 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_f1b1b_00005 config             |
+--------------------------------------------------+
| batch_size                                     4 |
| l1                                             8 |
| l2                                            64 |
| lr                                       0.00035 |
+--------------------------------------------------+

Trial train_cifar_f1b1b_00007 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_f1b1b_00007 config             |
+--------------------------------------------------+
| batch_size                                     8 |
| l1                                           256 |
| l2                                           256 |
| lr                                       0.00477 |
+--------------------------------------------------+
(func pid=5889) [1,  2000] loss: 2.319
(func pid=5904) Files already downloaded and verified [repeated 15x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)

Trial status: 8 RUNNING | 2 PENDING
Current time: 2024-10-17 21:59:00. Total running time: 30s
Logical resource usage: 16.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+-------------------------------------------------------------------------------+
| Trial name                status       l1     l2            lr     batch_size |
+-------------------------------------------------------------------------------+
| train_cifar_f1b1b_00000   RUNNING      16      1   0.00213327               2 |
| train_cifar_f1b1b_00001   RUNNING       1      2   0.013416                 4 |
| train_cifar_f1b1b_00002   RUNNING     256     64   0.0113784                2 |
| train_cifar_f1b1b_00003   RUNNING      64    256   0.0274071                8 |
| train_cifar_f1b1b_00004   RUNNING      16      2   0.056666                 4 |
| train_cifar_f1b1b_00005   RUNNING       8     64   0.000353097              4 |
| train_cifar_f1b1b_00006   RUNNING      16      4   0.000147684              8 |
| train_cifar_f1b1b_00007   RUNNING     256    256   0.00477469               8 |
| train_cifar_f1b1b_00008   PENDING     128    256   0.0306227                8 |
| train_cifar_f1b1b_00009   PENDING       2     16   0.0286986                2 |
+-------------------------------------------------------------------------------+
(func pid=5889) [1,  4000] loss: 1.153 [repeated 8x across cluster]
(func pid=5889) [1,  6000] loss: 0.768 [repeated 8x across cluster]

Trial train_cifar_f1b1b_00006 finished iteration 1 at 2024-10-17 21:59:29. Total running time: 59s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  53.84218 |
| time_total_s                                      53.84218 |
| training_iteration                                       1 |
| accuracy                                            0.0991 |
| loss                                               2.29352 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00006 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000000
(func pid=5899) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000000)

Trial status: 8 RUNNING | 2 PENDING
Current time: 2024-10-17 21:59:30. Total running time: 1min 0s
Logical resource usage: 16.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+----------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+----------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00000   RUNNING      16      1   0.00213327               2                                                    |
| train_cifar_f1b1b_00001   RUNNING       1      2   0.013416                 4                                                    |
| train_cifar_f1b1b_00002   RUNNING     256     64   0.0113784                2                                                    |
| train_cifar_f1b1b_00003   RUNNING      64    256   0.0274071                8                                                    |
| train_cifar_f1b1b_00004   RUNNING      16      2   0.056666                 4                                                    |
| train_cifar_f1b1b_00005   RUNNING       8     64   0.000353097              4                                                    |
| train_cifar_f1b1b_00006   RUNNING      16      4   0.000147684              8        1            53.8422   2.29352       0.0991 |
| train_cifar_f1b1b_00007   RUNNING     256    256   0.00477469               8                                                    |
| train_cifar_f1b1b_00008   PENDING     128    256   0.0306227                8                                                    |
| train_cifar_f1b1b_00009   PENDING       2     16   0.0286986                2                                                    |
+----------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_f1b1b_00003 finished iteration 1 at 2024-10-17 21:59:31. Total running time: 1min 1s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00003 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  55.38453 |
| time_total_s                                      55.38453 |
| training_iteration                                       1 |
| accuracy                                            0.1974 |
| loss                                                 2.088 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00003 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00003_3_batch_size=8,l1=64,l2=256,lr=0.0274_2024-10-17_21-58-29/checkpoint_000000

Trial train_cifar_f1b1b_00007 finished iteration 1 at 2024-10-17 21:59:31. Total running time: 1min 1s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  55.54326 |
| time_total_s                                      55.54326 |
| training_iteration                                       1 |
| accuracy                                            0.4803 |
| loss                                               1.41309 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00007 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000000
(func pid=5889) [1,  8000] loss: 0.576 [repeated 5x across cluster]
(func pid=5895) [1,  8000] loss: 0.572 [repeated 4x across cluster]
(func pid=5889) [1, 10000] loss: 0.461 [repeated 4x across cluster]

Trial status: 8 RUNNING | 2 PENDING
Current time: 2024-10-17 22:00:00. Total running time: 1min 30s
Logical resource usage: 16.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+----------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status       l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+----------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00000   RUNNING      16      1   0.00213327               2                                                    |
| train_cifar_f1b1b_00001   RUNNING       1      2   0.013416                 4                                                    |
| train_cifar_f1b1b_00002   RUNNING     256     64   0.0113784                2                                                    |
| train_cifar_f1b1b_00003   RUNNING      64    256   0.0274071                8        1            55.3845   2.088         0.1974 |
| train_cifar_f1b1b_00004   RUNNING      16      2   0.056666                 4                                                    |
| train_cifar_f1b1b_00005   RUNNING       8     64   0.000353097              4                                                    |
| train_cifar_f1b1b_00006   RUNNING      16      4   0.000147684              8        1            53.8422   2.29352       0.0991 |
| train_cifar_f1b1b_00007   RUNNING     256    256   0.00477469               8        1            55.5433   1.41309       0.4803 |
| train_cifar_f1b1b_00008   PENDING     128    256   0.0306227                8                                                    |
| train_cifar_f1b1b_00009   PENDING       2     16   0.0286986                2                                                    |
+----------------------------------------------------------------------------------------------------------------------------------+
(func pid=5895) [1, 10000] loss: 0.455 [repeated 4x across cluster]
(func pid=5889) [1, 12000] loss: 0.384 [repeated 4x across cluster]

Trial train_cifar_f1b1b_00005 finished iteration 1 at 2024-10-17 22:00:12. Total running time: 1min 42s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  96.45002 |
| time_total_s                                      96.45002 |
| training_iteration                                       1 |
| accuracy                                            0.3303 |
| loss                                               1.76039 |
+------------------------------------------------------------+
(func pid=5898) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000000) [repeated 3x across cluster]
Trial train_cifar_f1b1b_00005 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000000

Trial train_cifar_f1b1b_00001 finished iteration 1 at 2024-10-17 22:00:12. Total running time: 1min 42s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00001 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  96.71572 |
| time_total_s                                      96.71572 |
| training_iteration                                       1 |
| accuracy                                             0.097 |
| loss                                               2.30901 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00001 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00001_1_batch_size=4,l1=1,l2=2,lr=0.0134_2024-10-17_21-58-29/checkpoint_000000

Trial train_cifar_f1b1b_00001 completed after 1 iterations at 2024-10-17 22:00:12. Total running time: 1min 42s

Trial train_cifar_f1b1b_00008 started with configuration:
+--------------------------------------------------+
| Trial train_cifar_f1b1b_00008 config             |
+--------------------------------------------------+
| batch_size                                     8 |
| l1                                           128 |
| l2                                           256 |
| lr                                       0.03062 |
+--------------------------------------------------+
(func pid=5890) Files already downloaded and verified
(func pid=5890) Files already downloaded and verified

Trial train_cifar_f1b1b_00004 finished iteration 1 at 2024-10-17 22:00:14. Total running time: 1min 44s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00004 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                   98.8246 |
| time_total_s                                       98.8246 |
| training_iteration                                       1 |
| accuracy                                            0.0961 |
| loss                                               2.31015 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00004 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00004_4_batch_size=4,l1=16,l2=2,lr=0.0567_2024-10-17_21-58-29/checkpoint_000000

Trial train_cifar_f1b1b_00004 completed after 1 iterations at 2024-10-17 22:00:14. Total running time: 1min 44s

Trial train_cifar_f1b1b_00009 started with configuration:
+-------------------------------------------------+
| Trial train_cifar_f1b1b_00009 config            |
+-------------------------------------------------+
| batch_size                                    2 |
| l1                                            2 |
| l2                                           16 |
| lr                                       0.0287 |
+-------------------------------------------------+

Trial train_cifar_f1b1b_00006 finished iteration 2 at 2024-10-17 22:00:22. Total running time: 1min 52s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000001 |
| time_this_iter_s                                  52.91409 |
| time_total_s                                     106.75627 |
| training_iteration                                       2 |
| accuracy                                            0.2044 |
| loss                                               2.08109 |
+------------------------------------------------------------+(func pid=5899) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000001) [repeated 3x across cluster]

Trial train_cifar_f1b1b_00006 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000001
(func pid=5895) [1, 12000] loss: 0.369
(func pid=5897) Files already downloaded and verified [repeated 2x across cluster]

Trial train_cifar_f1b1b_00007 finished iteration 2 at 2024-10-17 22:00:24. Total running time: 1min 54s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000001 |
| time_this_iter_s                                  52.83889 |
| time_total_s                                     108.38216 |
| training_iteration                                       2 |
| accuracy                                            0.5344 |
| loss                                               1.32872 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00007 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000001

Trial train_cifar_f1b1b_00003 finished iteration 2 at 2024-10-17 22:00:27. Total running time: 1min 57s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00003 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000001 |
| time_this_iter_s                                  56.17008 |
| time_total_s                                     111.55461 |
| training_iteration                                       2 |
| accuracy                                             0.155 |
| loss                                                2.2556 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00003 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00003_3_batch_size=8,l1=64,l2=256,lr=0.0274_2024-10-17_21-58-29/checkpoint_000001

Trial train_cifar_f1b1b_00003 completed after 2 iterations at 2024-10-17 22:00:27. Total running time: 1min 57s
(func pid=5889) [1, 14000] loss: 0.329

Trial status: 7 RUNNING | 3 TERMINATED
Current time: 2024-10-17 22:00:30. Total running time: 2min 0s
Logical resource usage: 14.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00000   RUNNING        16      1   0.00213327               2                                                    |
| train_cifar_f1b1b_00002   RUNNING       256     64   0.0113784                2                                                    |
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        1            96.45     1.76039       0.3303 |
| train_cifar_f1b1b_00006   RUNNING        16      4   0.000147684              8        2           106.756    2.08109       0.2044 |
| train_cifar_f1b1b_00007   RUNNING       256    256   0.00477469               8        2           108.382    1.32872       0.5344 |
| train_cifar_f1b1b_00008   RUNNING       128    256   0.0306227                8                                                    |
| train_cifar_f1b1b_00009   RUNNING         2     16   0.0286986                2                                                    |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=5899) [3,  2000] loss: 2.023 [repeated 4x across cluster]
(func pid=5897) [1,  4000] loss: 1.168 [repeated 5x across cluster]
(func pid=5899) [3,  4000] loss: 0.961 [repeated 2x across cluster]
Trial status: 7 RUNNING | 3 TERMINATED
Current time: 2024-10-17 22:01:00. Total running time: 2min 30s
Logical resource usage: 14.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00000   RUNNING        16      1   0.00213327               2                                                    |
| train_cifar_f1b1b_00002   RUNNING       256     64   0.0113784                2                                                    |
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        1            96.45     1.76039       0.3303 |
| train_cifar_f1b1b_00006   RUNNING        16      4   0.000147684              8        2           106.756    2.08109       0.2044 |
| train_cifar_f1b1b_00007   RUNNING       256    256   0.00477469               8        2           108.382    1.32872       0.5344 |
| train_cifar_f1b1b_00008   RUNNING       128    256   0.0306227                8                                                    |
| train_cifar_f1b1b_00009   RUNNING         2     16   0.0286986                2                                                    |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
+------------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_f1b1b_00008 finished iteration 1 at 2024-10-17 22:01:03. Total running time: 2min 33s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00008 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  50.87284 |
| time_total_s                                      50.87284 |
| training_iteration                                       1 |
| accuracy                                            0.2337 |
| loss                                               2.05218 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00008 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00008_8_batch_size=8,l1=128,l2=256,lr=0.0306_2024-10-17_21-58-29/checkpoint_000000
(func pid=5890) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00008_8_batch_size=8,l1=128,l2=256,lr=0.0306_2024-10-17_21-58-29/checkpoint_000000) [repeated 3x across cluster]

Trial train_cifar_f1b1b_00006 finished iteration 3 at 2024-10-17 22:01:06. Total running time: 2min 37s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000002 |
| time_this_iter_s                                  44.66284 |
| time_total_s                                     151.41911 |
| training_iteration                                       3 |
| accuracy                                            0.3018 |
| loss                                               1.85412 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00006 saved a checkpoint for iteration 3 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000002
(func pid=5889) [1, 20000] loss: 0.230 [repeated 6x across cluster]

Trial train_cifar_f1b1b_00007 finished iteration 3 at 2024-10-17 22:01:10. Total running time: 2min 41s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000002 |
| time_this_iter_s                                  46.44777 |
| time_total_s                                     154.82992 |
| training_iteration                                       3 |
| accuracy                                            0.5743 |
| loss                                                  1.22 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00007 saved a checkpoint for iteration 3 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000002
(func pid=5904) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000002) [repeated 2x across cluster]
(func pid=5890) [2,  2000] loss: 2.095 [repeated 4x across cluster]
(func pid=5895) [1, 20000] loss: 0.232 [repeated 4x across cluster]

Trial train_cifar_f1b1b_00000 finished iteration 1 at 2024-10-17 22:01:28. Total running time: 2min 59s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00000 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                 173.29934 |
| time_total_s                                     173.29934 |
| training_iteration                                       1 |
| accuracy                                            0.0993 |
| loss                                               2.30543 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00000 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00000_0_batch_size=2,l1=16,l2=1,lr=0.0021_2024-10-17_21-58-29/checkpoint_000000

Trial train_cifar_f1b1b_00000 completed after 1 iterations at 2024-10-17 22:01:28. Total running time: 2min 59s
(func pid=5889) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00000_0_batch_size=2,l1=16,l2=1,lr=0.0021_2024-10-17_21-58-29/checkpoint_000000)

Trial status: 4 TERMINATED | 6 RUNNING
Current time: 2024-10-17 22:01:30. Total running time: 3min 0s
Logical resource usage: 12.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00002   RUNNING       256     64   0.0113784                2                                                    |
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        1            96.45     1.76039       0.3303 |
| train_cifar_f1b1b_00006   RUNNING        16      4   0.000147684              8        3           151.419    1.85412       0.3018 |
| train_cifar_f1b1b_00007   RUNNING       256    256   0.00477469               8        3           154.83     1.22          0.5743 |
| train_cifar_f1b1b_00008   RUNNING       128    256   0.0306227                8        1            50.8728   2.05218       0.2337 |
| train_cifar_f1b1b_00009   RUNNING         2     16   0.0286986                2                                                    |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
+------------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_f1b1b_00005 finished iteration 2 at 2024-10-17 22:01:34. Total running time: 3min 4s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000001 |
| time_this_iter_s                                  81.94404 |
| time_total_s                                     178.39407 |
| training_iteration                                       2 |
| accuracy                                            0.4135 |
| loss                                               1.57162 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00005 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000001
(func pid=5898) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000001)
(func pid=5897) [1, 12000] loss: 0.389 [repeated 2x across cluster]
(func pid=5904) [4,  4000] loss: 0.564 [repeated 3x across cluster]

Trial train_cifar_f1b1b_00002 finished iteration 1 at 2024-10-17 22:01:45. Total running time: 3min 16s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00002 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                 190.34233 |
| time_total_s                                     190.34233 |
| training_iteration                                       1 |
| accuracy                                            0.0957 |
| loss                                               2.32163 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00002 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00002_2_batch_size=2,l1=256,l2=64,lr=0.0114_2024-10-17_21-58-29/checkpoint_000000

Trial train_cifar_f1b1b_00002 completed after 1 iterations at 2024-10-17 22:01:45. Total running time: 3min 16s
(func pid=5895) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00002_2_batch_size=2,l1=256,l2=64,lr=0.0114_2024-10-17_21-58-29/checkpoint_000000)
(func pid=5898) [3,  2000] loss: 1.550
(func pid=5897) [1, 14000] loss: 0.334

Trial train_cifar_f1b1b_00006 finished iteration 4 at 2024-10-17 22:01:48. Total running time: 3min 18s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000003 |
| time_this_iter_s                                   41.1182 |
| time_total_s                                      192.5373 |
| training_iteration                                       4 |
| accuracy                                            0.3254 |
| loss                                                 1.763 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00006 saved a checkpoint for iteration 4 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000003
(func pid=5899) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000003)

Trial train_cifar_f1b1b_00008 finished iteration 2 at 2024-10-17 22:01:49. Total running time: 3min 20s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00008 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000001 |
| time_this_iter_s                                  46.32766 |
| time_total_s                                       97.2005 |
| training_iteration                                       2 |
| accuracy                                             0.206 |
| loss                                               2.06562 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00008 saved a checkpoint for iteration 2 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00008_8_batch_size=8,l1=128,l2=256,lr=0.0306_2024-10-17_21-58-29/checkpoint_000001

Trial train_cifar_f1b1b_00008 completed after 2 iterations at 2024-10-17 22:01:49. Total running time: 3min 20s

Trial train_cifar_f1b1b_00007 finished iteration 4 at 2024-10-17 22:01:53. Total running time: 3min 23s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000003 |
| time_this_iter_s                                  42.42913 |
| time_total_s                                     197.25905 |
| training_iteration                                       4 |
| accuracy                                             0.572 |
| loss                                               1.23591 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00007 saved a checkpoint for iteration 4 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000003
(func pid=5904) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000003) [repeated 2x across cluster]
(func pid=5898) [3,  4000] loss: 0.760
(func pid=5897) [1, 16000] loss: 0.292

Trial status: 6 TERMINATED | 4 RUNNING
Current time: 2024-10-17 22:02:00. Total running time: 3min 30s
Logical resource usage: 8.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        2           178.394    1.57162       0.4135 |
| train_cifar_f1b1b_00006   RUNNING        16      4   0.000147684              8        4           192.537    1.763         0.3254 |
| train_cifar_f1b1b_00007   RUNNING       256    256   0.00477469               8        4           197.259    1.23591       0.572  |
| train_cifar_f1b1b_00009   RUNNING         2     16   0.0286986                2                                                    |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=5904) [5,  2000] loss: 1.039 [repeated 2x across cluster]
(func pid=5899) [5,  4000] loss: 0.864 [repeated 3x across cluster]
(func pid=5897) [1, 20000] loss: 0.234
(func pid=5904) [5,  4000] loss: 0.543

Trial train_cifar_f1b1b_00006 finished iteration 5 at 2024-10-17 22:02:22. Total running time: 3min 52s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000004 |
| time_this_iter_s                                  34.15932 |
| time_total_s                                     226.69663 |
| training_iteration                                       5 |
| accuracy                                            0.3465 |
| loss                                               1.71249 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00006 saved a checkpoint for iteration 5 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000004
(func pid=5899) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000004)

Trial train_cifar_f1b1b_00007 finished iteration 5 at 2024-10-17 22:02:29. Total running time: 3min 59s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000004 |
| time_this_iter_s                                  35.95874 |
| time_total_s                                      233.2178 |
| training_iteration                                       5 |
| accuracy                                            0.5716 |
| loss                                               1.25063 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00007 saved a checkpoint for iteration 5 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000004
(func pid=5904) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000004)
(func pid=5898) [3, 10000] loss: 0.294 [repeated 2x across cluster]

Trial status: 6 TERMINATED | 4 RUNNING
Current time: 2024-10-17 22:02:30. Total running time: 4min 0s
Logical resource usage: 8.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        2           178.394    1.57162       0.4135 |
| train_cifar_f1b1b_00006   RUNNING        16      4   0.000147684              8        5           226.697    1.71249       0.3465 |
| train_cifar_f1b1b_00007   RUNNING       256    256   0.00477469               8        5           233.218    1.25063       0.5716 |
| train_cifar_f1b1b_00009   RUNNING         2     16   0.0286986                2                                                    |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
+------------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_f1b1b_00009 finished iteration 1 at 2024-10-17 22:02:33. Total running time: 4min 4s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00009 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000000 |
| time_this_iter_s                                  139.4363 |
| time_total_s                                      139.4363 |
| training_iteration                                       1 |
| accuracy                                             0.098 |
| loss                                               2.33758 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00009 saved a checkpoint for iteration 1 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00009_9_batch_size=2,l1=2,l2=16,lr=0.0287_2024-10-17_21-58-29/checkpoint_000000

Trial train_cifar_f1b1b_00009 completed after 1 iterations at 2024-10-17 22:02:33. Total running time: 4min 4s

Trial train_cifar_f1b1b_00005 finished iteration 3 at 2024-10-17 22:02:37. Total running time: 4min 7s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000002 |
| time_this_iter_s                                  63.37139 |
| time_total_s                                     241.76546 |
| training_iteration                                       3 |
| accuracy                                            0.4627 |
| loss                                               1.45451 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00005 saved a checkpoint for iteration 3 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000002
(func pid=5898) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000002) [repeated 2x across cluster]
(func pid=5904) [6,  2000] loss: 0.991 [repeated 2x across cluster]
(func pid=5898) [4,  2000] loss: 1.424 [repeated 2x across cluster]
(func pid=5904) [6,  4000] loss: 0.525

Trial train_cifar_f1b1b_00006 finished iteration 6 at 2024-10-17 22:02:54. Total running time: 4min 24s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000005 |
| time_this_iter_s                                   32.1987 |
| time_total_s                                     258.89533 |
| training_iteration                                       6 |
| accuracy                                            0.3643 |
| loss                                               1.67179 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00006 saved a checkpoint for iteration 6 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000005
(func pid=5899) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000005)
(func pid=5898) [4,  4000] loss: 0.708

Trial status: 7 TERMINATED | 3 RUNNING
Current time: 2024-10-17 22:03:00. Total running time: 4min 30s
Logical resource usage: 6.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        3           241.765    1.45451       0.4627 |
| train_cifar_f1b1b_00006   RUNNING        16      4   0.000147684              8        6           258.895    1.67179       0.3643 |
| train_cifar_f1b1b_00007   RUNNING       256    256   0.00477469               8        5           233.218    1.25063       0.5716 |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
| train_cifar_f1b1b_00009   TERMINATED      2     16   0.0286986                2        1           139.436    2.33758       0.098  |
+------------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_f1b1b_00007 finished iteration 6 at 2024-10-17 22:03:02. Total running time: 4min 32s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000005 |
| time_this_iter_s                                  33.33426 |
| time_total_s                                     266.55205 |
| training_iteration                                       6 |
| accuracy                                            0.5779 |
| loss                                                1.2377 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00007 saved a checkpoint for iteration 6 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000005
(func pid=5904) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000005)
(func pid=5899) [7,  2000] loss: 1.644
(func pid=5898) [4,  6000] loss: 0.468
(func pid=5904) [7,  2000] loss: 0.956
(func pid=5899) [7,  4000] loss: 0.813

Trial train_cifar_f1b1b_00006 finished iteration 7 at 2024-10-17 22:03:25. Total running time: 4min 55s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000006 |
| time_this_iter_s                                  30.81064 |
| time_total_s                                     289.70597 |
| training_iteration                                       7 |
| accuracy                                            0.3875 |
| loss                                               1.61252 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00006 saved a checkpoint for iteration 7 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000006
(func pid=5899) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000006)
(func pid=5904) [7,  4000] loss: 0.507 [repeated 2x across cluster]

Trial status: 7 TERMINATED | 3 RUNNING
Current time: 2024-10-17 22:03:30. Total running time: 5min 0s
Logical resource usage: 6.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        3           241.765    1.45451       0.4627 |
| train_cifar_f1b1b_00006   RUNNING        16      4   0.000147684              8        7           289.706    1.61252       0.3875 |
| train_cifar_f1b1b_00007   RUNNING       256    256   0.00477469               8        6           266.552    1.2377        0.5779 |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
| train_cifar_f1b1b_00009   TERMINATED      2     16   0.0286986                2        1           139.436    2.33758       0.098  |
+------------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_f1b1b_00005 finished iteration 4 at 2024-10-17 22:03:33. Total running time: 5min 3s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000003 |
| time_this_iter_s                                  55.51741 |
| time_total_s                                     297.28287 |
| training_iteration                                       4 |
| accuracy                                            0.5025 |
| loss                                               1.35613 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00005 saved a checkpoint for iteration 4 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000003
(func pid=5898) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000003)

Trial train_cifar_f1b1b_00007 finished iteration 7 at 2024-10-17 22:03:35. Total running time: 5min 6s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000006 |
| time_this_iter_s                                  33.28508 |
| time_total_s                                     299.83713 |
| training_iteration                                       7 |
| accuracy                                            0.5743 |
| loss                                               1.28765 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00007 saved a checkpoint for iteration 7 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000006
(func pid=5899) [8,  2000] loss: 1.601 [repeated 2x across cluster]
(func pid=5898) [5,  2000] loss: 1.341
(func pid=5899) [8,  4000] loss: 0.794

Trial train_cifar_f1b1b_00006 finished iteration 8 at 2024-10-17 22:03:56. Total running time: 5min 26s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000007 |
| time_this_iter_s                                   31.4599 |
| time_total_s                                     321.16586 |
| training_iteration                                       8 |
| accuracy                                            0.4042 |
| loss                                                1.5959 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00006 saved a checkpoint for iteration 8 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000007
(func pid=5899) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000007) [repeated 2x across cluster]
(func pid=5904) [8,  4000] loss: 0.488 [repeated 3x across cluster]

Trial status: 7 TERMINATED | 3 RUNNING
Current time: 2024-10-17 22:04:00. Total running time: 5min 30s
Logical resource usage: 6.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        4           297.283    1.35613       0.5025 |
| train_cifar_f1b1b_00006   RUNNING        16      4   0.000147684              8        8           321.166    1.5959        0.4042 |
| train_cifar_f1b1b_00007   RUNNING       256    256   0.00477469               8        7           299.837    1.28765       0.5743 |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
| train_cifar_f1b1b_00009   TERMINATED      2     16   0.0286986                2        1           139.436    2.33758       0.098  |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=5899) [9,  2000] loss: 1.560 [repeated 2x across cluster]

Trial train_cifar_f1b1b_00007 finished iteration 8 at 2024-10-17 22:04:08. Total running time: 5min 39s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000007 |
| time_this_iter_s                                  32.91228 |
| time_total_s                                     332.74941 |
| training_iteration                                       8 |
| accuracy                                            0.5623 |
| loss                                               1.40129 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00007 saved a checkpoint for iteration 8 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000007
(func pid=5904) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000007)
(func pid=5899) [9,  4000] loss: 0.777 [repeated 2x across cluster]

Trial train_cifar_f1b1b_00006 finished iteration 9 at 2024-10-17 22:04:27. Total running time: 5min 58s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000008 |
| time_this_iter_s                                  31.09195 |
| time_total_s                                     352.25781 |
| training_iteration                                       9 |
| accuracy                                            0.4044 |
| loss                                               1.56051 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00006 saved a checkpoint for iteration 9 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000008
(func pid=5899) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000008)

Trial train_cifar_f1b1b_00005 finished iteration 5 at 2024-10-17 22:04:28. Total running time: 5min 59s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000004 |
| time_this_iter_s                                  55.47278 |
| time_total_s                                     352.75564 |
| training_iteration                                       5 |
| accuracy                                            0.5099 |
| loss                                               1.34254 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00005 saved a checkpoint for iteration 5 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000004

Trial status: 7 TERMINATED | 3 RUNNING
Current time: 2024-10-17 22:04:30. Total running time: 6min 0s
Logical resource usage: 6.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        5           352.756    1.34254       0.5099 |
| train_cifar_f1b1b_00006   RUNNING        16      4   0.000147684              8        9           352.258    1.56051       0.4044 |
| train_cifar_f1b1b_00007   RUNNING       256    256   0.00477469               8        8           332.749    1.40129       0.5623 |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
| train_cifar_f1b1b_00009   TERMINATED      2     16   0.0286986                2        1           139.436    2.33758       0.098  |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=5904) [9,  4000] loss: 0.491 [repeated 3x across cluster]
(func pid=5899) [10,  2000] loss: 1.530
(func pid=5898) [6,  2000] loss: 1.281

Trial train_cifar_f1b1b_00007 finished iteration 9 at 2024-10-17 22:04:42. Total running time: 6min 12s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000008 |
| time_this_iter_s                                  33.33806 |
| time_total_s                                     366.08747 |
| training_iteration                                       9 |
| accuracy                                            0.5661 |
| loss                                               1.37246 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00007 saved a checkpoint for iteration 9 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000008
(func pid=5904) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000008) [repeated 2x across cluster]
(func pid=5898) [6,  4000] loss: 0.646
(func pid=5899) [10,  4000] loss: 0.758
(func pid=5898) [6,  6000] loss: 0.431 [repeated 2x across cluster]

Trial train_cifar_f1b1b_00006 finished iteration 10 at 2024-10-17 22:04:58. Total running time: 6min 28s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00006 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000009 |
| time_this_iter_s                                  30.85066 |
| time_total_s                                     383.10847 |
| training_iteration                                      10 |
| accuracy                                            0.4347 |
| loss                                               1.50765 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00006 saved a checkpoint for iteration 10 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000009

Trial train_cifar_f1b1b_00006 completed after 10 iterations at 2024-10-17 22:04:58. Total running time: 6min 28s
(func pid=5899) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00006_6_batch_size=8,l1=16,l2=4,lr=0.0001_2024-10-17_21-58-29/checkpoint_000009)

Trial status: 8 TERMINATED | 2 RUNNING
Current time: 2024-10-17 22:05:00. Total running time: 6min 30s
Logical resource usage: 4.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        5           352.756    1.34254       0.5099 |
| train_cifar_f1b1b_00007   RUNNING       256    256   0.00477469               8        9           366.087    1.37246       0.5661 |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00006   TERMINATED     16      4   0.000147684              8       10           383.108    1.50765       0.4347 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
| train_cifar_f1b1b_00009   TERMINATED      2     16   0.0286986                2        1           139.436    2.33758       0.098  |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=5904) [10,  4000] loss: 0.472
(func pid=5898) [6,  8000] loss: 0.322

Trial train_cifar_f1b1b_00007 finished iteration 10 at 2024-10-17 22:05:12. Total running time: 6min 42s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00007 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000009 |
| time_this_iter_s                                   30.5672 |
| time_total_s                                     396.65467 |
| training_iteration                                      10 |
| accuracy                                            0.5751 |
| loss                                               1.35308 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00007 saved a checkpoint for iteration 10 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000009

Trial train_cifar_f1b1b_00007 completed after 10 iterations at 2024-10-17 22:05:12. Total running time: 6min 42s
(func pid=5904) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00007_7_batch_size=8,l1=256,l2=256,lr=0.0048_2024-10-17_21-58-29/checkpoint_000009)
(func pid=5898) [6, 10000] loss: 0.249

Trial train_cifar_f1b1b_00005 finished iteration 6 at 2024-10-17 22:05:20. Total running time: 6min 50s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000005 |
| time_this_iter_s                                  51.96922 |
| time_total_s                                     404.72486 |
| training_iteration                                       6 |
| accuracy                                            0.5364 |
| loss                                               1.29211 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00005 saved a checkpoint for iteration 6 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000005
(func pid=5898) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000005)
(func pid=5898) [7,  2000] loss: 1.251

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2024-10-17 22:05:30. Total running time: 7min 0s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        6           404.725    1.29211       0.5364 |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00006   TERMINATED     16      4   0.000147684              8       10           383.108    1.50765       0.4347 |
| train_cifar_f1b1b_00007   TERMINATED    256    256   0.00477469               8       10           396.655    1.35308       0.5751 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
| train_cifar_f1b1b_00009   TERMINATED      2     16   0.0286986                2        1           139.436    2.33758       0.098  |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=5898) [7,  4000] loss: 0.619
(func pid=5898) [7,  6000] loss: 0.415
(func pid=5898) [7,  8000] loss: 0.308
(func pid=5898) [7, 10000] loss: 0.245
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2024-10-17 22:06:00. Total running time: 7min 30s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        6           404.725    1.29211       0.5364 |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00006   TERMINATED     16      4   0.000147684              8       10           383.108    1.50765       0.4347 |
| train_cifar_f1b1b_00007   TERMINATED    256    256   0.00477469               8       10           396.655    1.35308       0.5751 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
| train_cifar_f1b1b_00009   TERMINATED      2     16   0.0286986                2        1           139.436    2.33758       0.098  |
+------------------------------------------------------------------------------------------------------------------------------------+

Trial train_cifar_f1b1b_00005 finished iteration 7 at 2024-10-17 22:06:06. Total running time: 7min 36s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000006 |
| time_this_iter_s                                  45.67289 |
| time_total_s                                     450.39776 |
| training_iteration                                       7 |
| accuracy                                            0.5543 |
| loss                                               1.23163 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00005 saved a checkpoint for iteration 7 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000006
(func pid=5898) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000006)
(func pid=5898) [8,  2000] loss: 1.196
(func pid=5898) [8,  4000] loss: 0.611
(func pid=5898) [8,  6000] loss: 0.394

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2024-10-17 22:06:30. Total running time: 8min 0s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        7           450.398    1.23163       0.5543 |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00006   TERMINATED     16      4   0.000147684              8       10           383.108    1.50765       0.4347 |
| train_cifar_f1b1b_00007   TERMINATED    256    256   0.00477469               8       10           396.655    1.35308       0.5751 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
| train_cifar_f1b1b_00009   TERMINATED      2     16   0.0286986                2        1           139.436    2.33758       0.098  |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=5898) [8,  8000] loss: 0.298
(func pid=5898) [8, 10000] loss: 0.240

Trial train_cifar_f1b1b_00005 finished iteration 8 at 2024-10-17 22:06:51. Total running time: 8min 21s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000007 |
| time_this_iter_s                                  45.03539 |
| time_total_s                                     495.43314 |
| training_iteration                                       8 |
| accuracy                                            0.5636 |
| loss                                               1.22532 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00005 saved a checkpoint for iteration 8 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000007
(func pid=5898) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000007)
(func pid=5898) [9,  2000] loss: 1.148

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2024-10-17 22:07:00. Total running time: 8min 31s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        8           495.433    1.22532       0.5636 |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00006   TERMINATED     16      4   0.000147684              8       10           383.108    1.50765       0.4347 |
| train_cifar_f1b1b_00007   TERMINATED    256    256   0.00477469               8       10           396.655    1.35308       0.5751 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
| train_cifar_f1b1b_00009   TERMINATED      2     16   0.0286986                2        1           139.436    2.33758       0.098  |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=5898) [9,  4000] loss: 0.588
(func pid=5898) [9,  6000] loss: 0.391
(func pid=5898) [9,  8000] loss: 0.294
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2024-10-17 22:07:30. Total running time: 9min 1s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        8           495.433    1.22532       0.5636 |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00006   TERMINATED     16      4   0.000147684              8       10           383.108    1.50765       0.4347 |
| train_cifar_f1b1b_00007   TERMINATED    256    256   0.00477469               8       10           396.655    1.35308       0.5751 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
| train_cifar_f1b1b_00009   TERMINATED      2     16   0.0286986                2        1           139.436    2.33758       0.098  |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=5898) [9, 10000] loss: 0.234

Trial train_cifar_f1b1b_00005 finished iteration 9 at 2024-10-17 22:07:37. Total running time: 9min 7s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000008 |
| time_this_iter_s                                  45.61971 |
| time_total_s                                     541.05286 |
| training_iteration                                       9 |
| accuracy                                             0.565 |
| loss                                               1.21076 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00005 saved a checkpoint for iteration 9 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000008
(func pid=5898) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000008)
(func pid=5898) [10,  2000] loss: 1.139
(func pid=5898) [10,  4000] loss: 0.559

Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2024-10-17 22:08:00. Total running time: 9min 31s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00005   RUNNING         8     64   0.000353097              4        9           541.053    1.21076       0.565  |
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00006   TERMINATED     16      4   0.000147684              8       10           383.108    1.50765       0.4347 |
| train_cifar_f1b1b_00007   TERMINATED    256    256   0.00477469               8       10           396.655    1.35308       0.5751 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
| train_cifar_f1b1b_00009   TERMINATED      2     16   0.0286986                2        1           139.436    2.33758       0.098  |
+------------------------------------------------------------------------------------------------------------------------------------+
(func pid=5898) [10,  6000] loss: 0.385
(func pid=5898) [10,  8000] loss: 0.288
(func pid=5898) [10, 10000] loss: 0.228

Trial train_cifar_f1b1b_00005 finished iteration 10 at 2024-10-17 22:08:22. Total running time: 9min 52s
+------------------------------------------------------------+
| Trial train_cifar_f1b1b_00005 result                       |
+------------------------------------------------------------+
| checkpoint_dir_name                      checkpoint_000009 |
| time_this_iter_s                                  45.12275 |
| time_total_s                                     586.17561 |
| training_iteration                                      10 |
| accuracy                                            0.5756 |
| loss                                               1.17761 |
+------------------------------------------------------------+
Trial train_cifar_f1b1b_00005 saved a checkpoint for iteration 10 at: (local)/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000009

Trial train_cifar_f1b1b_00005 completed after 10 iterations at 2024-10-17 22:08:22. Total running time: 9min 52s

Trial status: 10 TERMINATED
Current time: 2024-10-17 22:08:22. Total running time: 9min 52s
Logical resource usage: 2.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:M60)
+------------------------------------------------------------------------------------------------------------------------------------+
| Trial name                status         l1     l2            lr     batch_size     iter     total time (s)      loss     accuracy |
+------------------------------------------------------------------------------------------------------------------------------------+
| train_cifar_f1b1b_00000   TERMINATED     16      1   0.00213327               2        1           173.299    2.30543       0.0993 |
| train_cifar_f1b1b_00001   TERMINATED      1      2   0.013416                 4        1            96.7157   2.30901       0.097  |
| train_cifar_f1b1b_00002   TERMINATED    256     64   0.0113784                2        1           190.342    2.32163       0.0957 |
| train_cifar_f1b1b_00003   TERMINATED     64    256   0.0274071                8        2           111.555    2.2556        0.155  |
| train_cifar_f1b1b_00004   TERMINATED     16      2   0.056666                 4        1            98.8246   2.31015       0.0961 |
| train_cifar_f1b1b_00005   TERMINATED      8     64   0.000353097              4       10           586.176    1.17761       0.5756 |
| train_cifar_f1b1b_00006   TERMINATED     16      4   0.000147684              8       10           383.108    1.50765       0.4347 |
| train_cifar_f1b1b_00007   TERMINATED    256    256   0.00477469               8       10           396.655    1.35308       0.5751 |
| train_cifar_f1b1b_00008   TERMINATED    128    256   0.0306227                8        2            97.2005   2.06562       0.206  |
| train_cifar_f1b1b_00009   TERMINATED      2     16   0.0286986                2        1           139.436    2.33758       0.098  |
+------------------------------------------------------------------------------------------------------------------------------------+

Best trial config: {'l1': 8, 'l2': 64, 'lr': 0.0003530972286268149, 'batch_size': 4}
Best trial final validation loss: 1.1776135039269924
Best trial final validation accuracy: 0.5756
(func pid=5898) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2024-10-17_21-58-29/train_cifar_f1b1b_00005_5_batch_size=4,l1=8,l2=64,lr=0.0004_2024-10-17_21-58-29/checkpoint_000009)
Files already downloaded and verified
Files already downloaded and verified
Best trial test set accuracy: 0.5846

如果您运行代码,示例输出可能如下所示

Number of trials: 10/10 (10 TERMINATED)
+-----+--------------+------+------+-------------+--------+---------+------------+
| ... |   batch_size |   l1 |   l2 |          lr |   iter |    loss |   accuracy |
|-----+--------------+------+------+-------------+--------+---------+------------|
| ... |            2 |    1 |  256 | 0.000668163 |      1 | 2.31479 |     0.0977 |
| ... |            4 |   64 |    8 | 0.0331514   |      1 | 2.31605 |     0.0983 |
| ... |            4 |    2 |    1 | 0.000150295 |      1 | 2.30755 |     0.1023 |
| ... |           16 |   32 |   32 | 0.0128248   |     10 | 1.66912 |     0.4391 |
| ... |            4 |    8 |  128 | 0.00464561  |      2 | 1.7316  |     0.3463 |
| ... |            8 |  256 |    8 | 0.00031556  |      1 | 2.19409 |     0.1736 |
| ... |            4 |   16 |  256 | 0.00574329  |      2 | 1.85679 |     0.3368 |
| ... |            8 |    2 |    2 | 0.00325652  |      1 | 2.30272 |     0.0984 |
| ... |            2 |    2 |    2 | 0.000342987 |      2 | 1.76044 |     0.292  |
| ... |            4 |   64 |   32 | 0.003734    |      8 | 1.53101 |     0.4761 |
+-----+--------------+------+------+-------------+--------+---------+------------+

Best trial config: {'l1': 64, 'l2': 32, 'lr': 0.0037339984519545164, 'batch_size': 4}
Best trial final validation loss: 1.5310075663924216
Best trial final validation accuracy: 0.4761
Best trial test set accuracy: 0.4737

大多数试验已提前停止,以避免浪费资源。性能最佳的试验实现了大约 47% 的验证准确率,这可以在测试集中得到证实。

就是这样!现在,您可以调整 PyTorch 模型的参数了。

脚本的总运行时间: ( 10 分钟 10.049 秒)

由 Sphinx-Gallery 生成的图库

文档

访问 PyTorch 的全面开发者文档

查看文档

教程

获取面向初学者和高级开发人员的深入教程

查看教程

资源

查找开发资源并获得问题的解答

查看资源