Deep Convolutional Generative Adversarial Networks
Deep Convolutional Generative Adversarial Networks are a class of CNN and one of the first approaches that made GANs stable and usable for learning features from images in unsupervised learning. This project is a port of the pytorch/examples/dcgan. At the end of this example you will be able to use DCGANs for generating images from your dataset.
In this guide we will train a Pytorch model in an unsupervised task and use it to generate images from an input vector Z(100 dimensional uniform distribution). Finally, we will deploy the trained model as a REST endpoint that can be used to generate images in real time.
$ git clone https://github.com/floydhub/dcgan.git $ cd dcgan $ floyd init dcgan
For this project we will use the Labeled Faces in the Wild Home, aka LFW, for the training. Since we are learning in an unsupervised regime, there is not a train/dev/test split, but we use the BCE Loss Error of the Discriminator and Generator as good metrics to learn (minmax game strategy from Game Theory). Moreover, we can visualize the generated images every epoch. This preprocessed dataset is available publicly on FloydHub.
You can train the DCGAN model by running
main.py script with required
parameters. Below is the command to start the training job on Floyd:
$ floyd run \ --gpu \ --env pytorch-0.2 \ --data redeipirati/datasets/lfw/1:lfw \ "python main.py --dataset lfw --dataroot /lfw --outf trained_models --cuda --ngpu 1 --niter 20"
--gpuensures the job will run on a GPU server.
--env pytorch-0.2ensures the job will be run in FloydHub's
- The input dataset is passed using
--data redeipirati/datasets/lfw/1:lfw. This mounts the pre-processed LFW dataset (available here under FloydHub user @redeipirati's account) at
/lfw. Also note that
/lfwas the path where the data will be available. This ensures that the code knows where to find the dataset.
This job takes about 20 minutes to run and generate a model. You can follow along the progress by using the logs command. If you run the model with default value it will take about 1-5 minutes per epoch.
$ floyd logs <JOB_NAME> -t
Floyd saves any content stored in the working directory (
/floyd/home) at the end of the
job. This output can be used as a datasource in the next project. To get the
name of the output generated by your job use the info
$ floyd info <JOB_NAME>
To evaluate your model you can run the
generate.py script included in the GitHub repository.
The script needs the path of a checkpoint Generator model, which you pass to
the script using its
--netG flag. Just make sure that you mount the
checkpoint Generator model at the path you pass to the
--netG flag. Below, we
mount the model at
/model. You can also provide a serialized Zvector
--Zvector) to experiment with latent Z vector arithmetic landscape and for
analyzing the semantic information encoded during training.
floyd run \ --gpu \ --env pytorch-0.2 \ --data <REPLACE_WITH_JOB_NAME>:model \ "python generate.py --netG /model/trained_models/<REPLACE_WITH_MODEL_CHECKPOINT_PATH> --ngpu 1 --cuda"
# Provide a serialized Zvector floyd run \ --gpu \ --env pytorch-0.2 \ --data <REPLACE_WITH_JOB_NAME>:model \ "python generate.py --netG /model/trained_models/<REPLACE_WITH_MODEL_CHECKPOINT_PATH> --Zvector /model/trained_models/<REPLACE_WITH_SERIALIZED_Z_VECTOR_PATH> --ngpu 1 --cuda"
You can track the status of the run with the status or logs command.
$ floyd status <JOB_NAME> $ floyd logs <JOB_NAME> -t
Improving your model¶
You may notice that the output does not look great. In fact, the algorithm has
not yet learned how to correctly represent a face. That is because we ran the
training for a small number of iterations. To train a fully working model, try
the training step again, this time setting the flag
--niter to a large value,
such as 300. In general, about 300/500 epochs (or even more, much more!) are
necessary to have an accetable model. (Note: This takes a few hours to run on
the GPU instance!)
Keep in mind that all the classes of generative networks are neither stable
nor production ready, this is an exciting field of research and everyone can
contribute with new ideas.
Evaluate pre-trained models¶
If you want to try out a pre-trained model, FloydHub has a public dataset model
with the checkpoints (300 epochs training) for this located here. You can mount it with
floyd run \ --gpu \ --env pytorch-0.2 \ --data redeipirati/datasets/dcgan-300-epochs-models/1:model \ "python generate.py --netG /model/netG_epoch_299.pth --ngpu 1 --cuda"
This model should perform better compared to the previous one. You can also provide the
--Zvector parameter to explore the latent Z vector landscape. We have also provided to you the zvector used for evaluating our model in the attached dataset:
floyd run --gpu \ --env pytorch-0.2 \ --data redeipirati/datasets/dcgan-300-epochs-models/1:model \ "python generate.py --netG /model/netG_epoch_299.pth --Zvector /model/zvector.pth --ngpu 1 --cuda"
Serve the Model with a REST API¶
If you run a job with
flag, FloydHub will run the
app.py file in your project and attach it to a
dynamic service endpoint:
floyd run \ --gpu \ --mode serve \ --env pytorch-0.3 \ --data <REPLACE_WITH_JOB_NAME>:input
The above command will print out a service endpoint for this job in your terminal console.
The service endpoint will take couple minutes to become ready. Once it's up, you can interact with the model by sending a serialized Zvector file with a POST request or simply generate images from random noise with a GET request(you can also use the
ckp parameter to chose a specific checkpoint):
# e.g. of a GET req curl -X GET -o <NAME_&_PATH_DOWNLOADED_IMG> -F "ckp=<MODEL_CHECKPOINT>" <SERVICE_ENDPOINT> curl -X GET -o prova.png -F "ckp=netG_epoch_99.pth" https://www.floydlabs.com/serve/redeipirati/projects/dcgan # e.g. of a POST req curl -X POST -o <NAME_&_PATH_DOWNLOADED_IMG> -F "[email protected]<ZVECTOR_SERIALIZED_PATH>" <SERVICE_ENDPOINT> curl -X POST -o prova.png -F "[email protected]/parameter/zvector.pth" https://www.floydlabs.com/serve/redeipirati/projects/dcgan
Any job running in serving mode will stay up until it reaches maximum runtime. So once you are done testing, remember to shutdown the job.
In the original paper the model was trained on the LSUN dataset, and then the learned features were used to perform an image classification task on the CIFAR-10 dataset. DCGAN was one of the first stable models based on GAN and the first which tried to learn features from images in an unsupervised regime. GAN is an extremely active research area because they can provide an unlimited amount of high quality data which is necessary to train Deep Learning models.
Help make this document better¶
This guide, as well as the rest of our docs, are open-source and available on GitHub. We welcome your contributions.
- Suggest an edit to this page (by clicking the edit icon at the top next to the title).
- Open an issue about this page to report a problem.