Machine Learning for generative painting informed by visual arts Investigating painting techniques for designing Machine Learning pipelines of generative painting Master’s thesis in Computer science and engineering CHAOMING WANG Department of Computer Science and Engineering CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG Gothenburg, Sweden 2023 Master’s thesis 2023 Machine Learning for generative painting informed by visual arts Investigating painting techniques for designing Machine Learning pipelines of generative painting CHAOMING WANG Department of Computer Science and Engineering Chalmers University of Technology University of Gothenburg Gothenburg, Sweden 2023 CHAOMING WANG © CHAOMING WANG, 2023. Supervisor: Kıvanç Tatar, Department of Computer Science and Engineering Examiner: Palle Dahlstedt, Department of Computer Science and Engineering Master’s Thesis 2023 Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg SE-412 96 Gothenburg Telephone +46 76 397 6889 Typeset in LATEX Gothenburg, Sweden 2023 iv Machine Learning for generative painting informed by visual arts Investigating painting techniques for designing Machine Learning pipelines of gen- erative painting CHAOMING WANG Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg Abstract Visual art practice is a complicated, varied, creative process based on the artist’s style and preferences. Although many studies have attempted to apply artificial intelligence techniques to art production and statistical analysis, there is still sig- nificant scope for exploring how to incorporate the techniques in visual arts prac- tices into generative painting pipelines using Machine Learning. This thesis ap- plies machine learning to analyzing painting techniques in painting practices with a research-through-design approach. The problem is mainly presented as tasks such as segmentation of artworks (in this thesis, paintings), stroke prediction, and the presentation of painting processes based on different painting techniques through different algorithmic pipelines. The results show that most segmentation models based on photo training are challenging to apply to the segmentation of artwork components directly, and relevant improvement solutions are discussed in Chapter 6. In addition, due to the diverse presentation of painting art, this paper presents different painting techniques based on the foreground and background segmentation and ’blocking-in’ techniques based on line detection. It discusses the possibility of transferring these painting processes to other painting processes. Keywords: painting techniques, visual art, image segmentation, machine learning. v Acknowledgements Completing this thesis would not have been possible without the help of many peo- ple. I extend my sincere gratitude to everyone who has contributed to my thesis, in ways both big and small. I am truly fortunate to have such a supportive network of people around me, and I am grateful for the opportunity to have undertaken this research. I would like to express my sincere gratitude to my supervisor Kıvanç Tatar, for his patience, guidance, and support. I have gained a lot during the project, both in the task-related topics and the ability of art perception and appreciation. Most importantly, I received professional and constructive guidance on doing research. I am truly grateful for their mentorship and for pushing me to achieve my best work. I would also like to express my gratitude to the esteemed professors in Chalmers, whose teachings and guidance have shaped my research and contributed signifi- cantly to my intellectual growth. In particular, I would like to express my gratitude to Lennart Svensson, the professor of the deep learning course, for his extensive coverage of the field of deep learning and generative art, and I have found ideas to address many of the problems in this research by asking him for advice after class. In addition, I would like to thank other researchers in the generative arts and machine learning community whose related research has inspired my continued exploration in this area. Finally, I am deeply grateful to my family and friends, who have been a constant source of support and encouragement during this challenging journey. Their unwa- vering belief in me has given me the motivation to continue working towards my goals, even when the going gets tough. I thank them for always being there for me, for listening to my ideas, and for their unwavering love and support. Chaoming Wang, Gothenburg, April 2023 vii List of Acronyms Below is the list of acronyms that have been used throughout this thesis listed in appearance order: DALLE Neural network created by OpenAI, that can generate images from natural language descriptions CNN Convolutional neural network GAN Generative adversarial network VQ-VAE Vector quantized variational autoEncoders LDMs Latent diffusion models JPG Joint photographic experts group PNG Portable network graphic RGB Color space in (Red,Green,Blue) format HSV Color space in (Hue,Saturation,Value) format CMYK Color space in (Cyan,Magenta,Yellow,Key) format RtD Research method of "Research-through-Design" HCI Human computer interaction SNIC Swedish national infrastructure for Computing DCGAN Deep convolutional generative adversarial network DRAM Diverse realism in Art Movements ASPP Atrous spatial pyramid pooling KDE Kernel density estimation GMM Gaussian mixture model MSE Mean squared error LMSE Local mean square error ETF Edge tangent flow MOS Mean opinion score ix Nomenclature Below is the nomenclature of indices, sets, parameters, and variables that have been used throughout this thesis. Indices t Index for time step Sets C0 Empty canvas at initial time step Ct Progressively generated canvas at time step t C̃ Reference artworks shown in canvas Parameters Xt Vector of parameters estimated in the t iteration µ Learning rate for optimization process Smap,Sshape Matrix of boolean values Colormap Matrix of optimized pixel values Colorcomp Computed matrix of values for pixel color wts Weights for strokes at time step t vtc Previous canvas state at time step t st Weighted sum of generated strokes at time step t P (x) Probability density of the input image wk Weight for each component in clustering N(x|µk, θk) each Gaussian component with mean µk and covariance θk Gx(x, y),Gy(x, y) Horizontal and vertical gradients xi N Number of local units of differences Rf ,Gf Features extracted from referenced images and generated images respectively. H(x, y) The 2-order differential value at pixel [x, y] M(x, y) Magnitude of 2-order differential value at pixel [x, y] xii Contents List of Acronyms ix Nomenclature xi List of Figures xvii List of Tables xix 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Scope and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Clarification of the issue . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5.1 Painting techniques . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5.2 Strokes input . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5.3 Baseline models . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.6 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.7 Sustainability and Ethics . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.8 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.9 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Theory 9 2.1 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Art painting segmentation . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Using domain adaption in segmentation . . . . . . . . . . . . 10 2.2 Stroke-based Painting . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Art Perception in Painting Process . . . . . . . . . . . . . . . . . . . 11 2.4 Color Space of Images . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Methodology 13 3.1 Research-through-design method . . . . . . . . . . . . . . . . . . . . 13 3.1.1 Stroke Estimation Using Machine Learning . . . . . . . . . . . 14 3.1.2 Searching Strategy for Brush Size . . . . . . . . . . . . . . . . 15 3.2 Platforms and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 xiii Contents 4 Experiments 19 4.1 Ablation Study: Segmentation techniques on art paintings . . . . . . 19 4.1.1 K-means clustering on HSV/RGB images . . . . . . . . . . . . 20 4.1.2 DeepLabV3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.3 DeepLabV3 with domain adaption . . . . . . . . . . . . . . . 23 4.2 Brush Stroke Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 Circle Brush Strokes . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.2 Template-based Brush Strokes . . . . . . . . . . . . . . . . . . 28 4.3 Hyper-parameters for the training process . . . . . . . . . . . . . . . 28 4.4 Generate Brush Stroke Centroids . . . . . . . . . . . . . . . . . . . . 30 4.4.1 GMM-based Density Estimation . . . . . . . . . . . . . . . . . 30 4.4.2 Gradient-based Density Map . . . . . . . . . . . . . . . . . . . 31 4.5 Estimate Parameters of Brush Strokes . . . . . . . . . . . . . . . . . 32 4.5.1 Rotation Angle (θ) Estimation . . . . . . . . . . . . . . . . . . 32 4.5.2 Lengths (L1, L2, W1, W2) Estimation . . . . . . . . . . . . . . 33 4.6 Render Strokes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.7 Experiments I: "Details first" vs. "Background First" in painting process 34 4.7.1 Configurated pipelines . . . . . . . . . . . . . . . . . . . . . . 34 4.7.2 Complete "voids" . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.8 Experiments II: Iterative design for improvements . . . . . . . . . . . 36 4.8.1 Application of neural networks . . . . . . . . . . . . . . . . . . 36 4.8.2 Explore on abstract paintings . . . . . . . . . . . . . . . . . . 36 5 Results and Analysis 39 5.1 Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1.1 Quantative Evaluation . . . . . . . . . . . . . . . . . . . . . . 39 5.1.2 Visual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2 Segmentation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2.1 K-means clustering . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2.2 DeepLabV3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.3 Using Domain Adaption . . . . . . . . . . . . . . . . . . . . . 43 5.3 Density Map Estimations . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.4 Sampling results given density maps . . . . . . . . . . . . . . . . . . 46 5.5 Stroke-based Painting Results . . . . . . . . . . . . . . . . . . . . . . 46 5.5.1 Procedural Painting Processes . . . . . . . . . . . . . . . . . . 47 5.5.2 Test results on abstract paintings . . . . . . . . . . . . . . . . 47 6 Conclusion 51 6.1 Discussions on Sub-tasks . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.1.1 Segmentation techniques for art painting . . . . . . . . . . . . 51 6.1.2 Impacts of brush strokes . . . . . . . . . . . . . . . . . . . . . 52 6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.2.1 Segmentation on art painting . . . . . . . . . . . . . . . . . . 52 6.2.2 Art perception . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.2.3 Absence of procedural information . . . . . . . . . . . . . . . 53 6.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.3.1 Separate perception and rendering . . . . . . . . . . . . . . . 54 xiv Contents 6.3.2 Modifications for generating abstract paintings . . . . . . . . . 54 Bibliography 57 xv Contents xvi List of Figures 1.1 Painting process with the blocking-in first approach. . . . . . . . . . . 1 1.2 Painting process with Alla Prima approach. . . . . . . . . . . . . . . 2 3.1 Proposed painting pipeline. . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Examples of art paintings in DRAM dataset. . . . . . . . . . . . . . . 18 4.1 Input RGB images for K-means clustering algorithms. . . . . . . . . . 20 4.2 The visualization of value distributions for input images in RGB and HSV color space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3 Segmentation results using RGB and HSV color space of images. . . . 21 4.4 The examples of input images for DeepLabV3 models. . . . . . . . . . 23 4.5 Sample results for DeepLabV3 segmentation. . . . . . . . . . . . . . . 24 4.6 Illustrations of Stoke models. . . . . . . . . . . . . . . . . . . . . . . 26 4.7 Example of computing and applying brush strokes in pixel-wise canvas. 27 4.8 Generated sample stroke. . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.1 K-means clustering results of art paintings of animals ("a fox in the wild") in Realism, K=2. . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2 Segmentation results of DeepLabV3 (ResNet_101) on paintings of multi-portaits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 Segmentation results of DeepLabV3 (ResNet 101) on paintings of landscapes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.4 Results of sampling stroke centroids with different values of p_max. 49 5.5 The painting process of a portrait example using rectangular brush strokes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.6 Painting process on an abstract painting. . . . . . . . . . . . . . . . . 50 5.7 Illustration of painting process of a portrait with paint_mode = "f-b". 50 xvii List of Figures xviii List of Tables 3.1 The content details of DRAM dataset. . . . . . . . . . . . . . . . . . 16 4.1 Hyperparameters for initial system design of painting process. . . . . 34 5.1 Sample test images for segmentation. . . . . . . . . . . . . . . . . . . 41 5.2 Extracted foreground by using K-means clustering (K = 2) method. . 42 5.3 Extracted foreground by using DeepLabV3(ResNet 101). . . . . . . . 44 5.4 Extracted foreground by using DeepLabV3(ResNet101) with domain adaption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.5 Test results of GMM-based density estimation and gradient-based density estimation on 100 art paintings. . . . . . . . . . . . . . . . . . 48 xix List of Tables xx 1 Introduction Visual art practices involve all art forms that convey the ideas and emotions of artists in visual presentations. They are diverse depending on painting styles and the techniques preference of artists. Artists can employ various painting techniques while creating visual artworks, whether trained or out of nature. Unfortunately, the mainstream recognition of painting techniques varies greatly depending on factors such as categorization and art genres. Using stroke-based paintings, for instance, one existing definition of painting techniques is to divide them into seven categories based on the order in which the regions of interest move during the art-creating pro- cess1. However, other definitions of painting techniques are possible. For instance, painting techniques can involve different brush stroke appearances, including shapes and textures. The two painting processes shown below demonstrate the different pipelines of cre- ating artworks with blocking-in and Alla Prima techniques in Figure 1.1 2 and Figure 1.2 3, respectively. With the blocking-in approach, artists choose dominant colors and loosely paint them first on the whole canvas, so the painting process is segmented given the color and tone blocks. Usually, the background and foreground items are separated. While in Alla Prima, which is a quick painting technique, artists paint the background and foreground all at once, and they do not contour the lines and edges as blocking-in but lay every stroke in the same colors. Figure 1.1: Painting process with the blocking-in first approach. Generative art [1][2] [3] has been prominent in various popular culture industries such as film-making, music, games, robotics, and the Net Art [4]. As an ever- diversifying art practice that combines creative ideas with autonomous systems, 1The description of these painting techniques from www.liveabout.com 2Figures were captured from Youtube video by Michael James Smith 3Figures were captured from Youtube video by Florent Farges - arts 1 https://www.liveabout.com/techniques-for-creating-a-painting-2578799 https://www.youtube.com/watch?v=dywD2D-GCS8&t=13s https://www.youtube.com/watch?v=JYSdKQ31sqA 1. Introduction Figure 1.2: Painting process with Alla Prima approach. generative art fosters discussions in modern aesthetics and art theory. The pros- perity of AI technologies has specifically shed light on the ideas that use machine learning to create digital artworks. For example, this year, an image made with Mid- journey an artificial intelligence (AI) system that can produce detailed images when fed written prompts based on DALLE, won first prize at the Colorado State Fair Fine Arts Competition [5]. More explorations of AI art systems inspire research on combining generative art with natural visual art practices using machine learning, including AARON [6] by Harold Cohen early in 1973, and recent initiatives such as DeepDreams [7] by engineers in Google, StyleGAN [8] by OpenAI and Stable Diffusion. 1.1 Background Among various forms of artistic presentation, visual arts have become a main- stream with their wide range of application scenarios in industry, including the paradigmatic forms such as painting and other forms related to mass media such as film-making,dance,photography, etc. Differences in aesthetic perceptions and technical preferences among artists contribute to the diversity of visual artworks. In traditional paintings, for example, the preferences of materials such as can- vas,wood,porcelain and tools ranging from pigments,watercolor to charcoal,etc.,which can be seen as ’techniques’ in creation of art,can produce unique aesthetics. Apart from that, the artist’s actions also have an impact in the aesthetics of resulting art- works, such as the orders and directions of lines in sketching, the distribution and blending of pigments in watercolors. Recently, visual art practices have more flexibility of combining artists’ creativity with computer technologies. Whereas comprehension and appreciation of art are still considered to be exclusively human capability, recent advances in computer technologies in terms of machine learning, computer graphics and digital image processing enabled computational analysis of artworks using AI technologies. An ncreasing number of large-scale digitalized artwork collections gives opportunities to analyse the oeuvres and history of visual art. Particularly, convolutional neural networks(CNNs) boosted the machine intelligence in large-scale classifying and syn- theically understanding artworks. On the other hand,these online available datasets of artworks leverage more applications that produce creative practices using AI tech- nologies. In the past decades, research on generative models especially generative adversial networks(GANs) inspired various explorations of making machines gener- 2 https://midjourney.gitbook.io/docs/ https://midjourney.gitbook.io/docs/ https://openai.com/blog/dall-e/ https://stablediffusionweb.com/ https://stablediffusionweb.com/ 1. Introduction ate images given a set of visual art examples. Painting as a visual art form predominately presented in two-dimensional space clearly serves as a breakthrough in an attempt to combine AI technologies and artistic creation. And because of better computers that allow computing on large datasets of images in super resolutions, a lot of researchers proposed to make intel- ligent machines create artworks. A remarkable milestone among all these efforts is style transfer using CNN frameworks, which was first introduced in 2016 by Leon A Gatys et al. [9], aiming to generate stylized images by separating the content and style of images. This initiative triggered growing interest in style transfer models and stylized painting methods. Subsequent discussions in this field stand in three categories——stroke-based rendering, image analogy, and image filtering, in terms of the technological frameworks. Another field of creating novel visual content using AI technologies is to generate images from text, in which emerged not only the GAN-based architectures such as StackGAN [10],but also multi-modal applications based on transformer frameworks, which was first introduced as natural language models solving tasks such as text classification and machine translation [11],has been successfully used for text-to- image generation [12, 13].One state-of-the-art transformer-based model that out- performs previous GAN-based models is CogView [14],which combines transformers with the framework of Vector Quantized Variational AutoEncoders(VQ-VAE) [15] to improve large-scale image generations from text in terms of reducing fidelity loses [14]. In 2022,another most remarkable AI inventions on image generations is Stable-Diffusion [16] based on Latent Diffusion Models(LDMs)that achieved com- petitive performances on image generation and text-to-image synthesis by applying diffusion models in a latent space of powerful pretrained autoencoders [17]. Although the growing attempts of combining AI technologies and visual arts have enriched approaches to understand and appreciate visual artworks [18], and created efficient ways to generate artworks given textual or audio description, the details of creative process in terms of how painting techniques play roles in creating novel artworks using AI techniques are still enigmatic. Obviously we can appreciate AI- generated visual artworks by evaluating the semantic synthesis between input infor- mation and generated results, and analyse the history and context of artworks in a computational manner. Painting techniques used in art creative processes how- ever is necessary for analyse the dynamics of AI-generated visual arts,especially in context of explaining the novelty of AI-generated artworks. 1.2 Motivations As initiatives of using AI art in industry grows, such as online exhibitions and in- teractive visual arts, AI technologies especially deep learning with domain-specified neural networks have empowered artists to create artworks in novel forms. How- ever, most attention of researchers,artists and viewers was centered on the resulting visual presentation of AI art, it is far from sufficient to explore how the knowledge 3 1. Introduction in visual arts practices can be integrated into artistic creation processes using AI technologies [19]. Therefore, this project attempts to explore how the knowledge in visual arts practices can be integrated into artistic creation processes using AI tech- nologies. A promising outlook is to practically enrich the context of image-to-text research. For example, recent work in image-to-text synthesis mostly concern on generating semantic description of the scenes, our work can foster following research in generating aesthetic remarks for artworks. Once AI machines gain the capability of aesthetically analysing visual artworks, it is possible to make full use of the ad- vances of machines in terms of large-scale computing to create novel techniques for visual art creation. The project explores how painting techniques can a play role in the architecture design of AI art technologies. Specifically, we propose generative painting systems that are customized for different kinds of painting techniques and their final artworks given some content images. Since the painting techniques are usually presented in textual way, the system should be constructed on a multi-modal basis that can tackle problems such as extracting painting techniques from text, generating stroke-based images given a set of content images. The desired outcome is a multi-modal architecture that takes textual description of painting techniques and a set of referenced context images as input and generates stroke-based artworks,which can illustrate the impact of different painting techniques on artistic creation. The system are expected to be applicable to different painting techniques and bring insights for visual arts creation. 1.3 Research Questions The project aims to inform the generative system design with the knowledge of the artistic process and machine learning technology. The research question is therefore how to design a procedural painting pipeline that can integrate painting techniques from artists. Sub-tasks related to evaluations and generalization issues should be solved in the final stage of designing a generative painting pipeline. One sub-task in this research question is how we evaluate the quality of the generated paintings. Specifically, we decide on the way of quantitative analysis of the generated paint- ings as a complement to visual analysis. For generalization purposes, the generative painting pipeline was designed for specific painting practices. However, it aims to characterize any typical style of artwork and different painting techniques. So the research involves a discussion of possible improvements for generalization. Specific principles and requirements were clarified within the research question. Since the procedural information of creating art paintings by artists is unavailable from a dataset of images, the way of translating procedural painting techniques into computative modules in the design of the generative painting pipeline is diverse. Thus it should be clarified with fundamental assumptions in the design. During designing the painting pipeline, specific models and techniques in machine learning that are effective for modular functions should be investigated, and the motivations 4 1. Introduction for selecting these machine learning techniques should be explained. For instance, We compare the performance of the generative system with approaches of neural networks and parameter optimizations. 1.4 Scope and Limitations Recent advances in machine learning have led to interdisciplinary artists and re- searchers employing various machine learning techniques in the artistic creation of visual arts, especially in style transfer [20, 21, 22]. However, the problem of in- tegrating the painting techniques of an artist with the machine learning models is less addressed [23]. Therefore, one delimitation is that few principles of designing a generative painting pipeline involve the natural art-creating processes. Another delimitation is defining the painting techniques that can be used to inte- grate the artistic ideas and methods in easel painting processes with machine learning models. Painting techniques, as manual and procedural elements in painting pro- cesses, are usually diversely defined regarding material choices, stroke textures, and the proceeding orders of items in paintings. The painting techniques in use mainly contained procedural techniques involved in creating processes of art paintings since the generative process is highly modular and flow-oriented [24]. We assumed that all painting processes focus on one canvas region at a time slot so that they can be simulated by a sequence of foreground patches extracted from image segmentations. For example, the orders of object placements in a painting expresses artists’ idea of procedural painting techniques such as "blocking in", in which artists place the main edges of colors and shapes [25], and "detail first "(to start with the details on the foreground objects) or "underpainting"(to overlay the colors from background to foreground items) [26]. Moreover, various brush stroke templates were used to represent other painting techniques regarding the material selection and crossing out the underlying colors. The research scope was limited to designing a procedural generative pipeline focusing on stroke-based paintings. Unlike previous approaches in style transfer, which were deployed with the complete picture of artworks and aimed to achieve good resulting artworks [27], our system is designed in a stroke-based method with segments taken from the translation of painting techniques. By estimating the position, shape, and color of each stroke sequentially, it presented the resulting art paintings and the processes that imitate the actual painting practices. 1.5 Clarification of the issue 1.5.1 Painting techniques To implement painting techniques in a generative process, we translate procedural painting techniques and represent them as different painting workflows. Specifically, these workflows contain three groups of parameters that affect the semantic seg- 5 1. Introduction mentation module:(1) "blocking in":in which the image segmentation module first captures edges between foreground and background color shapes and then the over- lay different strokes;(2) "Detail first": in which the segmentation module aims to extract foreground items iteratively;(3)"Detail last": the segmentation module out- put background first given a set of digitalized artworks. 1.5.2 Strokes input To learn the painting techniques in art creation, pre-defined vector graphics of strokes need to be used as input to the training process. Strokes are stored in vec- tor graphics rather than raster images because the predictions of strokes are based on changing the colors and rescaling the shapes, vector graphics with the property of storing pixel information as points in coordinate system, win over raster images such as JPG and PNG on the convenience of rescaling to any resolution without pixelation [28, 29]. 1.5.3 Baseline models The workflow of conducting experiments in this research contains three main phases: • Construct flow-oriented representation for different procedural painting tech- niques; • Implement semantic segmentation networks on digitalized artworks; • Generate strokes based on optimizing the shapes and colors of pre-defined strokes. Baseline models for segmentation and generative processes in the second and third phases were needed. For semantic segmentation, a competitive pre-trained model that extracts the foreground objects can be implemented first. Some state-of-the-art models were executed at this stage, such as K-means clustering and DeepLabV3 [30]. Many machine learning architectures can estimate the colors and shapes of future strokes based on the output of the segmentation module and stroke input. Brush stroke estimations were translated into parameter optimization problems. For the positions of brush strokes, we used sampling methods such as gradient-based sam- pling. And we used backbones in style transfer architecture such as DCGAN to estimate the shape parameters of brush strokes. 1.6 Research Methodology This thesis employs the research-through-design(RtD) method, one first-person method that uses personal experience within the design process. Implementing experiments and system design with personal experience can present our relevance for designing and gaining in-depth knowledge on interacting with systems within the related fields [31]. The theory, methods, experiments, and evaluations are based on a proposed trans- lation from visual art practices to the tasks within the field of machine learning. 6 1. Introduction Following the research-through-design methodology, the thesis work selects exem- plary painting techniques and pipelines in actual art-creating processes and designs painting pipelines that use machine learning techniques to understand art-creating practices. The detailed design and motivations are presented in Chapter 4. Experiments on different machine learning techniques, such as segmentation models, follow empirical research methodology. In Chapter 5, the thesis work analyses the result from each sub-module by comparing the outputs in different categories, such as "landscape" and "portrait" paintings or paintings in realism and abstractionism. In different phases of this thesis work, quantitative and observative evaluations are performed to help analyze the proposed generative painting pipeline on specific tasks. 1.7 Sustainability and Ethics The main focus of this study is to investigate a process of generating art based on machine learning techniques that incorporate the ideas of human artists in their paintings. The process does not involve human subjects, so this study has no ethical approval issue. However, it should be noted that the artworks used in the experiment are from publicly available digital art collections, and the use of these paintings does not re- quire and is difficult to obtain permission from the creators. At the same time, the research in this paper focuses only on a representative selection of artworks, most typical paintings from Western art movements. It may lead bias since the research was finished on limited categories of widely available art styles and visual art forms, which were chosen based on the experiment’s feasibility and data availability. A specific description of the data set is presented in Section 3.3. Finally, there are a number of potential stakeholders for this study, including the creators of the artwork, the researchers who analyzed and designed the painting process, and the audience for the visual art done by the painting system. A widely discussed AI ethics issue is the copyright of AI-generated art; however, this is not discussed in this study. Rather, it acknowledges the subjectivity of the participants in the art perception process and attempts to mitigate it in the evaluation. 1.8 Contributions The contributions of this thesis work are summarized below: • The work proposes a machine learning pipeline which is built through analyz- ing painting techniques in conventional visual arts practices. • The work compares different image segmentation methods for painting seg- mentation task, and discusses the results. • The work provides know-how on how painting techniques can inform brush- based autonomous painting processes. 7 1. Introduction • The work provides an in-depth analysis of the proposed method, while dis- cussing the limitations and proposing future research directions. 1.9 Structure of the thesis The following content of the thesis is organized as follows: Chapter 2 presents theories about image segmentation methods, visual art creation, painting techniques, and art perception. Chapter 3 presents the methodology of designing the experiments and algorithms related to the sub-tasks and evaluation methods. Chapter 4 presents the implementing details of experiments and related tools and framework upon which the experiments of this thesis work depend. Chap- ter 5 presents the results of the experiments. Validity analysis on each result and comparative analysis are presented in this chapter. The conclusions drawn from this thesis are presented in Chapter 6, and limitations, improvements, and future works are proposed. 8 2 Theory 2.1 Image Segmentation Segmentation is often used as an image analysis method to analyze an image’s local characteristics. The basic principle is to assign a single class to the pixels of an image and eventually combine the localities of the same class. Based on the different domains of the task, these segmented images are often derived from subject- specific datasets. The datasets used for image segmentation tasks tend to be real- world photographs, as such datasets facilitate data annotation for use in supervised learning-based image segmentation schemes. In addition, items in these photographs are rarely deformed, so segmentation models trained on previous datasets can be applied to similar task scenarios. In contrast, the segmentation of art paintings is richly varied depending on the art style because the art style tends to produce deformations on the target so that even the same semantic target will show different local shapes in different art paintings. Therefore, image segmentation based on art paintings is a non-trivial segmentation task. 2.1.1 Art painting segmentation Art painting segmentation is the task of dividing an image of a painting into different regions or segments based on their visual characteristics or semantic meanings. This task aims to enable automated analysis and understanding of art paintings. This project uses art painting segmentation to analyze and represent different painting techniques. However, art painting segmentation is challenging due to the complex- ity and diversity of painting styles and techniques and the subjective and context- dependent nature of art interpretation. Different artists may use different painting styles, materials, and colors to convey different meanings and emotions, making it challenging to develop a general-purpose segmentation algorithm that works well for all paintings. Computer vision techniques for art painting segmentation mainly use features such as gradient, intensity, and neighboring similarity of images. These techniques in- clude classical image analysis, such as K-means clustering, Canny edge detection, and GrabCut, and even hybrid methods. Computer vision techniques theoretically suffer from limitations in art painting segmentation tasks because they ignore higher- level semantic information of images. In addition, these methods have a limitation on super-pixel and large-scale images. Especially the Grabcut algorithm requires that the user first marks out the foreground and background areas with rectangular 9 2. Theory boxes or strokes, etc., as an initial estimate for the algorithm. Therefore this is extremely inefficient in a large number of image segmentation tasks. AI approaches can theoretically perform better than classical computer vision ap- proaches because they combine various features such as pixel color, texture, and shape and edges. Specifically some machine learning methods such as K-means clustering and neural networks may have better performance in segmentation tasks. They have more accurate segmentation results, especially in distinguishing the fore- ground and background of the image. These machine learning methods mainly learn to map the input art paintings with segmentation masks. The mapping strat- egy is modeled as deep neural networks and trained on a large dataset of annotated paintings. It advances computer vision methods for dealing with large paintings and powerful computation capability for large datasets. However, it generally lacks annotated art paintings, especially for semantic segmentation of abstract paintings. 2.1.2 Using domain adaption in segmentation Despite proposing possible segmentation methods in the art painting domain, seg- mentation using domain adaption from photographic images to art paintings is still a priority. Directly transferring the segmentation models trained in the photographic domain can hardly solve the problems in the art painting domain. The differences in colors, textures, and geometric features of content between photographs and art paintings lead to significant accuracy losses when directly using a semantic segmen- tation model trained on photographs and implemented in paintings that usually contain unrealistic motifs and geometric distortions. Existing image segmentation methods are rarely based on art paintings but on pho- tographs. A natural way to solve this problem is to propose an architecture that solves segmentation tasks in the art painting domain. However, the need for ground truth makes training a segmentation model based on artwork challenging. Even though art painting datasets are available for training machine learning models, they are rarely concerned with segmentation but classification. Therefore, valuable future work is to make art painting datasets for segmentation tasks. These datasets should contain ground truth for distinct artistic styles since different styles may cause specific transformations of the same realistic object in the painting. However, this project prioritizes using domain adaption in segmentation for time efficiency. Style transfer and Gram matrices are key techniques that make domain adaption from the photograph-based segmentation models to art paintings possible. By style transfer, we can generate "pseudo paintings" given content images and style images. There are some feature mappings between the generated images and the content images, which are usually realistic photographys with pixel labels for segmentation tasks. Along with the desire to use ground truth labels for content images, we want to be able to produce ground truth for each style of art painting for training, which 10 2. Theory depends on the mapping of that style. Gram matrices make this possible because they can be used to measure the mapping relationships for any of the art style transitions. 2.2 Stroke-based Painting In digital painting, stroke-based painting involves applying brushstrokes or marks to an image to mimic traditional painting techniques. Stroke-based painting can cre- ate various artistic effects, from realistic simulations of traditional painting styles to more abstract and expressive images. It is particularly effective in creating textures or layering effects. In applications that mimic the brush stroke patterns of human artists, stroke-based painting first requires analysis of the reference art paintings. Because brush strokes may vary in shape, size, and color and overlap or blend in complex ways, analysis using traditional computer vision techniques is challenging [32, 33]. However, recent advances in deep learning and image analysis techniques are beginning to make it possible to automatically segment and analyze stroke-based paintings, allowing for new insights into the artistic process and the underlying structures and patterns within the artwork. Existing research commonly employs a parametric approach to digital strokes to render their artistic effects in digital images, with these parameters often necessitating stroke shape, color, and position. This approach effectively makes digital painting controllable and automated, especially in recent studies that use deep learning techniques to analyze the textures, edges, and other features of brush strokes to provide new approaches to digital art creation, such as style transfer[27] based on stroke analysis. 2.3 Art Perception in Painting Process Art perception is a complex process that involves both artists and viewers. Artists use various techniques, materials, and concepts to create visual art that conveys a specific message or emotion [34]. Viewers then interpret the artwork based on their experience, knowledge and cultural background [35]. In the painting process, the artist’s artistic perception is often present in the initial stages of the painting, i.e., thinking about the ideas and concepts they want to express, the techniques of artistic presentation, etc. Artists express their ideas of artistic perception through a series of decisions, such as how to represent the subject matter, which materials to use, and how to apply these materials to the surface. In this project, art perception is also presented in the process of designing the generative painting architecture and paralleling the viewer’s perception. Because we work with information about the artist’s complete work, rather than directly observing the actual painting process, our perception of art as viewers are subjective, including our art history and technical knowledge. Moreover, it suggests that diverse generative painting is possible. 11 2. Theory 2.4 Color Space of Images The color space of an image refers to a mathematical model used to describe the color information of each pixel in an image. The representation of image color information and the number of components differs in different color spaces. The commonly used color spaces include RGB, HSV, Lab[36], CMYK[37, 38]. The commonly used color spaces in digital image processing include RGB, HSV, and Lab. For example, in image analysis, the digital input image is often in RGB space. In the subsequent processing, such as conversion to a grayscale map, the digital conversion is based on the three-channel values of RGB. Besides, in the HSV color space, the color of each pixel consists of the values of the three components of hue, saturation, and luminance. This color space can effectively solve tasks such as image processing and color analysis in computer vision. Since color analysis for artistic painting in this project is an essential step in the design generation of the painting process, both RGB and HSV color spaces will be used. 12 3 Methodology This thesis employs a research-through-design method and first-person method that uses personal experience within the design process. Implementing experiments and system design with personal experience can present our relevance for designing and gaining in-depth knowledge on interacting with systems within the related fields [31]. 3.1 Research-through-design method Research-through-design is a research methodology that combines design and re- search activities to create new knowledge, concepts, and artifacts [39, 40]. This ap- proach is advantageous in fields such as architecture, industrial design, and human- computer interaction (HCI), where designers often encounter complex problems that require innovative solutions. Its operational process involves a cyclical process of design, testing, and evaluation to explore and optimize feasible solutions [39]. The specific process are concluded as the following four main steps: • Define the research question or problem: The first step is to identify the research question or problem, which reviews existing research to find dif- ferences or gain insights. • Design: The design phase involves the creation of a new artifact, concept, or prototype. Designers may use various design tools, techniques, and methods, such as sketching, modeling, or rapid prototyping. • Test: After the design is completed, the design results are tested to identify any design flaws or areas for improvement and to provide feedback for design improvement. • Evaluate: The final step is to evaluate the results of the RtD process, includ- ing analyzing data from user studies, soliciting feedback from stakeholders, or conducting a comparative analysis of the previous design with existing solu- tions. The main tasks in this project are categorized into three modules: the transla- tion of painting techniques from actual painting practice to structured inputs, the generative models and the optimization methods, which can be represented as the structure in Figure 3.1. Despite various approaches to translating painting tech- niques into structured inputs, such as simply encoding a set of painting techniques into different classes or using text summarization [41] to extract tags from a text description of painting techniques, the practical idea is to implement semantic seg- mentation to analyze the procedural painting techniques in creating artworks. The 13 3. Methodology segmentation module in Figure 1(a) extracts foreground objects and background separately, which can be used in estimating stroke sequences. However, exploring other painting techniques related to materials and tools is possible to make the generated images stylized. In the generative painting module, a GAN-based archi- tecture will be implemented to generate artistic images from low resolutions to super resolutions. Although the basic GANs can learn how to paint with real artworks taken as inputs, generating fine art in super-resolution is always challenging. So this module can be implemented with low resolution from scratch. For the optimization method, we can take pixel l1 or l2 loss between the canvas with several strokes and the target images. (a) Stage 1: Art perception process (b) Stage 2: procedural painting using painting techniques Figure 3.1: Proposed painting pipeline. 3.1.1 Stroke Estimation Using Machine Learning Once the image segmentation module extracts patterns that affect the stroke opti- mization, the generative module G will generate strokes from an empty canvas C0 step by step. So position estimations of each stroke are defined as an optimizing 14 3. Methodology process that minimizes the difference between the progressively generated paintings Ct and the reference artworks C̃. The update of canvas state follows a weighted sum of generated strokes st at time step t and previous canvas state Ct−1. Ct = wts ∗ st + wtc ∗ Ct−1, t ∈ 1, 2, 3, ... (3.1) wts + wtc = 1 (3.2) for any t ≥ 1, where wts and wtc are weights for strokes and previous canvas state. The loss function measures the similarity between the rendered and the reference images. Previous research on style transfer and stroke-based paintings proposed some ideas in the structure of loss function. The inspiration is that we should construct a loss function with consideration of pixel loss and stroke loss[42]. In this project, the loss in segmentation was also considered as a complement. We can do this in a gradient descent manner that updates the canvas as follows: Xt+1 = Xt − µ ∂L(C̃, Ct) ∂X (3.3) where Xt is the vector of parameters estimated in the tth iteration, µ is the learning rate. 3.1.2 Searching Strategy for Brush Size One goal of the painting pipeline is to find the optimal solution for the brush stroke placement, including the location and size of each brush stroke. A basic principle is that a larger brush size contributes to a faster painting process while it excludes more details. Oppositely, smaller brush sizes can be used to emphasize details, but the painting process can be extremely long. This principle is in line with the paint- ing processes of human artists it therefore can be used in pipeline design. The brush stroke can be transformed based on different shapes. In this project, the basic shape of a stroke is defined as a circle with a variable radius R. The trans- formation between the basic shape and stroke shape is based on the spline curve or straight line. Regardless of the stroke model, the search strategy for stroke size can be elaborated as an exploration-implementation approach. we assume that the initial stroke size is the size of the reference image and guide the stroke size by iteratively shrinking the strokes and comparing the differences in their generated images; after the stroke size is reduced to a reasonable threshold, we assume that smaller stroke sizes will all be applied to the drawing process. Therefore, in the implementation phase, this project can use decreased brush sizes incrementally. 3.2 Platforms and Tools The system is programmed in Python and based on standard frameworks such as PyTorch and OpenCV. Cloud computing resources and GPU platforms on Swedish 15 https://snic.se/ https://snic.se/ https://snic.se/ 3. Methodology National Infrastructure for Computing(SNIC) were used for training models on large datasets. For tasks in each module, previous research has provided a wide range of approaches to present excellent performance, so starting with pre-trained models and imple- menting transfer learning can be effective strategies for this project. In the segmen- tation module, although machine learning approaches based on pixel classification can be used, methods based on neural networks such as U-Net [43], ResNet [44, 45] and DeepLab [30], can be better options since the datasets used in this project consist of a large number of artworks that can be difficult to classify the content. Therefore, we selected segmentation models based on K-means clustering and DeepLabV3 and compared them with domain adaption approaches. And for the generative painting processes in this system, GAN-based architectures such as DCGANs [46] were used as an improvement of rendering modules. 3.3 Dataset The research requires three datesets: digitalized artworks for training semantic seg- mentation networks, stroke samples, and reference paintings for generative models. Previous research on creating art with Ahas provided various artwork datasets. Ide- ally, the dataset used for this study should cover as many art painting genres and styles as possible to obtain generalizability. In addition, the dataset should be ap- plied to the subtasks of image segmentation and brushstroke rendering. However, existing datasets of art paintings, such as WikiArt dataset, are mainly designed for tasks such as classifications by artists, styles, or genres [18]. They may not be designed explicitly for segmentation tasks due to lack of pixel-level annotations. Therefore, additional manual labeling or annotations may be required for specific segmentation tasks. The research takes Diverse Realism in Art Movements (DRAM) dataset as the pre- liminary dataset of art paintings used in each module since it can be used to solve challenges in semantic segmentation on art paintings[47]. The DRAM dataset com- prises four leading art movements: Realism, Impressionism, Post-Impressionism, and Expressionism. As a target domain of art paintings, it has an unlabeled train- ing set and a fully annotated test set. The test annotations follow the guidelines of PASCAL VOC2012, which serves as the source dataset in domain adaption. There- fore, DRAM is an appropriate dataset for this project. Table 3.1: The content details of DRAM dataset. Art Movement Expressionism Impressionism Post_impressionism Realism Number of Images 1603 1538 1462 1074 Average Size* [398,500] [500, 376] [500, 425] [500,354] * The average size of each subset is the pixel number in the [width, height] format and each number represents the pixel numbers. 16 https://snic.se/ https://snic.se/ https://snic.se/ https://snic.se/ https://www.wikiart.org/en/paintings-by-style https://faculty.runi.ac.il/arik/site/artseg/Dram-Dataset.html http://host.robots.ox.ac.uk/pascal/VOC/voc2012/ 3. Methodology A standard set of characteristics, such as a particular style, content matter, or painting technique, often defines art movements. They can therefore serve as a description of a wide range of artistic styles and techniques. Table 3.1 shows that the DRAM dataset contains about 5500 digital art paintings, with a maximum size of 500*500 pixels. Figure 3.2 shows examples of works from four different art movements to show their differences. Since these original art paintings were not annotated with images for the segmentation task, they were used as templates for different art styles in the segmentation task. PASCAL VOC12 [48] is a dataset commonly used for semantic segmentation tasks and has 20 classes, including person, bird, bottle, dog, etc. The processed DRAM dataset contains model-created art paintings following the four art movements, in which the contents are from the photographic images in PASCAL VOC12. Therefore, the created art paintings are mapped with the corresponding content images, which are annotated for training segmentation models. 17 3. Methodology Figure 3.2: Examples of art paintings in DRAM dataset. 18 4 Experiments Experimentation in this thesis work is divided into four phases following the workflow of finishing a design of a generative painting pipeline with painting techniques. The four phases in this section involve: • Ablation study on different segmentation methods in Section 4.1; • Experiment design for sub-tasks in brush stroke estimations in Section 4.2 - 4.6; • Experiment on the first painting pipeline in Section 4.7; • Iterative design for improvements in Section 4.8. 4.1 Ablation Study: Segmentation techniques on art paintings In this section, we conducted an ablation study on different segmentation methods for art paintings using the DRAM dataset. Generally, image segmentation methods are based on conventional image processing algorithms and semantic segmentation models. Training a semantic segmentation model is a supervised learning process. Therefore, it requires a large number of pixel labels for each class. However, pre- vious research on art paintings and machine learning dominantly focuses on tasks such as image classifications and style transfer, so existing datasets of art paintings with pixel labels for segmentation tasks are limited. However, conventional image processing techniques for segmentation are primarily based on paintings’ colors and texture features, and they are usually trained without any label of pixels. There are different ways to present segmentation results in Python using standard libraries such as OpenCV and Pillow. We experiment with the effect of three different segmentation methods based on unsupervised learning, supervised learning, and domain adaptation in the segmen- tation of art paintings, respectively. In machine learning, k-means clustering is the typical unsupervised learning that can be used for segmentation tasks. Moreover, we can choose DeepLabV3 as the deep learning model for segmentation tasks. At last, we investigated the domain adaption using DeepLabV3 for the same task, which is similar to the research of Nadav Cohen et al.[47]. The detailed implementations are described in Sections 4.1.1,4.1.2 and 4.1.3. 19 https://pypi.org/project/opencv-python/ https://pypi.org/project/Pillow/ 4. Experiments 4.1.1 K-means clustering on HSV/RGB images In this experiment, two problems were explored: • Is K-Means clustering feasible for the front and back segmentation of art paint- ings? • Which is more reasonable, K-means clustering based on RGB or HSV image space? Since in this experiment, we are mainly concerned with extracting the region of interest (’foreground’) and the remaining part ("background") in art paintings, the number of clusters k is set to 2 when applying k_Means clustering. Moreover, the RGB color space of images considers different visual information from HSV color space when we implement K-Means clustering. Thus they impact the complexity of this problem given the value distributions. (a) landscape (b) portrait (c) potted flowers Figure 4.1: Input RGB images for K-means clustering algorithms. This experimental process consists of three main steps: image pre-processing and feature analysis, foreground segmentation based on K-means clustering, and visu- alization of the results. In the image pre-processing step, whether or not the input image is converted to an HSV image is critical in comparing the RGB image with the HSV image. The image pre-processing stage also involves vectorizing the image, i.e., converting the two-dimensional data into a vector of pixel values. Using the images shown in Figure 4.1 as examples, we first visualize the distribution of values in the RGB and HSV image spaces, as shown in Figure 4.2. Figure 4.2 shows that there is coupling in the distribution of RGB values of the above three input images, while there is a noticeable aggregation feature in the dis- tribution of HSV. For example, Figure 4.2 (a),(b),(c) distributions show obvious aggregation points. In addition, the distribution characteristics of HSV images are correlated with the color distributions of the input images. Since the colors of these scatter plots are derived from the color statistics of the input image, we can see from the visualization that the peak distribution of the HSV image has an obvious corre- lation with the color distribution. For example, in Figure 4.1(b), the orange color is distributed in the facial part of the original image, while the red color is mainly distributed in the background region. The different distributions of these two color 20 4. Experiments (a) landscape-RGB (b) portrait-RGB (c) potted flowers-RGB (d) landscape-HSV (e) portrait-HSV (f) potted flowers-HSV Figure 4.2: The visualization of value distributions for input images in RGB and HSV color space. (a) landscape-RGB (b) portrait-RGB (c) flowers-RGB (d) landscape-HSV (e) portrait-HSV (f) flowers-HSV Figure 4.3: Segmentation results using RGB and HSV color space of images. 21 4. Experiments regions also correspond to the Figure 4.2(e) distribution in the HSV distribution. In contrast, in the RGB distribution, the distributions of red, green, and blue colors are often stacked to form new colors. Therefore, they do not have a clear correlation with the color distribution in the visual presentation. This also leads to the conclu- sion that the HSV color space is more suitable for the visual representation of images. To implement K-means clustering, weused ’KMeans’ module from "sklearn.cluster" in the Python environment with the parameter K = 2. This module generates the foreground mask by predicting binary labels for each pixel. This step grouped pixels with the same clustering characteristics into a feature class, foreground, or background. The resulting mask is defined as an array of the same size as the input image, with each element value corresponding to a pixel coordinate of the classification result (in the foreground/background segmentation, the foreground is marked as 1 and the background as 0). Finally, the generated foreground masks were placed onto the input images to illustrate results, which can be seen in Figure 4.3, where (a)(b)(c) is the segmentation result based on RGB color space and (d)(e)(f) is the segmentation result based on HSV color space. 4.1.2 DeepLabV3 DeepLabv3 is a Convolutional Neural Network (CNN) model created to address the semantic segmentation issue. It is an upgraded version of DeepLabv1 and v2 and is more effective than its predecessors. It employs the Atrous Convolution architecture, which is also known as "dilated convolutions", and an enhanced Atrous Spatial Pyramid Pooling (ASPP) module from an architectural perspective [30]. This experiment explores the possibility of using pre-trained DeepLabV3 models in PyTorch for inferencing segmentation on art paintings. PyTorch provides three pre-trained DeepLabv3 variants with three different backbone models, which are "deeplabv3_resnet50", "deeplabv3_resnet101" and "deeplabv3_mobilenet_v3_large". Specifically, DeepLabv3 models with ResNet backbone (’ResNet50’ and ’ResNet101’) have output_stride = 8, whereas DeeLabv3 models with mobilenet_v3_large back- bone have output_stride = 16. The pre-trained models can be loaded by " torch.hub.load()" in PyTroch. In order to compare the segmentation effect with the K-means clustering model in section 4.1.1, the input images in this experiment contain the images shown in Fig- ure 4.1. We also introduced other input images to analyze the segmentation results in paintings with different artistic features and contents, as shown in Figure 4.4. All three DeepLabV3 variants were implemented for segmentation. After inferencing, the predicted pixel labels can be used to generate the foreground and background masks, which were overlayed on the input images. The results are shown in Figure 4.5. As shown in Figure 4.5, the DeepLabV3 models with three different backbones can achieve foreground segmentation of art paintings to different degrees. In the seg- mentation results based on ResNet50, the main content information of the art image 22 4. Experiments (a) portrait-hc (b) portrait-lc (c) portrait-br (d) landscape (e) plants (f) ducks Figure 4.4: The examples of input images for DeepLabV3 models. can be included in the foreground. In contrast, in the results of ResNet101, this in- formation will be included less in the foreground. The model with mobileNet_v3 as the backbone tends to include less content in the foreground, but it can maximize the inclusion of local information other than the main content, as shown in Figure 4.5(d2),(d3),(d4). In addition, for the input images shown in Figures 4.4(d) and 4.4(e), the DeepLabV3 model shows significant limitations in the foreground segmentation task. Since the pre-training of this model is based on the PASCAL VOC dataset, which contains only 20 classes of photographic images, and Figure 4.4(d) does not contain any class of semantic subjects, it is wholly classified as background. For Figure 4.4(e), the semantic content is potted plants, and this one class exists in the PASCAL VOC dataset, so this model can extract local features and segment the foreground even though in the domain of artistic paintings, the artistic style causes differences between the images and the photos of natural potted plants. However, the ResNet50- based model classifies this image completely as a background, which is related to the structure of the model and will be discussed in Section 5.2. 4.1.3 DeepLabV3 with domain adaption Section 2.3 describes the theory of applying domain adaption to segmenting art- works. In summary, this approach attempts to use existing segmentation methods 23 4. Experiments Input RGB images ResNet50 ResNet101 MobileNet_v3 (a1) (a2) (a3) (a4) (b1) (b2) (b3) (b4) (c1) (c2) (c3) (c4) (d1) (d2) (d3) (d4) Figure 4.5: Sample results for DeepLabV3 segmentation. for photographic images and existing augmented datasets to solve the problem of missing augmentation in the artwork domain. This section describes the details of implementing this approach in the Python environment. The implementation of this method is divided into three main stages [47]: • Use style transfer to generate pseudo-paintings with ground truth: In this step, an AdaIN model is trained to take the photographic images from PASCAL VOC and the art. The photographic images are content, and the paintings in DRAM are used as style images. • Different styles of pseudo paintings are used to train the segmentation model: DeepLabV2 is used as the backbone of the segmentation model. Different styles of pseudo-paintings are used individually as input images to train the segmentation model under the corresponding style images. In this stage, the 24 4. Experiments ground truth of the pseudo painting is used as the ground truth of its corre- sponding content image. • Mixing images from two sources to optimize the model: The model is finally trained to directly segment the art paintings in the target domain using pho- tographic images from PASCAL VOC 12 as the input dataset and DRAM subdomains as the target. 4.2 Brush Stroke Models Brush stroke patterns in art paintings refer to how the artist applies the brush to the canvas. Brush strokes vary in thickness, direction, texture, and color, and they can be used to create different effects, such as conveying movement, depth, or emotion. For example, the Impressionist movement is characterized by loose, spontaneous brushstrokes that capture the effects of changing light and atmosphere. In contrast, the Realist movement is known for precise, detailed brushstrokes that seek to repre- sent the world accurately. In terms of individual brush strokes, the characteristics of the brush show marked differences depending on the artistic style and the techniques employed by the artist in the painting. For example, most of the brushes used by human artists in the artistic process are used for oil painting, where the consistency of the ink varies according to the strength of the artist’s brushstrokes, thus creating a unique artistic expression early on. Moreover, different painting techniques such as rubbing, outlining, thwarting, tapping, dabbing, and brushing can manipulate the colors on the brush to present a unique distribution on the panel, resulting in light and dark colors and curvilinear effects of textures. Thus, brushstroke models can also be studied to understand different artists and art movements’ techniques and styles. Brush stroke models are the basis for generative artwork in conveying the regional patterns of art paintings. In generative painting, brushes are the small units of color and shape composition in a single-step painting, displayed in the final artistic effect. Such a brush can therefore be defined as an element containing information about color, shape, position, etc. Multiple brush elements with different property values form an artistic painting. On the other hand, a brush is also an essential element in the artistic analysis of a painting. Accordingly, in a generative painting, local features of the art painting can be extracted as a sequence of brushes with interrelationships. Therefore, in different generative painting models, the sequence of brushes that the same art painting has is not unique due to the different local feature extraction. Therefore, the definition of a brush model adapted to a specific painting system can be specific to a painting model. In this project, the colors of the brushes can be easily translated due to the reference to human painters’ painting techniques, while the variable brush shapes need to be specified as a fixed pattern. Therefore, we summarize the basic shapes of brushes in the physical world in a generic way; they can be classified as round, linear, rectangular, and triangular, as shown in Figure 4.6. In Figure 4.6 (a) , the circular brush stroke is the most common type in creating 25 4. Experiments (a) circular brush stroke (b) rectangular brush stroke (c) triangular brush stroke Figure 4.6: Illustrations of Stoke models. paintings, widely found in Impressionist, Post-Impressionist, and Abstract Expres- sionist paintings. On the one hand, the circular brush can represent scenes and contents in paintings. For example, in Claude Monet’s water lily paintings, circular brush strokes create the impression of flowers floating on the water’s surface. On the other hand, it can also represent distinctive stylistic textures and the physical behavior of the painting process, as in Post-Impressionism, where artists such as Vincent van Gogh used circular brush strokes to create texture, movement, and depth in their paintings. Moreover, in Abstract Expressionism, artists like Jackson Pollock used circular brushstrokes to create complex and layered visual effects of dripping and splashing. Rectangular brush strokes ( Figure 4.6 (b)) can be used to construct strokes with linear textures, which are often produced by the linear movement of the brush, especially when brushing or painting with techniques such as brushing and painting that often present a distinctly square brush. The triangu- lar stroke (Figure 4.6 (c)) is less frequently used than the two strokes mentioned earlier. It appears in art paintings to describe natural forms, especially those con- taining a pointed tip, or to construct the movement characteristics of a picture, adding tension to it. For example, French painter Paul Cézanne used triangular brush strokes to suggest the form of a mountain peak or rocky landscape. Cézanne created the illusion of the rough, jagged surfaces of these natural forms by using a series of small, sharp, triangular brush strokes. In some specific oil painting techniques such as thwarting, artists press the brush down and then apply a slight force, thwarting and lifting it, as in the calligraphic counter blade stroke. The difference in color dipping between the nib and the root of the brush and the different directions of lightness and weight of pressing the brush can often produce a variety of changes based on the triangle. 4.2.1 Circle Brush Strokes A circular brush stroke can be defined with circle centers, radius, and colors. The center point of a circular stroke is the point within the pixel range of the reference image. For simplicity, when taken as an integer, the radius value ensures a quick calculation of the pixel points in the circular coverage and changes the color of those pixel points. 26 4. Experiments To implement circular brush strokes in the code, this project uses the function ’cv2::circle’ for drawing circles in OpenCV-python. Given the documentation this method requires over 5 parameters, including referenced images, the center, the ra- dius of the circle, color, and thickness. Significantly, the circle is filled when the thickness is negative and unfilled when the thickness is positive. Since the brush strokes are mainly filled with colors, the thickness is set as a negative float number such as ’-1.0’. The circle brush strokes are applied with the mapping of stroke shapes and colors. In pixel-wise images, the referenced art paintings are composed of pixels of colors. At any pixel in an art painting, the color can be changed when the stroke shape covers that pixel. Therefore, the stroke shape can be defined as a matrix of boolean values, which is of the same size as the referenced art painting. That is, for any pixel-wise referenced image (’canvas’), Cnew := Cnow ∗ (1 − Smap) + Colormap (4.1) Where Cnew is the computed new canvas, Cnow is the current canvas, Smap is the matrix of boolean values which represent to change the pixel value or not, and the Colormap is the matrix of optimized pixel values. Cnew, Cnow, Smap, and Colormap are of the same size as the referenced images. Therefore, the operation is an entirely element-wise operation. Since the computed color Colorcomp is a matrix of the same size as the referenced image, it is necessary to set the colors as 0 at pixels that are not covered by stroke shape. That is, Colormap := Smap ∗ Colorcomp (4.2) Figure 4.7 showcases an example of computing in updating a pixel-wise canvas. The canvas size was assumed as (4,4) shown in Figure 4.7(a), and the central pixels are presented at (b) as current canvas Cnow. Assuming the computed stroke map Smap and the color map Colormap are 2*2 matrix shown in Figure 4.7(b), the updated canvas Cnew is the result of element-wise operations. Figure 4.7: Example of computing and applying brush strokes in pixel-wise canvas. 27 https://docs.opencv.org/4.x/d6/d6e/group__imgproc__draw.html#gaf10604b069374903dbd0f0488cb43670 4. Experiments 4.2.2 Template-based Brush Strokes The main brush shapes in the human artistic process are approximately rectangular or solid curved. Although the OpenCV module provides functions to generate circles and rectangles, these brush shapes, which consist of perfect baselines, need to fit bet- ter with the rich and varied brush shapes of the actual painting and have limitations in rendering brush color variations. Therefore, in this section, we explore a way to geometrically deform accurate stroke templates using the geometric transformation parameters of rectangular images. For the color processing, the brush colors are estimated with reference to the input image and filled into the transformed brush shapes. Unlike the previous section where the color is filled uniformly by setting the thickness, in this section, we obtain the color coverage of the brush by converting the reference image to HSV(Hue, Saturation, Value) color space. That is, the shape of the brush stroke Sshape = f(center(x, y), width(w), length(l), rotation(θ)) (4.3) The illustration of this brush stroke models are presented in Figure 4.8, where (a) is the brush template containing texture features; (b) is the rectangular stroke model; (c) is the generated stroke with the transformed shape and texture. Figure 4.8: Generated sample stroke. Centroids of brush strokes are generated from a sampling of pixels from the ref- erenced image; the width w represents the distance between the centroid and the vertical edges of a rectangular brush stroke, while the length l represents that be- tween the centroid and the horizontal edges. 4.3 Hyper-parameters for the training process Hyperparameters are parameters that specify the external requirements of the paint- ing process. Since in the procedural painting process (shown in Figure 3.1 (b)), only the resulting partial images are taken as inputs for the second stage of gener- ating paintings, parameters in the first stage, which focus on the art perception, do not affect the painting process. So the hyperparameters in this section clarify exter- nal parameters that affect the painting process, including the brush stroke model, 28 4. Experiments maximum brush size, brush sampling number, and paint mode. Except for max- imum brush size and brush sampling number, which take positive integer values, the brush stroke model and paint mode take descriptive values, usually set in code as strings or pre-stored interpretable values. Specifically, the brush stoke model and maximum brush size cannot be decided independently because the brush sizes have dependencies on the brush stroke model. These dependencies are generally translated as the computation of areas of brush strokes given specific brush stroke models. For example, the area of a circular brush stroke is computed as πr2, where r denotes the radius of the circular brush stroke. The brush stroke size has a positive monotonicity with the radius of the circular brush stroke. So the maximum brush stroke be used to define the thresholds for specific parameters of the brush stroke models. On the other hand, the shape and size of the brushstrokes are closely related to the artistic effect formed and the speed of the painting process. The larger the brush stroke size, the more pixel points are manipulated by a single stroke, thus enabling a complete painting to be formed quickly. However, since parametric strokes are less diverse in their local representation than human-drawn strokes, inevitably, large strokes will lose their ability to emphasize details. Such strokes are more common in some abstract artworks. Conversely, if enough strokes are used, for example, by including only a one-pixel point in each stroke, we can make the canvas end up with a very close result to the reference image. However, such a painting strategy would take time and effort. An original reference image typically contains over 480,000 (800 * 600) to millions of pixels. Although we can cut or scale the image to make it smaller, we still need to gain reference information. On the other hand, such a painting strategy differs from how a human artist handles the painting process. Therefore, it is necessary to use a reasonable brush stroke size. Moreover, the stroke size should be controllable. In this experiment, we assume that the maximum stroke size is close to the size of the reference image. Since the stroke size is calculated from the shape parameters of the stroke, we need to consider the range of parame- ters when setting these parameters. The exploration of the shape parameters of the strokes will be elaborated in Section 4.5. The maximum sampling number defines the number of brush strokes in each iter- ation. The sampling decides the possible brush stroke positions (called "centroids ") Ns. After generating the strokes, a simple placement strategy is to place them on the canvas in the order of the stored stroke sequence. However, to better show the painting process, the strokes can be placed in hierarchical order according to the stroke size, e.g., from largest to smallest stroke size; in addition, we can also iteratively place the strokes in batches, e.g., when Ns = 5392 (less than the total number of pixels), assuming the number of iterations Niter = 15. Then, in each iteration, 340 strokes are randomly selected from Ns strokes to be painted until all strokes have been placed, completing the iteration. 29 4. Experiments 4.4 Generate Brush Stroke Centroids One of the primary roles of the reference image as an input to the painting sys- tem is to generate stroke centroids randomly during the drawing process based on the perception of their features with the same feature distribution. Therefore, it is necessary to use a suitable method to analyze the feature distribution of the image during the perception stage of the reference image. In image processing theory, a density map can be used as a representation of the image feature distribution. It is typically created by mapping the number of pixels or objects in a given area to a continuous grayscale or color value. Converting the image into a density map makes it easier to analyze and identify objects and patterns within the image. Density maps can be created using various techniques, such as histograms, heat maps, and kernel density estimation. The choice of technique depends on the specific needs of the image analysis task and the type of image data being processed. This experiment compares the performance of kernel density estimation (KDE) and gradient-based density estimations in the context of images. KDE estimates the probability density map of images by computing the weighted contributions of pix- els using kernel functions. Especially Using Gaussian Mixture Model(GMM) to estimate the probability density of images is a clique of the KDE method, as GMM can represent multi-modal distribution and estimate the density of high-dimensional data. GMM-based estimation is non-parametric because the mean and covariance of each Gaussian are estimated from the images, and the weighted sum of each Gaussian contributes to the overall probability density of the image. However, gradient-based density estimation for images can be more intuitive because it uses the heat kernel smoothing method, which involves solving the heat equation on the image data to estimate the density. The underlying intuition is that gradients of images can repre- sent the intensity variations or edges. Techniques such as Sober operation or Canny edge detection can compute the gradients of images in programming. The comparison is based on calculating the differences between the mean squared error (MSE) of input imagea and that of the estimated density map using the two methods. The MSE measures the average squared difference between the pixel values of the input image in grayscale and the density maps, to quantitatively estimate how much the images differ from each other. Lower MSE values correspond to better density estimation. 4.4.1 GMM-based Density Estimation GMM-based density estimation for feature extraction maps intensity values to sam- pling probabilities. It is reasonable since the goal is t sample stroke anchors in the same distribution as the reference images. The stroke anchors cluster in higher- intensity regions on the images. This method assumes that the probability distribu- tion can be modeled as the integration of a finite number of Gaussian models. That is, 30 4. Experiments P (x) = K∑ k=1 wk ∗ N(x|µk, σk) (4.4) where P (x) is the probability density of the input image, K is the number of com- ponents, wk is the weight for each component, and N(x|µk, σk) represents each Gaussian component with mean µk and covariance σk. The parameters learning, namely the weights, means, and covariances of the Gaus- sian distributions, is typically estimated from the data using the maximum likelihood estimation (MLE) or maximum a posteriori (MAP) estimation methods. Once the parameters are estimated, the GMM can estimate the density of new data points or generate new samples from the same distribution. In the technical implementation, we used the GMM module from scikit-learn with two hyperparameters: the number of Gaussian distributions ("n_components") and the kernel size for image blurring ("k_size"). For simplicity, we set n_components = 3 and k_size = 3. The main steps consist of 3 modules. The first step is to convert input images into grayscale since the probability density represents single values in pixels while the color images contain three channel values. The grayscale values are normalized into the range [0,1] for simplicity. The second step is to fit the given number of Gaussian models into image data, which estimates the mean and covariance of each Gaussian. Using the "score_samples" method of the GMM object can generate a grid of points covering the image to illustrate the density map of the input image. 4.4.2 Gradient-based Density Map The gradient-based method can extract features of the reference image by computing the gradient magnitudes. Because gradients can represent the intensity differences in pixel levels, and larger gradients imply noticeable changes in intensity, it may be associated with the probability density of the centroid. Pixels with large gradients are more likely to compose edges, so gradient magnitudes are reasonable to compute the probability density map of images. In digital image processing, the gradient magnitude of an image is approximated as a two-dimensional differential approximation of the neighboring pixel values. That is, Gx(x, y) = H(x + 1, y) − H(x − 1, y) (4.5) Gy(xy) = H(x, y + 1) − H(x, y − 1) (4.6) where Gx(x, y),Gy(x, y) are the horizontal and vertical gradients respectively, and H(x, y) denotes the 2-order differential value at pixel [x, y]. Thus the magnitude M(x, y) = √ Gx(x, y)2 + Gy(x, y)2 (4.7) 31 https://scikit-learn.org/stable/ 4. Experiments We normalized the gradient magnitudes to represent the probability density distri- bution. That is, for each pixel [x, y] M(x, y) = M(x, y) − Mmin Mmax − Mmin (4.8) In programming, the pre-processing steps are similar to that in GMM-based density estimation, which is to load color images and change them into grayscale and rescale pixel values into the range [0,1]. The Sober operator from the OpenCV-python mod- ule can compute the gradients in both horizontal and vertical directions. So The code first loads the input image and converts it to grayscale. It then uses the Sobel operator to compute the gradient in the horizontal and vertical directions (Gx and Gy, respectively). It computes the magnitude of the gradient as the square root of the sum of the squared horizontal and vertical gradients. The last step is to present the resulting density maps as grayscale images. Compared with GMM-based density estimation, this method ignores the data’s statistical features but values the input image’s local intensity, which is another way to present features (especially edges) of images using intensity. 4.5 Estimate Parameters of Brush Strokes Since the shape of the brush is defined as an object consisting of four parameters: stroke centroids C(x,y), the length of the rotation angle along the rotation angle direction (L1, L2), and the length of the vertical rotation angle direction(W1,W2), the estimation of each stroke is equivalent to parameter estimations for each stroke centroids. The centroids are generated from a sampling given the estimated density maps of the referenced images, as described in Section 4.5. In this section, we described the parameter estimation given the reference images. 4.5.1 Rotation Angle (θ) Estimation As for the rotation angles of each brush stroke, researchers has described a method using Edge Tangent Flow (ETF) vector field of images [49]. The underlying idea is that The ETF is calculated based on the gradient field of the image, which includes distinct edge features produced by stacking textures of brush strokes in artistic paintings. Thus at the pixel point, the gradients can be used to guide the estimation of the direction of the brush strokes. Thus for each stroke patch that includes multiple pixels, ETF vector field offers a method to generate an overall direction given gradients of neighboring pixels in the same stroke. The computation of ETF consists of three steps. Compute the gradient vector field (modulus and direction at each pixel); rotate the direction by 90 degrees, roughly pointing to the edges’ direction. Approximate the edges by changing the direction of the pixels with a small modulus to that of its nearby pixels with a large modulus while the modulus keeps the same [49]. 32 4. Experiments 4.5.2 Lengths (L1, L2, W1, W2) Estimation For each stroke, the vertical and horizontal length parameters (L1, L2, W1, W2) are valued based on the regional approximation of the input image using HSV values. HSV (Hue, Saturation, Value) color space is a better image format than RGB (Red, Green, Blue) format because it separates color and brightness information, contain- ing the perceptual qualities of colors at each pixel. Additionally, the HSV color space is more intuitive for humans to understand and work with, as it corresponds more closely to how we perceive and describe colors in the world around us [50, 51]. In length estimation, since colors at each pixel can be valued in HSV image format, the estimation is a process of detecting possible lengths, comparing the differences between the HSV values of pixels at that length and the starting centroids, and deciding to accept such length. The numerical comparison can be defined as com- paring hues and brightness values of starting point (the stroke centroid) and the candidate pixelP (x, y) [49]. Assume the starting stroke centroid isC(x0, y0), the length L should be a value such that L = (x0, y0) + r ∗ (cosβ + sinβ) (4.9) |H(x, y) − H(x0, y0)| ≤ tH (4.10) |V (x, y) − V (x0, y0)| ≤ tV (4.11) where r is the searching step length (generally set as integers), and β is the rotation angle for different length estimations. Namely, for L1, β = θ, and for L2,β = θ +2π, and β = θ + π 2 ,β = θ − π 2 for W1 and W2 respectively. And the tH ,tV are thresholds for comparison. Since the final output is the maximal length that satisfies formula(4.9)(4.10)(4.11), in programming the searching process starts with L = 0 and updates the starting pixel (x0, y0) as (x, y). It repeats till the formula (4.10) (4.11) is not satisfied. Every move in the searching process should be at least one pixel, but too large searching steps may yield zero lengths. For simplicity, we set r as one pixel. 4.6 Render Strokes After generating a sequence of brush strokes with parameters, the next step is to place every stroke onto the canvas. Generally, we can place the strokes one by one, but in the experiment, we sorted the stroke sequence by size to illustrate different painting pipelines. In this experiment, the painting pipelines, which imply different painting techniques used in human painting processes, refer to the different orders of placing strokes onto the canvas. The painting pipelines such as "full painting", "detail first" and "background first" are encoded as an argument "paint_mode" in program- ming. Particularly, we shifted the order of foreground and background drawing to illustrate the painting techniques such as "detail first" and "background first". Since in the painting process using a circular stroke model and template-based paint- ing model, the strokes are parameter sequences, they share the same rendering 33 4. Experiments strategies but different parameters. 4.7 Experiments I: "Details first" vs. "Background First" in painting process Since the segmentation phases have been explored in the previous work, the fol- lowing experiments focus on the painting design part, the second phase shown in Figure 3.1. These experiments optimized the painting process based on a research- through-design approach iteratively. The design starts with a simple drawing process that measures the system performance with a combination of visual evaluation and quantitative metrics, focusing on exploring whether different drawing models can be applied to painting processes. We consider a global optimization-based approach for estimating stroke parameters in the initial stage. Since the complexity of esti- mation is related to the number of strokes and the number of parameters in a single stroke, among the three brush models shown in Figure 4.8, the circular brush is the model with the lowest number of parameters and is therefore used as the initial stage in the experiments. Moreover, the maximum number of brushes is adapted to the optimized threshold value. Table 4.1 showcases the hyperparameter settings. Table 4.1: Hyperparameters for initial system design of painting process. Parameter Name values brush stroke model (S) "circular" S(p,r,c) painting mode (P) "f-b" or "b-f" max brush size 0.25 × min(h, w) approximation threshold (T) 100 line model bazier curve max iterations 500 In Table 4.1, the circular brush model can be parametrically represented as S(p,r,c), where p represents the two-dimensional coordinates of the pixel point, r is the brush radius, and c is the color of the brush, using the color value at p. The drawing modes are defined as "f-b" and "b-f" to indicate the drawing order of different image areas. In the maximum brush size, h and w denote the height and width of the reference image, respectively, and the maximum brush size, measured as the brush radius, does not exceed 1 4 of the reference image size (measured as the minimal edge). The termination threshold of the optimization process is set to 100.0 pixels to indicate the expected error between the generated image and the reference image. 4.7.1 Configurated pipelines The painting mode is configurated to showcase procedural painting techniques such as "detail first" and "background first" or even start at every content part. This section first shows the impact of a painting technique known as "detail first" on the 34 4. Experiments placement of color blocks and areas of focus in creating artwork during the imple- mentation of image segmentation. Unlike block painting techniques that emphasize segmenting the color of artwork, detail-first focuses on first capturing an area of interest in the image and then gradually painting around that item until the entire foreground is completed, eventually extending to the entire background. This paint- ing technique is commonly found in realist portraits, where the figure is used as the foreground content, and the background is the remainder. Therefore, this section is designed with a painting process based on detail-first and circular brushes. In this design, the input image is a person’s portrait from the DRAM dataset, and the painting process is divided into two modules: segmentation and rendering. The purpose of segmentation is to separate the painting’s foreground ("portrait"), and the rendering process will prioritize rendering this part. Contrary to the above, the second part explores working on the background part, applying large brush strokes to complete the background painting quickly, and then applying smaller brushes in the foreground to retain good painting details. In the process, the same test images are applied for comparison, and the effects of the application of landscape art painting are also explored. 4.7.2 Complete "voids" Based on the investigations in the previous two sections, this project achieved a basic analysis-painting process, but there needed to be more in the presentation of details. Very typically, the random sampling and estimation of the painting system may cause local "voids" in the final painting, which rarely occur in the paintings of human artists. Hence, a preliminary conclusion is that the previous painting system left this problem behind. Therefore, this section updates the system design of the painting part by adding a detail filler module in the final stage to detect and fill the "voids" in the generated paintings. Filling "voids" can be designed as a sub-iterative process. This process can filter out local voids by comparing the generated drawing with the original reference image to generate a void mask; after setting the error threshold, the process is stopped by using the local area covered by the void mask as a reference and painting further on the image generated in the previous sequence until the error threshold is reached. Since the local area where the voids appear is minimal compared to the overall reference image, the error threshold can be taken in equal proportions when setting the error threshold. For example, given a (500,400) reference image, if the approximation threshold T is 100 using the formula from Table 4.1; and assume that the voids cover 10% pixels of the reference image, then the approximation threshold for the sub-iteration should be 10, which is proportional to the overall threshold. 35 4. Experiments 4.8 Experiments II: Iterative design for improve- ments This experiment is to investigate how to improve the above drawing flow design. Based on the previous testing phase results, the improvements are mainly aimed at the prediction module of strokes and the generalization of the drawing process. For the prediction module of brush strokes, this experiment mainly explores the possibility of applying the neural network model as the backbone to improve the prediction effect. In addition, this project tries to generalize the drawing process by using the analysis of abstract paintings and drawing results as an example. 4.8.1 Application of neural networks In the research-through-design approach, the completion of a complete system design is only the initial step in the design process, while iterations allow for the exploration and experimentation of different design options to discover new and innovative so- lutions. The descriptions in Sections 4.7 complete the design of a complete painting system by testing and evaluating it under different parameter conditions. However, the computation time of the optimization error-based brush estimation is long, and it is difficult to capture the features between the local coordinates of the image. Deep learning models can theoretically win over the method of parameter opti- mizations in terms of estimation efficiency and feature extraction. Because brush estimation based on optimization errors involves solving an optimization problem to estimate the brush strokes in an image, this optimization process is computationally expensive, especially for reference images of large size or more complex brush models. On the other hand, deep learning models can automatically learn texture features between local pixel points by processing large amounts of training data. The model learns to recognize the patterns and features associated with strokes in the input image and generates the corresponding strokes that best represent these features. Therefore, further, this experiment attempts to update the brush prediction part to a neural network model as the backbone based on the preliminary design. Regarding model selection, all the neural network models used for AI-generated paintings can be used as backbones. In this experiment we mainly improved the approach based on the previous stroke prediction methods [27]. Its operation in practice will be presented in Section 5.5. 4.8.2 Explore on abstract paintings Testing the painting pipeline on abstract paintings is necessary to ensure that the pipeline can handle the complexity and creativity required by this art form and produce quality results that meet the standards of the art world. On the one hand, abstract paintings are often more complex than representational paintings, making them a more challenging test case for the painting pipeline. The use of color, shape, texture, and other elements in abstract art can be more diverse and less predictable than in representational art. Therefore, testing the pipeline on abstract paintings 36 4. Experiments ensures that the pipeline can handle the complexity of such paintings. Moreover, testing against abstract paintings is essentially a test of the effectiveness of the per- ceived artistic process that has been designed. It is a highly creative and subjective art form that r