Keynote Speakers

Michel Valstar, PhD
Associate Professor, School of Computer Science
Nottingham University, United Kingdom.

Michel Valstar ( is an associate professor at the University of Nottingham, and member of both the Computer Vision and Mixed Reality Labs. He received his masters’ degree in Electrical Engineering at Delft University of Technology in 2005 and his PhD in computer science at Imperial College London in 2008, and was a Visiting Researcher at MIT’s Media Lab in 2011. He works in the fields of computer vision and pattern recognition, where his main interest is in automatic recognition of human behaviour, specialising in the analysis of facial expressions. He is the founder of the facial expression recognition challenges (FERA 2011/2015/2017), and the Audio-Visual Emotion recognition Challenge series (AVEC 2011-2019). He was the coordinator of the EU Horizon2020 project ARIA-VALUSPA, which will build the next generation virtual humans, deputy director of the 6M£ Biomedical Research Centre’s Mental Health and Technology theme, and recipient of Melinda & Bill Gates Foundation funding to help premature babies survive in the developing world, which won the FG 2017 best paper award. His work has received popular press coverage in, among others, Science Magazine, The Guardian, New Scientist and on BBC Radio. He has published over 90 peer-reviewed papers at venues including PAMI, CVPR, ICCV, SMC-Cybernetics, and Transactions on Affective Computing (h-index 38, >8,400 citations).

Behaviomedics – Objective Assessment of Clinically Relevant Expressive Behaviour

Behaviomedics is the application of automatic analysis and synthesis of affective and social signals to aid objective diagnosis, monitoring, and treatment of medical conditions that alter one’s affective and socially expressive behaviour. Or, put more succinctly, it is the objective assessment of clinically relevant expressive behaviour. Objective assessment of expressive behaviour has been around for a couple of decades at least, perhaps most notably in the form of facial muscle action detection (FACS AUs) or pose estimation. While often presented alongside work on emotion recognition, with many works presented as a solution to both emotion and objective behaviour asessment, the two problems are actually incredibly different in terms of machine learning problems. I would argue that a rethink of behaviour assessment is useful, with emotion recognition and other ‘higher level behaviours’ building on objective assessment methods. This is particularly pertinent in an era where the interpretability of machine learning systems is increasingly a basic requirement. In this talk I will firstly present our lab’s efforts in the objective asssessment of expressive behaviour, followed by three areas where we have applied this to automatic assessment of behaviomedical conditions, to wit, depression analysis, distinguishing ADHD from ASD, and measuring the intensity of pain in infants and adults with shoulder-pain. Finally, I will discuss how we see Virtual Humans can be used to aid the process of screening, diagnosing, and monitoring of behaviomedical conditions.


Dr. Somnuk Phon-Amnuaisuk
Associate Professor, School of Computing & Informatics
Universiti Teknologi Brunei (UTB), Brunei Darussalam

Dr. Somnuk Phon-Amnuaisuk received his Ph.D. from Edinburgh University in Artificial Intelligence. He is currently an Associate Professor at the School of Computing & Informatics, Universiti Teknologi Brunei (UTB). Somnuk leads the Media Informatics Special Interest Group (MI6) and the International Neural Network Society (INNS) Brunei chapter. He serves as a reviewer and an editorial board member for international journals. He is also actively involved as a committee member in local/international conferences.

Promoting Citizen Well-being through Scene Analysis

Scene analysis aims to understand the semantic context of interesting scenes through visual and audio information. Machine-vision and machine-hearing could provide automated analysis of the scene and raises appropriate safety warnings and other relevant concerns. Enhanced safety and efficient operations through machine perception are attractive. This is because cameras and microphones are inexpensive sensors and can be quickly installed in desired locations to gather necessary data. In this talk, recent advances in artificial intelligence and machine perception will be reviewed based on various hypothesized scenarios such as smart traffic, smart environment and smart nursery. Feasibility of the application of these technologies to promote safety and wellbeing of citizen will be discussed.

Dr. Sani Muhamad Isa
Associate Professor, Master of Information Technology Program
Bina Nusantara University, Indonesia

Dr. Sani Muhamad Isa received his Doctoral degree from University of Indonesia in Computer Science. He is currently an Associate Professor at Master of Information Technology Program, Bina Nusantara University.

Land Use Change Analysis and Prediction of Bodetabek Area Using Remotely Sensed Imagery

Urban development is one of the logical consequences of economic growth. RTRW (Rencana Tata Ruang Wilayah) is a blueprint of the city development for each city in Indonesia. The local government refers to RTRW to ensure that the city development always conforms with the blueprint. Unfortunately, what happened in reality sometimes doesn’t match with the city plan. Bodetabek region (Bogor, Depok, Tangerang, and Bekasi) as the satellite city of the Capital of Indonesia, plays a strategic role in development in Jakarta. The lack of city development monitoring has been causing various problems to the community, such as environmental damages, flooding, garbage accumulation, and improper land uses. It would be very difficult to fix those problems if the local government doesn’t take enough preventive actions to avoid it. In this study, we use remote sensing technology for city development monitoring using change detection approach. The use of remote sensing technology provides an effective and efficient way of monitoring land change detection than a land survey. In addition, the availability of remote sensing data for more than 30 years ago is very useful for analyzing land use changes over a long period. We use MODIS MCD12Q1 data (MODIS Terra and Aqua yearly global 500m type land cover) in the Bodetabek region from 2007 to 2017. Besides the spatial analysis, we also use the area of different land cover classes from each year as a reference for evaluating land use changes. The area changes of each land cover class in the 11 years period then used as the input for developing a prediction model. Based on the obtained prediction model for each land cover class, all classes show a linear trend with a positive slope (urban, forest, and wetland) or negative slope (cropland). The best prediction model comes from the urban land cover class, where the changes in land use are very close to a linear trend. The prediction model generated from this study can be used to predict the area of certain land cover class in the future so that it can be used by local governments in city development planning.

Oskar Riandi (Industrial Speakers)
PT. Bahasa Kita

The Development of Indonesian Smart Speaker Platform


Artificial intelligence, speech and natural language processing technology, increasingly the user experience in term of human – machines interaction. In the last few years, the development of smart assistant platforms has grown rapidly. And one of the phenomenal ones is smart speaker. Smart speaker is a type of wireless speaker and voice command device with integrated virtual assistant that offers interactive actions and hands-on activation with the help of wake word. We will explain the developing of Indonesian smart speaker platform such as speech processing technology both for automatic speech recognition and speech synthesizer for response system. The natural language understanding (NLU) which functions to interpret the intent of a voice command and provide an appropriate response to the system. The skill platform, a platform to enhance smart speaker features by involving third parties skill. The IoT platform, that functions to operate voice enable devices such as lamps, smart door locks, air conditioners and another home appliances. The last but not least is the cloud infrastructure. Smart speaker is rely on the concept of client and server platform with multiple concurrent connections. Therefore an appropriate cloud architecture should be configured properly to make the smart speaker platform robust and scalable.

Keywords: artificial intelligent, automatic speech recognition, wake word, speech synthesizer, natural language understanding, IoT, smart speaker