Industry Workshops

Tencent Industry Workshop

Session TitleVideo Technologies Towards Human Perception

Organize Date: Sep. 23

Organize Room: 201BC

Brief Description:

With recent advances in deep neural networks, video technologies also took a big leap towards serving human perception in a more refined manner. This includes artifact restoration, saliency detection and protection, and non-reference quality assessment. In this panel workshop, we invite elite video experts from academia and industry across the globe to share their visions and experiences on the next generation of video evolution. With serving human perception in mind, the open discussion shall shed some light on interesting new trends and possible emerging applications.

Tencent Media Lab (TML) dedicates to cutting-edge research on a broad spectrum of multimedia technologies, ranging from high quality on-demand video service, webcasting, real-time audio video communications, to multimedia standardization. By serving billions of Tencent customers for over a decade, TML is recognized as the elite leader and pioneer in the multimedia industry, with fruitful research contributions and award winning innovations.

Speakers: TBA

Kuaishou Industry Workshop

Session Title: User Experience Driven Short-Video Streaming System Optimization

Organize Date: Sep. 23

Organize Room201DE

Brief Description:

As a unicorn in mobile video industry of China, Kuaishou commits herself to building a universal, equal, and harmonic platform for short video streaming and live video broadcasting.  In this workshop, the experts from Kuaishou will bring their techniques and methods about optimizing the whole platform towards improving user experience. The workshop consists of two talks.

Talk 1: Short Video Streaming System and Technique Optimization

In this talk, the eco-system of Kuaishou short video streaming will be introduced. The system ranges from content production to consumption. It includes content capturing, editing, transcoding, transmission, consuming, and data analysis. Firstly, the overview of the architecture for short video sharing will be introduced. Second, the audio and video techniques which are used in this system to improve user experience are also presented. Furthermore, how to use big data analysis methods to guide the optimization will also be elaborated.

Speaker 1: Yunfei Zheng

Yunfei Zheng received his B.S. degree and M.S. degree from Tsinghua University in 1999 and 2002 respectively.  He received his Ph.D. degree in Electrical and Computer Engineering from West Virginia University, US in 2008. His main research areas are video modeling, video coding, video/image processing, and computer vision. He joined Princeton research lab of Thomson in 2008 and focused on the research of next generation video coding technique. In 2010, he joined Qualcomm and committed himself to HEVC standardization. His multiple proposals were adopted by HEVC standard. In 2011, he joined Apple Inc. and joined many projects, which covers video coding and video/image understanding, such as core code development for FaceTime/iTunes, and low-level video/image analysis algorithms in iOS memories feature. In 2018, he joined Kuaishou as a director to lead the video algorithm and engineering team, which provides core algorithm support for the company’s business.

Talk 2: Non-reference Non-Uniform Distorted Video Quality Assessment Based on Deep Multiple Instance Learning

The different parts of one non-uniform distorted video have different distortion degrees, thereby leading to data ambiguity in a data set. When non-uniform distorted video blocks are used as input, traditional machine learning-based methods frequently do not work effectively or may even fail. In this talk, we will present a novel multiple instance (MI) learning method to overcome this non-uniform distortion problem. How this video quality assessment was used in Kuaishou’s business to improve user experience will be also introduced.

Speaker 2: Mading Li

Mading Li received his B.S. degree in computer science from Peking University in 2013, and Ph.D. degree from Institute of Computer Science & Technology of Peking University in 2018. He was a Visiting Scholar with McMaster University, ON, Canada from 2016 to 2017. He is currently an algorithm engineer with Video Technology Team at Kuaishou, China,in which he is focusing on image/video quality evaluation and smart video editing.

Google Industry Workshop

Session Title: TBA

Organize Date: Sep. 24

Organize Room: 201BC

Brief Description: TBA

Speakers: TBA

Qualcomm Industry Workshop

Session Title: Depth Sensing on Mobile Phones

Organize Date: Sep. 24

Organize Room: 201DE

Brief Description:

This session is an introduction to SLiM - an implementation of the Qualcomm depth sensing reference design by Himax. The workshop will center around a demo of the SLiM depth sensing module and its capabilities. The performance of the module is discussed and its usage along with several reference applications implemented using the provided SDK.

Speakers 1: Champ Yen

Champ Yen is an Application Engineer at Qualcomm Taiwan Corporation.  He received a B.S. degree in computer science information engineering from the National Cheng Kung University in 2001, and a M.S. degree in computer science information engineering from the National Chao Tung University of Taiwan in 2003. He provides support for customer application development, including algorithm porting, problem solving, and technical support.  Champ has significant experience in GPGPU, DSP and domain specific programming.  In recent years he has worked specifically on optimization and development of camera and computer vision applications.

Facebook Industry Workshop

Session Title: Video Processing at Facebook - How to Increase Quality and Power Efficiency at Scale

Organize Date: Sep. 25

Organize Room: 201BC

Brief Description:

Facebook is the world's largest social network, offering a variety of products that support video, such as Facebook Live, Facebook Watch, Instagram TV (IGTV), Messenger and WhatsApp video calling and Oculus/Portal hardware that allow user immersion. We handle both premium and user-generated content at varying source qualities and we are making it available all over the world over highly variable network conditions. We use adaptive bitrate streaming to maximize quality but also end-to-end encryption to protect our members’ privacy. Video processing is taking place in our own datacenters, where our focus is on the highest level of security, availability, quality and energy efficiency. In our session we will cover topics such as how we measure video quality at scale, what we do to maximize such quality and what steps we take to reduce the energy requirements of all video processing in our datacenters. We will highlight some of our research initiatives in this space and include a panel discussion with world experts on what are the challenges and possible research directions in efficient video processing.

Speakers 1: Dr. Ioannis Katsavounidis

Dr. Ioannis Katsavounidis is a member of Video Fundamentals and Research, part of the Video Infrastructure team, leading technical efforts in improving video quality across all video products at Facebook. Before joining Facebook, he spent 3.5 years at Netflix, contributing to the development and popularization of VMAF, Netflix's video quality metric, as well as inventing the Dynamic Optimizer, a shot-based video quality optimization framework that brought significant bitrate savings across the whole streaming spectrum. Before that, he was a professor for 8 years at the University of Thesally's Electrical Engineering Department in Greece, teaching video compression, signal processing and information theory. He has over 100 publications and patents in the general field of video coding, but also high energy experimental physics. His research interests lie in video coding, video quality, adaptive streaming and hardware/software partitioning of multimedia processing.

Speakers 2: Dr. Mani Malek Esmaeili

Mani Malek Esmaeili received his PhD at University of British Columbia. His research interests are multimedia retrieval, computer vision, and the general problem of approximate nearest neighbor search. He has been working at Facebook’s video infrastructure group as an algorithm developer. He has been leading the Media copyright’s team algorithm development for the past year.

Speakers 3: Rahul Gowda

Rahul Gowda is a member of Video Fundamentals and Research at Facebook helping build key video infrastructure pieces to serve video@scale. Prior to joining Facebook, he spent 8 years at Nvidia working on cloud gaming, GPU encoding and streaming. He received a Masters in EE from Arizona State University. His research interests lie at the intersection of video coding, gaming, adaptive streaming and hardware/software co-design of multimedia processing.

Speakers 4: Shankar Regunathan

Shankar Regunathan is a member of Video Algorithms at Facebook working on video quality measurement and encoding improvements with particular focus on user generated content. Prior to joining Facebook, he spent several years at Microsoft working on VC-1, JPEG-XR and contributions towards AVC/SVC. He received a Ph.D in EE from University of California, Santa Barbara. He has received the IEEE Signal Processing Society Best Paper Award in 2004 and 2007. His research interests lie at the intersection of video compression, signal processing and coding theory. 

Microsoft Industry Workshop

Session Title: Machine Learning for Computer Vision Applications

Organize Date: Sep. 25

Organize Room: 201DE

Brief Description:

This workshop consists of three talks. The topics are 3D skeletal tracking on Azure Kinect, Optical Character Recognition (OCR) and its applications, and towards practical solutions for 3D face tracking and reconstruction. The contents for these three talks are as follows:

Microsoft has built a new RGB-D sensor, called Azure Kinect, and released Azure Kinect DK which is a developer kit and PC peripheral for computer vision and speech recognition models ( In this talk, I’ll briefly introduce the hardware and describe in more detail about the Azure Kinect Body Tracking SDK which is a neural network based solution for the 3D skeletal tracking with the new RGB-D sensor. 

OCR is an image processing task that has been studied for tens of years. Due to the recent development of deep learning algorithms, OCR has also gone through major algorithm redesign to achieve much better accuracy. In this talk, I will give a brief overview of our efforts to build a state-of-the-art OCR engine, and how to apply it in enterprise applications.

Both 3D face tracking and reconstruction have various important applications, although the algorithmic progress in the academy is significant in the recent years due to the advances of deep learning, building robust and practical solutions are still very challenging for many scenarios. In this talk, I will present a few of our ongoing research works to share the experience of how we address those challenges. Specifically, I will talk about how to develop an end2end RGB-based 3D face tracker that works in real-time on mobile devices, then share the latest progress of a scalable 3D face reconstruction system. 

Speakers 1: Zicheng Liu

Zicheng Liu is currently a principal research manager at Microsoft.  His current research interests include human pose estimation and activity understanding. He received a Ph.D. in Computer Science from Princeton University, a M.S. in Operational Research from the Institute of Applied Mathematics, Chinese Academy of Science, and a B.S. in Mathematics from Huazhong Normal University, China. Before joining Microsoft Research in 1997, he worked at Silicon Graphics as a member of technical staff and shipped OpenGL NURBS tessellator and OpenGL Optimizer. He has co-authored three books: “Face Geometry and Appearance Modeling: concept and applications”, Cambridge University Press, “Human Action Recognition with Depth Cameras”, Springer Briefs, and “Human Action Analysis with Randomized Trees”, Springer Briefs.  He was a technical co-chair of 2010 and 2014 IEEE International Conference on Multimedia and Expo, and a general co-chair of 2012 IEEE Visual Communication and Image Processing. He is the Editor-in-Chief of the Journal of Visual Communication and Image Representation. He served as a Steering Committee member of IEEE Transactions on Multimedia. He was a distinguished lecturer of IEEE CAS from 2015-2016. He was the chair of IEEE CAS Multimedia Systems and Applications technical committee from 2015-2017. He is a fellow of IEEE. 

Speakers 2: Cha Zhang

Cha Zhang is a principal engineering manager at Microsoft Cloud & AI working on computer vision. He received the B.S. and M.S. degrees from Tsinghua University, Beijing, China in 1998 and 2000, respectively, both in Electronic Engineering, and the Ph.D. degree in Electrical and Computer Engineering from Carnegie Mellon University, in 2004. After graduation, he worked at Microsoft Research for 12 years investigating research topics including multimedia signal processing, computer vision and machine learning. He has published more than 100 technical papers and hold more than 30 U.S. patents. He served as Program Co-Chair for VCIP 2012 and MMSP 2018, and General Co-Chair for ICME 2016. He is a Fellow of the IEEE. Since joining Cloud & AI, he has led teams to ship industry-leading technologies in Microsoft Cognitive Services such as emotion recognition and optical character recognition.

Speakers 3: Baoyuan Wang

Baoyuan Wang is currently a principal research manager at Microsoft Cognition vision team, Redmond, US. His research interest include automatic 3D content creation, computational photography as well as deep learning applications. He has shipped several key technologies to various Microsoft products including Bing maps, Xbox/Kinect, Microsoft Pix camera and Swift-key. Dr. Wang got both his Ph.D. and bachelor degree from Zhejiang University in 2007 and 2012, respectively.