Zhiguan Hu
Senior Software Engineer
Toronto
I am a senior software engineer with 8 years of experience leading complex projects in the domain of distributed systems, information extraction, and content management. I specialize in designing and developing NLP-based AI products, guiding projects from the research and development phase through to commercialization. I also have a strong foundation in the field, holding both a bachelor’s and a master’s degree in computer science.
WORK EXPERIENCE Link to heading
Senior Software Engineer Link to heading
Sfile Technology | January 2019 - Present
-
Distributed System for Content Management
- Led the design and development of a self-discovery index synchronization system capable of facilitating large-scale file transfers between indexing and search servers, handling multi-terabyte data volumes.
- Engineered a service discovery protocol using Consul to enable dynamic server management, automated shard replication, and continuous service health monitoring.
- Enhanced performance by optimizing Rsync algorithm through parallelization and the integration of new hash functions for specific use cases.
-
Document Clustering Algorithm Optimization
- Automated the training data collection process with scripting, significantly reducing human labeling effort in domain-specified data preparation by 90%.
- Conducted experiments and comparisons of various document representation algorithms, such as Bag of Words (BoW), TF-IDF, Word2Vec, Fasttext, Doc2Vec, and BERT embedding, as part of the clustering algorithm design.
- Developed a scalable job scheduling workflow in C# to execute clustering and auto-labeling algorithms efficiently.
- Re-architected a monolithic system into microservices using RESTful API and .NET Core WebAPI, enhancing scalability and performance.
- Achieved a remarkable performance boost of over 100 times by optimizing the clustering process through the conversion of CPU-based Python code to GPU with CUDA.
-
Oil & Gas Well Reliability Reporting
- Developed a data extraction process to compile asset attributes from diverse data sources, including unstructured and handwritten well logs within the oil & gas domain.
- Designed a data reconciliation algorithm, resolving conflicting information stemming from disparate sources during data preparation.
- Collaborated closely with petroleum engineers to gather requirements for a reliability prediction algorithm.
- Implemented the algorithm to assess the integrity of well systems, components, and equipment.
- Developed a Power BI dashboard to visualize asset condition, availability simulation results, and well life timeline; integrated the dashboard with customized R and Python scripts for real-time analysis.
Software Engineer Link to heading
Sfile Technology | May 2015 - December 2018
- Intelligent Document Processing Engine
- Engineered automation rules for a complex unstructured data processing pipeline, capable of handling millions of diverse legal and oil & gas sensor documents in varying file formats and styles.
- Led the discovery and implementation of data extraction tools, including a scalable OCR workflow with multiple engines, Excel header detection and table auto-correction utilizing the OpenXML SDK, logo detection using YOLO and Detectron models, and page splitting based on text or image similarities.
- Innovated a document quality control (QC) process with an intuitive interface, enabling reviewers to efficiently inspect extracted attributes by highlighting source content within original documents using the pdf.js library, leading to a tenfold acceleration in the review process.
- Led a team of three in developing and deploying a containerized text extraction application hosted with Docker, handling end-to-end data processing, from document loading into a self-hosted object storage, retrieval, data extraction, and data point ingestion into ElasticSearch.
Research Assistant Link to heading
MD Anderson Cancer Center | June 2013 - August 2013
SKILLS Link to heading
- Programming: .NET Framework, .NET Core, C#, JAVA, Python, C/C++, R, CUDA, Swift, JavaScript, Shell, SQL, HTML
- Others: Linux, Git, Docker, IIS, AWS EC2, Power BI
EDUCATION Link to heading
Master&BS in Computer Science