Zhiguan Hu

Senior Software Engineer
Toronto

I am a senior software engineer with 8 years of experience leading complex projects in the domain of distributed systems, information extraction, and content management. I specialize in designing and developing NLP-based AI products, guiding projects from the research and development phase through to commercialization. I also have a strong foundation in the field, holding both a bachelor’s and a master’s degree in computer science.

WORK EXPERIENCE Link to heading

Senior Software Engineer Link to heading

Sfile Technology | January 2019 - Present

  • Distributed System for Content Management

    • Led the design and development of a self-discovery index synchronization system capable of facilitating large-scale file transfers between indexing and search servers, handling multi-terabyte data volumes.
    • Engineered a service discovery protocol using Consul to enable dynamic server management, automated shard replication, and continuous service health monitoring.
    • Enhanced performance by optimizing Rsync algorithm through parallelization and the integration of new hash functions for specific use cases.
  • Document Clustering Algorithm Optimization

    • Automated the training data collection process with scripting, significantly reducing human labeling effort in domain-specified data preparation by 90%.
    • Conducted experiments and comparisons of various document representation algorithms, such as Bag of Words (BoW), TF-IDF, Word2Vec, Fasttext, Doc2Vec, and BERT embedding, as part of the clustering algorithm design.
    • Developed a scalable job scheduling workflow in C# to execute clustering and auto-labeling algorithms efficiently.
    • Re-architected a monolithic system into microservices using RESTful API and .NET Core WebAPI, enhancing scalability and performance.
    • Achieved a remarkable performance boost of over 100 times by optimizing the clustering process through the conversion of CPU-based Python code to GPU with CUDA.
  • Oil & Gas Well Reliability Reporting

    • Developed a data extraction process to compile asset attributes from diverse data sources, including unstructured and handwritten well logs within the oil & gas domain.
    • Designed a data reconciliation algorithm, resolving conflicting information stemming from disparate sources during data preparation.
    • Collaborated closely with petroleum engineers to gather requirements for a reliability prediction algorithm.
    • Implemented the algorithm to assess the integrity of well systems, components, and equipment.
    • Developed a Power BI dashboard to visualize asset condition, availability simulation results, and well life timeline; integrated the dashboard with customized R and Python scripts for real-time analysis.

Software Engineer Link to heading

Sfile Technology | May 2015 - December 2018

  • Intelligent Document Processing Engine
    • Engineered automation rules for a complex unstructured data processing pipeline, capable of handling millions of diverse legal and oil & gas sensor documents in varying file formats and styles.
    • Led the discovery and implementation of data extraction tools, including a scalable OCR workflow with multiple engines, Excel header detection and table auto-correction utilizing the OpenXML SDK, logo detection using YOLO and Detectron models, and page splitting based on text or image similarities.
    • Innovated a document quality control (QC) process with an intuitive interface, enabling reviewers to efficiently inspect extracted attributes by highlighting source content within original documents using the pdf.js library, leading to a tenfold acceleration in the review process.
    • Led a team of three in developing and deploying a containerized text extraction application hosted with Docker, handling end-to-end data processing, from document loading into a self-hosted object storage, retrieval, data extraction, and data point ingestion into ElasticSearch.

Research Assistant Link to heading

MD Anderson Cancer Center | June 2013 - August 2013

SKILLS Link to heading

  • Programming: .NET Framework, .NET Core, C#, JAVA, Python, C/C++, R, CUDA, Swift, JavaScript, Shell, SQL, HTML
  • Others: Linux, Git, Docker, IIS, AWS EC2, Power BI

EDUCATION Link to heading

Master&BS in Computer Science