[Esip-marinedata] Marine Data Cluster - 2023-03-09 - Extracting Metadata from CCHDO's PDF Cruise Reports - Dakshh Saraf

Carolina Berys-Gonzalez cberysgonzalez at ucsd.edu
Thu Mar 9 18:28:03 EST 2023


Hello everyone,

The slides and notes from today's great presentation from Dakshh Saraf on
"CCHDO - Knowledge Extraction from Cruise Reports" are available here:
https://drive.google.com/drive/u/0/folders/1wVwZ_8Cuy9DJ1xysKmo0fPAP5OXNk8Ev
Recording folder is here:
https://drive.google.com/drive/u/0/folders/1i6nnaPiOGdfpf8IFqOSsladpSbNw1XQT


This is an ongoing project and we may have a follow up presentation in the
future, so please do share any comments or questions that come up while
watching the recording.

Thanks,
MDC Admin

On Wed, Mar 8, 2023 at 11:41 AM Carolina Berys-Gonzalez <
cberysgonzalez at ucsd.edu> wrote:

> *Time: Thursday, 2023-03-09, 11:00 PT / 2:00 ET*
>
> Hello everyone,
>
> Please join this month's Marine Data Cluster call where we will have a
> presentation on the CCHDO's current efforts to mine its collection of PDF
> cruise reports for machine readable metadata.
>
> This is a multistep project which is being conducted by Dakshh Saraf who
> is a senior undergraduate at UCSD studying data science.
>
> The project is divided into 2 parts:
>
> 1) Parsing the PDFs into formatted text for ingestion into other
> processing data pipelines, including language processing and machine
> learning APIs. This was done using Optical Character Recognition (OCR)
> technology, and the process for this decision will be presented.
>
> 2) Mining the formatted text for metadata. This portion of the project is
> still developing and several techniques and options are being explored,
> including known pattern matching (such as searching for expected keywords)
> as well as supervised and semi-supervised machine learning options.
>
> We will have a presentation of the project followed by questions and open
> discussion. This project is very much still active, and we invite feedback
> and suggestions as well as requests for output.
>
> We hope to see you there!
>
> *SEE UPDATED CONNECTION INFORMATION BELOW*
> ------------------------------------
>
> *Connection Information via Zoom:This group will convene every month on
> the fourth Thursday11:00 AM Pacific Time, 2:00 PM Eastern Time, 18:00 UTC*
>
> Marine Data
> Connection Information
>
> Join Zoom Meeting
>
> https://us02web.zoom.us/j/84964718283?pwd=V1VmaWRKaVBJVmY1alhzSVJvTmc0QT09
>
>
> Meeting ID: 849 6471 8283
>
> Passcode: 023686
>
> One tap mobile
>
> +13017158592,,84964718283# US (Washington DC)
>
> +19292056099,,84964718283# US (New York)
>
> Dial by your location
>
>         +1 301 715 8592 US (Washington DC)
>
>         +1 929 205 6099 US (New York)
>
>         +1 312 626 6799 US (Chicago)
>
>         +1 253 215 8782 US (Tacoma)
>
>         +1 346 248 7799 US (Houston)
>
>         +1 669 900 6833 US (San Jose)
>
> Meeting ID: 849 6471 8283
> Find your local number: https://us02web.zoom.us/u/kci1kp7GY
>
>
> --
> Carolina Berys-Gonzalez
> Scripps Institution of Oceanography
> http://cchdo.ucsd.edu/
>


-- 
Carolina Berys-Gonzalez
Scripps Institution of Oceanography
http://cchdo.ucsd.edu/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.esipfed.org/pipermail/esip-marinedata/attachments/20230309/ea4aa218/attachment-0001.htm>


More information about the Esip-marinedata mailing list