[Esip-marinedata] Marine Data Cluster - 2023-03-09 - Extracting Metadata from CCHDO's PDF Cruise Reports - Dakshh Saraf

Carolina Berys-Gonzalez cberysgonzalez at ucsd.edu
Thu Mar 9 14:04:05 EST 2023


Reminder that we are starting now if anyone wants to join!

On Wed, Mar 8, 2023 at 11:41 AM Carolina Berys-Gonzalez <
cberysgonzalez at ucsd.edu> wrote:

> *Topic: Extracting Metadata from CCHDO's PDF Cruise ReportsTime: Thursday,
> 2023-03-09, 11:00 PT / 2:00 ET*
>
> Hello everyone,
>
> Please join this month's Marine Data Cluster call where we will have a
> presentation on the CCHDO's current efforts to mine its collection of PDF
> cruise reports for machine readable metadata.
>
> This is a multistep project which is being conducted by Dakshh Saraf who
> is a senior undergraduate at UCSD studying data science.
>
> The project is divided into 2 parts:
>
> 1) Parsing the PDFs into formatted text for ingestion into other
> processing data pipelines, including language processing and machine
> learning APIs. This was done using Optical Character Recognition (OCR)
> technology, and the process for this decision will be presented.
>
> 2) Mining the formatted text for metadata. This portion of the project is
> still developing and several techniques and options are being explored,
> including known pattern matching (such as searching for expected keywords)
> as well as supervised and semi-supervised machine learning options.
>
> We will have a presentation of the project followed by questions and open
> discussion. This project is very much still active, and we invite feedback
> and suggestions as well as requests for output.
>
> We hope to see you there!
>
> *SEE UPDATED CONNECTION INFORMATION BELOW*
> ------------------------------------
>
> *Connection Information via Zoom:This group will convene every month on
> the fourth Thursday11:00 AM Pacific Time, 2:00 PM Eastern Time, 18:00 UTC*
>
> Marine Data
> Connection Information
>
> Join Zoom Meeting
>
> https://us02web.zoom.us/j/84964718283?pwd=V1VmaWRKaVBJVmY1alhzSVJvTmc0QT09
>
>
> Meeting ID: 849 6471 8283
>
> Passcode: 023686
>
> One tap mobile
>
> +13017158592,,84964718283# US (Washington DC)
>
> +19292056099,,84964718283# US (New York)
>
> Dial by your location
>
>         +1 301 715 8592 US (Washington DC)
>
>         +1 929 205 6099 US (New York)
>
>         +1 312 626 6799 US (Chicago)
>
>         +1 253 215 8782 US (Tacoma)
>
>         +1 346 248 7799 US (Houston)
>
>         +1 669 900 6833 US (San Jose)
>
> Meeting ID: 849 6471 8283
> Find your local number: https://us02web.zoom.us/u/kci1kp7GY
>
>
> --
> Carolina Berys-Gonzalez
> Scripps Institution of Oceanography
> http://cchdo.ucsd.edu/
>


-- 
Carolina Berys-Gonzalez
Scripps Institution of Oceanography
http://cchdo.ucsd.edu/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.esipfed.org/pipermail/esip-marinedata/attachments/20230309/10c888d8/attachment.htm>


More information about the Esip-marinedata mailing list